Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
Link to record
Permanent link

Direct link
Kulmizev, Artur
Publications (10 of 10) Show all publications
Kulmizev, A. & Nivre, J. (2023). Investigating UD Treebanks via Dataset Difficulty Measures. In: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: . Paper presented at The 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023), Dubrovnik, Croatia, May 2-6, 2023 (pp. 1076-1089). Dubrovnik, Croatia: Association for Computational Linguistics
Open this publication in new window or tab >>Investigating UD Treebanks via Dataset Difficulty Measures
2023 (English)In: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, Dubrovnik, Croatia: Association for Computational Linguistics, 2023, p. 1076-1089Conference paper, Published paper (Refereed)
Abstract [en]

Treebanks annotated with Universal Dependencies (UD) are currently available for over 100 languages and are widely utilized by the community. However, their inherent characteristics are hard to measure and are only partially reflected in parser evaluations via accuracy metrics like LAS. In this study, we analyze a large subset of the UD treebanks using three recently proposed accuracy-free dataset analysis methods: dataset cartography, 𝒱-information, and minimum description length. Each method provides insights about UD treebanks that would remain undetected if only LAS was considered. Specifically, we identify a number of treebanks that, despite yielding high LAS, contain very little information that is usable by a parser to surpass what can be achieved by simple heuristics. Furthermore, we make note of several treebanks that score consistently low across numerous metrics, indicating a high degree of noise or annotation inconsistency present therein.

Place, publisher, year, edition, pages
Dubrovnik, Croatia: Association for Computational Linguistics, 2023
Keywords
computational linguistics, syntax, universal dependencies, parsing, natural language processing
National Category
General Language Studies and Linguistics Computer Sciences
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-508035 (URN)
Conference
The 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023), Dubrovnik, Croatia, May 2-6, 2023
Available from: 2023-07-18 Created: 2023-07-18 Last updated: 2023-08-11Bibliographically approved
Kulmizev, A. (2023). The Search for Syntax: Investigating the Syntactic Knowledge of Neural Language Models Through the Lens of Dependency Parsing. (Doctoral dissertation). Uppsala: Acta Universitatis Upsaliensis
Open this publication in new window or tab >>The Search for Syntax: Investigating the Syntactic Knowledge of Neural Language Models Through the Lens of Dependency Parsing
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Syntax — the study of the hierarchical structure of language — has long featured as a prominent research topic in the field of natural language processing (NLP). Traditionally, its role in NLP was confined towards developing parsers: supervised algorithms tasked with predicting the structure of utterances (often for use in downstream applications). More recently, however, syntax (and syntactic theory) has factored much less into the development of NLP models, and much more into their analysis. This has been particularly true with the nascent relevance of language models: semi-supervised algorithms trained to predict (or infill) strings given a provided context. In this dissertation, I describe four separate studies that seek to explore the interplay between syntactic parsers and language models upon the backdrop of dependency syntax. In the first study, I investigate the error profiles of neural transition-based and graph-based dependency parsers, showing that they are effectively homogenized when leveraging representations from pre-trained language models. Following this, I report the results of two additional studies which show that dependency tree structure can be partially decoded from the internal components of neural language models — specifically, hidden state representations and self-attention distributions. I then expand on these findings by exploring a set of additional results, which serve to highlight the influence of experimental factors, such as the choice of annotation framework or learning objective, in decoding syntactic structure from model components. In the final study, I describe efforts to quantify the overall learnability of a large set of multilingual dependency treebanks — the data upon which the previous experiments were based — and how it may be affected by factors such as annotation quality or tokenization decisions. Finally, I conclude the thesis with a conceptual analysis that relates the aforementioned studies to a broader body of work concerning the syntactic knowledge of language models.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2023. p. 101
Series
Studia Linguistica Upsaliensia, ISSN 1652-1366 ; 30
Keywords
syntax, language models, dependency parsing, universal dependencies
National Category
Language Technology (Computational Linguistics)
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-508379 (URN)978-91-513-1850-9 (ISBN)
Public defence
2023-09-22, Humanistiska Teatern, Engelska parken, Thunbergsvägen 3C, Uppsala, 14:00 (English)
Opponent
Supervisors
Available from: 2023-08-24 Created: 2023-07-30 Last updated: 2023-08-24
Kulmizev, A. & Nivre, J. (2022). Schrödinger's tree: On syntax and neural language models. Frontiers in Artificial Intelligence, 5, Article ID 796788.
Open this publication in new window or tab >>Schrödinger's tree: On syntax and neural language models
2022 (English)In: Frontiers in Artificial Intelligence, E-ISSN 2624-8212, Vol. 5, article id 796788Article in journal (Refereed) Published
Abstract [en]

In the last half-decade, the field of natural language processing (NLP) hasundergone two major transitions: the switch to neural networks as the primarymodeling paradigm and the homogenization of the training regime (pre-train, then fine-tune). Amidst this process, language models have emergedas NLP’s workhorse, displaying increasingly fluent generation capabilities andproving to be an indispensable means of knowledge transfer downstream.Due to the otherwise opaque, black-box nature of such models, researchershave employed aspects of linguistic theory in order to characterize theirbehavior. Questions central to syntax—the study of the hierarchical structureof language—have factored heavily into such work, shedding invaluableinsights about models’ inherent biases and their ability to make human-likegeneralizations. In this paper, we attempt to take stock of this growing body ofliterature. In doing so, we observe a lack of clarity across numerous dimensions,which influences the hypotheses that researchers form, as well as theconclusions they draw from their findings. To remedy this, we urge researchersto make careful considerations when investigating coding properties, selectingrepresentations, and evaluating via downstream tasks. Furthermore, we outlinethe implications of the different types of research questions exhibited in studieson syntax, as well as the inherent pitfalls of aggregate metrics. Ultimately, wehope that our discussion adds nuance to the prospect of studying languagemodels and paves the way for a less monolithic perspective on syntax in thiscontext.

Place, publisher, year, edition, pages
Frontiers Media S.A., 2022
Keywords
neural networks, language models, syntax, coding properties, representations, natural language understanding
National Category
Language Technology (Computational Linguistics)
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-492066 (URN)10.3389/frai.2022.796788 (DOI)000915268600001 ()36325030 (PubMedID)
Funder
Uppsala University
Available from: 2023-01-01 Created: 2023-01-01 Last updated: 2023-07-30Bibliographically approved
Abdou, M., Ravishankar, V., Kulmizev, A. & Sogaard, A. (2022). Word Order Does Matter (And Shuffled Language Models Know It). In: PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS). Paper presented at 60th Annual Meeting of the Association-for-Computational-Linguistics (ACL), MAY 22-27, 2022, Dublin, IRELAND (pp. 6907-6919). ASSOC COMPUTATIONAL LINGUISTICS-ACL Association for Computational Linguistics
Open this publication in new window or tab >>Word Order Does Matter (And Shuffled Language Models Know It)
2022 (English)In: PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), ASSOC COMPUTATIONAL LINGUISTICS-ACL Association for Computational Linguistics, 2022, p. 6907-6919Conference paper, Published paper (Refereed)
Abstract [en]

Recent studies have shown that language models pretrained and/or fine-tuned on randomly permuted sentences exhibit competitive performance on GLUE, putting into question the importance of word order information. Somewhat counter-intuitively, some of these studies also report that position embeddings appear to be crucial for models' good performance with shuffled text. We probe these language models for word order information and investigate what position embeddings learned from shuffled text encode, showing that these models retain information pertaining to the original, naturalistic word order. We show this is in part due to a subtlety in how shuffling is implemented in previous work - before rather than after subword segmentation. Surprisingly, we find even Language models trained on text shuffled after subword segmentation retain some semblance of information about word order because of the statistical dependencies between sentence length and unigram probabilities. Finally, we show that beyond GLUE, a variety of language understanding tasks do require word order information, often to an extent that cannot be learned through fine-tuning.

Place, publisher, year, edition, pages
Association for Computational LinguisticsASSOC COMPUTATIONAL LINGUISTICS-ACL, 2022
National Category
Language Technology (Computational Linguistics) Computer Sciences
Identifiers
urn:nbn:se:uu:diva-484790 (URN)000828702307003 ()978-1-955917-21-6 (ISBN)
Conference
60th Annual Meeting of the Association-for-Computational-Linguistics (ACL), MAY 22-27, 2022, Dublin, IRELAND
Available from: 2022-09-19 Created: 2022-09-19 Last updated: 2024-01-15Bibliographically approved
Ravishankar, V., Kulmizev, A., Abdou, M., Søgaard, A. & Nivre, J. (2021). Attention Can Reflect Syntactic Structure (If You Let It). In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. Paper presented at 16th Conference of the European Chapter of the Association for Computational Linguistics,19-23 April, 2021, on line (pp. 3031-3045). Association for Computational Linguistics
Open this publication in new window or tab >>Attention Can Reflect Syntactic Structure (If You Let It)
Show others...
2021 (English)In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Association for Computational Linguistics, 2021, p. 3031-3045Conference paper, Published paper (Refereed)
Abstract [en]

Since the popularization of the Transformer as a general-purpose feature encoder for NLP, many studies have attempted to decode linguistic structure from its novel multi-head attention mechanism. However, much of such work focused almost exclusively on English - a language with rigid word order and a lack of inflectional morphology. In this study, we present decoding experiments for multilingual BERT across 18 languages in order to test the generalizability of the claim that dependency syntax is reflected in attention patterns. We show that full trees can be decoded above baseline accuracy from single attention heads, and that individual relations are often tracked by the same heads across languages. Furthermore, in an attempt to address recent debates about the status of attention as an explanatory mechanism, we experiment with fine-tuning mBERT on a supervised parsing objective while freezing different series of parameters. Interestingly, in steering the objective to learn explicit linguistic structure, we find much of the same structure represented in the resulting attention patterns, with interesting differences with respect to which parameters are frozen.

Place, publisher, year, edition, pages
Association for Computational Linguistics, 2021
National Category
Language Technology (Computational Linguistics)
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-462768 (URN)10.18653/v1/2021.eacl-main.264 (DOI)000863557003010 ()978-1-954085-02-2 (ISBN)
Conference
16th Conference of the European Chapter of the Association for Computational Linguistics,19-23 April, 2021, on line
Funder
Google
Available from: 2022-01-02 Created: 2022-01-02 Last updated: 2023-07-30Bibliographically approved
Kulmizev, A., Ravishankar, V., Abdou, M. & Nivre, J. (2020). Do Neural Language Models Show Preferences for Syntactic Formalisms?. In: 58Th Annual Meeting Of The Association For Computational Linguistics (Acl 2020): . Paper presented at 58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), JULY 5-10, 2020 (pp. 4077-4091). ASSOC COMPUTATIONAL LINGUISTICS-ACL
Open this publication in new window or tab >>Do Neural Language Models Show Preferences for Syntactic Formalisms?
2020 (English)In: 58Th Annual Meeting Of The Association For Computational Linguistics (Acl 2020), ASSOC COMPUTATIONAL LINGUISTICS-ACL , 2020, p. 4077-4091Conference paper, Published paper (Refereed)
Abstract [en]

Recent work on the interpretability of deep neural language models has concluded that many properties of natural language syntax are encoded in their representational spaces. However, such studies often suffer from limited scope by focusing on a single language and a single linguistic formalism. In this study, we aim to investigate the extent to which the semblance of syntactic structure captured by language models adheres to a surface-syntactic or deep syntactic style of analysis, and whether the patterns are consistent across different languages. We apply a probe for extracting directed dependency trees to BERT and ELMo models trained on 13 different languages, probing for two different syntactic annotation styles: Universal Dependencies (UD), prioritizing deep syntactic relations, and Surface-Syntactic Universal Dependencies (SUD), focusing on surface structure. We find that both models exhibit a preference for UD over SUD - with interesting variations across languages and layers - and that the strength of this preference is correlated with differences in tree shape.

Place, publisher, year, edition, pages
ASSOC COMPUTATIONAL LINGUISTICS-ACL, 2020
National Category
General Language Studies and Linguistics Computer Sciences
Identifiers
urn:nbn:se:uu:diva-423307 (URN)000570978204034 ()978-1-952148-25-5 (ISBN)
Conference
58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), JULY 5-10, 2020
Available from: 2020-10-23 Created: 2020-10-23 Last updated: 2023-07-30Bibliographically approved
Hershcovich, D., de Lhoneux, M., Kulmizev, A., Pejhan, E. & Nivre, J. (2020). Kopsala: Transition-Based Graph Parsing via Efficient Training and Effective Encoding. In: 16th International Conference on Parsing Technologies and IWPT 2020 Shared Task on Parsing Into Enhanced Universal Dependencies: . Paper presented at 16th International Conference on Parsing Technologies (IWPT) - Shared Task on Parsing into Enhanced Universal Dependencies, Electronic Network, July 9, 2020 (pp. 236-244). Association for Computational Linguistics
Open this publication in new window or tab >>Kopsala: Transition-Based Graph Parsing via Efficient Training and Effective Encoding
Show others...
2020 (English)In: 16th International Conference on Parsing Technologies and IWPT 2020 Shared Task on Parsing Into Enhanced Universal Dependencies, Association for Computational Linguistics , 2020, p. 236-244Conference paper, Published paper (Refereed)
Abstract [en]

We present Kopsala, the Copenhagen-Uppsala system for the Enhanced Universal Dependencies Shared Task at IWPT 2020. Our system is a pipeline consisting of off-the-shelf models for everything but enhanced graph parsing, and for the latter, a transition-based graph parser adapted from Che et al. (2019). We train a single enhanced parser model per language, using gold sentence splitting and tokenization for training, and rely only on tokenized surface forms and multilingual BERT for encoding. While a bug introduced just before submission resulted in a severe drop in precision, its post-submission fix would bring us to 4th place in the official ranking, according to average ELAS. Our parser demonstrates that a unified pipeline is elective for both Meaning Representation Parsing and Enhanced Universal Dependencies.

Place, publisher, year, edition, pages
Association for Computational Linguistics, 2020
National Category
Language Technology (Computational Linguistics) Computer Sciences
Identifiers
urn:nbn:se:uu:diva-422886 (URN)10.18653/v1/2020.iwpt-1.25 (DOI)000563425200025 ()978-1-952148-11-8 (ISBN)
Conference
16th International Conference on Parsing Technologies (IWPT) - Shared Task on Parsing into Enhanced Universal Dependencies, Electronic Network, July 9, 2020
Available from: 2020-10-27 Created: 2020-10-27 Last updated: 2020-10-27Bibliographically approved
Kulmizev, A., de Lhoneux, M., Gontrum, J., Fano, E. & Nivre, J. (2019). Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing – A Tale of Two Parsers Revisited. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): . Paper presented at 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), November 3-7, Hong Kong, China (pp. 2755-2768).
Open this publication in new window or tab >>Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing – A Tale of Two Parsers Revisited
Show others...
2019 (English)In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, p. 2755-2768Conference paper, Published paper (Refereed)
Abstract [en]

Transition-based and graph-based dependency parsers have previously been shown to have complementary strengths and weaknesses: transition-based parsers exploit rich structural features but suffer from error propagation, while graph-based parsers benefit from global optimization but have restricted feature scope. In this paper, we show that, even though some details of the picture have changed after the switch to neural networks and continuous representations, the basic trade-off between rich features and global optimization remains essentially the same. Moreover, we show that deep contextualized word embeddings, which allow parsers to pack information about global sentence structure into local feature representations, benefit transition-based parsers more than graph-based parsers, making the two approaches virtually equivalent in terms of both accuracy and error profile. We argue that the reason is that these representations help prevent search errors and thereby allow transitionbased parsers to better exploit their inherent strength of making accurate local decisions. We support this explanation by an error analysis of parsing experiments on 13 languages.

National Category
Language Technology (Computational Linguistics)
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-406697 (URN)000854193302085 ()
Conference
2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), November 3-7, Hong Kong, China
Funder
Swedish Research Council, 2016-01817
Available from: 2020-03-11 Created: 2020-03-11 Last updated: 2023-07-30Bibliographically approved
Abdou, M., Kulmizev, A., Hill, F., Low, D. M. & Sogaard, A. (2019). Higher-order Comparisons of Sentence Encoder Representations. In: 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE. Paper presented at Conference on Empirical Methods in Natural Language Processing / 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), NOV 03-07, 2019, Hong Kong, HONG KONG (pp. 5838-5845). ASSOC COMPUTATIONAL LINGUISTICS-ACL
Open this publication in new window or tab >>Higher-order Comparisons of Sentence Encoder Representations
Show others...
2019 (English)In: 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, ASSOC COMPUTATIONAL LINGUISTICS-ACL , 2019, p. 5838-5845Conference paper, Published paper (Refereed)
Abstract [en]

Representational Similarity Analysis (RSA) is a technique developed by neuroscientists for comparing activity patterns of different measurement modalities (e.g., fMRI, electrophysiology, behavior). As a framework, RSA has several advantages over existing approaches to interpretation of language encoders based on probing or diagnostic classification: namely, it does not require large training samples, is not prone to overfitting, and it enables a more transparent comparison between the representational geometries of different models and modalities. We demonstrate the utility of RSA by establishing a previously unknown correspondence between widely-employed pre-trained language encoders and human processing difficulty via eye-tracking data, showcasing its potential in the interpretability toolbox for neural models.

Place, publisher, year, edition, pages
ASSOC COMPUTATIONAL LINGUISTICS-ACL, 2019
National Category
Computer Sciences
Identifiers
urn:nbn:se:uu:diva-494310 (URN)000854193306004 ()978-1-950737-90-1 (ISBN)
Conference
Conference on Empirical Methods in Natural Language Processing / 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), NOV 03-07, 2019, Hong Kong, HONG KONG
Available from: 2023-01-17 Created: 2023-01-17 Last updated: 2023-01-17Bibliographically approved
Basirat, A., de Lhoneux, M., Kulmizev, A., Kurfal, M., Nivre, J. & Östling, R. (2019). Polyglot Parsing for One Thousand and One Languages (And Then Some). In: : . Paper presented at First workshop on Typology for Polyglot NLP, Florence, Italy, August 1 2019.
Open this publication in new window or tab >>Polyglot Parsing for One Thousand and One Languages (And Then Some)
Show others...
2019 (English)Conference paper, Poster (with or without abstract) (Other academic)
National Category
General Language Studies and Linguistics Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:uu:diva-392156 (URN)
Conference
First workshop on Typology for Polyglot NLP, Florence, Italy, August 1 2019
Available from: 2019-08-29 Created: 2019-08-29 Last updated: 2021-01-18Bibliographically approved
Organisations

Search in DiVA

Show all publications