Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
The Search for Syntax: Investigating the Syntactic Knowledge of Neural Language Models Through the Lens of Dependency Parsing
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. (Computational Linguistics)
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Description
Abstract [en]

Syntax — the study of the hierarchical structure of language — has long featured as a prominent research topic in the field of natural language processing (NLP). Traditionally, its role in NLP was confined towards developing parsers: supervised algorithms tasked with predicting the structure of utterances (often for use in downstream applications). More recently, however, syntax (and syntactic theory) has factored much less into the development of NLP models, and much more into their analysis. This has been particularly true with the nascent relevance of language models: semi-supervised algorithms trained to predict (or infill) strings given a provided context. In this dissertation, I describe four separate studies that seek to explore the interplay between syntactic parsers and language models upon the backdrop of dependency syntax. In the first study, I investigate the error profiles of neural transition-based and graph-based dependency parsers, showing that they are effectively homogenized when leveraging representations from pre-trained language models. Following this, I report the results of two additional studies which show that dependency tree structure can be partially decoded from the internal components of neural language models — specifically, hidden state representations and self-attention distributions. I then expand on these findings by exploring a set of additional results, which serve to highlight the influence of experimental factors, such as the choice of annotation framework or learning objective, in decoding syntactic structure from model components. In the final study, I describe efforts to quantify the overall learnability of a large set of multilingual dependency treebanks — the data upon which the previous experiments were based — and how it may be affected by factors such as annotation quality or tokenization decisions. Finally, I conclude the thesis with a conceptual analysis that relates the aforementioned studies to a broader body of work concerning the syntactic knowledge of language models.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2023. , p. 101
Series
Studia Linguistica Upsaliensia, ISSN 1652-1366 ; 30
Keywords [en]
syntax, language models, dependency parsing, universal dependencies
National Category
Language Technology (Computational Linguistics)
Research subject
Computational Linguistics
Identifiers
URN: urn:nbn:se:uu:diva-508379ISBN: 978-91-513-1850-9 (print)OAI: oai:DiVA.org:uu-508379DiVA, id: diva2:1784732
Public defence
2023-09-22, Humanistiska Teatern, Engelska parken, Thunbergsvägen 3C, Uppsala, 14:00 (English)
Opponent
Supervisors
Available from: 2023-08-24 Created: 2023-07-30 Last updated: 2023-08-24
List of papers
1. Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing – A Tale of Two Parsers Revisited
Open this publication in new window or tab >>Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing – A Tale of Two Parsers Revisited
Show others...
2019 (English)In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, p. 2755-2768Conference paper, Published paper (Refereed)
Abstract [en]

Transition-based and graph-based dependency parsers have previously been shown to have complementary strengths and weaknesses: transition-based parsers exploit rich structural features but suffer from error propagation, while graph-based parsers benefit from global optimization but have restricted feature scope. In this paper, we show that, even though some details of the picture have changed after the switch to neural networks and continuous representations, the basic trade-off between rich features and global optimization remains essentially the same. Moreover, we show that deep contextualized word embeddings, which allow parsers to pack information about global sentence structure into local feature representations, benefit transition-based parsers more than graph-based parsers, making the two approaches virtually equivalent in terms of both accuracy and error profile. We argue that the reason is that these representations help prevent search errors and thereby allow transitionbased parsers to better exploit their inherent strength of making accurate local decisions. We support this explanation by an error analysis of parsing experiments on 13 languages.

National Category
Language Technology (Computational Linguistics)
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-406697 (URN)000854193302085 ()
Conference
2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), November 3-7, Hong Kong, China
Funder
Swedish Research Council, 2016-01817
Available from: 2020-03-11 Created: 2020-03-11 Last updated: 2023-07-30Bibliographically approved
2. Do Neural Language Models Show Preferences for Syntactic Formalisms?
Open this publication in new window or tab >>Do Neural Language Models Show Preferences for Syntactic Formalisms?
2020 (English)In: 58Th Annual Meeting Of The Association For Computational Linguistics (Acl 2020), ASSOC COMPUTATIONAL LINGUISTICS-ACL , 2020, p. 4077-4091Conference paper, Published paper (Refereed)
Abstract [en]

Recent work on the interpretability of deep neural language models has concluded that many properties of natural language syntax are encoded in their representational spaces. However, such studies often suffer from limited scope by focusing on a single language and a single linguistic formalism. In this study, we aim to investigate the extent to which the semblance of syntactic structure captured by language models adheres to a surface-syntactic or deep syntactic style of analysis, and whether the patterns are consistent across different languages. We apply a probe for extracting directed dependency trees to BERT and ELMo models trained on 13 different languages, probing for two different syntactic annotation styles: Universal Dependencies (UD), prioritizing deep syntactic relations, and Surface-Syntactic Universal Dependencies (SUD), focusing on surface structure. We find that both models exhibit a preference for UD over SUD - with interesting variations across languages and layers - and that the strength of this preference is correlated with differences in tree shape.

Place, publisher, year, edition, pages
ASSOC COMPUTATIONAL LINGUISTICS-ACL, 2020
National Category
General Language Studies and Linguistics Computer Sciences
Identifiers
urn:nbn:se:uu:diva-423307 (URN)000570978204034 ()978-1-952148-25-5 (ISBN)
Conference
58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), JULY 5-10, 2020
Available from: 2020-10-23 Created: 2020-10-23 Last updated: 2023-07-30Bibliographically approved
3. Attention Can Reflect Syntactic Structure (If You Let It)
Open this publication in new window or tab >>Attention Can Reflect Syntactic Structure (If You Let It)
Show others...
2021 (English)In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Association for Computational Linguistics, 2021, p. 3031-3045Conference paper, Published paper (Refereed)
Abstract [en]

Since the popularization of the Transformer as a general-purpose feature encoder for NLP, many studies have attempted to decode linguistic structure from its novel multi-head attention mechanism. However, much of such work focused almost exclusively on English - a language with rigid word order and a lack of inflectional morphology. In this study, we present decoding experiments for multilingual BERT across 18 languages in order to test the generalizability of the claim that dependency syntax is reflected in attention patterns. We show that full trees can be decoded above baseline accuracy from single attention heads, and that individual relations are often tracked by the same heads across languages. Furthermore, in an attempt to address recent debates about the status of attention as an explanatory mechanism, we experiment with fine-tuning mBERT on a supervised parsing objective while freezing different series of parameters. Interestingly, in steering the objective to learn explicit linguistic structure, we find much of the same structure represented in the resulting attention patterns, with interesting differences with respect to which parameters are frozen.

Place, publisher, year, edition, pages
Association for Computational Linguistics, 2021
National Category
Language Technology (Computational Linguistics)
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-462768 (URN)10.18653/v1/2021.eacl-main.264 (DOI)000863557003010 ()978-1-954085-02-2 (ISBN)
Conference
16th Conference of the European Chapter of the Association for Computational Linguistics,19-23 April, 2021, on line
Funder
Google
Available from: 2022-01-02 Created: 2022-01-02 Last updated: 2023-07-30Bibliographically approved
4. Schrödinger's tree: On syntax and neural language models
Open this publication in new window or tab >>Schrödinger's tree: On syntax and neural language models
2022 (English)In: Frontiers in Artificial Intelligence, E-ISSN 2624-8212, Vol. 5, article id 796788Article in journal (Refereed) Published
Abstract [en]

In the last half-decade, the field of natural language processing (NLP) hasundergone two major transitions: the switch to neural networks as the primarymodeling paradigm and the homogenization of the training regime (pre-train, then fine-tune). Amidst this process, language models have emergedas NLP’s workhorse, displaying increasingly fluent generation capabilities andproving to be an indispensable means of knowledge transfer downstream.Due to the otherwise opaque, black-box nature of such models, researchershave employed aspects of linguistic theory in order to characterize theirbehavior. Questions central to syntax—the study of the hierarchical structureof language—have factored heavily into such work, shedding invaluableinsights about models’ inherent biases and their ability to make human-likegeneralizations. In this paper, we attempt to take stock of this growing body ofliterature. In doing so, we observe a lack of clarity across numerous dimensions,which influences the hypotheses that researchers form, as well as theconclusions they draw from their findings. To remedy this, we urge researchersto make careful considerations when investigating coding properties, selectingrepresentations, and evaluating via downstream tasks. Furthermore, we outlinethe implications of the different types of research questions exhibited in studieson syntax, as well as the inherent pitfalls of aggregate metrics. Ultimately, wehope that our discussion adds nuance to the prospect of studying languagemodels and paves the way for a less monolithic perspective on syntax in thiscontext.

Place, publisher, year, edition, pages
Frontiers Media S.A., 2022
Keywords
neural networks, language models, syntax, coding properties, representations, natural language understanding
National Category
Language Technology (Computational Linguistics)
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-492066 (URN)10.3389/frai.2022.796788 (DOI)000915268600001 ()36325030 (PubMedID)
Funder
Uppsala University
Available from: 2023-01-01 Created: 2023-01-01 Last updated: 2023-07-30Bibliographically approved
5. Investigating UD Treebanks via Dataset Difficulty Measures
Open this publication in new window or tab >>Investigating UD Treebanks via Dataset Difficulty Measures
2023 (English)In: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, Dubrovnik, Croatia: Association for Computational Linguistics, 2023, p. 1076-1089Conference paper, Published paper (Refereed)
Abstract [en]

Treebanks annotated with Universal Dependencies (UD) are currently available for over 100 languages and are widely utilized by the community. However, their inherent characteristics are hard to measure and are only partially reflected in parser evaluations via accuracy metrics like LAS. In this study, we analyze a large subset of the UD treebanks using three recently proposed accuracy-free dataset analysis methods: dataset cartography, 𝒱-information, and minimum description length. Each method provides insights about UD treebanks that would remain undetected if only LAS was considered. Specifically, we identify a number of treebanks that, despite yielding high LAS, contain very little information that is usable by a parser to surpass what can be achieved by simple heuristics. Furthermore, we make note of several treebanks that score consistently low across numerous metrics, indicating a high degree of noise or annotation inconsistency present therein.

Place, publisher, year, edition, pages
Dubrovnik, Croatia: Association for Computational Linguistics, 2023
Keywords
computational linguistics, syntax, universal dependencies, parsing, natural language processing
National Category
General Language Studies and Linguistics Computer Sciences
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-508035 (URN)
Conference
The 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023), Dubrovnik, Croatia, May 2-6, 2023
Available from: 2023-07-18 Created: 2023-07-18 Last updated: 2023-08-11Bibliographically approved

Open Access in DiVA

fulltext(986 kB)440 downloads
File information
File name FULLTEXT01.pdfFile size 986 kBChecksum SHA-512
56e70b9d675551d85b1a87035bd1ac7fd22210abca43ca756b1120afaade24552275532b1e52fe93d0e9ca9d42e4a3067efb1745bbca56c5dd955b04f43da370
Type fulltextMimetype application/pdf

Authority records

Kulmizev, Artur

Search in DiVA

By author/editor
Kulmizev, Artur
By organisation
Department of Linguistics and Philology
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 446 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 422 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf