Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
Refine search result
1 - 10 of 10
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Abdou, Mostafa
    et al.
    Univ Copenhagen, Dept Comp Sci, Copenhagen, Denmark..
    Kulmizev, Artur
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Hill, Felix
    DeepMind, London, England..
    Low, Daniel M.
    Harvard Med Sch MIT, Program Speech & Hearing Biosci & Technol, Cambridge, MA 02139 USA..
    Sogaard, Anders
    Univ Copenhagen, Dept Comp Sci, Copenhagen, Denmark..
    Higher-order Comparisons of Sentence Encoder Representations2019In: 2019 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING AND THE 9TH INTERNATIONAL JOINT CONFERENCE ON NATURAL LANGUAGE PROCESSING (EMNLP-IJCNLP 2019): PROCEEDINGS OF THE CONFERENCE, ASSOC COMPUTATIONAL LINGUISTICS-ACL , 2019, p. 5838-5845Conference paper (Refereed)
    Abstract [en]

    Representational Similarity Analysis (RSA) is a technique developed by neuroscientists for comparing activity patterns of different measurement modalities (e.g., fMRI, electrophysiology, behavior). As a framework, RSA has several advantages over existing approaches to interpretation of language encoders based on probing or diagnostic classification: namely, it does not require large training samples, is not prone to overfitting, and it enables a more transparent comparison between the representational geometries of different models and modalities. We demonstrate the utility of RSA by establishing a previously unknown correspondence between widely-employed pre-trained language encoders and human processing difficulty via eye-tracking data, showcasing its potential in the interpretability toolbox for neural models.

  • 2.
    Abdou, Mostafa
    et al.
    Univ Copenhagen, Dept Comp Sci, Copenhagen, Denmark..
    Ravishankar, Vinit
    Univ Oslo, Dept Informat, Language Technol Grp, Oslo, Norway..
    Kulmizev, Artur
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Sogaard, Anders
    Univ Copenhagen, Dept Comp Sci, Copenhagen, Denmark..
    Word Order Does Matter (And Shuffled Language Models Know It)2022In: PROCEEDINGS OF THE 60TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2022), VOL 1: (LONG PAPERS), ASSOC COMPUTATIONAL LINGUISTICS-ACL Association for Computational Linguistics, 2022, p. 6907-6919Conference paper (Refereed)
    Abstract [en]

    Recent studies have shown that language models pretrained and/or fine-tuned on randomly permuted sentences exhibit competitive performance on GLUE, putting into question the importance of word order information. Somewhat counter-intuitively, some of these studies also report that position embeddings appear to be crucial for models' good performance with shuffled text. We probe these language models for word order information and investigate what position embeddings learned from shuffled text encode, showing that these models retain information pertaining to the original, naturalistic word order. We show this is in part due to a subtlety in how shuffling is implemented in previous work - before rather than after subword segmentation. Surprisingly, we find even Language models trained on text shuffled after subword segmentation retain some semblance of information about word order because of the statistical dependencies between sentence length and unigram probabilities. Finally, we show that beyond GLUE, a variety of language understanding tasks do require word order information, often to an extent that cannot be learned through fine-tuning.

  • 3. Basirat, Ali
    et al.
    de Lhoneux, Miryam
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Kulmizev, Artur
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Kurfal, Murathan
    Department of Linguistics, Stockholm University.
    Nivre, Joakim
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Östling, Robert
    Department of Linguistics, Stockholm University.
    Polyglot Parsing for One Thousand and One Languages (And Then Some)2019Conference paper (Other academic)
    Download full text (pdf)
    fulltext
  • 4.
    Hershcovich, Daniel
    et al.
    Univ Copenhagen, Copenhagen, Denmark..
    de Lhoneux, Miryam
    Univ Copenhagen, Copenhagen, Denmark..
    Kulmizev, Artur
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Pejhan, Elham
    Univ Copenhagen, Copenhagen, Denmark..
    Nivre, Joakim
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Kopsala: Transition-Based Graph Parsing via Efficient Training and Effective Encoding2020In: 16th International Conference on Parsing Technologies and IWPT 2020 Shared Task on Parsing Into Enhanced Universal Dependencies, Association for Computational Linguistics , 2020, p. 236-244Conference paper (Refereed)
    Abstract [en]

    We present Kopsala, the Copenhagen-Uppsala system for the Enhanced Universal Dependencies Shared Task at IWPT 2020. Our system is a pipeline consisting of off-the-shelf models for everything but enhanced graph parsing, and for the latter, a transition-based graph parser adapted from Che et al. (2019). We train a single enhanced parser model per language, using gold sentence splitting and tokenization for training, and rely only on tokenized surface forms and multilingual BERT for encoding. While a bug introduced just before submission resulted in a severe drop in precision, its post-submission fix would bring us to 4th place in the official ranking, according to average ELAS. Our parser demonstrates that a unified pipeline is elective for both Meaning Representation Parsing and Enhanced Universal Dependencies.

  • 5.
    Kulmizev, Artur
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    The Search for Syntax: Investigating the Syntactic Knowledge of Neural Language Models Through the Lens of Dependency Parsing2023Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    Syntax — the study of the hierarchical structure of language — has long featured as a prominent research topic in the field of natural language processing (NLP). Traditionally, its role in NLP was confined towards developing parsers: supervised algorithms tasked with predicting the structure of utterances (often for use in downstream applications). More recently, however, syntax (and syntactic theory) has factored much less into the development of NLP models, and much more into their analysis. This has been particularly true with the nascent relevance of language models: semi-supervised algorithms trained to predict (or infill) strings given a provided context. In this dissertation, I describe four separate studies that seek to explore the interplay between syntactic parsers and language models upon the backdrop of dependency syntax. In the first study, I investigate the error profiles of neural transition-based and graph-based dependency parsers, showing that they are effectively homogenized when leveraging representations from pre-trained language models. Following this, I report the results of two additional studies which show that dependency tree structure can be partially decoded from the internal components of neural language models — specifically, hidden state representations and self-attention distributions. I then expand on these findings by exploring a set of additional results, which serve to highlight the influence of experimental factors, such as the choice of annotation framework or learning objective, in decoding syntactic structure from model components. In the final study, I describe efforts to quantify the overall learnability of a large set of multilingual dependency treebanks — the data upon which the previous experiments were based — and how it may be affected by factors such as annotation quality or tokenization decisions. Finally, I conclude the thesis with a conceptual analysis that relates the aforementioned studies to a broader body of work concerning the syntactic knowledge of language models.

    List of papers
    1. Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing – A Tale of Two Parsers Revisited
    Open this publication in new window or tab >>Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing – A Tale of Two Parsers Revisited
    Show others...
    2019 (English)In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, p. 2755-2768Conference paper, Published paper (Refereed)
    Abstract [en]

    Transition-based and graph-based dependency parsers have previously been shown to have complementary strengths and weaknesses: transition-based parsers exploit rich structural features but suffer from error propagation, while graph-based parsers benefit from global optimization but have restricted feature scope. In this paper, we show that, even though some details of the picture have changed after the switch to neural networks and continuous representations, the basic trade-off between rich features and global optimization remains essentially the same. Moreover, we show that deep contextualized word embeddings, which allow parsers to pack information about global sentence structure into local feature representations, benefit transition-based parsers more than graph-based parsers, making the two approaches virtually equivalent in terms of both accuracy and error profile. We argue that the reason is that these representations help prevent search errors and thereby allow transitionbased parsers to better exploit their inherent strength of making accurate local decisions. We support this explanation by an error analysis of parsing experiments on 13 languages.

    National Category
    Language Technology (Computational Linguistics)
    Research subject
    Computational Linguistics
    Identifiers
    urn:nbn:se:uu:diva-406697 (URN)000854193302085 ()
    Conference
    2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), November 3-7, Hong Kong, China
    Funder
    Swedish Research Council, 2016-01817
    Available from: 2020-03-11 Created: 2020-03-11 Last updated: 2023-07-30Bibliographically approved
    2. Do Neural Language Models Show Preferences for Syntactic Formalisms?
    Open this publication in new window or tab >>Do Neural Language Models Show Preferences for Syntactic Formalisms?
    2020 (English)In: 58Th Annual Meeting Of The Association For Computational Linguistics (Acl 2020), ASSOC COMPUTATIONAL LINGUISTICS-ACL , 2020, p. 4077-4091Conference paper, Published paper (Refereed)
    Abstract [en]

    Recent work on the interpretability of deep neural language models has concluded that many properties of natural language syntax are encoded in their representational spaces. However, such studies often suffer from limited scope by focusing on a single language and a single linguistic formalism. In this study, we aim to investigate the extent to which the semblance of syntactic structure captured by language models adheres to a surface-syntactic or deep syntactic style of analysis, and whether the patterns are consistent across different languages. We apply a probe for extracting directed dependency trees to BERT and ELMo models trained on 13 different languages, probing for two different syntactic annotation styles: Universal Dependencies (UD), prioritizing deep syntactic relations, and Surface-Syntactic Universal Dependencies (SUD), focusing on surface structure. We find that both models exhibit a preference for UD over SUD - with interesting variations across languages and layers - and that the strength of this preference is correlated with differences in tree shape.

    Place, publisher, year, edition, pages
    ASSOC COMPUTATIONAL LINGUISTICS-ACL, 2020
    National Category
    General Language Studies and Linguistics Computer Sciences
    Identifiers
    urn:nbn:se:uu:diva-423307 (URN)000570978204034 ()978-1-952148-25-5 (ISBN)
    Conference
    58TH ANNUAL MEETING OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS (ACL 2020), JULY 5-10, 2020
    Available from: 2020-10-23 Created: 2020-10-23 Last updated: 2023-07-30Bibliographically approved
    3. Attention Can Reflect Syntactic Structure (If You Let It)
    Open this publication in new window or tab >>Attention Can Reflect Syntactic Structure (If You Let It)
    Show others...
    2021 (English)In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Association for Computational Linguistics, 2021, p. 3031-3045Conference paper, Published paper (Refereed)
    Abstract [en]

    Since the popularization of the Transformer as a general-purpose feature encoder for NLP, many studies have attempted to decode linguistic structure from its novel multi-head attention mechanism. However, much of such work focused almost exclusively on English - a language with rigid word order and a lack of inflectional morphology. In this study, we present decoding experiments for multilingual BERT across 18 languages in order to test the generalizability of the claim that dependency syntax is reflected in attention patterns. We show that full trees can be decoded above baseline accuracy from single attention heads, and that individual relations are often tracked by the same heads across languages. Furthermore, in an attempt to address recent debates about the status of attention as an explanatory mechanism, we experiment with fine-tuning mBERT on a supervised parsing objective while freezing different series of parameters. Interestingly, in steering the objective to learn explicit linguistic structure, we find much of the same structure represented in the resulting attention patterns, with interesting differences with respect to which parameters are frozen.

    Place, publisher, year, edition, pages
    Association for Computational Linguistics, 2021
    National Category
    Language Technology (Computational Linguistics)
    Research subject
    Computational Linguistics
    Identifiers
    urn:nbn:se:uu:diva-462768 (URN)10.18653/v1/2021.eacl-main.264 (DOI)000863557003010 ()978-1-954085-02-2 (ISBN)
    Conference
    16th Conference of the European Chapter of the Association for Computational Linguistics,19-23 April, 2021, on line
    Funder
    Google
    Available from: 2022-01-02 Created: 2022-01-02 Last updated: 2023-07-30Bibliographically approved
    4. Schrödinger's tree: On syntax and neural language models
    Open this publication in new window or tab >>Schrödinger's tree: On syntax and neural language models
    2022 (English)In: Frontiers in Artificial Intelligence, E-ISSN 2624-8212, Vol. 5, article id 796788Article in journal (Refereed) Published
    Abstract [en]

    In the last half-decade, the field of natural language processing (NLP) hasundergone two major transitions: the switch to neural networks as the primarymodeling paradigm and the homogenization of the training regime (pre-train, then fine-tune). Amidst this process, language models have emergedas NLP’s workhorse, displaying increasingly fluent generation capabilities andproving to be an indispensable means of knowledge transfer downstream.Due to the otherwise opaque, black-box nature of such models, researchershave employed aspects of linguistic theory in order to characterize theirbehavior. Questions central to syntax—the study of the hierarchical structureof language—have factored heavily into such work, shedding invaluableinsights about models’ inherent biases and their ability to make human-likegeneralizations. In this paper, we attempt to take stock of this growing body ofliterature. In doing so, we observe a lack of clarity across numerous dimensions,which influences the hypotheses that researchers form, as well as theconclusions they draw from their findings. To remedy this, we urge researchersto make careful considerations when investigating coding properties, selectingrepresentations, and evaluating via downstream tasks. Furthermore, we outlinethe implications of the different types of research questions exhibited in studieson syntax, as well as the inherent pitfalls of aggregate metrics. Ultimately, wehope that our discussion adds nuance to the prospect of studying languagemodels and paves the way for a less monolithic perspective on syntax in thiscontext.

    Place, publisher, year, edition, pages
    Frontiers Media S.A., 2022
    Keywords
    neural networks, language models, syntax, coding properties, representations, natural language understanding
    National Category
    Language Technology (Computational Linguistics)
    Research subject
    Computational Linguistics
    Identifiers
    urn:nbn:se:uu:diva-492066 (URN)10.3389/frai.2022.796788 (DOI)000915268600001 ()36325030 (PubMedID)
    Funder
    Uppsala University
    Available from: 2023-01-01 Created: 2023-01-01 Last updated: 2023-07-30Bibliographically approved
    5. Investigating UD Treebanks via Dataset Difficulty Measures
    Open this publication in new window or tab >>Investigating UD Treebanks via Dataset Difficulty Measures
    2023 (English)In: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, Dubrovnik, Croatia: Association for Computational Linguistics, 2023, p. 1076-1089Conference paper, Published paper (Refereed)
    Abstract [en]

    Treebanks annotated with Universal Dependencies (UD) are currently available for over 100 languages and are widely utilized by the community. However, their inherent characteristics are hard to measure and are only partially reflected in parser evaluations via accuracy metrics like LAS. In this study, we analyze a large subset of the UD treebanks using three recently proposed accuracy-free dataset analysis methods: dataset cartography, 𝒱-information, and minimum description length. Each method provides insights about UD treebanks that would remain undetected if only LAS was considered. Specifically, we identify a number of treebanks that, despite yielding high LAS, contain very little information that is usable by a parser to surpass what can be achieved by simple heuristics. Furthermore, we make note of several treebanks that score consistently low across numerous metrics, indicating a high degree of noise or annotation inconsistency present therein.

    Place, publisher, year, edition, pages
    Dubrovnik, Croatia: Association for Computational Linguistics, 2023
    Keywords
    computational linguistics, syntax, universal dependencies, parsing, natural language processing
    National Category
    General Language Studies and Linguistics Computer Sciences
    Research subject
    Computational Linguistics
    Identifiers
    urn:nbn:se:uu:diva-508035 (URN)
    Conference
    The 17th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2023), Dubrovnik, Croatia, May 2-6, 2023
    Available from: 2023-07-18 Created: 2023-07-18 Last updated: 2023-08-11Bibliographically approved
    Download full text (pdf)
    fulltext
    Download (jpg)
    preview image
  • 6.
    Kulmizev, Artur
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    de Lhoneux, Miryam
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Gontrum, Johannes
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Fano, Elena
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Nivre, Joakim
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing – A Tale of Two Parsers Revisited2019In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, p. 2755-2768Conference paper (Refereed)
    Abstract [en]

    Transition-based and graph-based dependency parsers have previously been shown to have complementary strengths and weaknesses: transition-based parsers exploit rich structural features but suffer from error propagation, while graph-based parsers benefit from global optimization but have restricted feature scope. In this paper, we show that, even though some details of the picture have changed after the switch to neural networks and continuous representations, the basic trade-off between rich features and global optimization remains essentially the same. Moreover, we show that deep contextualized word embeddings, which allow parsers to pack information about global sentence structure into local feature representations, benefit transition-based parsers more than graph-based parsers, making the two approaches virtually equivalent in terms of both accuracy and error profile. We argue that the reason is that these representations help prevent search errors and thereby allow transitionbased parsers to better exploit their inherent strength of making accurate local decisions. We support this explanation by an error analysis of parsing experiments on 13 languages.

  • 7.
    Kulmizev, Artur
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Nivre, Joakim
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. RISE.
    Investigating UD Treebanks via Dataset Difficulty Measures2023In: Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, Dubrovnik, Croatia: Association for Computational Linguistics, 2023, p. 1076-1089Conference paper (Refereed)
    Abstract [en]

    Treebanks annotated with Universal Dependencies (UD) are currently available for over 100 languages and are widely utilized by the community. However, their inherent characteristics are hard to measure and are only partially reflected in parser evaluations via accuracy metrics like LAS. In this study, we analyze a large subset of the UD treebanks using three recently proposed accuracy-free dataset analysis methods: dataset cartography, 𝒱-information, and minimum description length. Each method provides insights about UD treebanks that would remain undetected if only LAS was considered. Specifically, we identify a number of treebanks that, despite yielding high LAS, contain very little information that is usable by a parser to surpass what can be achieved by simple heuristics. Furthermore, we make note of several treebanks that score consistently low across numerous metrics, indicating a high degree of noise or annotation inconsistency present therein.

  • 8.
    Kulmizev, Artur
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Nivre, Joakim
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. RISE Research Institutes of Sweden, Kista, Sweden.
    Schrödinger's tree: On syntax and neural language models2022In: Frontiers in Artificial Intelligence, E-ISSN 2624-8212, Vol. 5, article id 796788Article in journal (Refereed)
    Abstract [en]

    In the last half-decade, the field of natural language processing (NLP) hasundergone two major transitions: the switch to neural networks as the primarymodeling paradigm and the homogenization of the training regime (pre-train, then fine-tune). Amidst this process, language models have emergedas NLP’s workhorse, displaying increasingly fluent generation capabilities andproving to be an indispensable means of knowledge transfer downstream.Due to the otherwise opaque, black-box nature of such models, researchershave employed aspects of linguistic theory in order to characterize theirbehavior. Questions central to syntax—the study of the hierarchical structureof language—have factored heavily into such work, shedding invaluableinsights about models’ inherent biases and their ability to make human-likegeneralizations. In this paper, we attempt to take stock of this growing body ofliterature. In doing so, we observe a lack of clarity across numerous dimensions,which influences the hypotheses that researchers form, as well as theconclusions they draw from their findings. To remedy this, we urge researchersto make careful considerations when investigating coding properties, selectingrepresentations, and evaluating via downstream tasks. Furthermore, we outlinethe implications of the different types of research questions exhibited in studieson syntax, as well as the inherent pitfalls of aggregate metrics. Ultimately, wehope that our discussion adds nuance to the prospect of studying languagemodels and paves the way for a less monolithic perspective on syntax in thiscontext.

    Download full text (pdf)
    fulltext
  • 9.
    Kulmizev, Artur
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Ravishankar, Vinit
    Univ Oslo, Oslo, Norway..
    Abdou, Mostafa
    Univ Copenhagen, Copenhagen, Denmark..
    Nivre, Joakim
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Do Neural Language Models Show Preferences for Syntactic Formalisms?2020In: 58Th Annual Meeting Of The Association For Computational Linguistics (Acl 2020), ASSOC COMPUTATIONAL LINGUISTICS-ACL , 2020, p. 4077-4091Conference paper (Refereed)
    Abstract [en]

    Recent work on the interpretability of deep neural language models has concluded that many properties of natural language syntax are encoded in their representational spaces. However, such studies often suffer from limited scope by focusing on a single language and a single linguistic formalism. In this study, we aim to investigate the extent to which the semblance of syntactic structure captured by language models adheres to a surface-syntactic or deep syntactic style of analysis, and whether the patterns are consistent across different languages. We apply a probe for extracting directed dependency trees to BERT and ELMo models trained on 13 different languages, probing for two different syntactic annotation styles: Universal Dependencies (UD), prioritizing deep syntactic relations, and Surface-Syntactic Universal Dependencies (SUD), focusing on surface structure. We find that both models exhibit a preference for UD over SUD - with interesting variations across languages and layers - and that the strength of this preference is correlated with differences in tree shape.

  • 10. Ravishankar, Vinit
    et al.
    Kulmizev, Artur
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Abdou, Mostafa
    Søgaard, Anders
    Nivre, Joakim
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Attention Can Reflect Syntactic Structure (If You Let It)2021In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Association for Computational Linguistics, 2021, p. 3031-3045Conference paper (Refereed)
    Abstract [en]

    Since the popularization of the Transformer as a general-purpose feature encoder for NLP, many studies have attempted to decode linguistic structure from its novel multi-head attention mechanism. However, much of such work focused almost exclusively on English - a language with rigid word order and a lack of inflectional morphology. In this study, we present decoding experiments for multilingual BERT across 18 languages in order to test the generalizability of the claim that dependency syntax is reflected in attention patterns. We show that full trees can be decoded above baseline accuracy from single attention heads, and that individual relations are often tracked by the same heads across languages. Furthermore, in an attempt to address recent debates about the status of attention as an explanatory mechanism, we experiment with fine-tuning mBERT on a supervised parsing objective while freezing different series of parameters. Interestingly, in steering the objective to learn explicit linguistic structure, we find much of the same structure represented in the resulting attention patterns, with interesting differences with respect to which parameters are frozen.

1 - 10 of 10
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf