Logotyp: till Uppsala universitets webbplats

uu.sePublikationer från Uppsala universitet
Ändra sökning
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf
Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing – A Tale of Two Parsers Revisited
Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.ORCID-id: 0000-0001-8844-2126
Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
Uppsala universitet, Humanistisk-samhällsvetenskapliga vetenskapsområdet, Språkvetenskapliga fakulteten, Institutionen för lingvistik och filologi.
Visa övriga samt affilieringar
2019 (Engelska)Ingår i: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, s. 2755-2768Konferensbidrag, Publicerat paper (Refereegranskat)
Abstract [en]

Transition-based and graph-based dependency parsers have previously been shown to have complementary strengths and weaknesses: transition-based parsers exploit rich structural features but suffer from error propagation, while graph-based parsers benefit from global optimization but have restricted feature scope. In this paper, we show that, even though some details of the picture have changed after the switch to neural networks and continuous representations, the basic trade-off between rich features and global optimization remains essentially the same. Moreover, we show that deep contextualized word embeddings, which allow parsers to pack information about global sentence structure into local feature representations, benefit transition-based parsers more than graph-based parsers, making the two approaches virtually equivalent in terms of both accuracy and error profile. We argue that the reason is that these representations help prevent search errors and thereby allow transitionbased parsers to better exploit their inherent strength of making accurate local decisions. We support this explanation by an error analysis of parsing experiments on 13 languages.

Ort, förlag, år, upplaga, sidor
2019. s. 2755-2768
Nationell ämneskategori
Språkbehandling och datorlingvistik
Forskningsämne
Datorlingvistik
Identifikatorer
URN: urn:nbn:se:uu:diva-406697ISI: 000854193302085OAI: oai:DiVA.org:uu-406697DiVA, id: diva2:1413749
Konferens
2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), November 3-7, Hong Kong, China
Forskningsfinansiär
Vetenskapsrådet, 2016-01817Tillgänglig från: 2020-03-11 Skapad: 2020-03-11 Senast uppdaterad: 2025-02-07Bibliografiskt granskad
Ingår i avhandling
1. The Search for Syntax: Investigating the Syntactic Knowledge of Neural Language Models Through the Lens of Dependency Parsing
Öppna denna publikation i ny flik eller fönster >>The Search for Syntax: Investigating the Syntactic Knowledge of Neural Language Models Through the Lens of Dependency Parsing
2023 (Engelska)Doktorsavhandling, sammanläggning (Övrigt vetenskapligt)
Abstract [en]

Syntax — the study of the hierarchical structure of language — has long featured as a prominent research topic in the field of natural language processing (NLP). Traditionally, its role in NLP was confined towards developing parsers: supervised algorithms tasked with predicting the structure of utterances (often for use in downstream applications). More recently, however, syntax (and syntactic theory) has factored much less into the development of NLP models, and much more into their analysis. This has been particularly true with the nascent relevance of language models: semi-supervised algorithms trained to predict (or infill) strings given a provided context. In this dissertation, I describe four separate studies that seek to explore the interplay between syntactic parsers and language models upon the backdrop of dependency syntax. In the first study, I investigate the error profiles of neural transition-based and graph-based dependency parsers, showing that they are effectively homogenized when leveraging representations from pre-trained language models. Following this, I report the results of two additional studies which show that dependency tree structure can be partially decoded from the internal components of neural language models — specifically, hidden state representations and self-attention distributions. I then expand on these findings by exploring a set of additional results, which serve to highlight the influence of experimental factors, such as the choice of annotation framework or learning objective, in decoding syntactic structure from model components. In the final study, I describe efforts to quantify the overall learnability of a large set of multilingual dependency treebanks — the data upon which the previous experiments were based — and how it may be affected by factors such as annotation quality or tokenization decisions. Finally, I conclude the thesis with a conceptual analysis that relates the aforementioned studies to a broader body of work concerning the syntactic knowledge of language models.

Ort, förlag, år, upplaga, sidor
Uppsala: Acta Universitatis Upsaliensis, 2023. s. 101
Serie
Studia Linguistica Upsaliensia, ISSN 1652-1366 ; 30
Nyckelord
syntax, language models, dependency parsing, universal dependencies
Nationell ämneskategori
Språkbehandling och datorlingvistik
Forskningsämne
Datorlingvistik
Identifikatorer
urn:nbn:se:uu:diva-508379 (URN)978-91-513-1850-9 (ISBN)
Disputation
2023-09-22, Humanistiska Teatern, Engelska parken, Thunbergsvägen 3C, Uppsala, 14:00 (Engelska)
Opponent
Handledare
Tillgänglig från: 2023-08-24 Skapad: 2023-07-30 Senast uppdaterad: 2025-02-07

Open Access i DiVA

Fulltext saknas i DiVA

Övriga länkar

https://www.aclweb.org/anthology/D19-1277https://www.emnlp-ijcnlp2019.org/

Person

Kulmizev, Arturde Lhoneux, MiryamNivre, Joakim

Sök vidare i DiVA

Av författaren/redaktören
Kulmizev, Arturde Lhoneux, MiryamNivre, Joakim
Av organisationen
Institutionen för lingvistik och filologi
Språkbehandling och datorlingvistik

Sök vidare utanför DiVA

GoogleGoogle Scholar

urn-nbn

Altmetricpoäng

urn-nbn
Totalt: 129 träffar
RefereraExporteraLänk till posten
Permanent länk

Direktlänk
Referera
Referensformat
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Annat format
Fler format
Språk
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Annat språk
Fler språk
Utmatningsformat
  • html
  • text
  • asciidoc
  • rtf