uu.seUppsala University Publications
12345671 of 42
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Linguistically Informed Neural Dependency Parsing for Typologically Diverse Languages
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. (Computational Linguistics)ORCID iD: 0000-0001-8844-2126
2019 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]

This thesis presents several studies in neural dependency parsing for typologically diverse languages, using treebanks from Universal Dependencies (UD). The focus is on informing models with linguistic knowledge. We first extend a parser to work well on typologically diverse languages, including morphologically complex languages and languages whose treebanks have a high ratio of non-projective sentences, a notorious difficulty in dependency parsing. We propose a general methodology where we sample a representative subset of UD treebanks for parser development and evaluation. Our parser uses recurrent neural networks which construct information sequentially, and we study the incorporation of a recursive neural network layer in our parser. This follows the intuition that language is hierarchical. This layer turns out to be superfluous in our parser and we study its interaction with other parts of the network. We subsequently study transitivity and agreement information learned by our parser for auxiliary verb constructions (AVCs). We suggest that a parser should learn similar information about AVCs as it learns for finite main verbs. This is motivated by work in theoretical dependency grammar. Our parser learns different information about these two if we do not augment it with a recursive layer, but similar information if we do, indicating that there may be benefits from using that layer and we may not yet have found the best way to incorporate it in our parser. We finally investigate polyglot parsing. Training one model for multiple related languages leads to substantial improvements in parsing accuracy over a monolingual baseline. We also study different parameter sharing strategies for related and unrelated languages. Sharing parameters that partially abstract away from word order appears to be beneficial in both cases but sharing parameters that represent words and characters is more beneficial for related than unrelated languages.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2019. , p. 178
Series
Studia Linguistica Upsaliensia, ISSN 1652-1366 ; 24
Keywords [en]
Dependency parsing, multilingual NLP, Universal Dependencies, Linguistically informed NLP
National Category
General Language Studies and Linguistics
Research subject
Computational Linguistics
Identifiers
URN: urn:nbn:se:uu:diva-394133ISBN: 978-91-513-0767-1 (print)OAI: oai:DiVA.org:uu-394133DiVA, id: diva2:1357373
Public defence
2019-11-25, Bertil Hammer, Blåsenhus, von Kraemers Allé 1, Uppsala, 13:15 (English)
Opponent
Supervisors
Available from: 2019-10-28 Created: 2019-10-03 Last updated: 2019-11-12

Open Access in DiVA

fulltext(1299 kB)150 downloads
File information
File name FULLTEXT01.pdfFile size 1299 kBChecksum SHA-512
24fef4fcc9436b53dfea47284dcda7e18282f5f41454f668606297392ef37f9105e09568998a291378882207fe599cb2349eb5e65e0844fd6739055611e74f00
Type fulltextMimetype application/pdf
Buy this publication >>

Authority records BETA

de Lhoneux, Miryam

Search in DiVA

By author/editor
de Lhoneux, Miryam
By organisation
Department of Linguistics and Philology
General Language Studies and Linguistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 150 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1083 hits
12345671 of 42
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf