Linguistically Informed Neural Dependency Parsing for Typologically Diverse Languages
2019 (English)Doctoral thesis, monograph (Other academic)
Abstract [en]
This thesis presents several studies in neural dependency parsing for typologically diverse languages, using treebanks from Universal Dependencies (UD). The focus is on informing models with linguistic knowledge. We first extend a parser to work well on typologically diverse languages, including morphologically complex languages and languages whose treebanks have a high ratio of non-projective sentences, a notorious difficulty in dependency parsing. We propose a general methodology where we sample a representative subset of UD treebanks for parser development and evaluation. Our parser uses recurrent neural networks which construct information sequentially, and we study the incorporation of a recursive neural network layer in our parser. This follows the intuition that language is hierarchical. This layer turns out to be superfluous in our parser and we study its interaction with other parts of the network. We subsequently study transitivity and agreement information learned by our parser for auxiliary verb constructions (AVCs). We suggest that a parser should learn similar information about AVCs as it learns for finite main verbs. This is motivated by work in theoretical dependency grammar. Our parser learns different information about these two if we do not augment it with a recursive layer, but similar information if we do, indicating that there may be benefits from using that layer and we may not yet have found the best way to incorporate it in our parser. We finally investigate polyglot parsing. Training one model for multiple related languages leads to substantial improvements in parsing accuracy over a monolingual baseline. We also study different parameter sharing strategies for related and unrelated languages. Sharing parameters that partially abstract away from word order appears to be beneficial in both cases but sharing parameters that represent words and characters is more beneficial for related than unrelated languages.
Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2019. , p. 178
Series
Studia Linguistica Upsaliensia, ISSN 1652-1366 ; 24
Keywords [en]
Dependency parsing, multilingual NLP, Universal Dependencies, Linguistically informed NLP
National Category
General Language Studies and Linguistics
Research subject
Computational Linguistics
Identifiers
URN: urn:nbn:se:uu:diva-394133ISBN: 978-91-513-0767-1 (print)OAI: oai:DiVA.org:uu-394133DiVA, id: diva2:1357373
Public defence
2019-11-25, Bertil Hammer, Blåsenhus, von Kraemers Allé 1, Uppsala, 13:15 (English)
Opponent
Supervisors
2019-10-282019-10-032023-03-13