uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
An Investigation of the Interactions Between Pre-Trained Word Embeddings, Character Models and POS Tags in Dependency Parsing
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.ORCID iD: 0000-0002-2837-3648
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.ORCID iD: 0000-0001-8844-2126
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
2018 (English)In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, 2018, p. 2711-2720Conference paper, Published paper (Refereed)
Abstract [en]

Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2711–2720Brussels, Belgium, October 31 - November 4, 2018.c©2018 Association for Computational Linguistics2711An Investigation of the Interactions Between Pre-Trained WordEmbeddings, Character Models and POS Tags in Dependency ParsingAaron Smith Miryam de Lhoneux Sara Stymne Joakim NivreDepartment of Linguistics and Philology, Uppsala UniversityAbstractWe provide a comprehensive analysis of theinteractions between pre-trained word embed-dings, character models and POS tags in atransition-based dependency parser.Whileprevious studies have shown POS informationto be less important in the presence of char-acter models, we show that in fact there arecomplex interactions between all three tech-niques. In isolation each produces large im-provements over a baseline system using ran-domly initialised word embeddings only, butcombining them quickly leads to diminishingreturns. We categorise words by frequency,POS tag and language in order to systemati-cally investigate how each of the techniquesaffects parsing quality. For many word cat-egories, applying any two of the three tech-niques is almost as good as the full combinedsystem. Character models tend to be more im-portant for low-frequency open-class words,especially in morphologically rich languages,while POS tags can help disambiguate high-frequency function words. We also show thatlarge character embedding sizes help even forlanguages with small character sets, especiallyin morphologically rich languages.

Place, publisher, year, edition, pages
2018. p. 2711-2720
National Category
Language Technology (Computational Linguistics)
Research subject
Computational Linguistics; Computational Linguistics
Identifiers
URN: urn:nbn:se:uu:diva-371245OAI: oai:DiVA.org:uu-371245DiVA, id: diva2:1272808
Conference
The 2018 Conference on Empirical Methods in Natural Language Processing, October 31–November 4 Brussels, Belgium
Available from: 2018-12-19 Created: 2018-12-19 Last updated: 2019-03-06Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

http://aclweb.org/anthology/D18-1291

Authority records BETA

Smith, Aaronde Lhoneux, MiryamStymne, SaraNivre, Joakim

Search in DiVA

By author/editor
Smith, Aaronde Lhoneux, MiryamStymne, SaraNivre, Joakim
By organisation
Department of Linguistics and Philology
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 7 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf