uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
An analysis of Attention Mechanism: The Case of Word Sense Disambiguation in Neural Machine Translation
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
School of Informatics, University of Edinburgh.
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
2018 (English)In: Proceedings of the Third Conference on Machine Translation, 2018, p. 26-35Conference paper, Published paper (Refereed)
Abstract [en]

Recent work has shown that the encoder-decoder attention mechanisms in neural ma-chine translation (NMT) are different from theword alignment in statistical machine trans-lation.In this paper, we focus on analyz-ing encoder-decoder attention mechanisms, inthe case of word sense disambiguation (WSD)in NMT models. We hypothesize that atten-tion mechanisms pay more attention to contexttokens when translating ambiguous words.We explore the attention distribution patternswhen translating ambiguous nouns. Counter-intuitively, we find that attention mechanismsare likely to distribute more attention to theambiguous noun itself rather than context to-kens, in comparison to other nouns. We con-clude that attention is not the main mecha-nism used by NMT models to incorporate con-textual information for WSD. The experimen-tal results suggest that NMT models learn toencode contextual information necessary forWSD in the encoder hidden states. For the at-tention mechanism in Transformer models, wereveal that the first few layers gradually learnto “align” source and target tokens and the lastfew layers learn to extract features from the re-lated but unaligned context tokens

Place, publisher, year, edition, pages
2018. p. 26-35
National Category
Language Technology (Computational Linguistics)
Research subject
Computational Linguistics
Identifiers
URN: urn:nbn:se:uu:diva-369712OAI: oai:DiVA.org:uu-369712DiVA, id: diva2:1271212
Conference
Third Conference on Machine Translation, October 31 — November 1, 2018, Brussels, Belgium
Available from: 2018-12-17 Created: 2018-12-17 Last updated: 2019-03-06Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

http://www.statmt.org/wmt18/pdf/WMT004.pdf

Authority records BETA

Tang, GongboNivre, Joakim

Search in DiVA

By author/editor
Tang, GongboNivre, Joakim
By organisation
Department of Linguistics and Philology
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 16 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf