Open this publication in new window or tab >>2024 (English)In: Text, Speech, and Dialogue: 27th International Conference, TSD 2024, Brno, Czech Republic, September 9–13, 2024, Proceedings, Part I / [ed] Elmar Nöth; Aleš Horák; Petr Sojka, Cham: Springer, 2024, p. 267-278Conference paper, Published paper (Refereed)
Abstract [en]
Word sense disambiguation (WSD) is a core task in computational linguistics that involves interpreting polysemous words in context by identifying senses from a predefined sense inventory. Despite the dominance of BERT and its derivatives in WSD evaluation benchmarks, their effectiveness in encoding and retrieving word senses, especially in languages other than English, remains relatively unexplored. This paper provides a detailed quantitative analysis, comparing various BERT-based models for Russian, and examines two primary WSD strategies: fine-tuning and feature-based nearest-neighbor classification. The best results are obtained with the ruBERT model coupled with the feature-based nearest neighbor strategy. This approach adeptly captures even fine-grained meanings with limited data and diverse sense distributions.
Place, publisher, year, edition, pages
Cham: Springer, 2024
Series
Lecture Notes in Artificial Intelligence (LNCS), ISSN 2945-9133, E-ISSN 1611-3349 ; 15048
Keywords
word sense disambiguation, BERT, Russian
National Category
Natural Language Processing
Identifiers
urn:nbn:se:uu:diva-541107 (URN)10.1007/978-3-031-70563-2_21 (DOI)001307840300021 ()978-3-031-70562-5 (ISBN)978-3-031-70563-2 (ISBN)
Conference
27th International Conference, TSD 2024, Brno, Czech Republic, September 9–13, 2024
2024-10-292024-10-292025-02-07Bibliographically approved