Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
Link to record
Permanent link

Direct link
Alternative names
Publications (10 of 230) Show all publications
Carlsson, F., Broberg, J., Hillbom, E., Sahlgren, M. & Nivre, J. (2024). Branch-GAN: Improving Text Generation with (not so) Large Language Models. In: The Twelfth International Conference on Learning Representations: . Paper presented at The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7, 2024.
Open this publication in new window or tab >>Branch-GAN: Improving Text Generation with (not so) Large Language Models
Show others...
2024 (English)In: The Twelfth International Conference on Learning Representations, 2024Conference paper, Published paper (Refereed)
Abstract [en]

The current advancements in open domain text generation have been spearheaded by Transformer-based large language models. Leveraging efficient parallelization and vast training datasets, these models achieve unparalleled text generation capabilities. Even so, current models are known to suffer from deficiencies such as repetitive texts, looping issues, and lack of robustness. While adversarial training through generative adversarial networks (GAN) is a proposed solution, earlier research in this direction has predominantly focused on older architectures, or narrow tasks. As a result, this approach is not yet compatible with modern language models for open-ended text generation, leading to diminished interest within the broader research community. We propose a computationally efficient GAN approach for sequential data that utilizes the parallelization capabilities of Transformer models. Our method revolves around generating multiple branching sequences from each training sample, while also incorporating the typical next-step prediction loss on the original data. In this way, we achieve a dense reward and loss signal for both the generator and the discriminator, resulting in a stable training dynamic. We apply our training method to pre-trained language models, using data from their original training set but less than 0.01% of the available data. A comprehensive human evaluation shows that our method significantly improves the quality of texts generated by the model while avoiding the previously reported sparsity problems of GAN approaches. Even our smaller models outperform larger original baseline models with more than 16 times the number of parameters. Finally, we corroborate previous claims that perplexity on held-out data is not a sufficient metric for measuring the quality of generated texts.

National Category
Natural Language Processing
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-545945 (URN)
Conference
The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7, 2024
Available from: 2025-01-01 Created: 2025-01-01 Last updated: 2025-06-24Bibliographically approved
Gogoulou, E., Lesort, T., Boman, M. & Nivre, J. (2024). Continual Learning Under Language Shift. In: Elmar Nöth; Aleš Horák; Petr Sojka (Ed.), Text, Speech, and Dialogue: 27th International Conference, TSD 2024, Brno, Czech Republic, September 9–13, 2024, Proceedings, Part I. Paper presented at 27th International Conference, TSD 2024, Brno, Czech Republic, September 9–13, 2024 (pp. 71-84). Cham: Springer
Open this publication in new window or tab >>Continual Learning Under Language Shift
2024 (English)In: Text, Speech, and Dialogue: 27th International Conference, TSD 2024, Brno, Czech Republic, September 9–13, 2024, Proceedings, Part I / [ed] Elmar Nöth; Aleš Horák; Petr Sojka, Cham: Springer, 2024, p. 71-84Conference paper, Published paper (Refereed)
Abstract [en]

The recent increase in data and model scale for language model pre-training has led to huge training costs. In scenarios where new data become available over time, updating a model instead of fully retraining it would therefore provide significant gains. We study the pros and cons of updating a language model when new data comes from new languages – the case of continual learning under language shift. Starting from a monolingual English language model, we incrementally add data from Danish, Icelandic and Norwegian to investigate how forward and backward transfer effects depend on pre-training order and characteristics of languages, for models with 126M, 356M and 1.3B parameters. Our results show that, while forward transfer is largely positive and independent of language order, backward transfer can be positive or negative depending on the order and characteristics of new languages. We explore a number of potentially explanatory factors and find that a combination of language contamination and syntactic similarity best fits our results.

Place, publisher, year, edition, pages
Cham: Springer, 2024
Series
Lecture Notes in Computer Science (LNCS), ISSN 0302-9743, E-ISSN 1611-3349 ; 15048
Keywords
Multilingual NLP, Continual Learning, Large Language Models
National Category
Natural Language Processing
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-518957 (URN)10.1007/978-3-031-70563-2_6 (DOI)001307840300006 ()978-3-031-70562-5 (ISBN)978-3-031-70563-2 (ISBN)
Conference
27th International Conference, TSD 2024, Brno, Czech Republic, September 9–13, 2024
Funder
Swedish Research Council, 2022-02909Knut and Alice Wallenberg Foundation, Berzelius-2023-178
Available from: 2024-01-01 Created: 2024-01-01 Last updated: 2025-02-07Bibliographically approved
Karlgren, J., Dürlich, L., Gogoulou, E., Guillou, L., Nivre, J., Sahlgren, M. & Talman, A. (2024). ELOQUENT CLEF Shared Tasks for Evaluation of Generative Language Model Quality. In: Advances in Information Retrieval (ECIR 2024): . Paper presented at Advances in Information Retrieval (ECIR 2024).
Open this publication in new window or tab >>ELOQUENT CLEF Shared Tasks for Evaluation of Generative Language Model Quality
Show others...
2024 (English)In: Advances in Information Retrieval (ECIR 2024), 2024Conference paper, Published paper (Refereed)
National Category
Natural Language Processing
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-545940 (URN)
Conference
Advances in Information Retrieval (ECIR 2024)
Available from: 2025-01-01 Created: 2025-01-01 Last updated: 2025-02-07
De Marneffe, M.-C., Nivre, J. & Zeman, D. (2024). Function Words in Universal Dependencies. Linguistic Analysis, 43(3–4), 549-588
Open this publication in new window or tab >>Function Words in Universal Dependencies
2024 (English)In: Linguistic Analysis, ISSN 0098-9053, Vol. 43, no 3–4, p. 549-588Article in journal (Refereed) Published
National Category
Natural Language Processing
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-545937 (URN)
Available from: 2025-01-01 Created: 2025-01-01 Last updated: 2025-02-17Bibliographically approved
Aleksandrova, A. & Nivre, J. (2024). Models and Strategies for Russian Word Sense Disambiguation: A Comparative Analysis. In: Elmar Nöth; Aleš Horák; Petr Sojka (Ed.), Text, Speech, and Dialogue: 27th International Conference, TSD 2024, Brno, Czech Republic, September 9–13, 2024, Proceedings, Part I. Paper presented at 27th International Conference, TSD 2024, Brno, Czech Republic, September 9–13, 2024 (pp. 267-278). Cham: Springer
Open this publication in new window or tab >>Models and Strategies for Russian Word Sense Disambiguation: A Comparative Analysis
2024 (English)In: Text, Speech, and Dialogue: 27th International Conference, TSD 2024, Brno, Czech Republic, September 9–13, 2024, Proceedings, Part I / [ed] Elmar Nöth; Aleš Horák; Petr Sojka, Cham: Springer, 2024, p. 267-278Conference paper, Published paper (Refereed)
Abstract [en]

Word sense disambiguation (WSD) is a core task in computational linguistics that involves interpreting polysemous words in context by identifying senses from a predefined sense inventory. Despite the dominance of BERT and its derivatives in WSD evaluation benchmarks, their effectiveness in encoding and retrieving word senses, especially in languages other than English, remains relatively unexplored. This paper provides a detailed quantitative analysis, comparing various BERT-based models for Russian, and examines two primary WSD strategies: fine-tuning and feature-based nearest-neighbor classification. The best results are obtained with the ruBERT model coupled with the feature-based nearest neighbor strategy. This approach adeptly captures even fine-grained meanings with limited data and diverse sense distributions.

Place, publisher, year, edition, pages
Cham: Springer, 2024
Series
Lecture Notes in Artificial Intelligence (LNCS), ISSN 2945-9133, E-ISSN 1611-3349 ; 15048
Keywords
word sense disambiguation, BERT, Russian
National Category
Natural Language Processing
Identifiers
urn:nbn:se:uu:diva-541107 (URN)10.1007/978-3-031-70563-2_21 (DOI)001307840300021 ()978-3-031-70562-5 (ISBN)978-3-031-70563-2 (ISBN)
Conference
27th International Conference, TSD 2024, Brno, Czech Republic, September 9–13, 2024
Available from: 2024-10-29 Created: 2024-10-29 Last updated: 2025-02-07Bibliographically approved
Karlgren, J., Dürlich, L., Gogoulou, E., Guillou, L., Nivre, J., Sahlgren, M., . . . Zahra, S. (2024). Overview of ELOQUENT 2024: Shared Tasks for Evaluating Generative Language Model Quality. In: Lorraine Goeuriot; Philippe Mulhem; Georges Quénot; Didier Schwab; Giorgio Maria Di Nunzio; Laure Soulier; Petra Galuščáková; Alba García Seco de Herrera; Guglielmo Faggioli; Nicola Ferro (Ed.), Experimental IR Meets Multilinguality, Multimodality, and Interaction: 15th International Conference of the CLEF Association, CLEF 2024, Grenoble, France, September 9–12, 2024, Proceedings, Part II. Paper presented at Experimental IR Meets Multilinguality, Multimodality, and Interaction (CLEF 2024), Grenoble, France, September 9-12, 2024 (pp. 53-72). Cham: Springer, II
Open this publication in new window or tab >>Overview of ELOQUENT 2024: Shared Tasks for Evaluating Generative Language Model Quality
Show others...
2024 (English)In: Experimental IR Meets Multilinguality, Multimodality, and Interaction: 15th International Conference of the CLEF Association, CLEF 2024, Grenoble, France, September 9–12, 2024, Proceedings, Part II / [ed] Lorraine Goeuriot; Philippe Mulhem; Georges Quénot; Didier Schwab; Giorgio Maria Di Nunzio; Laure Soulier; Petra Galuščáková; Alba García Seco de Herrera; Guglielmo Faggioli; Nicola Ferro, Cham: Springer, 2024, Vol. II, p. 53-72Conference paper, Published paper (Refereed)
Abstract [en]

ELOQUENT is a set of shared tasks for evaluating the quality and usefulness of generative language models. ELOQUENT aims to apply high-level quality criteria, grounded in experiences from deploying models in real-life tasks, and to formulate tests for those criteria, preferably implemented to require minimal human assessment effort and in a multilingual setting. The tasks for the first year of ELOQUENT were (1) Topical quiz, in which language models are probed for topical competence; (2) HalluciGen, in which we assessed the ability of models to generate and detect hallucinations; (3) Robustness, in which we assessed the robustness and consistency of a model output given variation in the input prompts; and (4) Voight-Kampff, run in partnership with the PAN lab, with the aim of discovering whether it is possible to automatically distinguish human-generated text from machine-generated text. This first year of experimentation has shown—as expected—that using self-assessment with models judging models is feasible, but not entirely straight-forward, and that a a judicious comparison with human assessment and application context is necessary to be able to trust self-assessed quality judgments.

Place, publisher, year, edition, pages
Cham: Springer, 2024
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349 ; 14959
National Category
Natural Language Processing
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-545939 (URN)10.1007/978-3-031-71908-0_3 (DOI)001336411000003 ()2-s2.0-85205360663 (Scopus ID)978-3-031-71907-3 (ISBN)978-3-031-71908-0 (ISBN)
Conference
Experimental IR Meets Multilinguality, Multimodality, and Interaction (CLEF 2024), Grenoble, France, September 9-12, 2024
Available from: 2025-01-01 Created: 2025-01-01 Last updated: 2025-06-24Bibliographically approved
Dürlich, L., Gogoulou, E., Guillou, L., Nivre, J. & Zahra, S. (2024). Overview of the CLEF-2024 Eloquent Lab: Task 2 on HalluciGen. In: Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024): . Paper presented at Conference and Labs of the Evaluation Forum (CLEF 2024) (pp. 691-702).
Open this publication in new window or tab >>Overview of the CLEF-2024 Eloquent Lab: Task 2 on HalluciGen
Show others...
2024 (English)In: Working Notes of the Conference and Labs of the Evaluation Forum (CLEF 2024), 2024, p. 691-702Conference paper, Published paper (Refereed)
National Category
Natural Language Processing
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-545941 (URN)
Conference
Conference and Labs of the Evaluation Forum (CLEF 2024)
Available from: 2025-01-01 Created: 2025-01-01 Last updated: 2025-02-07
Bhatia, A., Bouma, G., Doğruöz, A. S., Evang, K., Garcia, M., Giouli, V., . . . Rademaker, A. (Eds.). (2024). Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024. Paper presented at Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, Torino, Italia. European Language Resources Association
Open this publication in new window or tab >>Proceedings of the Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024
Show others...
2024 (English)Conference proceedings (editor) (Refereed)
Place, publisher, year, edition, pages
European Language Resources Association, 2024
National Category
Natural Language Processing
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-545938 (URN)
Conference
Joint Workshop on Multiword Expressions and Universal Dependencies (MWE-UD) @ LREC-COLING 2024, Torino, Italia
Available from: 2025-01-01 Created: 2025-01-01 Last updated: 2025-04-28Bibliographically approved
Nivre, J. (2024). Ten Years of Universal Dependencies. In: Proceedings of the Sixth International Conference on Computational Linguistics in Bulgaria (CLIB 2024): . Paper presented at 6th International Conference on Computational Linguistics in Bulgaria (CLIB), Sep 09-10, 2024, Sofia, Bulgaria (pp. 3-3). Institute for Bulgarian Language, Bulgarian Academy of Sciences
Open this publication in new window or tab >>Ten Years of Universal Dependencies
2024 (English)In: Proceedings of the Sixth International Conference on Computational Linguistics in Bulgaria (CLIB 2024), Institute for Bulgarian Language, Bulgarian Academy of Sciences , 2024, p. 3-3Conference paper, Oral presentation with published abstract (Refereed)
Place, publisher, year, edition, pages
Institute for Bulgarian Language, Bulgarian Academy of Sciences, 2024
Series
Proceedings of the International Conference Computational Linguistics in Bulgaria, ISSN 2367-5675 ; 6
National Category
General Language Studies and Linguistics Natural Language Processing
Identifiers
urn:nbn:se:uu:diva-544680 (URN)001324798800002 ()
Conference
6th International Conference on Computational Linguistics in Bulgaria (CLIB), Sep 09-10, 2024, Sofia, Bulgaria
Available from: 2024-12-10 Created: 2024-12-10 Last updated: 2025-02-01Bibliographically approved
Weissweiler, L., Böbel, N., Guiller, K., Herrera, S., Scivetti, W., Lorenzi, A., . . . Schneider, N. (2024). UCxn: Typologically Informed Annotation of Constructions Atop Universal Dependencies. In: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024): . Paper presented at The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024) (pp. 16919-16932).
Open this publication in new window or tab >>UCxn: Typologically Informed Annotation of Constructions Atop Universal Dependencies
Show others...
2024 (English)In: Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), 2024, p. 16919-16932Conference paper, Published paper (Refereed)
National Category
Natural Language Processing
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-545943 (URN)
Conference
The 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)
Available from: 2025-01-01 Created: 2025-01-01 Last updated: 2025-02-07
Projects
Syntactic parsing of synthetic languages [2008-02073_VR]; Uppsala UniversityThe Gender and Work Database at Uppsala university [2010-06012_VR]; Uppsala UniversityUniversal Dependency Parsing [2016-01817_VR]; Uppsala UniversityEn avancerad databas av effekterna av extrema klimathändelser i Europa från nättexter [2022-03448_VR]; Uppsala UniversityCentre of excellence on Impacts of Climate Extremes under global change (ICE) [2022-06599_VR]; Uppsala University; Publications
Schutte, M., Portal, A., Lee, S. H. & Messori, G. (2025). Dynamics of stratospheric wave reflection over the North Pacific. Weather and Climate Dynamics, 6(2), 521-548
Organisations
Identifiers
ORCID iD: ORCID iD iconorcid.org/0000-0002-7873-3971

Search in DiVA

Show all publications