uu.seUppsala University Publications
Change search
ReferencesLink to record
Permanent link

Direct link
Extending the View: Explorations in Bootstrapping a Swedish PoS Tagger
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. (Språkteknologi)
2009 (English)In: Proceedings of the 17th Nordic Conference on Computational Linguistics NODALIDA 2009, Tartu, Estland: Tartu University Library , 2009, 34-40 p.Conference paper (Refereed)
Abstract [en]

State-of-the-art statistical part-of-speech taggers mainly use information on tag bi- or  trigrams, depending on the size of the training corpus. Some also use lexical emission probabilities above unigrams with beneficial results. In both cases, a wider context usually gives better accuracy for a large training corpus, which in turn gives better accuracy than a smaller one.  Large corpora with validated tags, however, are scarce, so a bootstrap technique can be used. As the corpus grows, it is probable that a widened context would improve results even further.

In this paper, we looked at the contribution to accuracy of such an extended view for both tag transitions and lexical emissions, applied to both a validated Swedish source corpus and a raw bootstrap corpus. We found that the extended view was more important for tag transitions, in particular if applied to the bootstrap corpus. For lexical emission, it was also more important if applied to the bootstrap corpus than to the source corpus, although it was beneficial for both. The overall best tagger had an accuracy of 98.05%.

Place, publisher, year, edition, pages
Tartu, Estland: Tartu University Library , 2009. 34-40 p.
, NEALT Proceedings Series, ISSN 1736-6305 ; 4
Keyword [en]
part-of-speech tagging, machine learning
Keyword [sv]
ordklassuppmärkning, maskininlärning
National Category
Language Technology (Computational Linguistics)
Research subject
URN: urn:nbn:se:uu:diva-103321OAI: oai:DiVA.org:uu-103321DiVA: diva2:218046
Available from: 2009-05-18 Created: 2009-05-18 Last updated: 2009-05-19Bibliographically approved

Open Access in DiVA

No full text

Other links

By organisation
Department of Linguistics and Philology
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 177 hits
ReferencesLink to record
Permanent link

Direct link