uu.seUppsala University Publications
Change search
ReferencesLink to record
Permanent link

Direct link
A Study on Automatically Extracted Keywords in Text Categorization
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. Datorlingvistik.ORCID iD: 0000-0002-4838-6518
2006 (English)In: Proceedings of International Conference of Association for Computational Linguistics, 2006Conference paper (Refereed)
Abstract [en]

This paper presents a study on if and how automatically extracted

keywords can be used to improve text categorization. In summary we

show that a higher performance --- as measured by micro-averaged

F-measure on a standard text categorization collection --- is achieved

when the full-text representation is combined with the automatically

extracted keywords. The combination is obtained by giving higher

weights to words in the full-texts that are also extracted as

keywords. We also present results for experiments in which the

keywords are the only input to the categorizer, either represented as

unigrams or intact. Of these two experiments, the unigrams have the

best performance, although neither performs as well as headlines only.

Place, publisher, year, edition, pages
National Category
Language Technology (Computational Linguistics)
URN: urn:nbn:se:uu:diva-18164OAI: oai:DiVA.org:uu-18164DiVA: diva2:45936
International Conference of Association for Computational Linguistics
Available from: 2006-11-20 Created: 2006-11-20 Last updated: 2016-03-08

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Megyesi, Beata
By organisation
Department of Linguistics and Philology
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

Total: 142 hits
ReferencesLink to record
Permanent link

Direct link