uu.seUppsala University Publications
Change search
ReferencesLink to record
Permanent link

Direct link
General-Purpose Text Categorization Applied to the Medical Domain.
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. Datorlingvistik.ORCID iD: 0000-0002-4838-6518
2007 (English)Report (Other academic)
Abstract [en]

This paper presents work where a general-purpose text categorization method was applied to categorize medical free-texts. The purpose of the experiments was to examine how such a method performs without any domain-specific knowledge, hand-crafting or tuning. Additionally, we compare the results from the general-purpose method with results from runs in which a medical thesaurus as well as automatically extracted keywords were used when building the classifiers. We show that standard text categorization techniques using stemmed unigrams as the basis for learning can be applied directly to categorize medical reports, yielding an F-measure of 83.9, and outperforming the more sophisticated methods.

Place, publisher, year, edition, pages
Department of Computer and Systems Sciences, Stockholm University , 2007.
, Research Report 2007-016
National Category
Language Technology (Computational Linguistics)
Research subject
Computational Linguistics; Computer Systems Sciences
URN: urn:nbn:se:uu:diva-13201OAI: oai:DiVA.org:uu-13201DiVA: diva2:40971
Available from: 2008-01-21 Created: 2008-01-21 Last updated: 2016-03-08

Open Access in DiVA

No full text

Search in DiVA

By author/editor
Megyesi, Beata
By organisation
Department of Linguistics and Philology
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 181 hits
ReferencesLink to record
Permanent link

Direct link