uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Using Linguistic Data for Genre Classification
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. Datorlingvistik.ORCID iD: 0000-0002-4838-6518
2005 (English)In: Proceedings of SAIS-SSLS, 2005Conference paper, Published paper (Refereed)
Abstract [en]

Automatic categorization of texts into genres, rather than subject categories, is typically quite difficult. We have run a series of experiments on an annotated Swedish text corpus to determine whether the use of linguistic metadata (in this case, parts of speech) can be used to improve the performance of such categorizers. Compared to the traditional approach of using word frequencies, we consistently achieved better results and reduced the error rate by 8.6%

Place, publisher, year, edition, pages
2005.
Keyword [en]
text categorization, genre detection
National Category
Language Technology (Computational Linguistics)
Research subject
Computational Linguistics
Identifiers
URN: urn:nbn:se:uu:diva-18166OAI: oai:DiVA.org:uu-18166DiVA: diva2:45938
Conference
SAIS-SSLS
Available from: 2006-11-20 Created: 2006-11-20 Last updated: 2017-01-25

Open Access in DiVA

No full text

Authority records BETA

Megyesi, Beata

Search in DiVA

By author/editor
Megyesi, Beata
By organisation
Department of Linguistics and Philology
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 338 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf