uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Merging Comparable Data Sources for the Discrimination of Similar Languages: The DSL Corpus Collection
Univ Saarland, D-66123 Saarbrucken, Germany..
Univ Saarland, D-66123 Saarbrucken, Germany..
Univ Zagreb, Zagreb 41000, Croatia..
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
2014 (English)In: LREC 2014 - Ninth International Conference On Language Resources And Evaluation, 2014Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents the compilation of the DSL corpus collection created for the DSL (Discriminating Similar Languages) shared task to be held at the VarDial workshop at COLING 2014. The DSL corpus collection were merged from three comparable corpora to provide a suitable dataset for automatic classification to discriminate similar languages and language varieties. Along with the description of the DSL corpus collection we also present results of baseline discrimination experiments reporting performance of up to 87.4% accuracy.

Place, publisher, year, edition, pages
2014.
Keyword [en]
language identification, language discrimination, comparable corpus, similar languages, language varieties
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:uu:diva-264098ISI: 000355611000019ISBN: 978-2-9517408-8-4 (print)OAI: oai:DiVA.org:uu-264098DiVA: diva2:860454
Conference
9th International Conference on Language Resources and Evaluation (LREC), MAY 26-31, 2014, Reykjavik, ICELAND
Available from: 2015-10-12 Created: 2015-10-05 Last updated: 2015-10-12Bibliographically approved

Open Access in DiVA

No full text

Authority records BETA

Tiedemann, Jorg

Search in DiVA

By author/editor
Tiedemann, Jorg
By organisation
Department of Linguistics and Philology
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 299 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf