uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
The Uppsala Corpus of Student Writings: Corpus Creation, Annotation, and Analysis
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. (Computational Linguistics)ORCID iD: 0000-0002-4838-6518
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. (Datorlingvistik)
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Scandinavian Languages.
2016 (English)In: Language Resources and Evaluation, 2016Conference paper, Published paper (Refereed)
Abstract [en]

The Uppsala Corpus of Student Writings consists of Swedish texts produced as part of a national test of students ranging in age from nine (in year three of primary school) to nineteen (the last year of upper secondary school) who are studying either Swedish or Swedish as a second language. National tests have been collected since 1996. The corpus currently consists of 2,500 texts containing over 1.5 million tokens. Parts of the texts have been annotated on several linguistic levels using existing state-of-the-art natural language processing tools. In order to make the corpus easy to interpret for scholars in the humanities, we chose the CoNLL format instead of an XML-based representation. Since spelling and grammatical errors are common in student writings, the texts are automatically corrected while keeping the original tokens in the corpus. Each token is annotated with part-of-speech and morphological features as well as syntactic structure. The main purpose of the corpus is to facilitate the systematic and quantitative empirical study of the writings of various student groups based on gender, geographic area, age, grade awarded or a combination of these, synchronically or diachronically. The intention is for this to be a monitor corpus, currently under development.

Place, publisher, year, edition, pages
2016.
Keyword [en]
student writings, digital humanities, educational applications
National Category
Specific Languages Language Technology (Computational Linguistics)
Research subject
Computational Linguistics; Scandinavian Languages
Identifiers
URN: urn:nbn:se:uu:diva-280192OAI: oai:DiVA.org:uu-280192DiVA: diva2:910321
Conference
Language Resources and Evaluation (LREC) 2016
Projects
SWE-CLARIN
Funder
Swedish Research Council
Available from: 2016-03-08 Created: 2016-03-08 Last updated: 2018-01-10Bibliographically approved

Open Access in DiVA

No full text in DiVA

Authority records BETA

Megyesi, BeataNäsman, JesperPalmér, Anne

Search in DiVA

By author/editor
Megyesi, BeataNäsman, JesperPalmér, Anne
By organisation
Department of Linguistics and PhilologyDepartment of Scandinavian Languages
Specific LanguagesLanguage Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 476 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf