A Multilingual Evaluation of Three Spelling Normalization Methods for Historical Text.
2014 (English)In: Workshop on Language Technology for Cultural Heritage, Social Sciences and Humanities, LaTeCH 2014, 2014Conference paper (Refereed)
We present a multilingual evaluation of approaches for spelling normalisation of historical text based on data from five languages: English, German, Hungarian, Icelandic, and Swedish. Three different normalisation methods are evaluated: a simplistic filtering model, a Levenshteinbased approach, and a character-based statistical machine translation approach. The evaluation shows that the machine translation approach often gives the best results, but also that all approaches improve over the baseline and that no single method works best for all languages.
Place, publisher, year, edition, pages
Language Technology (Computational Linguistics)
Research subject Computational Linguistics
IdentifiersURN: urn:nbn:se:uu:diva-264781OAI: oai:DiVA.org:uu-264781DiVA: diva2:861521
European Association for Computational Linguistics, EACL 2014.