A Multilingual Evaluation of Three Spelling Normalization Methods for Historical Text
2014 (English)In: Proceedings of the 8th Workshop on Language Technologyfor Cultural Heritage, Social Sciences, and Humanities(LaTeCH), 2014, 32-41 p.Conference paper (Refereed)
We present a multilingual evaluation of approaches for spelling normalisation of historical text based on data from five languages: English, German, Hungarian, Icelandic, and Swedish. Three different normalisation methods are evaluated: a simplistic filtering model, a Levenshteinbased approach, and a character-based statistical machine translation approach. The evaluation shows that the machine translation approach often gives the best results, but also that all approaches improve over the baseline and that no single method works best for all languages.
Place, publisher, year, edition, pages
2014. 32-41 p.
spelling normalization, historical texts
Language Technology (Computational Linguistics)
Research subject Computational Linguistics
IdentifiersURN: urn:nbn:se:uu:diva-239449ISBN: 978-1-937284-85-5OAI: oai:DiVA.org:uu-239449DiVA: diva2:774587
14th Conference of the European Association for Computational Linguistics, EACL 2014, 26–30 April, Gothenburg, Sweden
FunderSwedish Research Council