Tunable Distortion Limits and Corpus Cleaning for SMT
2013 (English)In: Proceedings of the Eighth Workshop on Statistical Machine Translation, Association for Computational Linguistics, 2013, 225-231 p.Conference paper (Refereed)
We describe the Uppsala University system for WMT13, for English-to-German translation. We use the Docent decoder, a local search decoder that translates at the document level. We add tunable distortion limits, that is, soft constraints on the maximum distortion allowed, to Docent. We also investigate cleaning of the noisy Common Crawl corpus. We show that we can use alignment-based filtering for cleaning with good results. Finally we investigate effects of corpus selection for recasing.
Place, publisher, year, edition, pages
Association for Computational Linguistics, 2013. 225-231 p.
Language Technology (Computational Linguistics)
IdentifiersURN: urn:nbn:se:uu:diva-207765OAI: oai:DiVA.org:uu-207765DiVA: diva2:649447
WMT 2013; 8-9 August; Sofia, Bulgaria