Building a Large Machine-Aligned Parallel Treebank
2009 (English)In: Proceedings of the 8th International Workshop on Treebanks and Linguistic Theories (TLT’08) / [ed] Marco Passarotti, Adam Przepiórkowski, Savina Raynaud &, Frank Van Eynde, Milano/Italy: EDUCatt , 2009, 197-208 p.Conference paper (Refereed)
This paper reports on-going work on building a large automatically tree-aligned parallel treebank in the context of a syntax-based machine translation (MT) approach. For this we develop a discriminative tree aligner based on a log-linear model with a rich feature set. We incorporate various language-independent and language-specific features taking advantage of existing tools and annotation. Our initial experiments on a small hand-aligned treebank show promising results even with small amounts of training data. The performance of our approach is well above unsupervised techniques reported elsewhere. This enables us to quickly create training material and alignment models for additional language pairs. In recent work, we aligned more than one million sentence pairs and started our experiments with the extraction of transfer knowledge for our example-based machine translation system.
Place, publisher, year, edition, pages
Milano/Italy: EDUCatt , 2009. 197-208 p.
Language Technology (Computational Linguistics) Language Technology (Computational Linguistics) Language Technology (Computational Linguistics)
Research subject Computational Linguistics
IdentifiersURN: urn:nbn:se:uu:diva-112295ISBN: 978-88-8311-712-1OAI: oai:DiVA.org:uu-112295DiVA: diva2:285826
8th International Workshop on Treebanks and Linguistic Theories