uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Building a Large Machine-Aligned Parallel Treebank
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. (Datorlingvistik)
Faculteit der Letteren, Rijksuniversiteit Groningen. (Informatiekunde)
2009 (English)In: Proceedings of the 8th International Workshop on Treebanks and Linguistic Theories (TLT’08) / [ed] Marco Passarotti, Adam Przepiórkowski, Savina Raynaud &, Frank Van Eynde, Milano/Italy: EDUCatt , 2009, 197-208 p.Conference paper, Published paper (Refereed)
Abstract [en]

This paper reports on-going work on building a large automatically tree-aligned parallel treebank in the context of a syntax-based machine translation (MT) approach. For this we develop a discriminative tree aligner based on a log-linear model with a rich feature set. We incorporate various language-independent and language-specific features taking advantage of existing tools and annotation. Our initial experiments on a small hand-aligned treebank show promising results even with small amounts of training data. The performance of our approach is well above unsupervised techniques reported elsewhere. This enables us to quickly create training material and alignment models for additional language pairs. In recent work, we aligned more than one million sentence pairs and started our experiments with the extraction of transfer knowledge for our example-based machine translation system.

Place, publisher, year, edition, pages
Milano/Italy: EDUCatt , 2009. 197-208 p.
National Category
Language Technology (Computational Linguistics) Language Technology (Computational Linguistics) Language Technology (Computational Linguistics)
Research subject
Computational Linguistics
Identifiers
URN: urn:nbn:se:uu:diva-112295ISBN: 978-88-8311-712-1 (print)OAI: oai:DiVA.org:uu-112295DiVA: diva2:285826
Conference
8th International Workshop on Treebanks and Linguistic Theories
Available from: 2010-01-13 Created: 2010-01-12 Last updated: 2010-12-07Bibliographically approved

Open Access in DiVA

No full text

Other links

http://tlt8.unicatt.it/allegati/Proceedings_TLT8.pdf
By organisation
Department of Linguistics and Philology
Language Technology (Computational Linguistics)Language Technology (Computational Linguistics)Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 416 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf