ParCor 1.0: A Parallel Pronoun-Coreference Corpus to Support Statistical MT
2014 (English)In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), Paris: European Language Resources Association, 2014, 3191-3198 p.Conference paper (Refereed)
We present ParCor, a parallel corpus of texts in which pronoun coreference – reduced coreference in which pronouns are used as referringexpressions – has been annotated. The corpus is intended to be used both as a resource from which to learn systematic differences inpronoun use between languages and ultimately for developing and testing informed Statistical Machine Translation systems aimed ataddressing the problem of pronoun coreference in translation. At present, the corpus consists of a collection of parallel English-Germandocuments from two different text genres: TED Talks (transcribed planned speech), and EU Bookshop publications (written text). Alldocuments in the corpus have been manually annotated with respect to the type and location of each pronoun and, where relevant, itsantecedent. We provide details of the texts that we selected, the guidelines and tools used to support annotation and some corpus statistics.The texts in the corpus have already been translated into many languages, and we plan to expand the corpus into these other languages, aswell as other genres, in the future.
Place, publisher, year, edition, pages
Paris: European Language Resources Association, 2014. 3191-3198 p.
Language Technology (Computational Linguistics)
Research subject Computational Linguistics
IdentifiersURN: urn:nbn:se:uu:diva-231280ISI: 000355611004132ISBN: 978-2-9517408-8-4OAI: oai:DiVA.org:uu-231280DiVA: diva2:744173
9th International Conference on Language Resources and Evaluation (LREC), MAY 26-31, 2014, Reykjavik, ICELAND
FunderSwedish Research Council, 2012-916