uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
ParCor 1.0: A Parallel Pronoun-Coreference Corpus to Support Statistical MT
University of Edinburgh.
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. (Datorlingvistik)
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
Show others and affiliations
2014 (English)In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14), Paris: European Language Resources Association, 2014, 3191-3198 p.Conference paper, Published paper (Refereed)
Abstract [en]

We present ParCor, a parallel corpus of texts in which pronoun coreference – reduced coreference in which pronouns are used as referringexpressions – has been annotated. The corpus is intended to be used both as a resource from which to learn systematic differences inpronoun use between languages and ultimately for developing and testing informed Statistical Machine Translation systems aimed ataddressing the problem of pronoun coreference in translation. At present, the corpus consists of a collection of parallel English-Germandocuments from two different text genres: TED Talks (transcribed planned speech), and EU Bookshop publications (written text). Alldocuments in the corpus have been manually annotated with respect to the type and location of each pronoun and, where relevant, itsantecedent. We provide details of the texts that we selected, the guidelines and tools used to support annotation and some corpus statistics.The texts in the corpus have already been translated into many languages, and we plan to expand the corpus into these other languages, aswell as other genres, in the future.

Place, publisher, year, edition, pages
Paris: European Language Resources Association, 2014. 3191-3198 p.
National Category
Language Technology (Computational Linguistics)
Research subject
Computational Linguistics
Identifiers
URN: urn:nbn:se:uu:diva-231280ISI: 000355611004132ISBN: 978-2-9517408-8-4 (print)OAI: oai:DiVA.org:uu-231280DiVA: diva2:744173
Conference
9th International Conference on Language Resources and Evaluation (LREC), MAY 26-31, 2014, Reykjavik, ICELAND
Funder
Swedish Research Council, 2012-916
Available from: 2014-09-06 Created: 2014-09-06 Last updated: 2015-10-06Bibliographically approved

Open Access in DiVA

LREC2014(125 kB)156 downloads
File information
File name FULLTEXT03.pdfFile size 125 kBChecksum SHA-512
207aa0e639dca56b151b00c488beeb7c461d7a3853c6d2d4af3de7538d158532bbf8fa14c63d1062e0895a826fc47b86fe68ca91f5bce584f1f78beb15271f8e
Type fulltextMimetype application/pdf

Other links

http://www.lrec-conf.org/proceedings/lrec2014/summaries/298.html

Authority records BETA

Hardmeier, ChristianSmith, AaronTiedemann, Jörg

Search in DiVA

By author/editor
Hardmeier, ChristianSmith, AaronTiedemann, Jörg
By organisation
Department of Linguistics and Philology
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 156 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 647 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf