uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A Persian Treebank with Stanford Typed Dependencies
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. (datorlingvistik)
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. (iranistik)
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. (datorlingvistik)
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
2014 (English)In: Proceedings of Language Resources and Evaluation, 2014, 796-801 p.Conference paper, Published paper (Refereed)
Abstract [en]

We present the Uppsala Persian Dependency Treebank (UPDT) with a syntactic annotation scheme based on Stanford Typed Dependencies.

The treebank consists of 6,000 sentences and 151,671 tokens with an average sentence length of 25 words. The data is from different genres, including newspaper articles and fiction, as well as technical descriptions and texts about culture and art, taken from the open source Uppsala Persian Corpus (UPC). The syntactic annotation scheme is extended for Persian to include all syntactic relations that could not be covered by the primary scheme developed for English. In addition, we present open source tools for automatic analysis of Persian containing a text normalizer, a sentence segmenter and tokenizer, a part-of-speech tagger, and a parser. The treebank and the parser have been developed simultaneously in a bootstrapping procedure. The result of a parsing experiment shows an overall labeled attachment score of 82.05% and an unlabeled attachment score of 85.29%. The treebank is freely available as an open source resource.

Place, publisher, year, edition, pages
2014. 796-801 p.
Keyword [en]
treebank, Persian, Stanford Typed Dependencies
National Category
Language Technology (Computational Linguistics)
Research subject
Computational Linguistics; Iranian Languages
Identifiers
URN: urn:nbn:se:uu:diva-239450ISI: 000355611002062ISBN: 978-2-9517408-8-4 (print)OAI: oai:DiVA.org:uu-239450DiVA: diva2:774588
Conference
The 9th International Conference on Language Resources and Evaluation (LREC), 2014, 26-31 May, Reykjavik, Iceland
Available from: 2014-12-26 Created: 2014-12-26 Last updated: 2017-01-25Bibliographically approved

Open Access in DiVA

fulltext(215 kB)125 downloads
File information
File name FULLTEXT01.pdfFile size 215 kBChecksum SHA-512
9fc5d4ff72729ea6b3fbf58e71b8e406cc4fc4cbc0754c3fc4afd751d33626f23a04da0d89fc9771da3f92d859adec20c77307029db729d0810e7ab27a54d8db
Type fulltextMimetype application/pdf

Authority records BETA

Seraji, MojganJahani, CarinaMegyesi, BeataNivre, Joakim

Search in DiVA

By author/editor
Seraji, MojganJahani, CarinaMegyesi, BeataNivre, Joakim
By organisation
Department of Linguistics and Philology
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
Total: 125 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 491 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf