uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A bootstrapping method for development of Treebank
Univ Tehran, Sch Elect & Comp Engn, Coll Engn, Tehran, Iran..
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. Univ Tehran, Sch Elect & Comp Engn, Coll Engn, Tehran, Iran.
Univ Tehran, Sch Elect & Comp Engn, Coll Engn, Tehran, Iran.;Inst Res Fundamental Sci IPM, Sch Comp Sci, POB 19395-5746, Tehran, Iran..
Univ Tehran, Sch Elect & Comp Engn, Coll Engn, Tehran, Iran..
2017 (English)In: Journal of experimental and theoretical artificial intelligence (Print), ISSN 0952-813X, E-ISSN 1362-3079, Vol. 29, no 1, 19-42 p.Article in journal (Refereed) Published
Abstract [en]

Using statistical approaches beside the traditional methods of natural language processing could significantly improve both the quality and performance of several natural language processing (NLP) tasks. The effective usage of these approaches is subject to the availability of the informative, accurate and detailed corpora on which the learners are trained. This article introduces a bootstrapping method for developing annotated corpora based on a complex and rich linguistically motivated elementary structure called supertag. To this end, a hybrid method for supertagging is proposed that combines both of the generative and discriminative methods of supertagging. The method was applied on a subset of Wall Street Journal (WSJ) in order to annotate its sentences with a set of linguistically motivated elementary structures of the English XTAG grammar that is using a lexicalised tree-adjoining grammar formalism. The empirical results confirm that the bootstrapping method provides a satisfactory way for annotating the English sentences with the mentioned structures. The experiments show that the method could automatically annotate about 20% of WSJ with the accuracy of F-measure about 80% of which is particularly 12% higher than the F-measure of the XTAG Treebank automatically generated from the approach proposed by Basirat and Faili [(2013). Bridge the gap between statistical and hand-crafted grammars. Computer Speech and Language, 27, 1085-1104].

Place, publisher, year, edition, pages
2017. Vol. 29, no 1, 19-42 p.
Keyword [en]
Treebank, supertagging, parser, annotated corpus, bootstrapping, semi-supervised
National Category
Computer Science
Identifiers
URN: urn:nbn:se:uu:diva-316124DOI: 10.1080/0952813X.2015.1057239ISI: 000392422400002OAI: oai:DiVA.org:uu-316124DiVA: diva2:1080606
Available from: 2017-03-10 Created: 2017-03-10 Last updated: 2017-03-10Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Search in DiVA

By author/editor
Basirat, Ali
By organisation
Department of Linguistics and Philology
In the same journal
Journal of experimental and theoretical artificial intelligence (Print)
Computer Science

Search outside of DiVA

GoogleGoogle Scholar

Altmetric score

Total: 151 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf