uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
A statistical model for grammar mapping
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. University of Tehran.
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
2016 (English)In: Natural Language Engineering, ISSN 1351-3249, E-ISSN 1469-8110, Vol. 22, no 2, 215-255 p.Article in journal (Refereed) Published
Abstract [en]

The two main classes of grammars are (a) hand-crafted grammars, which are developed bylanguage experts, and (b) data-driven grammars, which are extracted from annotated corpora.This paper introduces a statistical method for mapping the elementary structures of a data-driven grammar onto the elementary structures of a hand-crafted grammar in order to combinetheir advantages. The idea is employed in the context of Lexicalized Tree-Adjoining Grammars(LTAG) and tested on two LTAGs of English: the hand-crafted LTAG developed in theXTAG project, and the data-driven LTAG, which is automatically extracted from the PennTreebank and used by the MICA parser. We propose a statistical model for mapping anyelementary tree sequence of the MICA grammar onto a proper elementary tree sequence ofthe XTAG grammar. The model has been tested on three subsets of the WSJ corpus thathave average lengths of 10, 16, and 18 words, respectively. The experimental results show thatfull-parse trees with average F1 -scores of 72.49, 64.80, and 62.30 points could be built from94.97%, 96.01%, and 90.25% of the XTAG elementary tree sequences assigned to the subsets,respectively. Moreover, by reducing the amount of syntactic lexical ambiguity of sentences,the proposed model significantly improves the efficiency of parsing in the XTAG system.

Place, publisher, year, edition, pages
Cambridge University Press, 2016. Vol. 22, no 2, 215-255 p.
National Category
Language Technology (Computational Linguistics)
Research subject
Computer Science with specialization in Human-Computer Interaction; Linguistics
Identifiers
URN: urn:nbn:se:uu:diva-248953DOI: 10.1017/S1351324915000017ISI: 000370862900003OAI: oai:DiVA.org:uu-248953DiVA: diva2:801400
Available from: 2015-04-09 Created: 2015-04-09 Last updated: 2017-12-04Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full text

Authority records BETA

Basirat, AliNivre, Joakim

Search in DiVA

By author/editor
Basirat, AliNivre, Joakim
By organisation
Department of Linguistics and Philology
In the same journal
Natural Language Engineering
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 783 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf