uu.seUppsala University Publications
Change search
Link to record
Permanent link

Direct link
BETA
Dahlqvist, Bengt
Publications (10 of 59) Show all publications
Dahlqvist, B. (2010). Sökbarhet i digitaliserade dokument: Metoder och överväganden. Uppsala: Uppsala universitet
Open this publication in new window or tab >>Sökbarhet i digitaliserade dokument: Metoder och överväganden
2010 (Swedish)Report (Other academic)
Place, publisher, year, edition, pages
Uppsala: Uppsala universitet, 2010. p. 13
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:uu:diva-296823 (URN)
Available from: 2016-06-20 Created: 2016-06-20 Last updated: 2018-01-10
Megyesi, B., Dahlqvist, B., Csató, É. Á. & Nivre, J. (2010). The English-Swedish-Turkish Parallel Treebank. In: Proceedings of Language Resources and Evaluation (LREC 2010): . Paper presented at LREC 2010, 17-23 May 2010, Valletta, Malta.
Open this publication in new window or tab >>The English-Swedish-Turkish Parallel Treebank
2010 (English)In: Proceedings of Language Resources and Evaluation (LREC 2010), 2010Conference paper, Published paper (Refereed)
Abstract [en]

We describe a syntactically annotated parallel corpus containing typologically partly different languages, namely English, Swedish and Turkish. The corpus consists of approximately 300 000 tokens in Swedish, 160 000 in Turkish and 150 000 in English, containing both fiction and technical documents. We build the corpus by using the Uplug toolkit for automatic structural markup, such as tokenization and sentence segmentation, as well as sentence and word alignment. In addition, we use basic language resource kits for the linguistic analysis of the languages involved. The annotation is carried on various layers from morphological and part of speech analysis to dependency structures. The tools used for linguistic annotation, e.g. HunPos tagger and MaltParser, are freely available data-driven resources, trained on existing corpora and treebanks for each language. The parallel treebank is used in teaching and linguistic research to study the relationship between the structurally different languages. In order to study the treebank, several tools have been developed for the visualization of the annotation and alignment, allowing search for linguistic patterns.

Keywords
treebank, parallel corpus, language resource, trädbank, parallell korpus, språkresurs
National Category
Language Technology (Computational Linguistics) Language Technology (Computational Linguistics) Specific Languages
Research subject
Computational Linguistics; Turkic languages
Identifiers
urn:nbn:se:uu:diva-121758 (URN)
Conference
LREC 2010, 17-23 May 2010, Valletta, Malta
Available from: 2010-03-31 Created: 2010-03-29 Last updated: 2018-01-12Bibliographically approved
Lindström, J., Brun, A. & Dahlqvist, B. (2009). OCR-läsning av äldre källmaterial: Vad kan (och bör) man göra?. Uppsala: Uppsala universitet
Open this publication in new window or tab >>OCR-läsning av äldre källmaterial: Vad kan (och bör) man göra?
2009 (Swedish)Report (Other academic)
Place, publisher, year, edition, pages
Uppsala: Uppsala universitet, 2009. p. 22
Keywords
OCR, historical texts
National Category
Language Technology (Computational Linguistics)
Research subject
Computerized Image Processing
Identifiers
urn:nbn:se:uu:diva-296821 (URN)
Available from: 2016-06-20 Created: 2016-06-20 Last updated: 2018-10-25
Saxena, A., Megyesi, B., Csató, É. Á. & Dahlqvist, B. (2009). Using Parallel Corpora in Teaching and Research: The Swedish-Hindi-English and Swedish-Turkish-English Parallel Corpora. In: Anju Saxena, Åke Viberg (Ed.), Multilingualism: proceedings of the 23rd Scandinavian Conference of Linguistics : Uppsala University, 1-3 October 2008. Paper presented at 23rd Scandinavian Conference of Linguistics. Uppsala: Acta Universitatis Upsaliensis
Open this publication in new window or tab >>Using Parallel Corpora in Teaching and Research: The Swedish-Hindi-English and Swedish-Turkish-English Parallel Corpora
2009 (English)In: Multilingualism: proceedings of the 23rd Scandinavian Conference of Linguistics : Uppsala University, 1-3 October 2008 / [ed] Anju Saxena, Åke Viberg, Uppsala: Acta Universitatis Upsaliensis , 2009Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2009
Series
Studia Linguistica Upsaliensia, ISSN 1652-1366 ; 8
Keywords
corpus linguistics, parallel corpora, computer-assisted language learning
National Category
General Language Studies and Linguistics
Research subject
Linguistics
Identifiers
urn:nbn:se:uu:diva-99534 (URN)978-91-554-7594-9 (ISBN)
Conference
23rd Scandinavian Conference of Linguistics
Available from: 2009-03-16 Created: 2009-03-16 Last updated: 2018-01-13Bibliographically approved
Nivre, J., Megyesi, B., Gustafson-Capková, S., Salomonsson, F. & Dahlqvist, B. (2008). Cultivating a Swedish Treebank. In: Nivre, Joakim; Dahllöf, Mats; Megyesi, Beáta (Ed.), Resourceful Language Technology. A Festschrift in Honor of Anna Sågvall Hein: (pp. 111-120). Acta Universitatis Upsaliensis
Open this publication in new window or tab >>Cultivating a Swedish Treebank
Show others...
2008 (English)In: Resourceful Language Technology. A Festschrift in Honor of Anna Sågvall Hein / [ed] Nivre, Joakim; Dahllöf, Mats; Megyesi, Beáta, Acta Universitatis Upsaliensis, 2008, p. 111-120Chapter in book (Other academic)
Place, publisher, year, edition, pages
Acta Universitatis Upsaliensis, 2008
National Category
Language Technology (Computational Linguistics)
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-277173 (URN)
Available from: 2016-02-17 Created: 2016-02-17 Last updated: 2018-01-10
Megyesi, B., Dahlqvist, B., Pettersson, E., Gustafson-Capkova, S. & Nivre, J. (2008). Supporting Research Environment for Less Explored Languages: A Case Study of Swedish and Turkish. In: Nivre, Joakim, Dahllöf, Mats, Megyesi, Beáta (Ed.), Resourceful Language Technology: Festschrift in Honor of Anna Sågvall Hein (pp. 96-110). Uppsala: Uppsala universitet
Open this publication in new window or tab >>Supporting Research Environment for Less Explored Languages: A Case Study of Swedish and Turkish
Show others...
2008 (English)In: Resourceful Language Technology: Festschrift in Honor of Anna Sågvall Hein / [ed] Nivre, Joakim, Dahllöf, Mats, Megyesi, Beáta, Uppsala: Uppsala universitet, 2008, p. 96-110Chapter in book (Other academic)
Place, publisher, year, edition, pages
Uppsala: Uppsala universitet, 2008
Series
Studia Linguistica Upsaliensia, ISSN 1652-1366 ; 7
Keywords
parallel corpora, parallel treebank
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:uu:diva-87590 (URN)978-91--5547226-9 (ISBN)
Projects
Supporting Research Environment for Less Explored Languages
Available from: 2008-12-31 Created: 2008-12-31 Last updated: 2018-01-13
Megyesi, B., Csató, É., Dahlqvist, B., Gustafson-Capková, S., Nivre, J., Pettersson, E. & Sågvall Hein, A. (2008). Supporting Research Environment for Swedish and Turkish. Uppsala: Dept of Linguistics and Philology
Open this publication in new window or tab >>Supporting Research Environment for Swedish and Turkish
Show others...
2008 (English)Report (Other (popular science, discussion, etc.))
Place, publisher, year, edition, pages
Uppsala: Dept of Linguistics and Philology, 2008. p. 17
National Category
Language Technology (Computational Linguistics) Specific Languages
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-99535 (URN)
Available from: 2009-03-16 Created: 2009-03-16 Last updated: 2018-01-13Bibliographically approved
Megyesi, B., Dahlqvist, B., Pettersson, E. & Nivre, J. (2008). Swedish-Turkish Parallel Treebank. In: Proceedings of the Sixth International Language Resources and Evaluation (LREC'08): . Paper presented at LREC'08, May 28-30, Marrakech, Morocco. Paris: European Language Resources Association (ELRA)
Open this publication in new window or tab >>Swedish-Turkish Parallel Treebank
2008 (English)In: Proceedings of the Sixth International Language Resources and Evaluation (LREC'08), Paris: European Language Resources Association (ELRA) , 2008Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we describe our work on building a parallel treebank for a less studied and typologically dissimilar language pair, namely Swedish and Turkish. The treebank is a balanced syntactically annotated corpus containing both fiction and technical documents. In total, it consists of approximately 160,000 tokens in Swedish and 145,000 in Turkish. The texts are linguistically annotated using different layers from part of speech tags and morphological features to dependency annotation. Each layer is automatically processed by using basic language resources for the involved languages. The sentences and words are aligned, and partly manually corrected. We create the treebank by reusing and adjusting existing tools for the automatic annotation, alignment, and their correction and visualization. The treebank was developed within the project Supporting research environment for minor languages aiming at to create representative language resources for language pairs dissimilar in language structure. Therefore, efforts are put on developing a general method for formatting and annotation procedure, as well as using tools that can be applied to other language pairs easily.

Place, publisher, year, edition, pages
Paris: European Language Resources Association (ELRA), 2008
Keywords
parallel treebanks, corpus linguistics
National Category
Language Technology (Computational Linguistics)
Identifiers
urn:nbn:se:uu:diva-87592 (URN)
Conference
LREC'08, May 28-30, Marrakech, Morocco
Available from: 2008-12-31 Created: 2008-12-31 Last updated: 2018-01-13
Dahlqvist, B. & Nordenfors, M. (2008). Using the Text Processing Tool Textin to Examine Developmental Aspects of School Texts. In: Nivre, Joakim, Dahllöf, Mats, Megyesi, Beáta (Ed.), Resourceful Language Technology: Festschrift in Honor of Anna Sågvall Hein (pp. 61-76). Uppsala: Uppsala universitet
Open this publication in new window or tab >>Using the Text Processing Tool Textin to Examine Developmental Aspects of School Texts
2008 (English)In: Resourceful Language Technology: Festschrift in Honor of Anna Sågvall Hein / [ed] Nivre, Joakim, Dahllöf, Mats, Megyesi, Beáta, Uppsala: Uppsala universitet, 2008, p. 61-76Chapter in book (Other academic)
Abstract [en]

The purpose with this article is to first make a brief presentation of the functions in the web based text processing tool Textin 1.2, and then to illuminate these functions by putting the program to use within a research project in progress that concerns developmental aspects on texts written by Swedish pupils during school years 5 to 9. The text will begin with a brief description of Textins’ main functions, and then move on to previous research on school texts where computer linguistic methods either were used or could have been used if the technology had been accessible at the time being. The article then continues with a presentation of the results that Textin delivers, and ends with a discussion on these findings.

Place, publisher, year, edition, pages
Uppsala: Uppsala universitet, 2008
Series
Studia Linguistica Upsaliensia, ISSN 1652-1366 ; 7
National Category
Language Technology (Computational Linguistics)
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-89240 (URN)978-91--5547226-9 (ISBN)
Available from: 2009-02-10 Created: 2009-02-10 Last updated: 2018-01-13Bibliographically approved
Dahlqvist, B. & Megyesi, B. (2007). Changing the tokenization in Talbanken to SUC2.0. Department of Linguistics and Philology, Uppsala University.
Open this publication in new window or tab >>Changing the tokenization in Talbanken to SUC2.0
2007 (English)Report (Other academic)
Place, publisher, year, edition, pages
Department of Linguistics and Philology, Uppsala University., 2007
National Category
Language Technology (Computational Linguistics)
Research subject
Computational Linguistics
Identifiers
urn:nbn:se:uu:diva-13202 (URN)
Available from: 2008-01-21 Created: 2008-01-21 Last updated: 2018-01-12
Organisations

Search in DiVA

Show all publications