Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
Refine search result
1 - 24 of 24
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Allassonniere-Tang, Marc
    et al.
    Univ Paris, CNRS, MNHN, EA UMR 7206, Paris, France..
    Lundgren, Olof
    Lund Univ, Lund, Sweden..
    Robbers, Maja
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Cronhamn, Sandra
    Lund Univ, Lund, Sweden..
    Larsson, Filip
    Lund Univ, Lund, Sweden..
    Her, One-Soon
    Tunghai Univ, Taichung, Taiwan..
    Hammarström, Harald
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Carling, Gerd
    Lund Univ, Lund, Sweden..
    Expansion by migration and diffusion by contact is a source to the global diversity of linguistic nominal categorization systems2021In: Humanities and Social Sciences Communications, E-ISSN 2662-9992, Vol. 8, no 1, article id 331Article in journal (Refereed)
    Abstract [en]

    Languages of diverse structures and different families tend to share common patterns if they are spoken in geographic proximity. This convergence is often explained by horizontal diffusibility, which is typically ascribed to language contact. In such a scenario, speakers of two or more languages interact and influence each other's languages, and in this interaction, more grammaticalized features tend to be more resistant to diffusion compared to features of more lexical content. An alternative explanation is vertical heritability: languages in proximity often share genealogical descent. Here, we suggest that the geographic distribution of features globally can be explained by two major pathways, which are generally not distinguished within quantitative typological models: feature diffusion and language expansion. The first pathway corresponds to the contact scenario described above, while the second occurs when speakers of genetically related languages migrate. We take the worldwide distribution of nominal classification systems (grammatical gender, noun class, and classifier) as a case study to show that more grammaticalized systems, such as gender, and less grammaticalized systems, such as classifiers, are almost equally widespread, but the former spread more by language expansion historically, whereas the latter spread more by feature diffusion. Our results indicate that quantitative models measuring the areal diffusibility and stability of linguistic features are likely to be affected by language expansion that occurs by historical coincidence. We anticipate that our findings will support studies of language diversity in a more sophisticated way, with relevance to other parts of language, such as phonology.

    Download full text (pdf)
    FULLTEXT01
  • 2.
    Barbieri, Chiara
    et al.
    Univ Zurich, Dept Evolutionary Biol & Environm Studies, CH-8057 Zurich, Switzerland.;Univ Zurich, Ctr Interdisciplinary Study Language Evolut, CH-8050 Zurich, Switzerland.;Max Planck Inst Evolutionary Anthropol, Dept Linguist & Cultural Evolut, D-04103 Leipzig, Germany.
    Blasi, Damian E.
    Max Planck Inst Evolutionary Anthropol, Dept Linguist & Cultural Evolut, D-04103 Leipzig, Germany.;Harvard Univ, Dept Human Evolutionary Biol, Cambridge, MA 02134 USA.;Yale Univ, Human Relat Area Files, New Haven, CT 06511 USA.
    Arango-Isaza, Epifania
    Univ Zurich, Dept Evolutionary Biol & Environm Studies, CH-8057 Zurich, Switzerland.;Univ Zurich, Ctr Interdisciplinary Study Language Evolut, CH-8050 Zurich, Switzerland.
    Sotiropoulos, Alexandros G.
    Univ Zurich, Dept Plant & Microbial Biol, CH-8008 Zurich, Switzerland.
    Hammarström, Harald
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Wichmann, Soren
    Univ Kiel, Cluster Excellence ROOTS, D-24118 Kiel, Germany.
    Greenhill, Simon J.
    Max Planck Inst Evolutionary Anthropol, Dept Linguist & Cultural Evolut, D-04103 Leipzig, Germany.;Univ Auckland, Sch Biol Sci, Auckland 1010, New Zealand.
    Gray, Russell D.
    Max Planck Inst Evolutionary Anthropol, Dept Linguist & Cultural Evolut, D-04103 Leipzig, Germany.
    Forkel, Robert
    Max Planck Inst Evolutionary Anthropol, Dept Linguist & Cultural Evolut, D-04103 Leipzig, Germany.
    Bickel, Balthasar
    Univ Zurich, Ctr Interdisciplinary Study Language Evolut, CH-8050 Zurich, Switzerland.;Univ Zurich, Dept Comparat Language Sci, CH-8050 Zurich, Switzerland.
    Shimizu, Kentaro K.
    Univ Zurich, Dept Evolutionary Biol & Environm Studies, CH-8057 Zurich, Switzerland.;Univ Zurich, Ctr Interdisciplinary Study Language Evolut, CH-8050 Zurich, Switzerland.;Yokohama City Univ, Kihara Inst Biol Res, Yokohama, Kanagawa 2440813, Japan.
    A global analysis of matches and mismatches between human genetic and linguistic histories2022In: Proceedings of the National Academy of Sciences of the United States of America, ISSN 0027-8424, E-ISSN 1091-6490, Vol. 119, no 47, article id 2122084119Article in journal (Refereed)
    Abstract [en]

    Human history is written in both our genes and our languages. The extent to which our biological and linguistic histories are congruent has been the subject of considerable debate, with clear examples of both matches and mismatches. To disentangle the patterns of demographic and cultural transmission, we need a global systematic assessment of matches and mismatches. Here, we assemble a genomic database (GeLaTo, or Genes and Languages Together) specifically curated to investigate genetic and linguistic diversity worldwide. We find that most populations in GeLaTo that speak languages of the same language family (i.e., that descend from the same ancestor language) are also genetically highly similar. However, we also identify nearly 20% mismatches in populations genetically close to linguistically unrelated groups. These mismatches, which occur within the time depth of known linguistic relatedness up to about 10,000 y, are scattered around the world, suggesting that they are a regular outcome in human history. Most mismatches result from populations shifting to the language of a neighboring population that is genetically different because of independent demographic histories. In line with the regularity of such shifts, we find that only half of the language families in GeLaTo are genetically more cohesive than expected under spatial autocorrelations. Moreover, the genetic and linguistic divergence times of population pairs match only rarely, with Indo-European standing out as the family with most matches in our sample. Together, our database and findings pave the way for systematically disentangling demographic and cultural history and for quantifying processes of shifts in language and social identities on a global scale.

    Download full text (pdf)
    fulltext
  • 3. Castermans, Thom
    et al.
    Hammarström, Harald
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Speckmann, Bettina
    Verbeek, Kevin
    Westenberg, Michel
    GlottoVis: Visualizing Language Endangerment and Documentation2017In: VIS4DH’17: 2nd Workshop on Visualization for the Digital Humanities / [ed] Collins, Christopher; Correll, Michael; El-Assady, Mennatallah; Jänicke, Stefan; Keim, Daniel; Wrisley, David, Phoenix, Arizona: IEEE , 2017, p. 1-5Chapter in book (Other academic)
  • 4.
    Forkel, Robert
    et al.
    Max Planck Inst Evolutionary Anthropol, Dept Linguist & Cultural Evolut, Leipzig, Germany..
    Hammarström, Harald
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Glottocodes: Identifiers linking families, languages and dialects to comprehensive reference information2022In: Semantic Web, ISSN 1570-0844, E-ISSN 2210-4968, Vol. 13, no 6, p. 917-924Article in journal (Refereed)
    Abstract [en]

    Glottocodes constitute the backbone identification system for the language, dialect and family inventory Glottolog (https://glottolog.org). In this paper, we summarize the motivation and history behind the system of glottocodes and describe the principles and practices of data curation, technical infrastructure and update/version-tracking systematics. Since our understanding of the target domain - the dialects, languages and language families of the entire world - is continually evolving, changes and updates are relatively common. The resulting data is assessed in terms of the FAIR (Findable, Accessible, Interoperable, Reusable) Guiding Principles for scientific data management and stewardship. As such the glottocode-system responds to an important challenge in the realm of Linguistic Linked Data with numerous NLP applications.

  • 5. Forkel, Robert
    et al.
    List, Johann-Mattis
    Greenhill, Simon J.
    Rzymski, Christoph
    Bank, Sebastian
    Cysouw, Michael
    Hammarström, Harald
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Haspelmath, Martin
    Kaiping, Gereon A.
    Gray, Russell D.
    Cross-Linguistic Data Formats, advancing data sharing and re-use in comparative linguistics2018In: Scientific Data, E-ISSN 2052-4463, Vol. 5, article id 180205Article in journal (Refereed)
    Abstract [en]

    The amount of available digital data for the languages of the world is constantly increasing. Unfortunately, most of the digital data are provided in a large variety of formats and therefore not amenable for comparison and re-use. The Cross-Linguistic Data Formats initiative proposes new standards for two basic types of data in historical and typological language comparison (word lists, structural datasets) and a framework to incorporate more data types (e.g. parallel texts, and dictionaries). The new specification for cross-linguistic data formats comes along with a software package for validation and manipulation, a basic ontology which links to more general frameworks, and usage examples of best practices.

    Download full text (pdf)
    fulltext
  • 6. Gijn, Rik van
    et al.
    Hammarström, Harald
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Kerke, Simon van de
    Krasnoukhova, Olga
    Muysken, Pieter
    Linguistic Areas, Linguistic Convergence and River Systems in South America2017In: Handbook of Areal Linguistics / [ed] Hickey, Raymond, Cambridge: Cambridge University Press , 2017, p. 964-996Chapter in book (Other academic)
  • 7.
    Hammarström, Harald
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    A Survey of African Languages2018In: The Languages and Linguistics of Africa / [ed] Tom Güldemann, Berlin: Mouton de Gruyter, 2018, p. 1-57Chapter in book (Other academic)
  • 8.
    Hammarström, Harald
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Language Isolates in the New Guinea region2017In: Language Isolates / [ed] Campbell, Lyle; Smith, Alex; Dougherty, Thomas, London: Routledge , 2017, p. 287-321Chapter in book (Other academic)
  • 9. Hammarström, Harald
    Unsupervised learning of morphology: Survey, model, algorithm and experiments2007Licentiate thesis, comprehensive summary (Other academic)
  • 10.
    Hammarström, Harald
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Castermans, Thom
    TU Eindhoven, Eindhoven, Netherlands.
    Forkel, Robert
    Max Planck Inst Sci Human Hist, Jena, Germany.
    Verbeek, Kevin
    TU Eindhoven, Eindhoven, Netherlands.
    Westenberg, Michel A.
    TU Eindhoven, Eindhoven, Netherlands.
    Speckmann, Bettina
    TU Eindhoven, Eindhoven, Netherlands.
    Simultaneous Visualization of Language Endangerment and Language Description2018In: Language Documentation & Conservation, E-ISSN 1934-5275, Vol. 12, p. 359-392Article in journal (Refereed)
    Abstract [en]

    The world harbors a diversity of some 6,500 mutually unintelligible languages. As has been increasingly observed by linguists, many minority languages are becoming endangered and will be lost forever if not documented. Urgently indeed, many efforts are being launched to document and describe languages. This undertaking naturally has the priority toward the most endangered and least described languages. For the first time, we combine world-wide databases on language description (Glottolog) and language endangerment (ElCat, Ethnologue, UNESCO) and provide two online interfaces, GlottoScope and GlottoVis, to visualize these together. The interfaces are capable of browsing, filtering, zooming, basic statistics, and different ways of combining the two measures on a world map background. GlottoVis provides advanced techniques for combining cluttered dots on a map. With the tools and databases described we seek to increase the overall knowledge of the actual state language endangerment and description worldwide.

    Download full text (pdf)
    fulltext
  • 11.
    Hammarström, Harald
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Rönchen, Philipp
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Elgh, Erik
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Wiklund, Tilo
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Mathematics, Applied Mathematics and Statistics. Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    On computational historical linguistics in the 21st century2019In: Theoretical Linguistics, ISSN 0301-4428, E-ISSN 1613-4060, Vol. 45, no 3-4, p. 233-245Article in journal (Other academic)
    Download full text (pdf)
    fulltext
  • 12.
    Hammarström, Harald
    et al.
    Department of Computing Science, Chalmers University, Gothenburg.
    Thornell, Christina
    Department of African Languages, Gothenburg University, Gothenburg.
    Petzell, Malin
    Department of African Languages, Gothenburg University, Gothenburg.
    Westerlund, Torbjörn
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Bootstrapping Language Description: The case of Mpiemo (Bantu A, Central African Republic)2008Conference paper (Refereed)
    Abstract [en]

    Linguists have long been producing grammatical decriptions of yet undescribed languages. This is a time-consuming process, which has already adapted to improved technology for recording and storage. We present here a novel application of NLP techniques to bootstrap analysis of collected data and speed-up manual selection work. To be more precise, we argue that unsupervised induction of morphology and part-of-speech analysis from raw text data is mature enough to produce useful results. Experiments with Latent Semantic Analysis were less fruitful. We exemplify this on Mpiemo, a so-far essentially undescribed Bantu language of the Central African Republic, for which raw text data was available.

    Download full text (pdf)
    FULLTEXT01
  • 13.
    Hammarström, Harald
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Virk, Shafqat Mumtaz
    Forsberg, Markus
    Poor Man’s OCR Post-Correction: Unsupervised Recognition of Variant Spelling Applied to a Multilingual Document Collection2017In: Proceedings of the Digital Access to Textual Cultural Heritage (DATeCH) conference, Göttingen: ACM , 2017, p. 71-75Chapter in book (Other academic)
  • 14.
    Her, One-Soon
    et al.
    Tunghai Univ, Dept Foreign Languages & Literature, Taichung, Taiwan.;Natl Chengchi Univ, Grad Inst Linguist, Taipei, Taiwan..
    Hammarström, Harald
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Allassonniere-Tang, Marc
    Univ Paris City, Lab Ecol Anthropol, CNRS, MNHN, Paris, France..
    Defining numeral classifiers and identifying classifier languages of the world2022In: Linguistics Vanguard, E-ISSN 2199-174X, Vol. 8, no 1, p. 151-164Article in journal (Refereed)
    Abstract [en]

    This paper presents a precise definition of numeral classifiers, steps to identify a numeral classifier language, and a database of 3,338 languages, of which 723 languages have been identified as having a numeral classifier system. The database, named World Atlas of Classifier Languages (WACL), has been systematically constructed over the last 10 years via a manual survey of relevant literature and also an automatic scan of digitized grammars followed by manual checking. The open-access release of WACL is thus a significant contribution to linguistic research in providing (i) a precise definition and examples of how to identify numeral classifiers in language data and (ii) the largest dataset of numeral classifier languages in the world. As such it offers researchers a rich and stable data source for conducting typological, quantitative, and phylogenetic analyses on numeral classifiers. The database will also be expanded with additional features relating to numeral classifiers in the future in order to allow more fine-grained analyses.

    Download full text (pdf)
    FULLTEXT01
  • 15.
    Kalyan, Siva
    et al.
    Australian Natl Univ, Coll Asia & Pacific, Sch Culture Hist & Language, Dept Linguist, 9 Fellows Rd, Acton, ACT 2601, Australia.
    Francois, Alexandre
    Australian Natl Univ, Canberra, ACT, Australia;CNRS, LaTTiCe, ENS, Paris, France;PSL, ENS, Paris, France;USPC, Paris 3, Paris, France.
    Hammarström, Harald
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Problems With, And Alternatives To, The Tree Model In Historical Linguistics2019In: Journal of Historical Linguistics, ISSN 2210-2116, E-ISSN 2210-2124, Vol. 9, no 1, p. 1-8Article in journal (Other academic)
  • 16. Manova, Stela
    et al.
    Hammarström, Harald
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Kastner, Itamar
    Nie, Yining
    What is in a morpheme?: Theoretical, experimental and computational approaches to the relation of meaning and form in morphology2020In: Word Structure, ISSN 1750-1245, E-ISSN 1755-2036, Vol. 13, no 1, p. 1-21Article in journal (Refereed)
  • 17. Pawley, Andrew
    et al.
    Hammarström, Harald
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    The Trans New Guinea family2017In: Papuan Languages and Linguistics / [ed] Palmer, Bill, Berlin: DeGruyter Mouton , 2017, p. 21-195Chapter in book (Other academic)
  • 18.
    Robbers, Maja
    et al.
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Hammarström, Harald
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    A computer-assisted quantitative typology of spatial deictic coding strategies2019Conference paper (Refereed)
  • 19.
    Seifart, Frank
    et al.
    CNRS, Paris, France; Université de Lyon, Lyon, France; University of Amsterdam, Amsterdam, Netherlands; University of Cologne, Cologne, Germany.
    Evans, Nicholas
    ARC Centre of Excellence for the Dynamics of Language, The Australian National University, Canberra, ACT, Australia.
    Hammarström, Harald
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. Max Planck Inst Sci Human Hist, Jena, Germany.
    Levinson, Stephen C.
    Max Planck Institute for Psycholinguistics, Nijmegen, Netherlands.
    Language documentation twenty-five years on2018In: Language, ISSN 0097-8507, E-ISSN 1535-0665, Vol. 94, no 4, p. E324-E345Article in journal (Refereed)
    Abstract [en]

    This discussion note reviews responses of the linguistics profession to the grave issues of language endangerment identified a quarter of a century ago in the journal Language by Krauss, Hale, England, Craig, and others (Hale et al. 1992). Two and a half decades of worldwide research not only have given us a much more accurate picture of the number, phylogeny, and typological variety of the world's languages, but they have also seen the development of a wide range of new approaches, conceptual and technological, to the problem of documenting them. We review these approaches and the manifold discoveries they have unearthed about the enormous variety of linguistic structures. The reach of our knowledge has increased by about 15% of the world's languages, especially in terms of digitally archived material, with about 500 languages now reasonably documented thanks to such major programs as DoBeS, ELDP, and DEL. But linguists are still falling behind in the race to document the planet's rapidly dwindling linguistic diversity, with around 35-42% of the world's languages still substantially undocumented, and in certain countries (such as the US) the call by Krauss (1992) for a significant professional realignment toward language documentation has only been heeded in a few institutions. Apart from the need for an intensified documentarist push in the face of accelerating language loss, we argue that existing language documentation efforts need to do much more to focus on crosslinguistically comparable data sets, sociolinguistic context, semantics, and interpretation of text material, and on methods for bridging the 'transcription bottleneck', which is creating a huge gap between the amount we can record and the amount in our transcribed corpora.

  • 20. Seifart, Frank
    et al.
    Hammarström, Harald
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Language Isolates in South America2017In: Language Isolates / [ed] Campbell, Lyle; Smith, Alex; Dougherty, Thomas, London: Routledge , 2017, p. 260-286Chapter in book (Other academic)
  • 21. Virk, Shafqat Mumtaz
    et al.
    Borin, Lars
    Saxena, Anju
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Hammarström, Harald
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Automatic extraction of typological linguistic features from descriptive grammars2017In: Text, Speech, and Dialogue: 20th International Conference, TSD 2017, Prague, Czech Republic, August 27-31, 2017, Proceedings / [ed] Ekštein, Kamil; Matoušek, Václav, Springer, 2017, Vol. 10415, p. 111-119Chapter in book (Refereed)
    Abstract [en]

    The present paper describes experiments on automatically extracting typological linguistic features of natural languages from traditional written descriptive grammars. The feature-extraction task has high potential value in typological, genealogical, historical, and other related areas of linguistics that make use of databases of structural features of languages. Until now, extraction of such features from grammars has been done manually, which is highly time and labor consuming and becomes prohibitive when extended to the thousands of languages for which linguistic descriptions are available. The system we describe here starts from semantically parsed text over which a set of rules are applied in order to extract feature values. We evaluate the system’s performance on the manually curated Grambank database as the gold standard and report the first measures of precision and recall for this problem

  • 22.
    Virk, Shafqat Mumtaz
    et al.
    Univ Gothenburg, Dept Swedish, Sprakbanken Text, Gothenburg, Sweden..
    Hammarström, Harald
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Forsberg, Markus
    Univ Gothenburg, Dept Swedish, Sprakbanken Text, Gothenburg, Sweden..
    Wichmann, Soren
    Leiden Univ, Ctr Linguist, Leiden, Netherlands.;Kazan Fed Univ, Lab Quantitat Linguist, Kazan, Russia.;Beijing Language Univ, Beijing Adv Innovat Ctr Language Resources, Beijing, Peoples R China..
    The DReaM Corpus: A Multilingual Annotated Corpus of Grammars for the World's Languages2020In: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020) / [ed] Calzolari, N Bechet, F Blache, P Choukri, K Cieri, C Declerck, T Goggi, S Isahara, H Maegaard, B Mariani, J Mazo, H Moreno, A Odijk, J Piperidis, S, European Language Resources Association, 2020, p. 878-884Conference paper (Refereed)
    Abstract [en]

    There exist as many as 7000 natural languages in the world, and a huge number of documents describing those languages have been produced over the years. Most of those documents are in paper format. Any attempts to use modern computational techniques and tools to process those documents will require them to be digitized first. In this paper, we report a multilingual digitized version of thousands of such documents searchable through some well-established corpus infrastructures. The corpus is annotated with various meta, word, and text level attributes to make searching and analysis easier and more useful.

    Download full text (pdf)
    fulltext
  • 23.
    Wichmann, Sören
    et al.
    Leiden Univ, Ctr Linguist, Leiden, Netherlands;Kazan Fed Univ, Lab Quantitat Linguist, Kazan, Russia;Beijing Language Univ, Beijing, Peoples R China.
    Hammarström, Harald
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Methods for calculating walking distances2020In: Physica A: Statistical Mechanics and its Applications, ISSN 0378-4371, E-ISSN 1873-2119, Vol. 540, article id 122890Article in journal (Refereed)
    Abstract [en]

    In many scientific disciplines it is often necessary to refer to geographical travel distances. While online services can provide such distances, they fail for larger distances or for distances between points not connected by roads, and they do not allow for the calculation of many distances. Here we describe two novel methods of measuring travel distances which overcome these problems. Both use waypoints of populated places from the geonames.org database. The more efficient and accurate of the two uses the Dijkstra algorithm to find the shortest path through a Delaunay graph of neighbouring populated places.

  • 24. Zariquiey, Roberto
    et al.
    Arakaki, Mónica
    Vera, Javier
    Torres-Orihuela, Guido
    Cuba-Raime, Claret
    Barrientos, Carlos
    García, Aracelli
    Ingunza, Adriano
    Hammarström, Harald
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Linking endangerment databases and descriptive linguistics: an assessment of the use of terms relating to language endangerment in grammars2022In: Language Documentation & Conservation, E-ISSN 1934-5275, Vol. 16, p. 290-318Article in journal (Refereed)
    Abstract [en]

    The world harbours a diversity of some 6,500 mutually unintelligible languages. As has been increasingly observed by linguists, many minority languages are becoming endangered and will be lost forever if not documented. The increased urgency has led to the development of several global endangerment databases and a more fine-grained understanding of the language endangerment progression as well as its possible reversal. In the present paper, we explore the terminological correlates of this development as found in the descriptive linguistic literature, using a corpus of over 10,000 digitized grammatical descriptions. Comparing this with existing endangerment databases, we find that simply counting terms related to endangerment does signal endangerment, but the degree of endangerment is more difficult to assess from grammatical descriptions. The label endangered seems to be an umbrella term that covers different situations ranging from moribund languages with less than ten speakers to minority languages with several thousand speakers. For many languages considered endangered in existing databases, explicit terms to this effect cannot be found in their descriptions. The discrepancy is due to incompleteness of the searchterm set, gaps in the literature, and projected rather than observed information in the databases. Our explorations illustrate the potential for database curation assisted by computational searches both to maintain accuracy of the databases and to investigate assumed language endangerment. Future work includes a larger cloud of search terms, usage of term frequencies, and prescreening of descriptive literature for the existence of a relevant section. From the perspective of descriptive linguistics, this study calls for a more careful correlation between the language endangerment indexes, as developed in the global endangerment databases, and the treatment of the endangerment status of individual languages in descriptive grammars.

    Download full text (pdf)
    fulltext
1 - 24 of 24
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf