Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Sparse Parallel Training of Hierarchical Dirichlet Process Topic Models
Imperial Coll London, London, England..
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Social Sciences, Department of Statistics. Aalto Univ, Espoo, Finland..ORCID iD: 0000-0002-0296-2719
Ericsson AB, Stockholm, Sweden.;Linköping Univ, Linköping, Sweden..
2020 (English)In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, 2020, p. 2925-2934Conference paper, Published paper (Refereed)
Abstract [en]

To scale non-parametric extensions of probabilistic topic models such as Latent Dirichlet allocation to larger data sets, practitioners rely increasingly on parallel and distributed systems. In this work, we study data-parallel training for the hierarchical Dirichlet process (HDP) topic model. Based upon a representation of certain conditional distributions within an HDP, we propose a doubly sparse data-parallel sampler for the HDP topic model. This sampler utilizes all available sources of sparsity found in natural language-an important way to make computation efficient. We benchmark our method on a well-known corpus (PubMed) with 8m documents and 768m tokens, using a single multi-core machine in under four days.

Place, publisher, year, edition, pages
Association for Computational Linguistics, 2020. p. 2925-2934
National Category
Natural Language Processing
Identifiers
URN: urn:nbn:se:uu:diva-497501DOI: 10.18653/v1/2020.emnlp-main.234ISI: 000855160703009ISBN: 978-1-952148-60-6 (print)OAI: oai:DiVA.org:uu-497501DiVA, id: diva2:1740690
Conference
Conference on Empirical Methods in Natural Language Processing (EMNLP), NOV 16-20, 2020, ELECTR NETWORK
Funder
Swedish Research Council, 201805170Swedish Research Council, 201806063Academy of Finland, 298742Academy of Finland, 313122Available from: 2023-03-01 Created: 2023-03-01 Last updated: 2025-02-07Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)

Search in DiVA

By author/editor
Magnusson, Måns
By organisation
Department of Statistics
Natural Language Processing

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 4 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf