Online inference of topics: Implementation of the topic model Latent Dirichlet Allocation using an online variational bayes inference algorithm to sort news articles
Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
The client of the project has problems with complex queries and noisewhen querying their stream of ﬁve million news articles per day. Thisresults in much manual work when sorting and pruning the search result of their query. Instead of using direct text matching, the approachof the project was to use a topic model to describe articles in terms oftopics covered and to use this new information to sort the articles.
An online version of the topic model Latent Dirichlet Allocationwas implemented using online variational Bayes inference to handlestreamed data. Using 100 dimensions, topics such as sports and politics emerged during training on a 1.7 million articles big simulatedstream. These topics were used to sort articles based on context. Theimplementation was found accurate enough to be useful for the client aswell as fast and stable enough to be a feasible solution to the problem.
Place, publisher, year, edition, pages
UPTEC F, ISSN 1401-5757 ; 14010
Computer and Information Science
IdentifiersURN: urn:nbn:se:uu:diva-222429OAI: oai:DiVA.org:uu-222429DiVA: diva2:712454
The Loop54 Group AB
Master Programme in Engineering Physics
Tomas, NybergRisch, Tore