Targeted Topic Modeling for Levantine Arabic
2020 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE credits
Student thesis
Abstract [en]
Topic models for focused analysis aim to capture topics within the limiting scope of a targeted aspect (which could be thought of as some inner topic within a certain domain). To serve their analytic purposes, topics are expected to be semantically-coherent and closely aligned with human intuition – this in itself poses a major challenge for the more common topic modeling algorithms which, in a broader sense, perform a full analysis that covers all aspects and themes within a collection of texts. The paper attempts to construct a viable focused-analysis topic model which learns topics from Twitter data written in a closely related group of non-standardized varieties of Arabic widely spoken in the Levant region (i.e Levantine Arabic). Results are compared to a baseline model as well as another targeted topic model designed precisely to serve the purpose of focused analysis. The model is capable of adequately capturing topics containing terms which fall within the scope of the targeted aspect when judged overall. Nevertheless, it fails to produce human-friendly and semantically-coherent topics as several topics contained a number of intruding terms while others contained terms, while still relevant to the targeted aspect, thrown together seemingly at random.
Place, publisher, year, edition, pages
2020. , p. 46
Keywords [en]
Topic Model, Focused Analysis, Targeted Aspect, Levantine Arabic
National Category
Language Technology (Computational Linguistics)
Identifiers
URN: urn:nbn:se:uu:diva-412975OAI: oai:DiVA.org:uu-412975DiVA, id: diva2:1439483
Subject / course
Language Technology
Educational program
Master Programme in Language Technology
Presentation
2020-06-04, Via Zoom - Online Seminar, Uppsala, 10:54 (English)
Supervisors
Examiners
2020-06-122020-06-122020-06-12Bibliographically approved