Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Computational Analysis of Swedish Newspapers  Using Topic Detection and Sentiment Analysis
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
2021 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

Newspapers might report on the same event, say a sport event or a political statement, but since they most likely differ in the presentation, are the content and under laying message of the articles actually the same? A human can read two separate articles and determine if they touch similar subjects and if they approach the subject in a positive or negative way. If this comparison would be preformed over several thousand of articles a computer would very much be the preferred method. However, a computer needs to be trained to understand the topics of the articles to be able to detect the topics and make the comparison.The two goals of this project is to find and identify topics within articles extracted from Swedish newspapers as well as preforming sentiment analysis on the most similar topic pairs.This project presents a Python 3 implementation of extracting textual data from Swedish newspapers, identify and assign topics to those articles, as well as preform sentiment analysis on articles based on their topics and day of publication. To extract the text from each article web scraping was used. The topic detection was performed with the help of non-negativefactorisation matrices. To determine each article polarity andemotional state TextBlob was utilised. Both goals were accomplished. The method used to extract textualdata was successful and topics for each article was successfullyidentified. The topic detection and sentiment analysis proved to be mostly correct while manually inspecting the most similar article pairs between the newspapers. The results was presented with dumbbell plots for the most similar article pairs. These plots shows each pairs polarity and subjectivity score and was therefore used to manually analyse the actual similarity between these articles as well as to their sentimentic structure. However, the results are deemed to be too unreliable to draw any significant conclusion in the sentiment difference and likeliness between the newspapers. This is because of the absence of a proper implementation of Swedish part-to-speech tags and lemmatization, which was noticed too late into the development process to be able to correct. These changes are however discussed and reflected upon in the purpose to gain insight in how the implemented solution could have been improved.

Place, publisher, year, edition, pages
2021. , p. 25
Series
IT ; 21016
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:uu:diva-440104OAI: oai:DiVA.org:uu-440104DiVA, id: diva2:1544289
Educational program
Bachelor Programme in Computer Science
Supervisors
Examiners
Available from: 2021-04-14 Created: 2021-04-14 Last updated: 2021-04-14Bibliographically approved

Open Access in DiVA

fulltext(2531 kB)555 downloads
File information
File name FULLTEXT01.pdfFile size 2531 kBChecksum SHA-512
b7b1dce14e49186f75762828841729fedd044e487145acff3742309bb107f1e5eb5078b93cb98feb1b45f69d3accf89b9d9e3c93e40602cfe34d15aa77edfe29
Type fulltextMimetype application/pdf

By organisation
Department of Information Technology
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 557 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 763 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf