uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Automating Text Summarization With Machine Learning: Extraction of End Results From Result Reports
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
2019 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE creditsStudent thesis
Abstract [en]

The Swedish International Development Cooperation Agency (SIDA) handles the financial aid Sweden gives to parter organisations for devel- opment activities. When an activity has ended, the partner organizations write reports on what they accomplished and how the grant was used. There are currently about 120,000 reports in SIDA’s database, and it is hard to get an overview of the information in the reports. This is both because of the large number of reports, but also the fact that the reports don’t follow a standardized format.

In the reports, there are result stories, which explain how the grant was used. The goal of our project was to automate the summarization of these stories in a clear, easy-to-understand way using machine learning and natural language processing.

More specifically, we created a prototype of a service that generates a word cloud with relevant words from the result stories, sized relative to the number of occurrences. When the user clicks on a word, they get a list of all the activities that contain the word, sorted by the number of occurrences of the word. When the user clicks on a report, they get a summarizing text of the result story in the report.

Abstract [sv]

Swedish International Development Cooperation Agency, hanterar det ekonomiska stöd Sverige ger till länder och organisationer för utvecklingsaktiviteter. De mottagande part- nerorganisationerna skriver sedan en rapport om verksamheten, vad som uppnåddes och hur pengarna användes. Det finns för närvarande cirka 120000 rapporter i SIDA’s data- bas, men det är svårt att få en sammanfattning av den information som finns i partneror- ganisationernas rapporter. Detta beror både på den stora mängd av rapporter, men också att rapporterna inte följer en standardiserad mall.

I rapporterna finns det resultat berättelser, som förklarar hur biståndet har använts. Målet med vårt projekt var att automatisera en sammanfattning av dessa resultat berättelser på ett tydligt, lättförståeligt sätt med hjälp av maskininlärning och natural language processing.

Mer specifikt skapade vi en prototyp av en service som genererar ett ordmoln som in- nehåller relevanta ord från resultat berättelserna, där de vanligaste orden är störst. När användaren klickar på ett ord kommer en lista upp över de aktiviteter som innehåller ordet, sorterade efter hur ofta ordet förekommer i rapportens resultat berättelser. När användaren klickar på en aktivitet syns en sammanfattande text av resultat berättelsen från rapporten.

Place, publisher, year, edition, pages
2019.
Series
Independent Project in Computer and Information Engineering
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:uu:diva-399925OAI: oai:DiVA.org:uu-399925DiVA, id: diva2:1379374
Educational program
Master of Science Programme in Information Technology Engineering
Supervisors
Examiners
Available from: 2020-03-12 Created: 2019-12-17 Last updated: 2020-03-12Bibliographically approved

Open Access in DiVA

No full text in DiVA

By organisation
Department of Information Technology
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 4 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf