uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Anomaly detection with Machine learning: Quality assurance of statistical data in the Aid community
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science.
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science.
2015 (English)Independent thesis Advanced level (professional degree), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

The overall purpose of this study was to find a way to identify incorrect data in Sida’s statistics about their contributions. A contribution is the financial support given by Sida to a project. The goal was to build an algorithm that determines if a contribution has a risk to be inaccurate coded, based on supervised classification methods within the area of Machine Learning. A thorough data analysis process was done in order to train a model to find hidden patterns in the data. Descriptive features containing important information about the contributions were successfully selected and used for this task. These included keywords that were retrieved from descriptions of the contributions. Two Machine learning methods, Adaboost and Support Vector Machines, were tested for ten classification models. Each model got evaluated depending on their accuracy of predicting the target variable into its correct class. A misclassified component was more likely to be incorrectly coded and was also seen as an anomaly. The Adaboost method performed better and more steadily on the majority of the models. Six classification models built with the Adaboost method were combined to one final ensemble classifier. This classifier was verified with new unseen data and an anomaly score was calculated for each component. The higher the score, the higher the risk of being anomalous. The result was a ranked list, where the most anomalous components were prioritized for further investigation of staff at Sida. 

Place, publisher, year, edition, pages
2015. , 53 p.
Series
UPTEC STS, ISSN 1650-8319 ; 15014
Keyword [en]
Anomaly detection, Machine learning
National Category
Engineering and Technology
Identifiers
URN: urn:nbn:se:uu:diva-260380OAI: oai:DiVA.org:uu-260380DiVA: diva2:846985
External cooperation
SIDA
Educational program
Systems in Technology and Society Programme
Supervisors
Examiners
Available from: 2015-09-03 Created: 2015-08-18 Last updated: 2015-09-03Bibliographically approved

Open Access in DiVA

Anomaly detection with Machine learning(9184 kB)336 downloads
File information
File name FULLTEXT01.pdfFile size 9184 kBChecksum SHA-512
bb356495c30a6e576ddd42bc2bca07e57c9d54b839f1fe7eabc698929029ed2e4385d9c8811f0a28ba86c8e93a4dfa617cab1b7d9769274eb4af838a883285d2
Type fulltextMimetype application/pdf

By organisation
Computing Science
Engineering and Technology

Search outside of DiVA

GoogleGoogle Scholar
Total: 336 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 797 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf