uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Multi-domain alias matching using machine learning
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computing Science. (Security)
Swedish Def Res Agcy FOI, Stockholm, Sweden..
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. Swedish Def Res Agcy FOI, Stockholm, Sweden.. (Security)
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. (Security)
2016 (English)In: Proc. 3rd European Network Intelligence Conference, IEEE, 2016, 77-84 p.Conference paper (Refereed)
Abstract [en]

We describe a methodology for linking aliases belonging to the same individual based on a user's writing style (stylometric features extracted from the user generated content) and her time patterns (time-based features extracted from the publishing times of the user generated content). While most previous research on social media identity linkage relies on matching usernames, our methodology can also be used for users who actively try to choose dissimilar usernames when creating their aliases. In our experiments on a discussion forum dataset and a Twitter dataset, we evaluate the performance of three different classifiers. We use the best classifier (AdaBoost) to evaluate how well it works on different datasets using different features. Experiments show that combining stylometric and time based features yield good results on our synthetic datasets and a small-scale evaluation on real-world blog data confirm these results, yielding a precision over 95%. The use of emotion-related and Twitter-related features yield no significant impact on the results.

Place, publisher, year, edition, pages
IEEE, 2016. 77-84 p.
National Category
Computer and Information Science
Identifiers
URN: urn:nbn:se:uu:diva-306944DOI: 10.1109/ENIC.2016.019ISI: 000399097600011ISBN: 9781509034550 (electronic)OAI: oai:DiVA.org:uu-306944DiVA: diva2:1044820
Conference
ENIC 2016, September 5–7, Wroclaw, Poland
Available from: 2017-02-02 Created: 2016-11-07 Last updated: 2017-05-16Bibliographically approved

Open Access in DiVA

fulltext(256 kB)33 downloads
File information
File name FULLTEXT01.pdfFile size 256 kBChecksum SHA-512
26272dd0de610b7e6f894db092d90d80b1165344d798bee55972f184e9b36e3ff0e33e42ee5cacfa6fea78246fb309b17f0e4e5f3964012964f06eef38423547
Type fulltextMimetype application/pdf

Other links

Publisher's full text

Search in DiVA

By author/editor
Ashcroft, MichaelKaati, LisaShrestha, Amendra
By organisation
Computing ScienceComputer Systems
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 33 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 310 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf