uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Gender Classification with Data Independent Features in Multiple Languages
Swedish Def Res Agcy FOI, Stockholm, Sweden..
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computer Systems. Swedish Def Res Agcy FOI, Stockholm, Sweden..
Swedish Def Res Agcy FOI, Stockholm, Sweden..
2017 (English)In: 2017 European Intelligence and Security Informatics Conference (EISIC) / [ed] Brynielsson, J, IEEE, 2017, p. 54-60Conference paper, Published paper (Refereed)
Abstract [en]

Gender classification is a well-researched problem, and state-of-the-art implementations achieve an accuracy of over 85%. However, most previous work has focused on gender classification of texts written in the English language, and in many cases, the results cannot be transferred to different datasets since the features used to train the machine learning models are dependent on the data. In this work, we investigate the possibilities to classify the gender of an author on five different languages: English, Swedish, French, Spanish, and Russian. We use features of the word counting program Linguistic Inquiry and Word Count (LIWC) with the benefit that these features are independent of the dataset. Our results show that by using machine learning with features from LIWC, we can obtain an accuracy of 79% and 73% depending on the language. We also, show some interesting differences between the uses of certain categories among the genders in different languages.

Place, publisher, year, edition, pages
IEEE, 2017. p. 54-60
Series
European Intelligence and Security Informatics Conference, ISSN 2572-3723
Keywords [en]
Blogs, Pragmatics, Psychology, Social network services, Internet, Dictionaries, Machine learning
National Category
General Language Studies and Linguistics
Identifiers
URN: urn:nbn:se:uu:diva-351175DOI: 10.1109/EISIC.2017.16ISI: 000425928200007ISBN: 978-1-5386-2385-5 (electronic)OAI: oai:DiVA.org:uu-351175DiVA, id: diva2:1209821
Conference
European Intelligence and Security Informatics Conference (EISIC), SEP 11-13, 2017, Athens, GREECE
Available from: 2018-05-24 Created: 2018-05-24 Last updated: 2018-05-24Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records BETA

Kaati, Lisa

Search in DiVA

By author/editor
Kaati, Lisa
By organisation
Computer Systems
General Language Studies and Linguistics

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 3 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf