uu.seUppsala University Publications
Change search
ReferencesLink to record
Permanent link

Direct link
Automatic prediction of gender, political affiliation, and age in Swedish politicians from the wording of their speeches: A comparative study of classifiability
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology. (datorlingvistik)
2012 (English)In: Literary & Linguistic Computing, ISSN 0268-1145, E-ISSN 1477-4615, Vol. 27, no 2, 139-153 p.Article in journal (Refereed) Published
Abstract [en]

The present study explores automatic classification of Swedish politicians and their speeches into classes based on personal traits-gender, age, and political affiliation-as a means for measuring and analyzing how these traits influence language use. Support Vector Machines classified 200-word passages, represented by binary bag-of-word-forms vectors. Different feature selections were tried. The performance of the classifiers was assessed using test data from authors unseen in the training data. Author-level predictions derived from twenty-one text-level predictions reached an accuracy rate of 81.2% for gender, 89.4% for political affiliation, and 78.9% for age. Classification concerning each basic distinction was applied to general populations of politicians and to cohorts defined by the other classes. The outcomes suggest that the extent to which these personal traits are expressed in language use varies considerably among the different cohorts and that different traits affect different layers of the vocabulary. The accuracy rates for gender classification were higher for the right wing and older cohorts than for the opposite ones. Age prediction gave higher accuracy for the right wing cohort. Political classification gave the highest accuracy rates when all forms were included in the feature sets, whereas feature sets restricted to verbs or function words gave the highest scores for gender prediction, and the lowest ones for political classification.

Place, publisher, year, edition, pages
Oxford, 2012. Vol. 27, no 2, 139-153 p.
National Category
General Language Studies and Linguistics Language Technology (Computational Linguistics)
Research subject
Computational Linguistics; Linguistics
URN: urn:nbn:se:uu:diva-176261DOI: 10.1093/llc/fqs010ISI: 000304199900002OAI: oai:DiVA.org:uu-176261DiVA: diva2:534679
Available from: 2012-06-18 Created: 2012-06-18 Last updated: 2016-02-29Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textpdf

Search in DiVA

By author/editor
Dahllöf, Mats
By organisation
Department of Linguistics and Philology
In the same journal
Literary & Linguistic Computing
General Language Studies and LinguisticsLanguage Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Altmetric score

Total: 228 hits
ReferencesLink to record
Permanent link

Direct link