Automatic prediction of gender, political affiliation, and age in Swedish politicians from the wording of their speeches: A comparative study of classifiability
2012 (English)In: Literary & Linguistic Computing, ISSN 0268-1145, E-ISSN 1477-4615, Vol. 27, no 2, 139-153 p.Article in journal (Refereed) Published
The present study explores automatic classification of Swedish politicians and their speeches into classes based on personal traits-gender, age, and political affiliation-as a means for measuring and analyzing how these traits influence language use. Support Vector Machines classified 200-word passages, represented by binary bag-of-word-forms vectors. Different feature selections were tried. The performance of the classifiers was assessed using test data from authors unseen in the training data. Author-level predictions derived from twenty-one text-level predictions reached an accuracy rate of 81.2% for gender, 89.4% for political affiliation, and 78.9% for age. Classification concerning each basic distinction was applied to general populations of politicians and to cohorts defined by the other classes. The outcomes suggest that the extent to which these personal traits are expressed in language use varies considerably among the different cohorts and that different traits affect different layers of the vocabulary. The accuracy rates for gender classification were higher for the right wing and older cohorts than for the opposite ones. Age prediction gave higher accuracy for the right wing cohort. Political classification gave the highest accuracy rates when all forms were included in the feature sets, whereas feature sets restricted to verbs or function words gave the highest scores for gender prediction, and the lowest ones for political classification.
Place, publisher, year, edition, pages
Oxford, 2012. Vol. 27, no 2, 139-153 p.
General Language Studies and Linguistics Language Technology (Computational Linguistics)
Research subject Computational Linguistics; Linguistics
IdentifiersURN: urn:nbn:se:uu:diva-176261DOI: 10.1093/llc/fqs010ISI: 000304199900002OAI: oai:DiVA.org:uu-176261DiVA: diva2:534679