uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Experiments on Large Scale Document Visualization using Image-based Word Clouds
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.ORCID iD: 0000-0002-4405-6888
2015 (English)Report (Other academic)
Abstract [en]

In this paper, we introduce image-based word clouds as a novel tool for a quick and aesthetic overviews of common words in collections of digitized text manuscripts. While OCR can be used to enable summaries and search functionality to printed modern text, historical and handwritten documents remains a challenge. By segmenting and counting word images, without applying manual transcription or OCR, we have developed a method that can produce word- or tag clouds from document collections. Our new tool is not limited to any specific kind of text. We make further contributions in ways of stop-word removal, class based feature weighting and visualization. An evaluation of the proposed tool includes comparisons with ground truth word clouds on handwritten marriage licenses from the 17th century and the George Washington database of handwritten letters, from the 18th century. Our experiments show that image-based word clouds capture the same information, albeit approximately, as the regular word clouds based on text data.

Place, publisher, year, edition, pages
2015.
Series
Technical report / Department of Information Technology, Uppsala University, ISSN 1404-3203 ; 2015-022
Keyword [en]
clustering, image-based, word cloud, feature weighting
National Category
Language Technology (Computational Linguistics)
Research subject
Computerized Image Processing
Identifiers
URN: urn:nbn:se:uu:diva-259164OAI: oai:DiVA.org:uu-259164DiVA: diva2:843537
Projects
q2bq2b_vr2012
Funder
Swedish Research Council, 2012-5743
Available from: 2015-07-19 Created: 2015-07-29 Last updated: 2016-06-09

Open Access in DiVA

No full text

Other links

http://www.it.uu.se/research/publications/reports/2015-022/

Authority records BETA

Wilkinson, TomasBrun, Anders

Search in DiVA

By author/editor
Wilkinson, TomasBrun, Anders
By organisation
Division of Visual Information and InteractionComputerized Image Analysis and Human-Computer Interaction
Language Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

urn-nbn

Altmetric score

urn-nbn
Total: 464 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf