uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
TexT – Text extractor tool for handwritten document transcription and annotation
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.ORCID iD: 0000-0003-1054-2754
Uppsala University, University Library.
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.ORCID iD: 0000-0003-4480-3158
2018 (English)In: Digital Libraries and Multimedia Archives, Springer, 2018, p. 81-92Conference paper, Published paper (Refereed)
Abstract [en]

This paper presents a framework for semi-automatic transcription of large-scale historical handwritten documents and proposes a simple user-friendly text extractor tool, TexT for transcription. The proposed approach provides a quick and easy transcription of text using computer assisted interactive technique. The algorithm finds multiple occurrences of the marked text on-the-fly using a word spotting system. TexT is also capable of performing on-the-fly annotation of handwritten text with automatic generation of ground truth labels, and dynamic adjustment and correction of user generated bounding box annotations with the word being perfectly encapsulated. The user can view the document and the found words in the original form or with background noise removed for easier visualization of transcription results. The effectiveness of TexT is demonstrated on an archival manuscript collection from well-known publicly available dataset.

Place, publisher, year, edition, pages
Springer, 2018. p. 81-92
Series
Communications in Computer and Information Science ; 806
Keywords [en]
Handwritten text recognition, Transcription Annotation, TexT, Word spotting, Historical documents
National Category
Computer and Information Sciences
Research subject
Computerized Image Processing
Identifiers
URN: urn:nbn:se:uu:diva-343160DOI: 10.1007/978-3-319-73165-0_8ISI: 000434481000008ISBN: 978-3-319-73164-3 (print)ISBN: 978-3-319-73165-0 (electronic)OAI: oai:DiVA.org:uu-343160DiVA, id: diva2:1185594
Conference
14th Italian Research Conference on Digital Libraries (IRCDL) 2018, January 25–26, Udine, Italy
Projects
eSSENCE
Funder
Riksbankens Jubileumsfond, NHS14-2068:1eSSENCE - An eScience CollaborationAvailable from: 2017-12-21 Created: 2018-02-26 Last updated: 2018-10-22Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records BETA

Hast, AndersCullhed, Per

Search in DiVA

By author/editor
Hast, AndersCullhed, PerVats, Ekta
By organisation
Division of Visual Information and InteractionComputerized Image Analysis and Human-Computer InteractionUniversity Library
Computer and Information Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 131 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf