uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Towards a Generic Unsupervised Method for Transcription of Encoded Manuscripts
Computer Vision Center, Computer Science Department, Universitat Autònoma de Barcelona Bellaterra, Spain.
Computer Vision Center, Computer Science Department, Universitat Autònoma de Barcelona Bellaterra, Spain.
Computer Vision Center, Computer Science Department, Universitat Autònoma de Barcelona Bellaterra, Spain.
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.ORCID iD: 0000-0002-4838-6518
2019 (English)In: Proceedings of the 3rd International Conference on Digital Access to Textual Cultural Heritage: DATeCH2019, New York: ACM , 2019Conference paper, Published paper (Refereed)
Abstract [en]

Historical ciphers, a special type of manuscripts, contain encrypted information, important for the interpretation of our history. The first step towards decipherment is to transcribe the images, either manually or by automatic image processing techniques. Despite the improvements in handwritten text recognition (HTR) thanks to deep learning methodologies, the need of labelled data to train is an important limitation. Given that ciphers often use symbol sets across various alphabets and unique symbols without any transcription scheme available, these supervised HTR techniques are not suitable to transcribe ciphers. In this paper we propose an unsupervised method for transcribing encrypted manuscripts based on clustering and label propagation, which has been successfully applied to community detection in networks. We analyze the performance on ciphers with various symbol sets, and discuss the advantages and drawbacks compared to supervised HTR methods.

Place, publisher, year, edition, pages
New York: ACM , 2019.
Keywords [en]
Handwritten text recognition, Encoded manuscripts, Unsupervised methods.
National Category
Computer Sciences Language Technology (Computational Linguistics)
Research subject
Computational Linguistics
Identifiers
URN: urn:nbn:se:uu:diva-385925DOI: 10.1145/3322905.3322920ISBN: 978-1-4503-7194-0 (print)OAI: oai:DiVA.org:uu-385925DiVA, id: diva2:1326474
Conference
the 3rd International Conference on Digital Access to Textual Cultural Heritage
Projects
DECRYPT
Funder
Swedish Research Council, 2018-06074Available from: 2019-06-18 Created: 2019-06-18 Last updated: 2019-08-16Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full text

Authority records BETA

Megyesi, Beáta

Search in DiVA

By author/editor
Megyesi, Beáta
By organisation
Department of Linguistics and Philology
Computer SciencesLanguage Technology (Computational Linguistics)

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 59 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf