Logo: to the web site of Uppsala University

uu.sePublications from Uppsala University
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Document Image Processing for Handwritten Text Recognition: Deep Learning-based Transliteration of Astrid Lindgren’s Stenographic Manuscripts
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.ORCID iD: 0000-0002-5010-9149
2023 (English)Doctoral thesis, comprehensive summary (Other academic)
Description
Abstract [en]

Document image processing and handwritten text recognition have been applied to a variety of materials, scripts, and languages, both modern and historic. They are crucial building blocks in the on-going digitisation efforts of archives, where they aid in preserving archival materials and foster knowledge sharing. The latter is especially facilitated by making document contents available to interested readers who may have little to no practice in, for example, reading a specific script type, and might therefore face challenges in accessing the material.  

The first part of this dissertation focuses on reducing editorial artefacts, specifically in the form of struck-through words, in manuscripts. The main goal of this process is to identify struck-through words and remove as much of the strikethrough artefacts as possible in order to regain access to the original word. This step can serve both as preprocessing, to aid human annotators and readers, as well as in computerised pipelines, such as handwritten text recognition. Two deep learning-based approaches, exploring paired and unpaired data settings, are examined and compared. Furthermore, an approach for generating synthetic strikethrough data, for example, for training and testing purposes, and three novel datasets are presented. 

The second part of this dissertation is centred around applying handwritten text recognition to the stenographic manuscripts of Swedish children's book author Astrid Lindgren (1907 - 2002). Manually transliterating stenography, also known as shorthand, requires special domain knowledge of the script itself. Therefore, the main focus of this part is to reduce the required manual work, aiming to increase the accessibility of the material. In this regard, a baseline for handwritten text recognition of Swedish stenography is established. Two approaches for improving upon this baseline are examined. Firstly, a variety of data augmentation techniques, commonly-used in handwritten text recognition, are studied. Secondly, different target sequence encoding methods, which aim to approximate diplomatic transcriptions, are investigated. The latter, in combination with a pre-training approach, significantly improves the recognition performance. In addition to the two presented studies, the novel LION dataset is published, consisting of excerpts from Astrid Lindgren's stenographic manuscripts. 

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2023. , p. 87
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 2294
Series
Skrifter utgivna av Svenska barnboksinstitutet, ISSN 0347-5387 ; 166
Keywords [en]
document image processing, handwritten text recognition, stenography, strikethrough
National Category
Computer Vision and Robotics (Autonomous Systems)
Research subject
Computerized Image Processing
Identifiers
URN: urn:nbn:se:uu:diva-509138ISBN: 978-91-513-1873-8 (print)OAI: oai:DiVA.org:uu-509138DiVA, id: diva2:1788213
Public defence
2023-10-04, Room 101121, Ångströmlaboratoriet, Lägerhyddsvägen 1, Uppsala, 09:15 (English)
Opponent
Supervisors
Available from: 2023-09-11 Created: 2023-08-16 Last updated: 2023-09-11
List of papers
1. Strikethrough Removal from Handwritten Words Using CycleGANs
Open this publication in new window or tab >>Strikethrough Removal from Handwritten Words Using CycleGANs
2021 (English)In: Document Analysis and Recognition -- ICDAR 2021 / [ed] Lladós J., Lopresti D., Uchida S., Springer, 2021, Vol. 12824, p. 572-586Conference paper, Published paper (Refereed)
Abstract [en]

Obtaining the original, clean forms of struck-through handwritten words can be of interest to literary scholars, focusing on tasks such as genetic criticism. In addition to this, replacing struck-through words can also have a positive impact on text recognition tasks. This work presents a novel unsupervised approach for strikethrough removal from handwritten words, employing cycle-consistent generative adversarial networks (CycleGANs). The removal performance is improved upon by extending the network with an attribute-guided approach. Furthermore, two new datasets, a synthetic multi-writer set, based on the IAM database, and a genuine single-writer dataset, are introduced for the training and evaluation of the models. The experimental results demonstrate the efficacy of the proposed method, where the examined attribute-guided models achieve F1 scores above 0.8 on the synthetic test set, improving upon the performance of the regular CycleGAN. Despite being trained exclusively on the synthetic dataset, the examined models even produce convincing cleaned images for genuine struck-through words. 

Place, publisher, year, edition, pages
Springer, 2021
Keywords
Strikethrough removal, CycleGAN, Handwritten words, Document image processing
National Category
Computer Vision and Robotics (Autonomous Systems)
Research subject
Computerized Image Processing
Identifiers
urn:nbn:se:uu:diva-455889 (URN)10.1007/978-3-030-86337-1_38 (DOI)000711880100038 ()
Conference
International Conference on Document Analysis and Recognition (ICDAR)
Funder
Riksbankens Jubileumsfond, P19-0103:1Swedish Research Council, 2018-05973
Available from: 2021-10-12 Created: 2021-10-12 Last updated: 2023-09-05Bibliographically approved
2. Paired Image to Image Translation for Strikethrough Removal from Handwritten Words
Open this publication in new window or tab >>Paired Image to Image Translation for Strikethrough Removal from Handwritten Words
2022 (English)In: DOCUMENT ANALYSIS SYSTEMS, DAS 2022 / [ed] Uchida, S Barney, E Eglin, V, Springer Nature, 2022, Vol. 13237, p. 309-322Conference paper, Published paper (Refereed)
Abstract [en]

Transcribing struck-through, handwritten words, for example for the purpose of genetic criticism, can pose a challenge to both humans and machines, due to the obstructive properties of the superimposed strokes. This paper investigates the use of paired image to image translation approaches to remove strikethrough strokes from handwritten words. Four different neural network architectures are examined, ranging from a few simple convolutional layers to deeper ones, employing Dense blocks. Experimental results, obtained from one synthetic and one genuine paired strikethrough dataset, confirm that the proposed paired models outperform the CycleGAN-based state of the art, while using less than a sixth of the trainable parameters.

Place, publisher, year, edition, pages
Springer Nature, 2022
Series
Lecture Notes in Computer Science, ISSN 0302-9743, E-ISSN 1611-3349
Keywords
Strikethrough removal, Paired image to image translation, Handwritten words, Document image processing
National Category
Computer Vision and Robotics (Autonomous Systems)
Identifiers
urn:nbn:se:uu:diva-488232 (URN)10.1007/978-3-031-06555-2_21 (DOI)000870314500021 ()978-3-031-06555-2 (ISBN)978-3-031-06554-5 (ISBN)
Conference
15th IAPR International Workshop on Document Analysis Systems (DAS), MAY 22-25, 2022, La Rochelle Univ, La Rochelle, FRANCE
Funder
Swedish Research Council, 2018-05973Riksbankens Jubileumsfond, P19-0103:1
Available from: 2022-11-14 Created: 2022-11-14 Last updated: 2023-09-05Bibliographically approved
3. A Study of Augmentation Methods for Handwritten Stenography Recognition
Open this publication in new window or tab >>A Study of Augmentation Methods for Handwritten Stenography Recognition
2023 (English)Conference paper, Published paper (Refereed)
Abstract [en]

One of the factors limiting the performance of handwritten text recognition (HTR) for stenography is the small amount of annotated training data. To alleviate the problem of data scarcity, modern HTR methods often employ data augmentation. However, due to specifics of the stenographic script, such settings may not be directly applicable for stenography recognition. In this work, we study 22 classical augmentation techniques, most of which are commonly used for HTR of other scripts, such as Latin handwriting. Through extensive experiments, we identify a group of augmentations, including for example contained ranges of random rotation, shifts and scaling, that are beneficial to the use case of stenography recognition. Furthermore, a number of augmentation approaches, leading to a decrease in recognition performance, are identified. Our results are supported by statistical hypothesis testing. A link to the source code is provided in the paper.

National Category
Computer Sciences
Research subject
Computerized Image Processing
Identifiers
urn:nbn:se:uu:diva-497025 (URN)10.1007/978-3-031-36616-1_11 (DOI)
Conference
IbPRIA 2023: 11th Iberian Conference on Pattern Recognition and Image Analysis
Available from: 2023-02-22 Created: 2023-02-22 Last updated: 2023-09-05
4. Handwritten Stenography Recognition and the LION Dataset
Open this publication in new window or tab >>Handwritten Stenography Recognition and the LION Dataset
(English)In: Article in journal (Refereed) Submitted
Abstract [en]

Purpose: In this paper, we establish a baseline for handwritten stenography recognition, using the novel LION dataset, and investigate the impact of including selected aspects of stenographic theory into the recognition process. We make the LION dataset publicly available with the aim of encouraging future research in handwritten stenography recognition.

Methods: A state-of-the-art text recognition model is trained to establish a baseline. Stenographic domain knowledge is integrated by applying four different encoding methods that transform the target sequence into representations, which approximate selected aspects of the writing system. Results are further improved by integrating a pre-training scheme, based on synthetic data.

Results: The baseline model achieves an average test character error rate (CER) of 29.81% and a word error rate (WER) of 55.14%. Test error rates are reduced significantly by combining stenography-specific target sequence encodings with pre-training and fine-tuning, yielding CERs in the range of 24.5% - 26% and WERs of 44.8% - 48.2%.

Conclusion: The obtained results demonstrate the challenging nature of stenography recognition. Integrating stenography-specific knowledge, in conjunction with pre-training and fine-tuning on synthetic data, yields considerable improvements. Together with our precursor study on the subject, this is the first work to apply modern handwritten text recognition to stenography. The dataset and our code are publicly available via Zenodo.

National Category
Computer Sciences
Research subject
Computerized Image Processing
Identifiers
urn:nbn:se:uu:diva-497026 (URN)
Available from: 2023-02-22 Created: 2023-02-22 Last updated: 2023-09-05

Open Access in DiVA

UUThesis_R-Heil-2023(1532 kB)751 downloads
File information
File name FULLTEXT01.pdfFile size 1532 kBChecksum SHA-512
83cb1bd3ad0be24af0028b0ceba88632409ab731ec0ae8d5062ef940e315c1ff233c2106fc8c98b3dd493ef2f17c1d03dae6c3e9914270f913c909300d3089c6
Type fulltextMimetype application/pdf

Authority records

Heil, Raphaela

Search in DiVA

By author/editor
Heil, Raphaela
By organisation
Computerized Image Analysis and Human-Computer Interaction
Computer Vision and Robotics (Autonomous Systems)

Search outside of DiVA

GoogleGoogle Scholar
Total: 754 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 1065 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf