This paper presents a framework for semi-automatic transcription of large-scale historical handwritten documents and proposes a simple user-friendly text extractor tool, TexT for transcription. The proposed approach provides a quick and easy transcription of text using computer assisted interactive technique. The algorithm finds multiple occurrences of the marked text on-the-fly using a word spotting system. TexT is also capable of performing on-the-fly annotation of handwritten text with automatic generation of ground truth labels, and dynamic adjustment and correction of user generated bounding box annotations with the word being perfectly encapsulated. The user can view the document and the found words in the original form or with background noise removed for easier visualization of transcription results. The effectiveness of TexT is demonstrated on an archival manuscript collection from well-known publicly available dataset.
Handwritten text recognition is a daunting task, due to complex characteristics of handwritten letters. Deep learning based methods have achieved significant advances in recognizing challenging handwritten texts because of its ability to learn and accurately classify intricate patterns. However, there are some limitations of deep learning, such as lack of well-defined mathematical model, black-box learning mechanism, etc., which pose challenges. This paper aims at going beyond the blackbox learning and proposes a novel learning framework called as Embedded Prototype Subspace Classification, that is based on the well-known subspace method, to recognise handwritten letters in a fast and efficient manner. The effectiveness of the proposed framework is empirically evaluated on popular datasets using standard evaluation measures.
Transcription of large-scale historical handwritten document images is a tedious task. Machine learning techniques, such as deep learning, are popularly used for quick transcription, but often require a substantial amount of pre-transcribed word examples for training. Instead of line-by-line word transcription, this paper proposes a simple training-free gamification strategy where all occurrences of each arbitrarily selected word is transcribed once, using an intelligent user interface implemented in this work. The proposed approach offers a fast and user-friendly semi-automatic transcription that allows multiple users to work on the same document collection simultaneously.
Automatic recognition of historical handwritten manuscripts is a daunting task due to paper degradation over time. Recognition-free retrieval or word spotting is popularly used for information retrieval and digitization of the historical handwritten documents. However, the performance of word spotting algorithms depends heavily on feature detection and representation methods. Although there exist popular feature descriptors such as Scale Invariant Feature Transform (SIFT) and Speeded Up Robust Features (SURF), the invariant properties of these descriptors amplify the noise in the degraded document images, rendering them more sensitive to noise and complex characteristics of historical manuscripts. Therefore, an efficient and relaxed feature descriptor is required as handwritten words across different documents are indeed similar, but not identical. This paper introduces a Radial Line Fourier (RLF) descriptor for handwritten word representation, with a short feature vector of 32 dimensions. A segmentation-free and training-free handwritten word spotting method is studied herein that relies on the proposed RLF descriptor, takes into account different keypoint representations and uses a simple preconditioner-based feature matching algorithm. The effectiveness of the RLF descriptor for segmentation-free handwritten word spotting is empirically evaluated on well-known historical handwritten datasets using standard evaluation measures.
This paper presents an approach towards word recognition based on embedded prototype subspace classification. The purpose of this paper is three-fold. Firstly, a new dataset for word recognition is presented, which is extracted from the Esposalles database consisting of the Barcelona cathedral marriage records. Secondly, different clustering techniques are evaluated for Embedded Prototype Subspace Classifiers. The dataset, containing 30 different classes of words is heavily imbalanced, and some word classes are very similar, which renders the classification task rather challenging. For ease of use, no stratified sampling is done in advance, and the impact of different data splits is evaluated for different clustering techniques. It will be demonstrated that the original clustering technique based on scaling the bandwidth has to be adjusted for this new dataset. Thirdly, an algorithm is therefore proposed that finds k clusters, striving to obtain a certain amount of feature points in each cluster, rather than finding some clusters based on scaling the Silverman’s rule of thumb. Furthermore, Self Organising Maps are also evaluated as both a clustering and embedding technique.
Transcribing struck-through, handwritten words, for example for the purpose of genetic criticism, can pose a challenge to both humans and machines, due to the obstructive properties of the superimposed strokes. This paper investigates the use of paired image to image translation approaches to remove strikethrough strokes from handwritten words. Four different neural network architectures are examined, ranging from a few simple convolutional layers to deeper ones, employing Dense blocks. Experimental results, obtained from one synthetic and one genuine paired strikethrough dataset, confirm that the proposed paired models outperform the CycleGAN-based state of the art, while using less than a sixth of the trainable parameters.
Obtaining the original, clean forms of struck-through handwritten words can be of interest to literary scholars, focusing on tasks such as genetic criticism. In addition to this, replacing struck-through words can also have a positive impact on text recognition tasks. This work presents a novel unsupervised approach for strikethrough removal from handwritten words, employing cycle-consistent generative adversarial networks (CycleGANs). The removal performance is improved upon by extending the network with an attribute-guided approach. Furthermore, two new datasets, a synthetic multi-writer set, based on the IAM database, and a genuine single-writer dataset, are introduced for the training and evaluation of the models. The experimental results demonstrate the efficacy of the proposed method, where the examined attribute-guided models achieve F1 scores above 0.8 on the synthetic test set, improving upon the performance of the regular CycleGAN. Despite being trained exclusively on the synthetic dataset, the examined models even produce convincing cleaned images for genuine struck-through words.
Word spotting is popularly used for digitisation and transcription of historical handwritten documents. Recently, deep learning based methods have dominated the current state-of-the-art in learning-based word spotting. However, deep learning architectures such as Convolutional Neural Networks (CNNs) require a large amount of training data, and suffer from translation invariance. Capsule Networks (CapsNet) have been recently introduced as a data-efficient alternative to CNNs. This work explores the applicability of CapsNets for segmentation-based word spotting, and is the first such effort in the Handwritten Text Recognition (HTR) community to the best of authors' knowledge. The effectiveness of CapsNets will be empirically evaluated on well-known historical handwritten datasets using standard evaluation measures. The impact of varying amounts of training data on the recognition performance will be investigated, along with a comparison with the state-of-the-art methods.
This work proposes an attention-based sequence-to-sequence model for handwritten word recognition and explores transfer learning for data-efficient training of HTR systems. To overcome training data scarcity, this work leverages models pre-trained on scene text images as a starting point towards tailoring the handwriting recognition models. ResNet feature extraction and bidirectional LSTM-based sequence mod- eling stages together form an encoder. The prediction stage consists of a decoder and a content-based attention mechanism. The effectiveness of the proposed end-to-end HTR system has been empirically evaluated on a novel multi-writer dataset Imgur5K and the IAM dataset. The experi- mental results evaluate the performance of the HTR framework, further supported by an in-depth analysis of the error cases. Source code and pre-trained models are available at GitHub.
This work proposes an attention-based sequence-to-sequence model for handwritten word recognition and explores transfer learning for data-efficient training of HTR systems. To overcome training data scarcity, this work leverages models pre-trained on scene text images as a starting point towards tailoring the handwriting recognition models. ResNet feature extraction and bidirectional LSTM-based sequence modeling stages together form an encoder. The prediction stage consists of a decoder and a content-based attention mechanism. The effectiveness of the proposed end-to-end HTR system has been empirically evaluated on a novel multi-writer dataset Imgur5K and the IAM dataset. The experimental results evaluate the performance of the HTR framework, further supported by an in-depth analysis of the error cases. Source code and pre-trained models are available at GitHub (https://github.com/dmitrijsk/AttentionHTR).