uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Learning based Word Search and Visualisation for Historical Manuscript Images
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.ORCID iD: 0000-0002-6783-1744
2019 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

Today, work with historical manuscripts is nearly exclusively done manually, by researchers in the humanities as well as laypeople mapping out their personal genealogy. This is a highly time consuming endeavour as it is not uncommon to spend months with the same volume of a few hundred pages. The last few decades have seen an ongoing effort to digitise manuscripts, both preservation purposes and to increase accessibility. This has the added effect of enabling the use methods and algorithms from Image Analysis and Machine Learning that have great potential in both making existing work more efficient and creating new methodologies for manuscript-based research.

The first part of this thesis focuses on Word Spotting, the task of searching for a given text query in a manuscript collection. This can be broken down into two tasks, detecting where the words are located on the page, and then ranking the words according to their similarity to a search query. We propose Deep Learning models to do both, separately and then simultaneously, and successfully search through a large manuscript collection consisting of over a hundred thousand pages.

A limiting factor in applying learning-based methods to historical manuscript images is the cost, and therefore, lack of annotated data needed to train machine learning models. We propose several ways to mitigate this problem, including generating synthetic data, augmenting existing data to get better value from it, and learning from pre-existing, partially annotated data that was previously unusable.

In the second part, a method for visualising manuscript collections called the Image-based Word Cloud is proposed. Much like it text-based counterpart, it arranges the most representative words in a collection into a cloud, where the size of the words are proportional to their frequency of occurrence. This grants a user a single image overview of a manuscript collection, regardless of its size. We further propose a way to estimate a manuscripts production date. This can grant historians context that is crucial for correctly interpreting the contents of a manuscript.

Place, publisher, year, edition, pages
Uppsala: Acta Universitatis Upsaliensis, 2019. , p. 82
Series
Digital Comprehensive Summaries of Uppsala Dissertations from the Faculty of Science and Technology, ISSN 1651-6214 ; 1798
Keywords [en]
Word Spotting, Convolutional Neural Networks, Deep Learning, Region Proposals, Historical Manuscripts, Computer Vision, Image Analysis, Visualisation, Document Analysis
National Category
Computer Vision and Robotics (Autonomous Systems)
Research subject
Computerized Image Processing
Identifiers
URN: urn:nbn:se:uu:diva-381308ISBN: 978-91-513-0633-9 (print)OAI: oai:DiVA.org:uu-381308DiVA, id: diva2:1303103
Public defence
2019-06-04, TLS (Tidskriftläsesalen), Carolina Rediviva, Dag Hammarskjölds väg 1, Uppsala, 10:15 (English)
Opponent
Supervisors
Funder
Swedish Research Council, 2012-5743Riksbankens Jubileumsfond, NHS14-2068:1Available from: 2019-05-13 Created: 2019-04-08 Last updated: 2019-06-18
List of papers
1. Bootstrapping Weakly Supervised Segmentation-free Word Spotting through HMM-based Alignment
Open this publication in new window or tab >>Bootstrapping Weakly Supervised Segmentation-free Word Spotting through HMM-based Alignment
(English)Manuscript (preprint) (Other academic)
Abstract [en]

Recent work in word spotting in handwritten documents has yielded impressive results. Yet this progress has largely been made by supervised learning systems which are dependant on manually annotated data, making deployment to new collections a significant effort. In this paper we propose an approach utilising transcriptions without bounding box annotations to train segmentation-free word spotting models, given a model partially trained with full annotations. This is done through an alignment procedure based on hidden Markov models. This model can create a tentative mapping between word region proposals and the transcriptions to automatically create additional weakly annotated training data. Using as little as 1% and 10% of the fully annotated training sets for partial convergence, we automatically annotate the remaining training data and successfully train using it. Across all datasets, our approach comes within a few mAP% of achieving the same performance as a model trained with only full ground truth. We believe that this will be a significant advance towards a more general use of word spotting, since digital transcription data will already exist for parts of many collections of interest.

Keywords
weakly supervised, segmentation-free word spotting, convolutional neural network, hidden Markov model
National Category
Computer Vision and Robotics (Autonomous Systems)
Research subject
Computerized Image Processing
Identifiers
urn:nbn:se:uu:diva-381304 (URN)
Projects
q2b
Funder
Swedish Research Council, 2012-5743Riksbankens Jubileumsfond, NHS14-2068:1
Available from: 2019-04-07 Created: 2019-04-07 Last updated: 2019-04-08
2. Neural Word Search in Historical Manuscript Collections
Open this publication in new window or tab >>Neural Word Search in Historical Manuscript Collections
(English)Manuscript (preprint) (Other academic)
Abstract [en]

We address the problem of segmenting and retrieving word images in collections of historical manuscripts given a text query. This is commonly referred to as "word spotting". To this end, we first propose an end-to-end trainable model based on deep neural networks that we dub Ctrl-F-Net. The model simultaneously generates region proposals and embeds them into a word embedding space, wherein a search is performed. We further introduce a simplified version called Ctrl-F-Mini. It is faster with similar performance, though it is limited to more easily segmented manuscripts. We evaluate both models on common benchmark datasets and surpass the previous state of the art. Finally, in collaboration with historians, we employ the Ctrl-F-Net to search within a large manuscript collection of over 100 thousand pages, written across two centuries. With only 11 training pages, we enable large scale data collection in manuscript-based historical research. This results in a speed up of data collection and the number of manuscripts processed by orders of magnitude. Given the time consuming manual work required to study old manuscripts in the humanities, quick and robust tools for word spotting has the potential to revolutionise domains like history, religion and language.

Keywords
Word spotting, Historical Manuscripts, Deep Convolutional Neural Network, Region Proposals
National Category
Computer Vision and Robotics (Autonomous Systems)
Research subject
Computerized Image Processing
Identifiers
urn:nbn:se:uu:diva-381306 (URN)
Projects
q2b
Funder
Swedish Research Council, 2012-5743Riksbankens Jubileumsfond, NHS14-2068:1
Available from: 2019-04-07 Created: 2019-04-07 Last updated: 2019-04-08
3. Neural Ctrl-F: Segmentation-free query-by-string word spotting in handwritten manuscript collections
Open this publication in new window or tab >>Neural Ctrl-F: Segmentation-free query-by-string word spotting in handwritten manuscript collections
2017 (English)In: 2017 IEEE International Conference on Computer Vision (ICCV), IEEE, 2017, p. 4443-4452Conference paper, Published paper (Refereed)
Abstract [en]

In this paper, we approach the problem of segmentation-free query-by-string word spotting for handwritten documents. In other words, we use methods inspired from computer vision and machine learning to search for words in large collections of digitized manuscripts. In particular, we are interested in historical handwritten texts, which are often far more challenging than modern printed documents. This task is important, as it provides people with a way to quickly find what they are looking for in large collections that are tedious and difficult to read manually. To this end, we introduce an end-to-end trainable model based on deep neural networks that we call Ctrl-F-Net. Given a full manuscript page, the model simultaneously generates region proposals, and embeds these into a distributed word embedding space, where searches are performed. We evaluate the model on common benchmarks for handwritten word spotting, outperforming the previous state-of-the-art segmentation-free approaches by a large margin, and in some cases even segmentation-based approaches. One interesting real-life application of our approach is to help historians to find and count specific words in court records that are related to women's sustenance activities and division of labor. We provide promising preliminary experiments that validate our method on this task.

Place, publisher, year, edition, pages
IEEE, 2017
Series
IEEE International Conference on Computer Vision, E-ISSN 1550-5499
Keywords
Segmentation-free Word Spotting, Deep Learning, Convolutional Neural Network, Query-by-String
National Category
Computer Vision and Robotics (Autonomous Systems)
Research subject
Computerized Image Processing
Identifiers
urn:nbn:se:uu:diva-335926 (URN)10.1109/ICCV.2017.475 (DOI)000425498404054 ()978-1-5386-1032-9 (ISBN)
Conference
16th IEEE International Conference on Computer Vision (ICCV), Venice, Italy, October 22-29, 2017
Projects
q2b
Funder
Swedish Research Council, 2012-5743Riksbankens Jubileumsfond, NHS14-2068:1
Available from: 2017-12-11 Created: 2017-12-11 Last updated: 2019-04-08Bibliographically approved
4. Visualizing document image collections using image-based word clouds
Open this publication in new window or tab >>Visualizing document image collections using image-based word clouds
2015 (English)In: Advances in Visual Computing: 11th International Symposium, ISVC 2015, Las Vegas, NV, USA, December 14-16, 2015, Proceedings, Part I / [ed] Bebis, G; Boyle, R; Parvin, B; Koracin, D; Pavlidis, I; Feris, R; McGraw, T; Elendt, M; Kopper, R; Ragan, E; Ye, Z; Weber, G, Springer, 2015, p. 297-306Conference paper, Published paper (Refereed)
Place, publisher, year, edition, pages
Springer, 2015
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 9474
National Category
Computer Vision and Robotics (Autonomous Systems)
Research subject
Computerized Image Processing
Identifiers
urn:nbn:se:uu:diva-272193 (URN)10.1007/978-3-319-27857-5_27 (DOI)000376400300027 ()9783319278568 (ISBN)9783319278575 (ISBN)
Conference
ISVC 2015, December 14–16, Las Vegas, NV
Projects
q2b
Funder
Swedish Research Council, 2012-5743
Available from: 2015-12-18 Created: 2016-01-12 Last updated: 2019-04-08Bibliographically approved
5. A novel word segmentation method based on object detection and deep learning
Open this publication in new window or tab >>A novel word segmentation method based on object detection and deep learning
2015 (English)In: Advances in Visual Computing: 11th International Symposium, ISVC 2015, Las Vegas, NV, USA, December 14-16, 2015, Proceedings, Part I / [ed] Bebis, G; Boyle, R; Parvin, B; Koracin, D; Pavlidis, I; Feris, R; McGraw, T; Elendt, M; Kopper, R; Ragan, E; Ye, Z; Weber, G, Springer, 2015, p. 231-240Conference paper, Published paper (Refereed)
Abstract [en]

The segmentation of individual words is a crucial step in several data mining methods for historical handwritten documents. Examples of applications include visual searching for query words (word spotting) and character-by-character text recognition. In this paper, we present a novel method for word segmentation that is adapted from recent advances in computer vision, deep learning and generic object detection. Our method has unique capabilities and it has found practical use in our current research project. It can easily be trained for different kinds of historical documents, uses full gray scale information, does not require binarization as pre-processing or prior segmentation of individual text lines. We evaluate its performance using established error metrics, previously used in competitions for word segmentation, and demonstrate its usefulness for a 15th century handwritten document.

Place, publisher, year, edition, pages
Springer, 2015
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 9474
National Category
Computer Vision and Robotics (Autonomous Systems)
Research subject
Computerized Image Processing
Identifiers
urn:nbn:se:uu:diva-272181 (URN)10.1007/978-3-319-27857-5_21 (DOI)000376400300021 ()9783319278568 (ISBN)9783319278575 (ISBN)
Conference
ISVC 2015, December 14–16, Las Vegas, NV
Projects
q2b
Funder
Swedish Research Council, 2012-5743
Available from: 2015-12-18 Created: 2016-01-12 Last updated: 2019-04-08Bibliographically approved
6. Semantic and Verbatim Word Spotting using Deep Neural Networks
Open this publication in new window or tab >>Semantic and Verbatim Word Spotting using Deep Neural Networks
2016 (English)In: Proceedings Of 2016 15Th International Conference On Frontiers In Handwriting Recognition (Icfhr), 2016, p. 307-312Conference paper, Published paper (Refereed)
Abstract [en]

In the last few years, deep convolutional neural networks have become ubiquitous in computer vision, achieving state-of-the-art results on problems like object detection, semantic segmentation, and image captioning. However, they have not yet been widely investigated in the document analysis community. In this paper, we present a word spotting system based on convolutional neural networks. We train a network to extract a powerful image representation, which we then embed into a word embedding space. This allows us to perform wordspotting using both query-by-string and query-by-example in a variety of word embedding spaces, both learned and handcrafted, for verbatim as well as semantic word spotting. Our novel approach is versatile and the evaluation shows that it outperforms the previous state-of-the-art for word spotting on standard datasets.

Series
International Conference on Handwriting Recognition, ISSN 2167-6445
Keywords
handwritten word spotting, convolutional neural networks, deep learning, word embeddings
National Category
Computer Vision and Robotics (Autonomous Systems)
Research subject
Computerized Image Processing
Identifiers
urn:nbn:se:uu:diva-306667 (URN)10.1109/ICFHR.2016.60 (DOI)000400052400056 ()978-1-5090-0981-7 (ISBN)
Conference
15th International Conference on Frontiers in Handwriting Recognition (ICFHR), October 23-26, 2016, Shenzhen, China.
Projects
q2b
Funder
Swedish Research Council, 2012-5743Riksbankens Jubileumsfond, NHS14-2068:1
Available from: 2016-11-01 Created: 2016-11-01 Last updated: 2019-04-08
7. Historical Manuscript Production Date Estimation using Deep Convolutional Neural Networks
Open this publication in new window or tab >>Historical Manuscript Production Date Estimation using Deep Convolutional Neural Networks
2016 (English)Conference paper, Published paper (Refereed)
Abstract [en]

Deep learning has thus far not been used for dating of pre-modern handwritten documents. In this paper, we propose ways of using deep convolutional neural networks (CNNs) to estimate production dates for such manuscripts. In our approach, a CNN can either be used directly for estimating the production date or as a feature learning framework for other regression techniques. We explore the feature learning approach using Gaussian Processes regression and Support Vector Regression.The evaluation is performed on a unique large dataset of over 10000 medieval charters from the Swedish collection Svenskt Diplomatariums huvudkartotek (SDHK). We show that deep learning is applicable to the task of dating documents and that the performance is on average comparable to that of a human expert.

Place, publisher, year, edition, pages
IEEE, 2016
Series
International Conference on Handwriting Recognition, ISSN 2167-6445
Keywords
Document analysis, Manuscripts, Document dating, Digital Humanities
National Category
Computer Vision and Robotics (Autonomous Systems)
Research subject
Computerized Image Processing
Identifiers
urn:nbn:se:uu:diva-306685 (URN)10.1109/ICFHR.2016.114 (DOI)000400052400039 ()978-1-5090-0981-7 (ISBN)
Conference
International Conference on Frontiers in Handwriting Recognition (ICFHR), October 23-26, 2016, Shenzhen, China.
Projects
q2bq2b_vr2012
Funder
Swedish Research Council, 2012-5743Riksbankens Jubileumsfond, NHS14-2068:1
Available from: 2016-11-01 Created: 2016-11-01 Last updated: 2019-04-08
8. CalligraphyNet: Augmenting handwriting generation with quill based stroke width
Open this publication in new window or tab >>CalligraphyNet: Augmenting handwriting generation with quill based stroke width
2019 (English)Manuscript (preprint) (Other academic)
Abstract [en]

Realistic handwritten document generation garners a lot ofinterest from the document research community for its abilityto generate annotated data. In the current approach we haveused GAN-based stroke width enrichment and style transferbased refinement over generated data which result in realisticlooking handwritten document images. The GAN part of dataaugmentation transfers the stroke variation introduced by awriting instrument onto images rendered from trajectories cre-ated by tracking coordinates along the stylus movement. Thecoordinates from stylus movement are augmented with thelearned stroke width variations during the data augmentationblock. An RNN model is then trained to learn the variationalong the movement of the stylus along with the stroke varia-tions corresponding to an input sequence of characters. Thismodel is then used to generate images of words or sentencesgiven an input character string. A document image thus cre-ated is used as a mask to transfer the style variations of the inkand the parchment. The generated image can capture the colorcontent of the ink and parchment useful for creating annotated data.

National Category
Computer Systems
Research subject
Computerized Image Processing
Identifiers
urn:nbn:se:uu:diva-379633 (URN)
Conference
26th IEEE International Conference on Image Processing
Note

Currently under review

Available from: 2019-03-19 Created: 2019-03-19 Last updated: 2019-04-08

Open Access in DiVA

fulltext(2043 kB)140 downloads
File information
File name FULLTEXT01.pdfFile size 2043 kBChecksum SHA-512
6be1551c9994c1ae95f9381262f3feba4c59de208e41d013adc882853011adbd8ab489443fe19f91d86e7dfa2468cbf6d29a851cefcf6bf6ed112973c1b4e1ec
Type fulltextMimetype application/pdf
Buy this publication >>

Authority records BETA

Wilkinson, Tomas

Search in DiVA

By author/editor
Wilkinson, Tomas
By organisation
Division of Visual Information and InteractionComputerized Image Analysis and Human-Computer Interaction
Computer Vision and Robotics (Autonomous Systems)

Search outside of DiVA

GoogleGoogle Scholar
Total: 140 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

isbn
urn-nbn

Altmetric score

isbn
urn-nbn
Total: 695 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf