uu.seUppsala University Publications
Change search
Refine search result
1 - 13 of 13
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Rows per page
  • 5
  • 10
  • 20
  • 50
  • 100
  • 250
Sort
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
  • Standard (Relevance)
  • Author A-Ö
  • Author Ö-A
  • Title A-Ö
  • Title Ö-A
  • Publication type A-Ö
  • Publication type Ö-A
  • Issued (Oldest first)
  • Issued (Newest first)
  • Created (Oldest first)
  • Created (Newest first)
  • Last updated (Oldest first)
  • Last updated (Newest first)
  • Disputation date (earliest first)
  • Disputation date (latest first)
Select
The maximal number of hits you can export is 250. When you want to export more records please use the Create feeds function.
  • 1.
    Wahlberg, Fredrik
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Interpreting the Script: Image Analysis and Machine Learning for Quantitative Studies of Pre-modern Manuscripts2017Doctoral thesis, comprehensive summary (Other academic)
    Abstract [en]

    The humanities have for a long time been a collection of fields that have not gained from the advancements in computational power, as predicted by Moore´s law.  Fields like medicine, biology, physics, chemistry, geology and economics have all developed quantitative tools that take advantage of the exponential increase of processing power over time.  Recent advances in computerized pattern recognition, in combination with a rapid digitization of historical document collections around the world, is about to change this.

    The first part of this dissertation focuses on constructing a full system for finding handwritten words in historical manuscripts. A novel segmentation algorithm is presented, capable of finding and separating text lines in pre-modern manuscripts.  Text recognition is performed by translating the image data of the text lines into sequences of numbers, called features. Commonly used features are analysed and evaluated on manuscript sources from the Uppsala University library Carolina Rediviva and the US Library of Congress.  Decoding the text in the vast number of photographed manuscripts from our libraries makes computational linguistics and social network analysis directly applicable to historical sources. Hence, text recognition is considered a key technology for the future of computerized research methods in the humanities.

    The second part of this thesis addresses digital palaeography, using a computers superior capacity for endlessly performing measurements on ink stroke shapes. Objective criteria of character shapes only partly catches what a palaeographer use for assessing similarity. The palaeographer often gets a feel for the scribe's style.  This is, however, hard to quantify.  A method for identifying the scribal hands of a pre-modern copy of the revelations of saint Bridget of Sweden, using semi-supervised learning, is presented.  Methods for production year estimation are presented and evaluated on a collection with close to 11000 medieval charters.  The production dates are estimated using a Gaussian process, where the uncertainty is inferred together with the most likely production year.

    In summary, this dissertation presents several novel methods related to image analysis and machine learning. In combination with recent advances of the field, they enable efficient computational analysis of very large collections of historical documents.

    List of papers
    1. Data Mining Medieval Documents by Word Spotting
    Open this publication in new window or tab >>Data Mining Medieval Documents by Word Spotting
    2011 (English)In: Proceedings of the 2011 Workshop on Historical Document Imaging and Processing, New York: ACM , 2011, p. 75-82Conference paper, Published paper (Refereed)
    Abstract [en]

    This paper presents novel results for word spotting based on dynamic time warping applied to medieval manuscripts in Latin and Old Swedish. A target word is marked by a user, and the method automatically finds similar word forms in the document by matching them against the target. The method automatically identifies pages and lines. We show that our method improves accuracy compared to earlier proposals for this kind of handwriting. An advantage of the new method is that it performs matching within a text line without presupposing that the difficult problem of segmenting the text line into individual words has been solved. We evaluate our word spotting implementation on two medieval manuscripts representing two script types. We also show that it can be useful by helping a user find words in a manuscript and present graphs of word statistics as a function of page number.

    Place, publisher, year, edition, pages
    New York: ACM, 2011
    National Category
    Humanities and the Arts Natural Sciences Language Technology (Computational Linguistics)
    Research subject
    Computational Linguistics; Computerized Image Processing
    Identifiers
    urn:nbn:se:uu:diva-162428 (URN)10.1145/2037342.2037355 (DOI)978-1-4503-0916-5 (ISBN)
    Conference
    Workshop on Historical Document Imaging and Processing, 16-17 Sep 2011, Beijing, China
    Available from: 2011-11-30 Created: 2011-11-30 Last updated: 2018-01-12Bibliographically approved
    2. Graph Based Line Segmentation on Cluttered Handwritten Manuscripts
    Open this publication in new window or tab >>Graph Based Line Segmentation on Cluttered Handwritten Manuscripts
    2012 (English)In: Proceedings of the 21st International Conference on Pattern Recognition, 2012, IEEE , 2012, p. 1570-1573Conference paper, Published paper (Refereed)
    Abstract [en]

    We propose a two phase line segmentationmethod for handwritten pre-modern densely writ-ten manuscripts. The proposed method combinesthe robustness of projection based methods withthe flexibility of graph based methods. The resultare cut-outs of the image containing each text line.Overlapping characters, help lines and degradationcan create foreground elements spanning several linesthat are hard to separate. We treat the problem offinding a cut through the text line separation as agraph optimization problem, which allows for flexibleseparation of entangled components.The proposed method has been tested on two me-dieval sources with satisfying results. A comparison tosimilar methods, using standard metrics, is presented.

    Place, publisher, year, edition, pages
    IEEE, 2012
    National Category
    Computer Vision and Robotics (Autonomous Systems)
    Identifiers
    urn:nbn:se:uu:diva-188588 (URN)978-1-4673-2216-4 (ISBN)
    Conference
    21st International Conference on Pattern Recognition (ICPR), 2012
    Available from: 2012-12-17 Created: 2012-12-17 Last updated: 2018-01-11Bibliographically approved
    3. Feature Weight Optimization and Pruning in Historical Text Recognition
    Open this publication in new window or tab >>Feature Weight Optimization and Pruning in Historical Text Recognition
    2013 (English)In: Advances of Visual Computing: 9th International Symposium, ISVC 2013, Rethymnon, Crete, Greece, July 29-31, 2013. Proceedings, Part II / [ed] George Bebis, Springer Berlin/Heidelberg, 2013, p. 98-107Conference paper, Published paper (Refereed)
    Abstract [en]

    In handwritten text recognition, "sliding window" feature extraction represent the visual information contained in written text as feature vector sequences. In this paper, we explore the parameter space of feature weights in search for optimal weights and feature selection using the coordinate descent method. We report a gain of about 5% AUC performance. We use a public dataset for evaluation and also discuss the effects and limitations of "word pruning," a technique in word spotting that is commonly used to boost performance and save computational time.

    Place, publisher, year, edition, pages
    Springer Berlin/Heidelberg, 2013
    Series
    Lecture Notes in Computer Science, ISSN 0302-9743 ; 8034
    Keywords
    handwritten text recognition
    National Category
    Computer Vision and Robotics (Autonomous Systems)
    Research subject
    Computerized Image Analysis; Computerized Image Processing
    Identifiers
    urn:nbn:se:uu:diva-212536 (URN)10.1007/978-3-642-41939-3_10 (DOI)000335169000010 ()978-3-642-41939-3 (ISBN)978-3-642-41938-6 (ISBN)
    Conference
    9th International Symposium, ISVC 2013, July 29-31, 2013, Rethymnon, Crete, Greece
    Projects
    From Quill to Bytesq2bq2b_vr2012
    Funder
    Swedish Research Council, 2012-5743
    Available from: 2013-12-11 Created: 2013-12-11 Last updated: 2018-01-11Bibliographically approved
    4. Feature space denoising improves word spotting
    Open this publication in new window or tab >>Feature space denoising improves word spotting
    2013 (English)In: Proc. 2nd International Workshop on Historical Document Imaging and Processing, New York: ACM Press, 2013, p. 59-66Conference paper, Published paper (Refereed)
    Abstract [en]

    Some of the sliding window features commonly used in off-line handwritten text recognition are inherently noisy or sen-sitive to image noise. In this paper, we investigate the ef-fects of several de-noising filters applied in the feature spaceand not in the image domain. The purpose is to target theintrinsic noise of these features, stemming from the com-plex shapes of handwritten characters. This noise is presenteven if the image has been captured without any kind ofartefacts or noise. An evaluation, using a public database,is presented showing that the recognition of word-spottingcan be improved considerably by using de-noising filters inthe feature space.

    Place, publisher, year, edition, pages
    New York: ACM Press, 2013
    Keywords
    OCR, handwritten text recognition, filtering
    National Category
    Computer Vision and Robotics (Autonomous Systems)
    Research subject
    Computerized Image Processing
    Identifiers
    urn:nbn:se:uu:diva-206930 (URN)10.1145/2501115.2501118 (DOI)978-1-4503-2115-0 (ISBN)
    Conference
    2nd International Workshop on Historical Document Imaging and Processing
    Projects
    q2bq2b_vr2012
    Funder
    Swedish Research Council, 2012-5743
    Available from: 2013-09-06 Created: 2013-09-06 Last updated: 2018-01-11Bibliographically approved
    5. Spotting words in medieval manuscripts
    Open this publication in new window or tab >>Spotting words in medieval manuscripts
    2014 (English)In: Studia Neophilologica, ISSN 0039-3274, E-ISSN 1651-2308, Vol. 86, p. 171-186Article in journal (Refereed) Published
    Abstract [en]

    This article discusses the technology of handwritten text recognition (HTR) as a tool for the analysis of historical handwritten documents. We give a broad overview of this field of research, but the focus is on the use of a method called word spotting' for finding words directly and automatically in scanned images of manuscript pages. We illustrate and evaluate this method by applying it to a medieval manuscript. Word spotting uses digital image analysis to represent stretches of writing as sequences of numerical features. These are intended to capture the linguistically significant aspects of the visual shape of the writing. Two potential words can then be compared mathematically and their degree of similarity assigned a value. Our version of this method gives a false positive rate of about 30%, when the true positive rate is close to 100%, for an application where we search for very frequent short words in a 16th-Century Old Swedish cursiva recentior manuscript. Word spotting would be of use e.g. to researchers who want to explore the content of manuscripts when editions or other transcriptions are unavailable.

    National Category
    Computer and Information Sciences General Language Studies and Linguistics Language Technology (Computational Linguistics)
    Research subject
    Computational Linguistics
    Identifiers
    urn:nbn:se:uu:diva-227725 (URN)10.1080/00393274.2013.871975 (DOI)000335850200012 ()
    Available from: 2014-01-20 Created: 2014-06-30 Last updated: 2018-01-11Bibliographically approved
    6. Scribal Attribution using a Novel 3-D Quill-Curvature Feature Histogram
    Open this publication in new window or tab >>Scribal Attribution using a Novel 3-D Quill-Curvature Feature Histogram
    2014 (English)In: Proceedings International Conference on Frontiers in Handwriting Recognition (ICFHR), 2014, 2014Conference paper, Published paper (Refereed)
    Abstract [en]

    In this paper, we propose a novel pipeline forautomated scribal attribution based on the Quill feature: 1) Wecompensate the Quill feature histogram for pen changes andpage warping. 2) We add curvature as a third dimension in thefeature histogram, to better separate characteristics like loopsand lines. 3) We also investigate the use of several dissimilaritymeasures between the feature histograms. 4) We propose andevaluate semi-supervised learning for classification, to reducethe need of labeled samples.Our evaluation is performed on 1104 pages from a 15thcentury Swedish manuscript. It was chosen because it repre-sents a significant part of Swedish manuscripts of said period.Our results show that only a few percent of the materialneed labelling for average precisions above 95%. Our novelcurvature and registration extensions, together with semi-supervised learning, outperformed the current Quill feature.

    Keywords
    writer identification; semi-supervised learning; classification; historical manuscripts
    National Category
    Computer Sciences
    Research subject
    Computer Science; Computerized Image Processing
    Identifiers
    urn:nbn:se:uu:diva-238270 (URN)
    Conference
    The International Conference on Frontiers in Handwriting Recognition (ICFHR), September 1-4, 2014, Crete, Greece
    Projects
    q2bq2b_vr2012
    Funder
    Swedish Research Council, 2012-5743
    Available from: 2014-12-11 Created: 2014-12-11 Last updated: 2018-05-03Bibliographically approved
    7. Large scale style based dating of medieval manuscripts
    Open this publication in new window or tab >>Large scale style based dating of medieval manuscripts
    2015 (English)In: Proc. 3rd International Workshop on Historical Document Imaging and Processing, New York: ACM Press, 2015, p. 107-114Conference paper, Published paper (Refereed)
    Place, publisher, year, edition, pages
    New York: ACM Press, 2015
    National Category
    Computer Vision and Robotics (Autonomous Systems)
    Research subject
    Computerized Image Processing
    Identifiers
    urn:nbn:se:uu:diva-261747 (URN)10.1145/2809544.2809560 (DOI)978-1-4503-3602-4 (ISBN)
    Conference
    HIP 2015, August 22, Nancy, France
    Available from: 2015-08-22 Created: 2015-09-03 Last updated: 2018-06-19Bibliographically approved
    8. Large scale continuous dating of medieval scribes using a combined image and language model
    Open this publication in new window or tab >>Large scale continuous dating of medieval scribes using a combined image and language model
    2016 (English)Conference paper, Published paper (Refereed)
    Abstract [en]

    Finding the production date of a pre-modern manuscript is commonly a long process in historical research, requiring days of work from highly specialised experts. In this paper, we present an automatic dating method based on modelling both the language and the image data. By creating a statistical model over the changes in the pen strokes and short character sequences in the transcribed text, a combination of multiple estimators give a distribution over the time line for each manuscript. We have evaluated our estimation scheme on the medieval charter collection "Svenskt Diplomatariums huvudkartotek" (SDHK), including more than 5300 transcribed charters from the period 1135 - 1509. Our system is capable of achieving a median absolute error of 12 years, where the only human input is a transcription of the charter text. Since reading and transcribing the text is a skill that many researchers and students have, compared to the more specialized skill of dating medieval manuscripts based on palaeographical expertise, we find our novel approach suitable for helping individual researchers to date collections of manuscript pages. For larger collections, transcriptions could also be collected using crowd sourcing.

    National Category
    Computer Vision and Robotics (Autonomous Systems)
    Research subject
    Computerized Image Processing
    Identifiers
    urn:nbn:se:uu:diva-294882 (URN)10.1109/DAS.2016.71 (DOI)000390411200009 ()
    Conference
    12th IAPR International Workshop on Document Analysis Systems (DAS), APR 11-14, 2016, Greece
    Projects
    q2bq2b_vr2012
    Funder
    Swedish Research Council, 2012-5743
    Available from: 2016-05-30 Created: 2016-05-30 Last updated: 2018-05-04Bibliographically approved
    9. Historical Manuscript Production Date Estimation using Deep Convolutional Neural Networks
    Open this publication in new window or tab >>Historical Manuscript Production Date Estimation using Deep Convolutional Neural Networks
    2016 (English)Conference paper, Published paper (Refereed)
    Abstract [en]

    Deep learning has thus far not been used for dating of pre-modern handwritten documents. In this paper, we propose ways of using deep convolutional neural networks (CNNs) to estimate production dates for such manuscripts. In our approach, a CNN can either be used directly for estimating the production date or as a feature learning framework for other regression techniques. We explore the feature learning approach using Gaussian Processes regression and Support Vector Regression.The evaluation is performed on a unique large dataset of over 10000 medieval charters from the Swedish collection Svenskt Diplomatariums huvudkartotek (SDHK). We show that deep learning is applicable to the task of dating documents and that the performance is on average comparable to that of a human expert.

    Place, publisher, year, edition, pages
    IEEE, 2016
    Series
    International Conference on Handwriting Recognition, ISSN 2167-6445
    Keywords
    Document analysis, Manuscripts, Document dating, Digital Humanities
    National Category
    Computer Vision and Robotics (Autonomous Systems)
    Research subject
    Computerized Image Processing
    Identifiers
    urn:nbn:se:uu:diva-306685 (URN)10.1109/ICFHR.2016.114 (DOI)000400052400039 ()978-1-5090-0981-7 (ISBN)
    Conference
    International Conference on Frontiers in Handwriting Recognition (ICFHR), October 23-26, 2016, Shenzhen, China.
    Projects
    q2bq2b_vr2012
    Funder
    Swedish Research Council, 2012-5743Riksbankens Jubileumsfond, NHS14-2068:1
    Available from: 2016-11-01 Created: 2016-11-01 Last updated: 2018-04-04
  • 2.
    Wahlberg, Fredrik
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Brun, Anders
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Feature space de-noising for text recognition2014In: Proceedings of SSBA, 2014, 2014Conference paper (Other academic)
  • 3.
    Wahlberg, Fredrik
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Brun, Anders
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Feature space denoising improves word spotting2013In: Proc. 2nd International Workshop on Historical Document Imaging and Processing, New York: ACM Press, 2013, p. 59-66Conference paper (Refereed)
    Abstract [en]

    Some of the sliding window features commonly used in off-line handwritten text recognition are inherently noisy or sen-sitive to image noise. In this paper, we investigate the ef-fects of several de-noising filters applied in the feature spaceand not in the image domain. The purpose is to target theintrinsic noise of these features, stemming from the com-plex shapes of handwritten characters. This noise is presenteven if the image has been captured without any kind ofartefacts or noise. An evaluation, using a public database,is presented showing that the recognition of word-spottingcan be improved considerably by using de-noising filters inthe feature space.

  • 4.
    Wahlberg, Fredrik
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Brun, Anders
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Feature Weight Optimization and Pruning in Historical Text Recognition2013In: Advances of Visual Computing: 9th International Symposium, ISVC 2013, Rethymnon, Crete, Greece, July 29-31, 2013. Proceedings, Part II / [ed] George Bebis, Springer Berlin/Heidelberg, 2013, p. 98-107Conference paper (Refereed)
    Abstract [en]

    In handwritten text recognition, "sliding window" feature extraction represent the visual information contained in written text as feature vector sequences. In this paper, we explore the parameter space of feature weights in search for optimal weights and feature selection using the coordinate descent method. We report a gain of about 5% AUC performance. We use a public dataset for evaluation and also discuss the effects and limitations of "word pruning," a technique in word spotting that is commonly used to boost performance and save computational time.

  • 5.
    Wahlberg, Fredrik
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Brun, Anders
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Graph Based Line Segmentation on Cluttered Handwritten Manuscripts2012In: Proceedings of the 21st International Conference on Pattern Recognition, 2012, IEEE , 2012, p. 1570-1573Conference paper (Refereed)
    Abstract [en]

    We propose a two phase line segmentationmethod for handwritten pre-modern densely writ-ten manuscripts. The proposed method combinesthe robustness of projection based methods withthe flexibility of graph based methods. The resultare cut-outs of the image containing each text line.Overlapping characters, help lines and degradationcan create foreground elements spanning several linesthat are hard to separate. We treat the problem offinding a cut through the text line separation as agraph optimization problem, which allows for flexibleseparation of entangled components.The proposed method has been tested on two me-dieval sources with satisfying results. A comparison tosimilar methods, using standard metrics, is presented.

  • 6.
    Wahlberg, Fredrik
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Brun, Anders
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Mårtensson, Lasse
    Writer identification using the Quill-Curvature feature in old manuscripts2015In: Proceedings of SSBA, 2015, 2015Conference paper (Other academic)
  • 7.
    Wahlberg, Fredrik
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Centre for Image Analysis. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Dahllöf, Mats
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Mårtensson, Lasse
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Scandinavian Languages.
    Brun, Anders
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Centre for Image Analysis. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Data Mining Medieval Documents by Word Spotting2011In: Proceedings of the 2011 Workshop on Historical Document Imaging and Processing, New York: ACM , 2011, p. 75-82Conference paper (Refereed)
    Abstract [en]

    This paper presents novel results for word spotting based on dynamic time warping applied to medieval manuscripts in Latin and Old Swedish. A target word is marked by a user, and the method automatically finds similar word forms in the document by matching them against the target. The method automatically identifies pages and lines. We show that our method improves accuracy compared to earlier proposals for this kind of handwriting. An advantage of the new method is that it performs matching within a text line without presupposing that the difficult problem of segmenting the text line into individual words has been solved. We evaluate our word spotting implementation on two medieval manuscripts representing two script types. We also show that it can be useful by helping a user find words in a manuscript and present graphs of word statistics as a function of page number.

  • 8.
    Wahlberg, Fredrik
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Dahllöf, Mats
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Mårtensson, Lasse
    Brun, Anders
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Spotting words in medieval manuscripts2014In: Studia Neophilologica, ISSN 0039-3274, E-ISSN 1651-2308, Vol. 86, p. 171-186Article in journal (Refereed)
    Abstract [en]

    This article discusses the technology of handwritten text recognition (HTR) as a tool for the analysis of historical handwritten documents. We give a broad overview of this field of research, but the focus is on the use of a method called word spotting' for finding words directly and automatically in scanned images of manuscript pages. We illustrate and evaluate this method by applying it to a medieval manuscript. Word spotting uses digital image analysis to represent stretches of writing as sequences of numerical features. These are intended to capture the linguistically significant aspects of the visual shape of the writing. Two potential words can then be compared mathematically and their degree of similarity assigned a value. Our version of this method gives a false positive rate of about 30%, when the true positive rate is close to 100%, for an application where we search for very frequent short words in a 16th-Century Old Swedish cursiva recentior manuscript. Word spotting would be of use e.g. to researchers who want to explore the content of manuscripts when editions or other transcriptions are unavailable.

  • 9.
    Wahlberg, Fredrik
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Dahllöf, Mats
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
    Mårtensson, Lasse
    Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Scandinavian Languages.
    Brun, Anders
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Word Spotting in Pre-Modern Manuscripts using Dynamic Time Warping2012In: Proceedings of SSBA, 2012, 2012Conference paper (Other academic)
  • 10. Wahlberg, Fredrik
    et al.
    Medvedev, Alexander
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Systems and Control. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Automatic control.
    Rosén, Olov
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Systems and Control. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Automatic control.
    A LEGO-Based Mobile Robotic Platform for Evaluation of Parallel Control and Estimation Algorithms2011In: Proc. 50th Conference on Decision and Control, Piscataway, NJ: IEEE , 2011, p. 4548-4553Conference paper (Refereed)
    Abstract [en]

    An inexpensive robotic system intended for educational use in parallel algorithms for embedded control and signal processing is described. The hardware platform is comprised of a state-of-the-art multi-core system in a wireless network with several mobile LEGO robots that collect data from their environment. The setup covers a broad range of real-time cooperative and parallel problems arising in sensor networks, robotics, surveillance and high-performance embedded applications. As an illustration, a bearings-only tracking problem, estimating both mobile robots positions and the position of a non-cooperating target by using parallel particle filtering, is solved on the proposed platform. In order to improve the estimation accuracy and to adjust to changes in the environment and movements of the target, a controller positioning the mobile robots is utilized.

  • 11.
    Wahlberg, Fredrik
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction.
    Mårtensson, Lasse
    Univ Gavle, Dept Business Studies, Gavle, Sweden.
    Brun, Anders
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Large scale continuous dating of medieval scribes using a combined image and language model2016Conference paper (Refereed)
    Abstract [en]

    Finding the production date of a pre-modern manuscript is commonly a long process in historical research, requiring days of work from highly specialised experts. In this paper, we present an automatic dating method based on modelling both the language and the image data. By creating a statistical model over the changes in the pen strokes and short character sequences in the transcribed text, a combination of multiple estimators give a distribution over the time line for each manuscript. We have evaluated our estimation scheme on the medieval charter collection "Svenskt Diplomatariums huvudkartotek" (SDHK), including more than 5300 transcribed charters from the period 1135 - 1509. Our system is capable of achieving a median absolute error of 12 years, where the only human input is a transcription of the charter text. Since reading and transcribing the text is a skill that many researchers and students have, compared to the more specialized skill of dating medieval manuscripts based on palaeographical expertise, we find our novel approach suitable for helping individual researchers to date collections of manuscript pages. For larger collections, transcriptions could also be collected using crowd sourcing.

  • 12.
    Wahlberg, Fredrik
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Mårtensson, Lasse
    Högskolan i Gävle.
    Brun, Anders
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Large scale style based dating of medieval manuscripts2015In: Proc. 3rd International Workshop on Historical Document Imaging and Processing, New York: ACM Press, 2015, p. 107-114Conference paper (Refereed)
  • 13.
    Wahlberg, Fredrik
    et al.
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Mårtensson, Lasse
    Brun, Anders
    Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
    Scribal Attribution using a Novel 3-D Quill-Curvature Feature Histogram2014In: Proceedings International Conference on Frontiers in Handwriting Recognition (ICFHR), 2014, 2014Conference paper (Refereed)
    Abstract [en]

    In this paper, we propose a novel pipeline forautomated scribal attribution based on the Quill feature: 1) Wecompensate the Quill feature histogram for pen changes andpage warping. 2) We add curvature as a third dimension in thefeature histogram, to better separate characteristics like loopsand lines. 3) We also investigate the use of several dissimilaritymeasures between the feature histograms. 4) We propose andevaluate semi-supervised learning for classification, to reducethe need of labeled samples.Our evaluation is performed on 1104 pages from a 15thcentury Swedish manuscript. It was chosen because it repre-sents a significant part of Swedish manuscripts of said period.Our results show that only a few percent of the materialneed labelling for average precisions above 95%. Our novelcurvature and registration extensions, together with semi-supervised learning, outperformed the current Quill feature.

1 - 13 of 13
CiteExportLink to result list
Permanent link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf