uu.seUppsala University Publications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
PDNet: Semantic segmentation integrated with a primal-dual network for document binarization
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction. (Quill2Bytes)
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction.
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Division of Visual Information and Interaction. Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology, Computerized Image Analysis and Human-Computer Interaction. (Quill2Bytes)ORCID iD: 0000-0002-4405-6888
2019 (English)In: Pattern Recognition Letters, ISSN 0167-8655, E-ISSN 1872-7344, Vol. 121, p. 52-60Article in journal (Refereed) Published
Abstract [en]

Binarization of digital documents is the task of classifying each pixel in an image of the document as belonging to the background (parchment/paper) or foreground (text/ink). Historical documents are often subjected to degradations, that make the task challenging. In the current work a deep neural network architecture is proposed that combines a fully convolutional network with an unrolled primal-dual network that can be trained end-to-end to achieve state of the art binarization on four out of seven datasets. Document binarization is formulated as an energy minimization problem. A fully convolutional neural network is trained for semantic segmentation of pixels that provides labeling cost associated with each pixel. This cost estimate is refined along the edges to compensate for any over or under estimation of the foreground class using a primal-dual approach. We provide necessary overview on proximal operator that facilitates theoretical underpinning required to train a primal-dual network using a gradient descent algorithm. Numerical instabilities encountered due to the recurrent nature of primal-dual approach are handled. We provide experimental results on document binarization competition dataset along with network changes and hyperparameter tuning required for stability and performance of the network. The network when pre-trained on synthetic dataset performs better as per the competition metrics.

Place, publisher, year, edition, pages
2019. Vol. 121, p. 52-60
National Category
Computer Vision and Robotics (Autonomous Systems)
Research subject
Computerized Image Processing
Identifiers
URN: urn:nbn:se:uu:diva-366933DOI: 10.1016/j.patrec.2018.05.011ISI: 000459876700008OAI: oai:DiVA.org:uu-366933DiVA, id: diva2:1266074
Funder
Swedish Research Council, 2012-5743Riksbankens Jubileumsfond, NHS14-2068:1Available from: 2018-05-16 Created: 2018-11-27 Last updated: 2019-03-21Bibliographically approved
In thesis
1.
The record could not be found. The reason may be that the record is no longer available or you may have typed in a wrong id in the address field.

Open Access in DiVA

The full text will be freely available from 2020-05-17 16:13
Available from 2020-05-17 16:13

Other links

Publisher's full text

Authority records BETA

Ayyalasomayajula, Kalyan RamMalmberg, FilipBrun, Anders

Search in DiVA

By author/editor
Ayyalasomayajula, Kalyan RamMalmberg, FilipBrun, Anders
By organisation
Division of Visual Information and InteractionComputerized Image Analysis and Human-Computer Interaction
In the same journal
Pattern Recognition Letters
Computer Vision and Robotics (Autonomous Systems)

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 64 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf