The GeoMemories project aims at publishing on the Web and digitally preserving historical aerial photographs that are currently stored in physical form within the archives of the Aerofototeca Nazionale in Rome. We describe a system, available at http://www.geomemories.org, that lets users visualize the evolution of the Italian landscape throughout the last century. The Web portal allows comparison of recent satellite imagery with several layers of historical maps, obtained from the aerial photos through a complex workflow that merges them together. We present several case studies carried out in collaboration with geologists, historians and archaeologists, that illustrate the great potential of our system in different research fields. Experiments and advances in image processing technologies are envisaged as a key factor in solving the inherent issue of vast amounts of manual work, from georeferencing to mosaicking to analysis.
Comparing staining patterns of paired antibodies designed towards a specific protein but toward different epitopes of the protein provides quality control over the binding and the antibodies' ability to identify the target protein correctly and exclusively. We present a method for automated quantification of immunostaining patterns for antibodies in breast tissue using the Human Protein Atlas database. In such tissue, dark brown dye 3,3'-diaminobenzidine is used as an antibody-specific stain whereas the blue dye hematoxylin is used as a counterstain. The proposed method is based on clustering and relative scaling of features following principal component analysis. Our method is able (1) to accurately segment and identify staining patterns and quantify the amount of staining and (2) to detect paired antibodies by correlating the segmentation results among different cases. Moreover, the method is simple, operating in a low-dimensional feature space, and computationally efficient which makes it suitable for high-throughput processing of tissue microarrays.
Due to the complexity of biological tissue and variations in staining procedures, features that are based on the explicit extraction of properties from subglandular structures in tissue images may have difficulty generalizing well over an unrestricted set of images and staining variations. We circumvent this problem by an implicit representation that is both robust and highly descriptive, especially when combined with a multiple instance learning approach to image classification. The new feature method is able to describe tissue architecture based on glandular structure. It is based on statistically representing the relative distribution of tissue components around lumen regions, while preserving spatial and quantitative information, as a basis for diagnosing and analyzing different areas within an image. We demonstrate the efficacy of the method in extracting discriminative features for obtaining high classification rates for tubular formation in both healthy and cancerous tissue, which is an important component in Gleason and tubule-based Elston grading. The proposed method may be used for glandular classification, also in other tissue types, in addition to general applicability as a region-based feature descriptor in image analysis where the image represents a bag with a certain label (or grade) and the region-based feature vectors represent instances.
Circles are one of the basic drawing primitives for computers and while the naive way of setting up an equation for drawing circles is simple, implementing it in an efficient way using integer arithmetic has resulted in quite a few different algorithms. We present a short chronological overview of the most important publications of such digital circle generation algorithms. Bresenham is often assumed to have invented the first all integer circle algorithm. However, there were other algorithms published before his first official publication, which did not use floating point operations. Furthermore, we present both a 4- and an 8-connected all integer algorithm. Both of them proceed without any multiplication, using just one addition per iteration to compute the decision variable, which makes them more efficient than previously published algorithms.
Ability to learn from a single instance is something unique to the human species and One-shot learning algorithms try to mimic this special capability. On the other hand, despite the fantastic performance of Deep Learning-based methods on various image classification problems, performance often depends having on a huge number of annotated training samples per class. This fact is certainly a hindrance in deploying deep neural network-based systems in many real-life applications like face recognition. Furthermore, an addition of a new class to the system will require the need to re-train the whole system from scratch. Nevertheless, the prowess of deep learned features could also not be ignored. This research aims to combine the best of deep learned features with a traditional One-Shot learning framework. Results obtained on 2 publicly available datasets are very encouraging achieving over 90% accuracy on 5-way One-Shot tasks, and 84% on 50-way One-Shot problems.
Logo and Seal serves the purpose of authenticating and referring to the source of a document. This strategy was also prevalent in the medieval period. Different algorithm exists for detection of logo and seal in document images. A close look into the present state-of-the-art methods reveals that those methods were focused toward detection of logo and seal in contemporary document images. However, such methods are likely to underperform while dealing with historical documents. This is due to the fact that historical documents are attributed with additional challenges like extra noise, bleed-through effect, blurred foreground elements and low contrast. The proposed method frames the problem of the logo and seals detection in an object detection framework. Using a deep-learning technique it counters earlier mentioned problems and evades the need for any pre-processing stage like layout analysis and/or binarization in the system pipeline. The experiments were conducted on historical images from 12th to the 16th century and the results obtained were very encouraging for detecting logo in historical document images. To the best of our knowledge, this is the first attempt on logo detection in historical document images using an object-detection based approach.
One of the major difficulties in face recognition while comparing photographs of individuals of different ages is the influence of age progression on their facial features. As a person ages, the face undergoes many changes, such as geometrical changes, changes in facial hair, and the presence of glasses, among others. Although biometric markers like computed face feature vectors should ideally remain unchanged by such factors, face recognition becomes less reliable as the age range increases. Therefore, this investigation was carried out to examine how the use of Embedded Prototype Subspace Classifiers could improve face recognition accuracy when dealing with age-related variations using face feature vectors only.
Efficient face image retrieval, i.e. searching for existing photographs of a person in unlabelled photo collections using a query photo, is evaluated for a novel method to find the top $n$ results for Consensus Ranking. The approach aims to maximise precision and recall by using the retrieved photos, all ranked on similarity. The proposed method aims to retrieve all photos of the queried person while excluding images of other individuals. To achieve this, the method uses the top n results as temporary queries, recalculates similarities, and combines the obtained ranked lists to produce a better overall ranking. The method includes a novel and reliable procedure for selecting $n$, which is evaluated on two datasets, and considers the impact of age variation in the datasets.
Word spotting use a query word image to find any instances of that word among document images. The obtained list of words is ranked according to similarity to the query word. Ideally, any false positives should only occur in the end of that list. However, in reality they often occur higher up, which decreases the so called mean average precision. It is shown how creating new ranked lists by re-scoring using the top n occurrences in the original list, and then fusing the scores, can increase the mean average precision.
The purpose of this paper is to in detail describe and analyse a Fourier based handcrafted descriptor for word recognition. Especially, it is discussed how the Variability in the results can be analysed and visualised. This efficiency of the descriptor is evaluated for the use with embedded prototype subspace classifiers for handwritten word recognition. Nonetheless, it can be used with any classifier for any purpose. An hierarchical composition of discrete semicircles in the Fourier-space is proposed and it will will be show how this compares to Gabor filters, which can be used to extract edges in an image. In comparison to Histogram of Oriented Gradients, the proposed feature descriptor performs better in this scenario. Compression using PCA turns out to be able to increase both the F1-score as well as decreasing the Variability.
Any feature matching algorithm needs to be robust, producing few false positives but also needs to be invariant to changes in rotation, illumination and scale. Several improvements are proposed to a previously published Phase Correlation based algorithm, which operates on local disc areas, using the Log Polar Transform to sample the disc neighborhood and the FFT to obtain the phase. It will be shown that the matching can be done in the frequency domain directly, using the Chi-squared distance, instead of computing the cross power spectrum. Moreover, it will be shown how combining these methods yields an algorithm that sorts out a majority of the false positives. The need for a peak to sub lobe ratio computation in order to cope with sub pixel accuracy will be discussed as well as how the FFT of the periodic component can enhance the matching. The result is a robust local feature matcher that is able to cope with rotational, illumination and scale differences to a certain degree.
In recent academic literature Sex and Gender have both become synonyms, even though distinct definitions doexist. This give rise to the question, which of those two are actually face image classifiers identifying? It will beargued and explained why CNN based classifiers will generally identify gender, while feeding face recognitionfeature vectors into a neural network, will tend to verify sex rather than gender. It is shown for the first time howstate of the art Sex Classification can be performed using Embedded Prototype Subspace Classifiers (EPSC) andalso how the projection depth can be learned efficiently. The automatic Gender classification, which is producedby the InsightFace project, is used as a baseline and compared to the results given by the EPSC, which takes thefeature vectors produced by InsightFace as input. It turns out that the depth of projection needed is much largerfor these face feature vectors than for an example classifying on MNIST or similar. Therefore, one importantcontribution is a simple method to determine the optimal depth for any kind of data. Furthermore, it is shown howthe weights in the final layer can be set in order to make the choice of depth stable and independent of the kind oflearning data. The resulting EPSC is extremely light weight and yet very accurate, reaching over 98% accuracy forseveral datasets.
Spline filters are usually implemented in two steps, where in the first step the basis coefficients are computed by deconvolving the sampled function with a factorized filter and the second step reconstructs the sampled function. It will be shown how separable spline filters using different splines can be constructed with fixed kernels, requiring no inverse filtering. Especially, it is discussed how first and second order derivatives can be computed correctly using cubic or trigonometric splines by a double filtering approach giving filters of length 7.
Spherical linear interpolation has got a number of important applications in computer graphics. We show how spherical interpolation can be performed efficiently even for the case when the angle vary quadratically over the interval. The computation will be fast since the implementation does not need to evaluate any trigonometric functions in the inner loop. Furthermore, no renormalization is necessary and therefore it is a true spherical interpolation. This type of interpolation, with non equal angle steps, should be useful for animation with accelerating or decelerating movements, or perhaps even in other types of applications.
This paper presents a framework for semi-automatic transcription of large-scale historical handwritten documents and proposes a simple user-friendly text extractor tool, TexT for transcription. The proposed approach provides a quick and easy transcription of text using computer assisted interactive technique. The algorithm finds multiple occurrences of the marked text on-the-fly using a word spotting system. TexT is also capable of performing on-the-fly annotation of handwritten text with automatic generation of ground truth labels, and dynamic adjustment and correction of user generated bounding box annotations with the word being perfectly encapsulated. The user can view the document and the found words in the original form or with background noise removed for easier visualization of transcription results. The effectiveness of TexT is demonstrated on an archival manuscript collection from well-known publicly available dataset.
The automatic recognition of historical handwritten documents is still considered a challenging task. For this reason, word spotting emerges as a good alternative for making the information contained in these documents available to the user. Word spotting is defined as the task of retrieving all instances of the query word in a document collection, becoming a useful tool for information retrieval. In this paper we propose a segmentation-free word spotting approach able to deal with large document collections. Our method is inspired on feature matching algorithms that have been applied to image matching and retrieval. Since handwritten words have different shape, there is no exact transformation to be obtained. However, the sufficient degree of relaxation is achieved by using a Fourier based descriptor and an alternative approach to RANSAC called PUMA. The proposed approach is evaluated on historical marriage records, achieving promising results.
Swedish eScience Education (SeSE) is a national graduate school in eScience in Sweden. It comes from the collaboration between two major research initiatives in eScience and the school has turned out to be very successful. It has made it possible for students at different universities to get access to education that is not normally available at their home universities. With SeSE they get access to education by the top experts within their respective field. We argue why such graduate school is important and how it is different from training offered by many HPC centres in Europe. Furthermore, examples of courses and their structure is discussed as well as lessons learned from SeSE and its two predecessors in Sweden.
Descriptors such as SURF and SIFT contain a framework for handling rotation and scale invariance, which generally is not needed when registration and stitching of images in microscopy is the focus. Instead speed and efficiency are more important factors. We propose a descriptor that performs very well for these criteria, which is based on the idea of radial line integration. The result is a descriptor that outperforms both SURF and SIFT when it comes to speed and the number of inliers, even for rather short descriptors.
Deep learning approaches suffer from the so called interpretability problem and can therefore be very hard to visualise. Embedded Prototype Subspace Classifiers is one attempt in the field of explainable AI, which is both fast and efficient since it does not require repeated learning epochs and has no hidden layers. In this paper we investigate how ensembles and cascades of ensembles perform on some popular datasets. The focus is on handwritten data such as digits, letters and signs. It is shown how cascading can be efficiently implemented in order to both increase accuracy as well as speed up the classification.
Handwritten text recognition is a daunting task, due to complex characteristics of handwritten letters. Deep learning based methods have achieved significant advances in recognizing challenging handwritten texts because of its ability to learn and accurately classify intricate patterns. However, there are some limitations of deep learning, such as lack of well-defined mathematical model, black-box learning mechanism, etc., which pose challenges. This paper aims at going beyond the blackbox learning and proposes a novel learning framework called as Embedded Prototype Subspace Classification, that is based on the well-known subspace method, to recognise handwritten letters in a fast and efficient manner. The effectiveness of the proposed framework is empirically evaluated on popular datasets using standard evaluation measures.
One disadvantage with RANSAC is that it is based on randomness and will therefore often yield a different set of inliers in each run, especially if the dataset contains a large number of outliers. A repeatable algorithm for finding both matches and the homography is proposed, which in our case is used for image stitching and the obtained points are also used for georeferencing. This algorithm will yield the same set of matches every time and is therefore a useful tool when trying to evaluate other algorithms involved and their parameters. Moreover a refining step is proposed that finds the best matches depending on what geometric transformation is used, which also can be utilized as a refining step for RANSAC.
Rotation invariance is an important property for any feature matching method and it has been implemented in different ways for different methods. The Log Polar Transform has primarily been used for image registration where it is applied after phase correlation, which in its turn is applied on the whole images or in the case of template matching, applied on major parts of them followed by an exhaustive search. We investigate how this transform can be used on local neighborhoods of features and how phase correlation as well as normalized cross correlation can be applied on the result. Thus, the order is reversed and we argue why it is important to do so. We demonstrate a common problem with the log polar transform and that many implementations of it are not suitable for local feature detectors. We propose an implementation of it based on Gaussian filtering. We also show that phase correlation generally will perform better than normalized cross correlation. Both handles illumination differences well, but changes in scale is handled better by the phase correlation approach.
We demonstrate with several examples how historical aerial photos can benefit from being viewed in stereo and how this can be useful as tool in digital heritage research. The main reason why stereo images are important is that they give a much better understanding of what is actually in the scene than single photos can. The important factor is the depth cue that helps understanding the content and adds the ability to distinguish between objects such as houses and trees and the ground as well as estimating heights of objects. There are however still challenges but also possibilities that will be discussed.
Transcription of large-scale historical handwritten document images is a tedious task. Machine learning techniques, such as deep learning, are popularly used for quick transcription, but often require a substantial amount of pre-transcribed word examples for training. Instead of line-by-line word transcription, this paper proposes a simple training-free gamification strategy where all occurrences of each arbitrarily selected word is transcribed once, using an intelligent user interface implemented in this work. The proposed approach offers a fast and user-friendly semi-automatic transcription that allows multiple users to work on the same document collection simultaneously.
Automatic recognition of historical handwritten manuscripts is a daunting task due to paper degradation over time. Recognition-free retrieval or word spotting is popularly used for information retrieval and digitization of the historical handwritten documents. However, the performance of word spotting algorithms depends heavily on feature detection and representation methods. Although there exist popular feature descriptors such as Scale Invariant Feature Transform (SIFT) and Speeded Up Robust Features (SURF), the invariant properties of these descriptors amplify the noise in the degraded document images, rendering them more sensitive to noise and complex characteristics of historical manuscripts. Therefore, an efficient and relaxed feature descriptor is required as handwritten words across different documents are indeed similar, but not identical. This paper introduces a Radial Line Fourier (RLF) descriptor for handwritten word representation, with a short feature vector of 32 dimensions. A segmentation-free and training-free handwritten word spotting method is studied herein that relies on the proposed RLF descriptor, takes into account different keypoint representations and uses a simple preconditioner-based feature matching algorithm. The effectiveness of the RLF descriptor for segmentation-free handwritten word spotting is empirically evaluated on well-known historical handwritten datasets using standard evaluation measures.
This paper presents an approach towards word recognition based on embedded prototype subspace classification. The purpose of this paper is three-fold. Firstly, a new dataset for word recognition is presented, which is extracted from the Esposalles database consisting of the Barcelona cathedral marriage records. Secondly, different clustering techniques are evaluated for Embedded Prototype Subspace Classifiers. The dataset, containing 30 different classes of words is heavily imbalanced, and some word classes are very similar, which renders the classification task rather challenging. For ease of use, no stratified sampling is done in advance, and the impact of different data splits is evaluated for different clustering techniques. It will be demonstrated that the original clustering technique based on scaling the bandwidth has to be adjusted for this new dataset. Thirdly, an algorithm is therefore proposed that finds k clusters, striving to obtain a certain amount of feature points in each cluster, rather than finding some clusters based on scaling the Silverman’s rule of thumb. Furthermore, Self Organising Maps are also evaluated as both a clustering and embedding technique.