HistSearch: Implementation and Evaluation of aWeb-based Tool for Automatic Information Extraction from Historical Text
2016 (English)In: Proceedings of the 3rd HistoInformaticsWorkshop, Krakow, Poland, 11 July 2016 / [ed] M. Düring, A. Jatowt, J. Preiser-Kapeller, A. van den Bosch, 2016Conference paper (Refereed)
Due to a lack of NLP tools adapted to the task of analysing historical text, historiansand other researchers in humanities often need to manually search through largevolumes of text in order to find certain pieces of information of interest to theirresearch. In this paper, we present a web-based tool for automatic informationextraction from historical text, with the aim of facilitating this time-consuming process.We describe 1) the underlying architecture of the system, based on spellingnormalisation succeeded by tagging and parsing using tools available for the modernlanguage, 2) a prototypical graphical user interface used by the historians, and 3) athorough manual evaluation of the tool performed by the actual users, i.e. the historians,when applied to the specific task of extracting and presenting verb phrases describingwork in Early Modern Swedish text. The main contribution is the manual evaluation,which takes both quantitative and qualitative aspects into account, and is compared toautomatic evaluation results. We show that spelling normalisation is successful for thetask of tagging and lemmatisation, meaning that the words analysed as verbs by the toolare mostly considered as verbs by the historians as well. We also point out the furtherwork needed for improving parsing and ranking performance, in order to make the toolreally useful in the extraction process.
Place, publisher, year, edition, pages
Research subject Computational Linguistics; History
IdentifiersURN: urn:nbn:se:uu:diva-305665OAI: oai:DiVA.org:uu-305665DiVA: diva2:1038910
3rd HistoInformaticsWorkshop, Krakow, Poland, 11 July 2016,
ProjectsGender and Work