uu.seUppsala University Publications
Change search
ReferencesLink to record
Permanent link

Direct link
HistSearch: Implementation and Evaluation of aWeb-based Tool for Automatic Information Extraction from Historical Text
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Languages, Department of Linguistics and Philology.
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Arts, Department of History. (Gender and Work)ORCID iD: 0000-0002-5245-937X,
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Arts, Department of History. (Gender and Work)
Uppsala University, Disciplinary Domain of Humanities and Social Sciences, Faculty of Arts, Department of History. (Gender and Work)
2016 (English)In: Proceedings of the 3rd HistoInformaticsWorkshop, Krakow, Poland, 11 July 2016 / [ed] M. Düring, A. Jatowt, J. Preiser-Kapeller, A. van den Bosch, 2016Conference paper (Refereed)
Abstract [en]

Due to a lack of NLP tools adapted to the task of analysing historical text, historiansand other researchers in humanities often need to manually search through largevolumes of text in order to find certain pieces of information of interest to theirresearch. In this paper, we present a web-based tool for automatic informationextraction from historical text, with the aim of facilitating this time-consuming process.We describe 1) the underlying architecture of the system, based on spellingnormalisation succeeded by tagging and parsing using tools available for the modernlanguage, 2) a prototypical graphical user interface used by the historians, and 3) athorough manual evaluation of the tool performed by the actual users, i.e. the historians,when applied to the specific task of extracting and presenting verb phrases describingwork in Early Modern Swedish text. The main contribution is the manual evaluation,which takes both quantitative and qualitative aspects into account, and is compared toautomatic evaluation results. We show that spelling normalisation is successful for thetask of tagging and lemmatisation, meaning that the words analysed as verbs by the toolare mostly considered as verbs by the historians as well. We also point out the furtherwork needed for improving parsing and ranking performance, in order to make the toolreally useful in the extraction process.

Place, publisher, year, edition, pages
2016.
National Category
History
Research subject
Computational Linguistics; History
Identifiers
URN: urn:nbn:se:uu:diva-305665OAI: oai:DiVA.org:uu-305665DiVA: diva2:1038910
Conference
3rd HistoInformaticsWorkshop, Krakow, Poland, 11 July 2016,
Projects
Gender and Work
Available from: 2016-10-20 Created: 2016-10-20 Last updated: 2016-10-20

Open Access in DiVA

No full text

Other links

http://ceur-ws.org/Vol-1632/paper_4.pdf

Search in DiVA

By author/editor
Pettersson, EvaLindström, JonasJacobsson, BennyFiebranz, Rosemarie
By organisation
Department of Linguistics and PhilologyDepartment of History
History

Search outside of DiVA

GoogleGoogle Scholar

Total: 115 hits
ReferencesLink to record
Permanent link

Direct link