Gerold Schneider
Institute of Computational Linguistics and Department of English, University of Zurich, Switzerland
Eva Pettersson
Department of Linguistics and Philology, Uppsala University, Sweden
Michael Percillier
Department of English, University of Mannheim, Germany
Download article
Published in: Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language
Linköping Electronic Conference Proceedings 133:8, p. 40-46
NEALT Proceedings Series 32:8, p. 40-46
Published: 2017-05-10
ISBN: 978-91-7685-503-4
ISSN: 1650-3686 (print), 1650-3740 (online)
To be able to use existing natural language processing tools for analysing historical text, an important preprocessing step is spelling normalisation, converting the original spelling to present-day spelling, before applying tools such as taggers and parsers. In this paper, we compare a probablistic, language-independent approach to spelling normalisation based on statistical machine translation (SMT) techniques, to a rule-based system combining dictionary lookup with rules and non-probabilistic weights. The rule-based system reaches the best accuracy, up to 94% precision at 74% recall, while the SMT system improves each tested period.
