Konferensartikel

Nefnir: A high accuracy lemmatizer for Icelandic

Svanhvít Ingólfsdóttir
Department of Computer Science, Reykjavik University, Iceland

Hrafn Loftsson
Department of Computer Science, Reykjavik University, Iceland

Jón Daðason
The Árni Magnússon Institute for Icelandic Studies, University of Iceland, Iceland

Kristín Bjarnadóttir
The Árni Magnússon Institute for Icelandic Studies, University of Iceland, Iceland

Ladda ner artikel

Ingår i: Proceedings of the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa), September 30 - October 2, Turku, Finland

Linköping Electronic Conference Proceedings 167:33, s. 310--315

NEALT Proceedings Series 42:33, p. 310--315

Visa mer +

Publicerad: 2019-10-02

ISBN: 978-91-7929-995-8

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

Lemmatization, finding the basic morphological form of a word in a corpus, is an important step in many natural language processing tasks when working with morphologically rich languages. We describe and evaluate Nefnir, a new open source lemmatizer for Icelandic. Nefnir uses suffix substitution rules, derived from a large morphological database, to lemmatize tagged text. Evaluation shows that for correctly tagged text, Nefnir obtains an accuracy of 99.55%, and for text tagged with a PoS tagger, the accuracy obtained is 96.88%.

Nyckelord

lemmatization morphologically rich languages morphological database

Referenser

Inga referenser tillgängliga

Citeringar i Crossref