Svanhvít Ingólfsdóttir
Department of Computer Science, Reykjavik University, Iceland
Sigurjó Þorsteinsson
Department of Computer Science, Reykjavik University, Iceland
Hrafn Loftsson
Department of Computer Science, Reykjavik University, Iceland
Ladda ner artikelIngår i: Proceedings of the 22nd Nordic Conference on Computational Linguistics (NoDaLiDa), September 30 - October 2, Turku, Finland
Linköping Electronic Conference Proceedings 167:42, s. 363--369
NEALT Proceedings Series 42:42, p. 363--369
Publicerad: 2019-10-02
ISBN: 978-91-7929-995-8
ISSN: 1650-3686 (tryckt), 1650-3740 (online)
We report on work in progress which consists of annotating an Icelandic corpus for named entities (NEs) and using it for training a named entity recognizer based on a Bidirectional Long Short-Term Memory model. Currently, we have annotated 7,538 NEs appearing in the first 200,000 tokens of a 1 million token corpus, MIMGOLD, originally developed for serving as a gold standard for part-of-speech tagging. Our best performing model, trained on this subset of MIM-GOLD, and enriched with external word embeddings, obtains an overall F1 score of 81.3% when categorizing NEs into the following four categories: persons, locations, organizations and miscellaneous. Our preliminary results are promising, especially given the fact that 80% of MIM-GOLD has not yet been used for training.
Inga referenser tillgängliga