Ryan Johnson
University of Tromsø, Norway
Lene Antonsen
University of Tromsø, Norway
Trond Trosterud
University of Tromsø, Norway
Ladda ner artikelIngår i: Proceedings of the 19th Nordic Conference of Computational Linguistics (NODALIDA 2013); May 22-24; 2013; Oslo University; Norway. NEALT Proceedings Series 16
Linköping Electronic Conference Proceedings 85:10, s. 59-71
NEALT Proceedings Series 16:10, p. 59-71
Publicerad: 2013-05-17
ISBN: 978-91-7519-589-6
ISSN: 1650-3686 (tryckt), 1650-3740 (online)
This article presents a novel way of combining finite-state transducers (FSTs) with electronic dictionaries; thereby creating efficient reading comprehension dictionaries. We compare a North Saami - Norwegian and a South Saami - Norwegian dictionary; both enriched with an FST; with existing; available dictionaries containing pre-generated paradigms; and show the advantages of our approach. Being more flexible; the FSTs may also adjust the dictionary to different contexts. The finite state transducer analyses the word to be looked up; and the dictionary itself conducts the actual lookup. The FST part is crucial for morphology-rich languages; where as little as 10% of the wordforms in running text actually consists of lemma forms. If a compound or derived word; or a word with an enclitic particle is not found in the dictionary; the FST will give the stems and derivation affixes of the wordform; and each of the stems will be given a separate translation. In this way; the coverage of the FST-dictionary will be far larger than an ordinary dictionary of the same size.
Lexicography; Computational Morphology; Orthographic Variation; Finite-state Transducers; Electronic Dictionaries
Antonsen; L. (2013). ?Cállinmeattáhusaid guorran. [English summary: Tracking misspellings.]. University of Tromsø.
Antonsen; L. and Trosterud; T. (2010). Manne dihtor galgá máhttit grammatihka? [English summary: Why the computer should know its Sami grammar.]. Sámi die¯dalaš áige?cála; 1:3–28.
Antonsen; L.; Trosterud; T.; Gerstenberger; C.-V.; and Moshagen; S. N. (2009). Ei intelligent ordbok for samisk. LexicoNordica; 16:271–283.
Beesley; K. R. and Karttunen; L. (2003). Finite State Morphology. CSLI publications in Computational Linguistics; USA.
Facebook-group (2012). Discussions in NSR – a Norwegian Saami Organisation’s facebook group. https://www.facebook.com/groups/norskesamersriksforbund/?fref= ts. [last visited on 25/01/2013].
Koskenniemi; K. (1983). Two-level morphology : a general computational model for word-form recognition and production. Helsingin yliopisto; Helsinki.
Larsson; L.-G. (1997). Prästen och ordet. Ur den samiska lexikografins historia. LexicoNordica; 4:101–117.
Lindén; K.; Silfverberg; M.; and Pirinen; T. (2009). HFST tools for morphology – An Efficient Open-Source Package for Construction of Morphological Analyzers. In Proceedings of the Workshop on Systems and Frameworks for Computational Morphology; Zürich; Switzerland.
Magga; O. H. (2012). Lexicography and indigenous languages. In Fjeld; R. V. and Torjusen; J. M.; editors; Proceedings of the 15th EURALEX International Congress; pages 3–18; Oslo; Norway. Department of Linguistics and Scandinavian Studies; University of Oslo.
Maxwell; M. and Poser; W. (2004). Morphological interfaces to dictionaries. In Zock; M.; editor; COLING 2004 Enhancing and using electronic dictionaries; pages 65–68; Geneva; Switzerland. COLING.
Moshagen; S.; Sammallahti; P.; and Trosterud; T. (2004). Twol at work. In Arppe; A.; Carlson; L.; Lindén; K.; Piitulainen; J.; Suominen; M.; Vainio; M.; Westerlund; H.; and Yli-Jyrä; A.; editors; Inquiries into Words; Constraints and Contexts; pages 94–105; Stanford; CA. CSLI.
Trosterud; T. (2000). Kåven; Brita E. (red) 2000: Stor norsk-samisk ordbok [book review]. LexicoNordica; 8:283–306.
Trosterud; T. and Eskonsipo; B. N. (2012). A North Sami translator’s mailing list seen as a key to minority language lexicography. In Fjeld; R. V. and Torjusen; J. M.; editors; Proceedings of the 15th EURALEX International Congress; pages 250–256; Oslo; Norway. Department of Linguistics and Scandinavian Studies; University of Oslo.