Publicerad: 2015-05-06
ISBN: 978-91-7519-098-3
ISSN: 1650-3686 (tryckt), 1650-3740 (online)
In this paper we explore the idea of using verb valency information to improve verb phrase extraction from historical text. As a case study, we perform experiments on Early Modern Swedish data, but the approach could easily be transferred to other languages and/or time periods as well. We show that by using verb valency information in a post-processing step to the verb phrase extraction system, it is possible to remove improbable complements extracted by the parser and insert probable complements not extracted by the parser, leading to an increase in both precision and recall for the extracted complements.
Maria Ågren, Rosemarie Fiebranz, Erik Lindberg, and Jonas Lindström. 2011. Making verbs count. The research project ’Gender and Work’ and its methodology. Scandinavian Economic History Review, 59(3):271–291. Forthcoming.
Alistair Baron and Paul Rayson. 2008. Vard2: A tool for dealing with spelling variation in historical corpora. In Postgraduate Conference in Corpus Linguistics, Aston University, Birmingham.
Lars Borin, Markus Forsberg, and Lennart Lönngren. 2008. Saldo 1.0 (svenskt associationslexikon version 2). Språkbanken, University of Gothenburg.
Eva Ejerhed and Gunnel K¨allgren. 1997. Stockholm Ume°a Corpus. Version 1.0. Produced by Department of Linguistics, Ume°a University and Department of Linguistics, Stockholm University. ISBN 91-7191-348-3.
Péter Halácsy, András Kornai, and Csaba Oravecz. 2007. HunPos - an open source trigram tagger. In Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics, pages 209–212, Prague, Czech Republic.
Miloš Jakubicek and Vojtech Kovár. 2013. Enhancing czech parsing with verb valency frames. In CICLing 2013, pages 282–293, Greece. Springer Verlag.
Joakim Nivre, Johan Hall, and Jens Nilsson. 2006a. MaltParser: A data-driven parser-generator for dependency parsing. In Proceedings of the 5th international conference on Language Resources and Evaluation (LREC), pages 2216–2219, Genoa, Italy, May.
Joakim Nivre, Jens Nilsson, and Johan Hall. 2006b. Talbanken05: A Swedish treebank with phrase structure and dependency annotation. In Proceedings of the 5th international conference on Language Resources and Evaluation (LREC), pages 24–26, Genoa, Italy, May.
Lilja Øvrelid and Joakim Nivre. 2007. When word order and part-of-speech tags are not enough – Swedish dependency parsing with rich linguistic features. In Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP), pages 447–451.
Eva Pettersson, Be´ata Megyesi, and Joakim Nivre. 2012. Parsing the Past - Identification of Verb Constructions in Historical Text. In Proceedings of the 6th EACL Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pages 65–74, Avignon, France, April. Association for Computational Linguistics.
Eva Pettersson, Beáta Megyesi, and Joakim Nivre. 2013. An SMT approach to automatic annotation of historical text. In Proceedings of the Workshop on Computational Historical Linguistics at NODALIDA. NEALT Proceedings Series 18; Linköping Electronic Conference Proceedings., volume 87, pages 54–69.
Cristina Sánchez-Marco, Gemma Boleda, and Lluiis Padró. 2011. Extending the tool, or how to annotate historical language varieties. In Proceedings of the 5th ACL-HLT Workshop on Language Technology for Cultural Heritage, Social Sciences, and Humanities, pages 1–9, Portland, OR, USA, June. Association for Computational Linguistics.
Gerold Schneider. 2012. Adapting a parser to Historical English. In Outpost of Historical Corpus Linguistics: From the Helsinki Corpus to a Proliferation of Resources.