Conference article

Quote Extraction and Attribution from Norwegian Newspapers

Andrew Salway
Language and Language Technology Group, Uni Research, Bergen, Norway

Paul Meurer
Language and Language Technology Group, Uni Research, Bergen, Norway

Knut Hofland
Language and Language Technology Group, Uni Research, Bergen, Norway

Øystein Reigem
Language and Language Technology Group, Uni Research, Bergen, Norway

Download article

Published in: Proceedings of the 21st Nordic Conference on Computational Linguistics, NoDaLiDa, 22-24 May 2017, Gothenburg, Sweden

Linköping Electronic Conference Proceedings 131:41, p. 293-297

NEALT Proceedings Series 29:41, p. 293-297

Show more +

Published: 2017-05-08

ISBN: 978-91-7685-601-7

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

We present ongoing work that, for the first time, seeks to extract and attribute politicians’ quotations from Norwegian Bokmål newspapers. Our method – using a statistical dependency parser, a few regular expressions and a look-up table – gives modest recall (a best of .570) but very high precision (.978) and attribution accuracy (.987) for a restricted set of speaker names. We suggest that this is already sufficient to support some kinds of important social science research, but also identify ways in which performance could be improved.

Keywords

No keywords available

References

Gisle Andersen and Knut Hofland. 2012. Building a large corpus based on newspapers from the web. In: Gisle Andersen (ed.), Exploring Newspaper Language: Using the web to create and investigate a large corpus of modern Norwegian: 1-30. John Benjamins.

Danqi Chen and Christopher Manning. 2014. A Fast and Accurate Dependency Parser Using Neural Networks. Procs. 2014 Conference on Empirical Methods in Natural Language Processing: 740-750.

Helge Dyvik, Paul Meurer, Victoria Rosén, Koenraad De Smedt, Petter Haugereid, Gyri Smørdal Losnegaard, Gunn Inger Lyse, and Martha Thunes. 2016. NorGramBank: A ‘Deep’ Treebank for Norwegian. Procs. 10th International Conference on Language Resources and Evaluation, LREC 2016: 3555-3562.

Justin Grimmer and Brandon M. Stewart. 2013. Text as Data: The Promise and Pitfalls of Automatic Content Analysis Methods for Political Texts. Political Analysis, 21(3): 267-297.

Ralf Krestel, Sabine Bergler, and René Witte. 2008. Minding the Source: Automatic Tagging of Reported Speech in Newspaper Articles. Procs. 6th International Language Resources and Evaluation Conference, LREC 2008.

Tim O’Keefe, Silvia Pareti, James R. Curran, Irena Koprinska, and Matthew Honnibal. 2012. A Sequence Labelling Approach to Quote Attribution. Procs. 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning: 790-799.

Silvia Pareti, Tim O’Keefe, Ioannis Konstas, James R. Curran, and Irena Koprinska. 2013. Automatically Detecting and Attributing Indirect Quotations. Procs. 2013 Conference on Empirical Methods in Natural Language Processing: 989-999.

Citations in Crossref