Aleksi Vesanto
Turku NLP Group, Department of FT, University of Turku, Finland
Asko Nivala
Cultural History and Turku Institute for Advanced Studies, University of Turku, Finland
Heli Rantala
Cultural History, University of Turku, Finland
Tapio Salakoski
Turku NLP Group, Department of FT, University of Turku, Finland
Hannu Salmi
Cultural History, University of Turku, Finland
Filip Ginter
Turku NLP Group, Department of FT, University of Turku, Finland
Ladda ner artikelIngår i: Proceedings of the NoDaLiDa 2017 Workshop on Processing Historical Language
Linköping Electronic Conference Proceedings 133:10, s. 54-58
NEALT Proceedings Series 32:10, p. 54-58
Publicerad: 2017-05-10
ISBN: 978-91-7685-503-4
ISSN: 1650-3686 (tryckt), 1650-3740 (online)
We present the results of text reuse detection, based on the corpus of scanned and OCR-recognized Finnish newspapers and journals from 1771 to 1910. Our study draws on BLAST, a software created for comparing and aligning biological sequences. We show different types of text reuse in this corpus, and also present a comparison to the software Passim, developed at the Northeastern University in Boston, for text reuse detection.
Stephen F. Altschul, Warren Gish, Webb Miller, Eugene W. Myers, and David J. Lipman. 1990. Basic local alignment search tool. Journal of Molecular Biology, 215(3):403–410, Oct.
Ryan Cordell. 2015. Reprinting, Circulation, and the Network Author in Antebellum Newspapers. American Literary History, 27(3):417–445.
Kimmo Kettunen. 2016. Keep, change or delete? Setting up a low resource ocr post-correction framework for a digitized old finnish newspaper collection. In D. Calvanese, D. De Nart, and C. Tasso, editors, Digital Libraries on the Move. IRCDL 2015.
Communications in Computer and Information Science, volume 612. Springer, Cham.
Tuula Pääkkönen, Jukka Kervinen, Asko Nivala, Kimmo Kettunen, and Eetu Mäkelä. 2016. Exporting Finnish Digitized Historical Newspaper Contents for Offline Use. D-Lib Magazine, 22(7).
David A. Smith, Ryan Cordell, Elizabeth Maddock Dillon, Nick Stramp, and John Wilkerson. 2014. Detecting and modeling local text reuse. In Proceedings of the 14th ACM/IEEE-CS Joint Conference on Digital Libraries, JCDL ’14, pages 183–192, Piscataway, NJ, USA. IEEE Press.
David A. Smith, Ryan Cordell, and Abby Mullen. 2015. Computational Methods for Uncovering Reprinted Texts in Antebellum Newspapers. American Literary History, 27(3):E1–E15.
Aleksi Vesanto, Asko Nivala, Tapio Salakoski, Hannu Salmi, and Ginter Filip. 2017. A system for identifying and exploring text repetition in large historical document corpora. In Proceedings of NoDaLiDa 2017.