Published: 2017-05-23
ISBN: 978-91-7685-499-0
ISSN: 1650-3686 (print), 1650-3740 (online)
This paper offers a use case of the CLARIN research infrastructure (Hinrichs and Krauwer, 2014) from the fields of historical linguistics and the history of linguistics. Using large electronically available corpora of historical English and German, it investigates differences in terminology used in the two languages when referring to the people and the language of Albania. The search and data exploration tools that are available for the DTA and the DWDS corpora as part of the CLARIN-D infrastructure (Hinrichs and Trippel, in press) make it possible to determine semantic change for the terminology under consideration. The paper concludes with a discussion of broader implication of the present use case for the use of historical corpora and the functionality of query tools needed for digital humanities research.
Historical linguistics, history of linguistics, semantic variation and change, CLARIN-D tools for historical corpora, historical corpus analysis
[Bloomfield1914] Leonard Bloomfield. 1914. Introduction to the Study of Language. Henry-Holt, New York.
[Bloomfield1933] Leonard Bloomfield. 1933. Language. Henry-Holt, New York.
[Davis2012] Mark Davies. 2012. Expanding Horizons in Historical Linguistics with the 400 Million Word Corpus of Historical American English. Corpora, 7:121t157.
[Erdmann et al.2016] Alexander Erdmann, Christopher Brown, Brian Joseph, Mark Janse, Petra Ajaka, Micha Elsner and Marie-Catherine de Marneffe. 2016. Challenges and Solutions for Latin Named Entity Recognition. Proceedings of the Language Technologies for the Digital Humanities Workshop in conjunction with the 26th International Conference on Computational Linguistics (COLING-2016), December 2016.
[Geyken2007] Alexander Geyken 2007. The DWDS Corpus: A Reference Corpus for the German Language of the 20th Century. C. Fellbaum ed. Collocations and Idioms: Linguistic, lexicographic, and computational aspects. Bloomsbury Academic, London. p. 23-41.
[Geyken et al.2011] Alexander Geyken, Susanne Haaf, Bryan Jurish, Matthias Schulz, Jakob Steinmann, Christian Thomas und Frank Wiegand. 2011. Das Deutsche Textarchiv: Vom historischen Korpus zum aktiven Archiv. S. Schomburg et al. eds. Digitale Wissenschaft. Stand und Entwicklung digital vernetzter Forschung in Deutschland. pp. 157-161.
[Haaf et al.2013] Susanne Haaf, Frank Wiegand, and Alexander Geyken 2013. Measuring the Correctness of Double-Keying: Error Classification and Quality Control in a Large Corpus of TEI-Annotated Historical Text. Journal of the Text Encoding Initiative (jTEI) 4.
[Hinrichs and Krauwer 2014] Erhard Hinrichs and Steven Krauwer 2014. The CLARIN Research Infrastructure: Resources and Tools for E-Humanities Scholars. Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014), May 2014, pp. 1525t31.
[Hinrichs and Trippel in press] Erhard Hinrichs and Thorsten Trippel in press. CLARIN-D: eine Forschungsinfrastruktur für die sprachbasierte Forschung in den Geistes- und Sozialwissenschaften. Bibliothek Forschung und Praxis; Vol. 41.1 (April 2017).
[Hock and Joseph1996] Hans Henrich Hock and Brian Joseph. 1996. Language History, Language Change, and Language Relationship. An Introduction to Historical and Comparative Linguistics. Mouton de Gruyter (2nd edn., 2009), Berlin.
[Jurish et al.2014] Bryan Jurish, Christian Thomas, and Frank Wiegand. 2014. Querying the Deutsches Textarchiv. In: U. Kruschwitz, F. Hopfgartner, and C. Gurrin eds.: Proceedings of the Workshop MindTheGap 2014: Beyond Single-Shot Text Queries: Bridging the Gap(s) between Research Communities (co-located with iConference 2014, Berlin, 4. März, 2014), p. 25t30.
[Jurish2015] Bryan Jurish. 2015. DiaCollo: On the Trail of Diachronic Collocations. K. De Smedt ed. Proceedings of the CLARIN Annual Conference 2015. Wroclaw, Poland, 15th-17th October, pp. 28-31.
[Libelt1828] Karol Libelt. 1828. Wyklady Humboldta na uniwersytecie Berlinskim: notaty prelekcyj tych po uczniu Jego Karolu Libelcie [= Nachschrift der ’Kosmos-Vorträge’ Alexander von Humboldts in der Berliner Universität, 3.11.1827t26.4.1828].
[Lieberman et al.2007] Erez Lieberman, Jean-Baptiste Michel, Joe Jackson, Tina Tang, and Martin Nowak Quantifying the Evolutionary Dynamics of Language. Nature 449 (2007).
[Michel et al.2012] Jean-Baptiste Michel, Yuan Kui Shen, Aviva P. Aiden, Adrian Veres, Matthew K. Gray, The Google Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak and Erez Lieberman Aiden 2012. Quantitative Analysis of Culture Using Millions of Digitized
Books. Science, DOI: 10.1126/science.1199644.
[Mommsen1854] Theodor Mommsen. 1854. Römische Geschichte. Bd. 1: Bis zur Schlacht von Pydna. Leipzig, Germany.
[Rychlý2008] Pavel Rychlý. 2008. A Lexicographer-friendly Association Score. Proceedings of the Second Workshop on Recent Advances in Slavonic Natural Language Processing RASLAN 2008, pp. 6t9.
[Schiller et al.1995] Anne Schiller, Simone Teufel, and Christine Thielen. 1995. Vorläufige Guidelines für das Tagging deutscher Textcorpora mit STTS. Technical Report. Universität Stuttgart, Institut für maschinelle Sprachverarbeitung, and Seminar für Sprachwissenschaft, Universität Tübingen. [Zhang2015] Sarah Zhang. 2015. The Pitfalls of using Google Ngram to study Language. Science 10.12.15.