Conference article

Converting the TüBa-D/Z Treebank of German to Universal Dependencies

Çagri Çöltekin
Department of Linguistics, University of Tübingen, Germany

Ben Campbell
Department of Linguistics, University of Tübingen, Germany

Erhard Hinrichs
Department of Linguistics, University of Tübingen, Germany

Heike Telljohann
Department of Linguistics, University of Tübingen, Germany

Download article

Published in: Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies, 22 May, Gothenburg Sweden

Linköping Electronic Conference Proceedings 135:4, p. 27-37

NEALT Proceedings Series 31:4, p. 27-37

Show more +

Published: 2017-05-29

ISBN: 978-91-7685-501-0

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

This paper describes the conversion of TüBa-D/Z, one of the major German constituency treebanks, to Universal Dependencies. Besides the automatic conversion process, we describe manual annotation of a small part of the treebank based on the UD annotation scheme for the purposes of evaluating the automatic conversion. The automatic conversion shows fairly high agreement with the manual annotations.

Keywords

No keywords available

References

Bernd Bohnet. 2003. Mapping phrase structures to dependency structures in the case of (partially) free word order languages. In Proceedings of the first international conference on Meaning-Text Theory, pages 217–216.

Sabine Brants, Stefanie Dipper, Silvia Hansen, Wolfgang Lezius, and George Smith. 2002. The tiger treebank. In Proceedings of The First Workshop on Treebanks and Linguistic Theories (TLT2002), pages 24–41.

Michael Collins. 1999. Head-Driven Statistical Models for Natural Language Parsing. Ph.D. thesis, University of Pennsylvania.

Michael Daum, Kilian A Foth, and Wolfgang Menzel. 2004. Automatic transformation of phrase treebanks to dependency trees. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC’02), Lisbon, Portugal.

Daniël de Kok and Erhard Hinrichs. 2016. Transitionbased dependency parsing with topological fields. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 1–7, Berlin, Germany, August. Association for Computational Linguistics.

Marie-Catherine de Marneffe and Christopher D. Manning. 2008. The Stanford typed dependencies representation. In Coling 2008: Proceedings of the workshop on Cross-Framework and Cross-Domain Parser Evaluation, pages 1–8, Manchester, UK.

Richard Eckart de Castilho, Chris Biemann, Iryna Gurevych, and Seid Muhie Yimam. 2014. Webanno: a flexible, web-based annotation tool for clarin. In Proceedings of the CLARIN Annual Conference (CAC) 2014, page online, Utrecht, Netherlands. CLARIN ERIC. Extended abstract.

Kilian A. Foth, Arne K¨ohn, Niels Beuck, and Wolfgang Menzel. 2014. Because size does matter: The hamburg dependency treebank. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), Reykjavik, Iceland, may. European Language Resources Association (ELRA).

Kilian A. Foth. 2006. Eine umfassende constraintdependenz-grammatik des deutschen. Technical Report 54.75, University of Hamburg.

Jan Hajic, Massimiliano Ciaramita, Richard Johansson, Daisuke Kawahara, Maria Antònia Martí Lluís M`arquez, Adam Meyers, Joakim Nivre, Sebastian adó, Jan Štepánek, Pavel Straná, Mihai Surdeanu, Nianwen Xue, and Yi Zhang. 2009. The conll-2009 shared task: Syntactic and semantic dependencies in multiple languages. In Proceedings of the Thirteenth Conference on Computational Natural Language Learning (CoNLL 2009): Shared Task, pages 1–18, Boulder, Colorado. Association for Computational Linguistics.

Verena Henrich and Erhard Hinrichs. 2014. Consistency of manual sense annotation and integration into the tüba-d/z treebank. In Proceedings of The 13th Workshop on Treebanks and Linguistic Theories (TLT13), pages 62–74.

Erhard Hinrichs, Sandra Kübler, Karin Naumann, Heike Telljohann, and Julia Trushkina. 2004. Recent developments in linguistic annotations of the TüBa-D/Z treebank. In Proceedings of the Third Workshop on Treebanks and Linguistic Theories, pages 51–62.

Tilman Höhle. 1986. Der begriff “mittelfeld”, anmerkungen ¨uber die theorie der topologischen felder. pages 329–340.

Sandra Kübler. 2008. The page 2008 shared task on parsing german. In Proceedings of the Workshop on Parsing German, PaGe ’08, pages 55–63, Stroudsburg, PA, USA. Association for Computational Linguistics.

Sandra Kübler, Wolfgang Maier, Ines Rehbein, and Yannick Versley. 2008. How to compare treebanks. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), pages 2322–2329, Marrakech, Morocco.

Jianqiang Ma, Verena Henrich, and Erhard Hinrichs. 2016. Letter sequence labeling for compound splitting. In Proceedings of the 14th SIGMORPHON Workshop on Computational Research in Phonetics, Phonology, and Morphology, pages 76–81.

Ryan McDonald, Joakim Nivre, Yvonne Quirmbach- Brundage, Yoav Goldberg, Dipanjan Das, Kuzman Ganchev, Keith Hall, Slav Petrov, Hao Zhang, Oscar T¨ackstr¨om, Claudia Bedini, Núria Bertomeu Castelló, and Jungmee Lee. 2013. Universal dependency annotation for multilingual parsing. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pages 92–97, Sofia, Bulgaria, August. Association for Computational Linguistics.

Karin Naumann. 2007. Manual for the annotation of in-document referential relations. Technical report, University of Tübingen.

Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajic, Christopher Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, and Daniel Zeman. 2016. Universal dependencies v1: A multilingual treebank collection. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), pages 23–28.

Slav Petrov, Dipanjan Das, and Ryan McDonald. 2012. A universal part-of-speech tagset. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey.

Anne Schiller, Simone Teufel, and Christine Thielen. 1995. Guidelines f¨ur das tagging deutscher textcorpora mit STTS. Technical report, Universities of Stuttgart and Tübingen.

Helmut Schmid. 1994. ”probabilistic part-of-speech tagging using decision trees. In Proceedings of International Conference on New Methods in Language Processing, page 154.

Wolfgang Seeker and Jonas Kuhn. 2012. Making ellipses explicit in dependency conversion for a german treebank. In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey, may. European Language Resources Association (ELRA).

Wojciech Skut, Thorsten Brants, Brigitte Krenn, and Hans Uszkoreit. 1997. Annotating unrestricted german text. In Fachtagung der Sektion Computerlinguistik der Deutschen Gesellschaft fr Sprachwissenschaft.

Heike Telljohann, Erhard Hinrichs, and Sandra Kübler. 2004. The TüBa-D/Z treebank: Annotating German with a context-free backbone. In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC 2004), pages 2229–2232.

Heike Telljohann, Erhard Hinrichs, Heike Zinsmeister, and Kathrin Beck. 2015. Stylebook for the tübingen treebank of written German (T¨uBa-D/Z). Technical report, University of T¨ubingen, Seminar f¨ur Sprachwissenschaft.

Julia Trushkina and Erhard Hinrichs. 2004. A hybrid model for morpho-syntactic annotation of german with a large tagset. In Dekang Lin and Dekai Wu, editors, Proceedings of EMNLP 2004, pages 238–245, Barcelona, Spain, July. Association for Computational Linguistics.

Yannick Versley. 2005. Parser evaluation across text types. In Proceedings of the Fourth Workshop on Treebanks and Linguistic Theories.

Daniel Zeman, David Marecek, Martin Popel, Loganathan Ramasamy, Jan ? Stepánek, Zdenek Žabokrtský, and Jan Hajic. 2012. Hamledt: To parse or not to parse? In Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC’12), Istanbul, Turkey. European Language Resources Association (ELRA).

Daniel Zeman. 2008. Reusable tagset conversion using tagset drivers. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC’08), Marrakech, Morocco. European Language Resources Association (ELRA).

Citations in Crossref