Conference article

Universal Dependencies Are Hard to Parse – Or Are They?

Ines Rehbein
Leibniz ScienceCampus, Institut für Deutsche Sprache Mannheim, Germany

Julius Steen
Leibniz ScienceCampus, Universität Heidelberg, Germany

Bich-Ngoc Do
Leibniz ScienceCampus, Universität Heidelberg, Germany

Anette Frank
Leibniz ScienceCampus, Universität Heidelberg, Germany

Download article

Published in: Proceedings of the Fourth International Conference on Dependency Linguistics (Depling 2017), September 18-20, 2017, Università di Pisa, Italy

Linköping Electronic Conference Proceedings 139:25, p. 218-228

Show more +

Published: 2017-09-13

ISBN: 978-91-7685-467-9

ISSN: 1650-3686 (print), 1650-3740 (online)

Abstract

Universal Dependency (UD) annotations, despite their usefulness for cross-lingual tasks and semantic applications, are not optimised for statistical parsing. In the paper, we ask what exactly causes the decrease in parsing accuracy when training a parser on UD-style annotations and whether the effect is similarly strong for all languages. We conduct a series of experiments where we systematically modify individual annotation decisions taken in the UD scheme and show that this results in an increased accuracy for most, but not for all languages. We show that the encoding in the UD scheme, in particular the decision to encode content words as heads, causes an increase in dependency length for nearly all treebanks and an increase in arc direction entropy for many languages, and evaluate the effect this has on parsing accuracy.

Keywords

No keywords available

References

Anders Björkelund and Joakim Nivre. 2015. Nondeterministic oracles for unrestricted non-projective transition-based dependency parsing. In Proceedings of the 14th International Conference on Parsing Technologies, IWPT ’15, pages 76–86, Bilbao, Spain, July.

Richard Futrell, Kyle Mahowald, and Edward Gibson. 2015. Quantifying word order freedom in dependency corpora. In Proceedings of the Third International Conference on Dependency Linguistics, Depling 2015, pages 91–100.

Yoav Goldberg and Michael Elhadad. 2010. Inspecting the structural biases of dependency parsing algorithms. In Proceedings of the Fourteenth Conference on Computational Natural Language Learning, CoNLL ’10, pages 234–242, Uppsala, Sweden.

Kristina Gulordava and Paola Merlo. 2016. Multilingual dependency parsing evaluation: a large-scale analysis of word order properties using artificial data. Transactions of the Association for Computational Linguistics, 4:343–356.

Sepp Hochreiter and J¨urgen Schmidhuber. 1997. Long short-term memory. Neural Computing, 9(8):1735–1780.

Samar Husain and Bhasha Agrawal. 2012. Analyzing parser errors to improve parsing accuracy and to inform tree banking decisions. In The 10th International Workshop on Treebanks and Linguistic Theories, TLT.

Ryosuke Kohita, Hiroshi Noji, and Yuji Matsumoto. 2017. Multilingual back-and-forth conversion between content and function head for easy dependency parsing. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL’17, pages 1–7, Valencia, Spain.

Sandra Kübler. 2005. How do treebank annotation schemes influence parsing results? Or how not to compare apples and oranges. In Proceedings of Recent Advances in Natural Language Processing, RANLP ’05.

Tao Lei, Yu Xin, Yuan Zhang, Regina Barzilay, and Tommi Jaakkola. 2014. Low-rank tensors for scoring dependency structures. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, ACL ’14, pages 1381–1391.

Haitao Liu. 2010. Dependency direction as a means of word-order typology: A method based on dependency treebanks. Lingua, 120(6):1567–1578.

David Marecek, Martin Popel, Loganathan Ramasamy, Jan Štepánek, Daniel Zeman, Zdenek ? Zabokrtský, and Jan Hajie. 2013. Cross-language study on influence of coordination style on dependency parsing performance. Technical report, UFAL.

Marie-Catherine De Marneffe, Timothy Dozat, Natalia Silveira, Katri Haverinen, Filip Ginter, Joakim Nivre, and Christopher D. Manning. 2014. Universal Stanford dependencies: a cross-linguistic typology. In Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC ’14, Reykjavik, Iceland.

Ryan T. McDonald and Joakim Nivre. 2007. Characterizing the errors of data-driven dependency parsing models. In Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, June 28-30, 2007, Prague, Czech Republic, EMNLP-CoNLL ’07, pages 122–131, Prague, Czech Republic.

Joakim Nivre and Chiao-Ting Fang. 2017. Universal dependency evaluation. In Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies, pages 86–95, Gothenburg, Sweden.

Joakim Nivre, Marie-Catherine de Marneffe, Filip Ginter, Yoav Goldberg, Jan Hajic, Christopher D. Manning, Ryan McDonald, Slav Petrov, Sampo Pyysalo, Natalia Silveira, Reut Tsarfaty, and Daniel Zeman. 2016. Universal dependencies v1: A multilingual treebank collection. In Proceedings of the Tenth International Conference on Language Resources and Evaluation, LREC ’16, Portorož, Slovenia.

Joakim Nivre. 2016. Universal dependency evaluation. Technical report, Uppsala University, Sweden. Slav Petrov, Dipanjan Das, and Ryan McDonald. 2012. A universal part-of-speech tagset. In Proceedings of the Eight International Conference on Language Resources and Evaluation, LREC ’12, Istanbul, Turkey.

Martin Popel, David Marecek, Jan Stepánek, Daniel Zeman, and Zdenek Zabokrtsk´y. 2013. Coordination structures in dependency treebanks. In Annual Meeting of The European Chapter of The Association of Computational Linguistics, EACL ’13, pages 517–527.

Ines Rehbein and Josef van Genabith. 2007. Why is it so difficult to compare treebanks? TIGER and T¨uBa-D/Z revisited. In The Sixth International Workshop on Treebanks and Linguistic Theories, TLT ’07, pages 115 – 126, Bergen, Norway.

Rudolf Rosa. 2015. Multi-source cross-lingual delexicalized parser transfer: Prague or Stanford? In Proceedings of the Third International Conference on Dependency Linguistics, DepLing ’15, pages 281–290, Uppsala, Sweden.

Roy Schwartz, Omri Abend, and Ari Rappoport. 2012. Learnability-based syntactic annotation design. In Proceedings of the 24th International Conference on Computational Linguistics, COLING ’12, pages 2405–2422, Mumbai, India.

Natalia Silveira and Christopher D. Manning. 2015. Does universal dependencies need a parsing representation? an investigation of english. In Proceedings of the Third International Conference on Dependency Linguistics, DepLing’15, pages 310–319, Uppsala, Sweden.

Reut Tsarfaty, Djame Seddah, Yoav Goldberg, Sandra Kübler, Marie Candito, Jennifer Foster, Yannick Versley, Ines Rehbein, and Lamia Tounsi. 2010. Statistical Parsing of Morphologically Rich Languages (SPMRL): What, How and Whither. In Proceedings of the first workshop on Statistical Parsing of Morphologically Rich Languages, SPMRL’10, Los Angeles, CA, USA.

Yannick Versley and Angelika Kirilin. 2015. What is hard in universal dependency parsing? In The 6th Workshop on Statistical Parsing of Morphologically Rich Languages, SPMRL ’15.

Xingxing Zhang, Jianpeng Cheng, and Mirella Lapata. 2017. Dependency parsing as head selection. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL’17, pages 665–676, Valencia, Spain.

Citations in Crossref