Published: 2013-05-17
ISBN: 978-91-7519-589-6
ISSN: 1650-3686 (print), 1650-3740 (online)
The identification of discourse units is an essential step in discourse parsing; the automatic construction of a discourse structure from a text. We present a rule-based algorithm to identify elementary discourse units (EDUs) in Dutch written text. Contrary to approaches that focus on the determination of segment boundaries; we identify complete discourse units; which is especially helpful for the recognition of interrupted EDUs that contain embedded discourse units. We use syntactic and lexical information to decompose sentences into EDUs. Experimental results show that our algorithm for EDU identification performs well on texts of various genres.
