Konferensartikel

Predicting the attitude flow in dialogue based on multi-modal speech cues

Peter Juel Henrichsen
Copenhagen Business School, Copenhagen, Denmark

Jens Allwood
University of Gothenburg, Gothenburg, Sweden

Ladda ner artikel

Ingår i: NEALT Proceedings. Northern European Association for Language and Technology; 4th Nordic Symposium on Multimodal Communication; November 15-16; Gothenburg; Sweden

Linköping Electronic Conference Proceedings 93:7, s. 47-53

NEALT Proceedings Series 21:7, p. 47-53

Visa mer +

Publicerad: 2013-10-29

ISBN: 978-91-7519-461-5

ISSN: 1650-3686 (tryckt), 1650-3740 (online)

Abstract

We present our experiments on attitude detection based on annotated multi-modal dialogue data1. Our long-term goal is to establish a computational model able to predict the attitudinal patterns in human-human dialogue. We believe; such prediction algorithms are useful tools in the pursuit of realistic discourse behavior in conversational agents and other intelligent man-machine interfaces. The present paper deals with two important subgoals in particular: How to establish a meaningful and consistent set of annotation categories for attitude annotation; and how to relate the annotation data to the recorded data (audio and video) in computational models of attitude prediction. We present our current results including a recommended set of analytical annotation labels and a recommended setup for extracting linguistically meaningful data even from noisy audio and video signals.

Nyckelord

attitude detection; prediction of attitude flow; attitude annotation; multimodal speech cues

Referenser

Allwood; J.; Cerrato; L.; Jokinen; K.; Navarretta; C. & Paggio; P. (2007). The MUMIN Coding Scheme for the Annotation of Feedback; Turn Management and Sequencing. In J. C. Martin et al. (eds.) Multimodal Corpora for Modelling Human Multimodal Behavior. Special Issue of the International Journal of Language Resources and Evaluation. Berlin: Springer.

Aylett; M. P. & J. Yamagishi (2008) Combining Statistical Parametric Speech Synthesis and Unit-Selection for Automatic Voice Cloning; LangTech-2008; Rome.

Boersma; P.; & Weenink; D. (2005). Praat: doing phonetics by computer (Version 4.3.01) [Computer program]. Retrieved from http://www.praat.org/

Henrichsen; P.J. (2012) Nature Identical Prosody; data-driven prosodic feature assignment for diphone synthesis; 4th Swedish Language Technology Conference (SLTC-2012); Lund.

Kipp; M. (2001). anvil – a generic annotation tool for multimodal dialogue. In Proceedings of Eurospeech; pages 1367-1370.

Navarretta; C.; Ahlsén; E.; Allwood; J.; Paggio; P. & Jokinen; K. (2011). Creating Comparable Multimodal Corpora for Nordic Languages. Proceedings of the 18th Nordic Conference of Computational Linguistics. Riga; Latvia; May 11-13. NEALT. pp. 153-160. See http://dspace.utlib.ee/dspace/handle/10062/16955

Nivre; J. et al. (2001). Göteborg Transcription Standard (GTS) 6.4. University of Gothenburg; Department of Linguistics.

Nivre J. et al. (2004). Modified Standard Orthography (MSO). University of Gothenburg; Department of Linguistics.

Oparin; I.; V.Kiselev; A.Talanov (2008) Large Scale Russian Hybrid Unit Selection TTS. SLTC-08. Stockholm.

Citeringar i Crossref