Hao Zhang
School of Engineering, The University of Tokyo, Japan
Shin’ichi Warisawa
School of Engineering, The University of Tokyo, Japan
Ichiro Yamada
School of Engineering, The University of Tokyo, Japan
Ladda ner artikelIngår i: KEER2014. Proceedings of the 5th Kanesi Engineering and Emotion Research; International Conference; Linköping; Sweden; June 11-13
Linköping Electronic Conference Proceedings 100:4, s. 39-49
Publicerad: 2014-06-11
ISBN: 978-91-7519-276-5
ISSN: 1650-3686 (tryckt), 1650-3740 (online)
A purely segment-level approach is proposed in this paper that entirely abandons the utterance-level features. We focus on better extracting the emotional information from a number of selected segments within utterances. We designed two segment selection approaches (miSATIR and crSATIR) for selecting utterance segments for use in extracting features that are based on information theory and correlation coefficients to create the purely segment-level concept of the model. We established a model using these selected segment-level speech frames after clarifying the time interval for the segments. Testing has been carried out on a 50-person emotional speech database that was specifically designed for this research; and we found that there were significant improvements in the average level of accuracy (more than 20%) compared to that using the existing approaches for all the utterances’ information. The test results that were based on the speech signals stimulated by the International Affective Picture System (IAPS) database showed that the proposed method could be used in emotion strength analyses.
A purely segment-level approach is proposed in this paper that entirely abandons the utterance-level features. We focus on better extracting the emotional information from a number of selected segments within utterances. We designed two segment selection
Atal; B. S.; & Hanauer; S. L. (1971). Speech analysis and synthesis by linear prediction of the speech wave. The Journal of the Acoustical Society of America; 50; 637.
Battiti; R. (1994). Using mutual information for selecting features in supervised neural net learning. Neural Networks; IEEE Transactions on; 5(4); 537-550.
Chandaka; S.; Chatterjee; A.; & Munshi; S. (2009). Support vector machines employing cross-correlation for emotional speech recognition. Measurement; 42(4); 611-618.
Davis; S.; & Mermelstein; P. (1980). Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. Acoustics; Speech and Signal Processing; IEEE Transactions on; 28
echanics. Physical review; 106(4); 620.
Kim; E. H.; Hyun; K. H.; Kim; S. H.; & Kwak; Y. K. (2009). Improved emotion recognition with a novel speaker-independent feature. Mechatronics; IEEE/ASME Transactions on; 14(3); 317-325.
Lang; P. J.; Bradley; M. M.; & Cuthbert; B. N. (1999). International affective picture system (IAPS): Technical manual and affective ratings: Gainesville; FL: The Center for Research in Psychophysiology; University of Florida.
Morrison; D.; Wang; R.; & De Silva; L. C. (2007). Ensemble methods for spoken emotion recognition in call-centres. Speech communication; 49(2); 98-112.
Pearson; K. (1895). Note on regression and inheritance in the case of two parents. Proceedings of the Royal Society of London; 58(347-352); 240-242.
Picard; R. W. (2000). Affective computing: MIT press.
Qi-Rong; M.; & Zhan; Y.-z. (2010). A novel hierarchical speech emotion recognition method based on improved DDAGSVM. Computer Science and Information Systems/ComSIS; 7(1); 211-222.
Schuller; B.; & Rigoll; G. (2006). Timing levels in segment-based speech emotion recognition. Paper presented at the INTERSPEECH; Pittsburgh; Pennsylvania; USA.
Shannon; C. E. (2001). A mathematical theory of communication. ACM SIGMOBILE Mobile Computing and Communications Review; 5(1); 3-55.
Specht; D. F. (1990). Probabilistic neural networks. Neural networks; 3(1); 109-118.
Steuer; R.; Kurths; J.; Daub; C. O.; Weise; J.; & Selbig; J. (2002). The mutual information: detecting and evaluating dependencies between variables. Bioinformatics; 18(suppl 2); S231-S240.
Ververidis; D.; & Kotropoulos; C. (2006). Emotional speech recognition: Resources; features; and methods. Speech communication; 48(9); 1162-1181.
Yeh; J.-H.; Pao; T.-L.; Lin; C.-Y.; Tsai; Y.-W.; & Chen; Y.-T. (2011). Segment-based emotion recognition from continuous Mandarin Chinese speech. Computers in Human Behavior; 27(5); 1545-1552.
Yu; F. B. J. Y. Y.; & Xu; D. (2007). Decision Templates Ensemble and Diversity Analysis for Segment-Based Speech Emotion Recognition. Paper presented at the ISKE 2007; San Diego; CA; USA.