Wavelet-Based Time-Frequency Representations for Automatic Recognition of Emotions from Speech

Konferenz: Speech Communication - 12. ITG-Fachtagung Sprachkommunikation
05.10.2016 - 07.10.2016 in Paderborn, Deutschland

Tagungsband: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Autoren:
Vasquez-Correa, J. C.; Arias-Vergara, T.; Vargas-Bonilla, J. F. (Faculty of Engineering, Universidad de Antioquia UdeA, Calle 70 No. 52-21, Medellin, Colombia)
Orozco-Arroyave, J. R. (Faculty of Engineering, Universidad de Antioquia UdeA, Calle 70 No. 52-21, Medellin, Colombi & Pattern Recognition Lab., Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany)
Noeth, E. (Pattern Recognition Lab., Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany)

Inhalt:
The interest in emotion recognition from speech has increased in the last decade. Emotion recognition can improve the quality of services and the quality of life of people. One of the main problems in emotion recognition from speech is to find suitable features to represent the phenomenon. This paper proposes new features based on the energy content of wavelet based time-frequency (TF) representations to model emotional speech. Three TF representations are considered: (1) the continuous wavelet transform, (2) the bionic wavelet transform, and (3) the synchro-squeezed wavelet transform. The classification is performed using GMM supervectors. Different classification problems are addressed, including high vs. low arousal, positive vs. negative valence, and multiple emotions. The results indicate that the proposed features are useful to classify high vs. low arousal emotions, and that the features derived from the synchro-squeezed wavelet transform are more suitable than the other two approaches to model emotional speech.