On the Effects of Speaker Gender in Emotion Recognition Training Data

Konferenz: Speech Communication - 13. ITG-Fachtagung Sprachkommunikation
10.10.2018 - 12.10.2018 in Oldenburg, Deutschland

Tagungsband: ITG-Fb. 282: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Xu, Ziyi; Meyer, Patrick; Fingscheidt, Tim (Institute for Communications Technology, Technische Universität Braunschweig, Braunschweig, Germany)

Research of gender effects in the field of automatic speech emotion recognition (SER) has been subject to research in the past. Still, however, it is somewhat unclear to which degree speaker gender influences SER. Although we will prove it to be wrong for our SER model, usually it is assumed that a “gender-dependent” emotion recognizer performs better than a “gender-independent” one by using gender-dependent emotional speech training data. In this paper, we use a state-of-the-art emotion recognizer model to investigate the effects of speaker genders in training and in test. The gender-specific SER model performs well under matched-gender test conditions, as expected. Training with approximate half the amount of the training data from male and female speakers jointly, is about as good as separate training using gender-specific data and test in matched conditions with in total the same amount of training data. This however, only holds for an average over male and female speakers: Interestingly, we show that female voices’ emotions are recognized better on a mixed-gender SER model, than on a female SER model, indicating that female speakers express emotions in a wider variety than male speakers?