Switching Linear Dynamic Models for Recognition of Emotionally Colored and Noisy Speech

Conference: Sprachkommunikation 2010 - 9. ITG-Fachtagung
10/06/2010 - 10/08/2010 at Bochum, Deutschland

Proceedings: Sprachkommunikation 2010

Pages: 4Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Wöllmer, Martin; Klebert, Nikolaj; Schuller, Björn (Technische Universität München, Institute for Human-Machine Communication, Theresienstr. 90, 80333 München, Germany)

Model-based speech feature enhancement techniques were shown to be a promising approach towards increasing the robustness of automatic speech recognition in noisy environments. Strategies that model speech with a Switching Linear Dynamic Model (SLDM) have been successfully applied to noisy speech recognition tasks, since they overcome the limitations of GMM- or HMM-based approaches. However, SLDM-based feature enhancement has so far only been investigated for the recognition of isolated words or relatively friendly scenarios such as connected digit recognition under the presence of additive noise using whole word models (e. g. the AURORA task). In order to give an impression of the effectiveness of SLDM speech modeling for more challenging ASR applications, we evaluate SLDM feature enhancement for continuous recognition of spontaneous and emotionally colored speech in the noise. As backend we use tied-state triphone models trained and evaluated on the SAL Corpus. Applying SLDM-based feature enhancement, we achieve an average relative performance gain of almost 20% when considering diverse noise settings.