Potentials for ASR based on Multiple Acoustic Models and Model Selection using Standard Speech Features

Conference: Sprachkommunikation - Beiträge zur 10. ITG-Fachtagung
09/26/2012 - 09/28/2012 at Braunschweig, Deutschland

Proceedings: Sprachkommunikation

Pages: 4Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Authors:
Winkler, Thomas (Fraunhofer IAIS, 53757 Sankt Augustin, Germany )
Stein, Daniel; Bardeli, Rolf; Schneider, Daniel; Köhler, Joachim (Fraunhofer IAIS, 53757 Sankt Augustin, Germany)

Abstract:
Acoustic modelling is a key issue for successful automatic speech recognition (ASR). Common ASR systems are usually adapted to a certain use case by training robust acoustic models on speech data from the domain recorded in conditions typical for the use case. Varying conditions thus need either multi-conditional or multiple acoustic models. We present a multi-model approach coping with various acoustic conditions in this work. For each utterance the best matching set of acoustic models is selected based on acoustic information of the same acoustic features and acoustic models used for ASR. Our initial experiments show, that we achieve results comparable to a manual selection of the acoustic models but that we are still slightly outperformed by multiconditional models with a comparable number of mixtures. We further show, that an ideal selection would indeed improve the results compared to multi-conditional models.