Audio-Visual Speech Recognition for Uncertain Acoustical Observations

Conference: Sprachkommunikation - Beiträge zur 10. ITG-Fachtagung
09/26/2012 - 09/28/2012 at Braunschweig, Deutschland

Proceedings: Sprachkommunikation

Pages: 4Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Abdelaziz, Ahmed Hussen; Zeiler, Steffen; Kolossa, Dorothea (Digital Signal Processing Group, Insititute of Communication Acoustics, Ruhr-Universität Bochum, 44801 Bochum, Germany)

speech recognition is still a challenging problem. To address this issue, so-called uncertainty-of-observation techniques can be used, either for audio-only, or for audiovisual speech recognition. There are many established uncertainty-of-observation strategies, among them, two of the computationally least expensive ones are uncertainty decoding and modified imputation. In contrast to these two standard approaches, improvements are possible by using a new technique, which combines model-based speech estimation with dynamic variance compensation, and carries little computational overhead. This new approach - significance decoding - has previously been applied only in unimodal speech recognition. In this paper, it is applied to coupled-HMM-based audio-visual speech recognition, and it is shown here to clearly outperform the two standard approaches of modified imputation and uncertainty decoding in handling acoustic uncertainty for both