Audio-Visual Speech Recognition for Uncertain Acoustical Observations

Konferenz: Sprachkommunikation - Beiträge zur 10. ITG-Fachtagung
26.09.2012 - 28.09.2012 in Braunschweig, Deutschland

Tagungsband: Sprachkommunikation

Seiten: 4Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Abdelaziz, Ahmed Hussen; Zeiler, Steffen; Kolossa, Dorothea (Digital Signal Processing Group, Insititute of Communication Acoustics, Ruhr-Universität Bochum, 44801 Bochum, Germany)

speech recognition is still a challenging problem. To address this issue, so-called uncertainty-of-observation techniques can be used, either for audio-only, or for audiovisual speech recognition. There are many established uncertainty-of-observation strategies, among them, two of the computationally least expensive ones are uncertainty decoding and modified imputation. In contrast to these two standard approaches, improvements are possible by using a new technique, which combines model-based speech estimation with dynamic variance compensation, and carries little computational overhead. This new approach - significance decoding - has previously been applied only in unimodal speech recognition. In this paper, it is applied to coupled-HMM-based audio-visual speech recognition, and it is shown here to clearly outperform the two standard approaches of modified imputation and uncertainty decoding in handling acoustic uncertainty for both