Missing Feature Audiovisual Speech Recognition under Real-Time Constraints

Conference: Sprachkommunikation 2010 - 9. ITG-Fachtagung
10/06/2010 - 10/08/2010 at Bochum, Deutschland

Proceedings: Sprachkommunikation 2010

Pages: 4Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Authors:
Kolossa, Dorothea; Astudillo, Ramón Fernandez; Zeiler, Steffen; Vorwerk, Alexander; Lerch, Dennis; Chong, Jike; Orglmeister, Reinhold (Electronics and Medical Signal Processing Group, TU Berlin, 10587 Berlin, Germany)
Kolossa, Dorothea; Astudillo, Ramón Fernandez; Zeiler, Steffen; Vorwerk, Alexander; Lerch, Dennis; Chong, Jike; Orglmeister, Reinhold (Parallel Computing Laboratory, UC Berkeley, CA 94704, USA)

Abstract:
Speech recognition under very noisy conditions can profit greatly from the addition of another modality. This is of interest for a wide range of applications, but we focus on command-and-control tasks, where a small vocabulary is necessary but needs to be correctly recognized even with negative signal-to-noise ratios. For this purpose, a robust audio front-end and a video front-end have been extended to include real-valued and binary estimates of feature reliability, respectively. This information has been integrated in an audiovisual recognizer that shows robust performance without any additional parameter tuning for a wide range of SNRs, speakers, and noise conditions. In order to obtain a method that is extensible to larger vocabularies and more complex models, the likelihood computation has been implemented in a parallelized version, leading to a measured speedup of 7.5x on an NVIDIA GT 285 processor, when compared to a sequential version running on a Core i7 processor at 2.67 GHz.