New Insights into Turbo-Decoding-Based AVSR with Dynamic StreamWeights

Konferenz: Speech Communication - 12. ITG-Fachtagung Sprachkommunikation
05.10.2016 - 07.10.2016 in Paderborn, Deutschland

Tagungsband: ITG-Fb. 267: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Gergen, Sebastian; Zeiler, Steffen; Kolossa, Dorothea (Cognitive Signal Processing Group, Institute of Communication Acoustics, Ruhr-Universität Bochum, Germany)
Abdelaziz, Ahmed Hussen (International Computer Science Institute, Berkeley, USA)

Automatic speech recognition (ASR) is an important component of many technical systems which enables seamless and intuitive human-machine interaction. However, the accuracy of ASR drops considerably when speech signals are degraded, e.g., by noise or reverberation. The introduction of a second signal stream which is not affected by the signal degradations, e.g., a video stream, helps to improve ASR such that it becomes more robust against these degradations. To incorporate information from different signal streams to an ASR system, different decoding strategies have been proposed, where especially turbo-decoding (TD) based ASR with optimally weighted streams has yielded high recognition rates even in severe acoustical conditions. The development of an algorithm for the estimation of these optimal dynamic stream weights (DSW) opened up new research questions, e.g., w.r.t. to the characteristics of DSWs and the evaluation of the recognition accuracy over the iterations of TD, which are covered in this contribution.