Training Deep Neural Networks for Reverberation Robust Speech Recognition

Conference: Speech Communication - 12. ITG-Fachtagung Sprachkommunikation
10/05/2016 - 10/07/2016 at Paderborn, Deutschland

Proceedings: Speech Communication

Pages: 5Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Authors:
Ritter, Marvin; Mueller, Markus; Stueker, Sebastian; Metze, Florian; Waibel, Alex (Institute for Anthropomatics, Karlsruhe Institute of Technology, 76131 Karlsruhe, Germany)

Abstract:
Recently hybrid systems of deep neural networks (DNNs) and hidden Markov models (HMMs) have shown state of the art results on various speech recognition tasks. Best results were achieved by training large neural networks (NNs) on huge data sets (≥ 2000h [11, 16, 20]). The required training data is often generated using different methods of data augmentation. We show that a simple approach using room impulse response (RIR) can be used to train systems more robust to reverberation. The method does not require multiple microphones or complex signal processing techniques. On a test set, simulating large rooms we show improvements from 59.7% word error rate (WER) down to 41.9%. In the case of known lectures rooms, with varying microphone positions the approach can be used to train the system for the environment. We compare systems trained with RIRs from one, multiple and simulated rooms.