Training Deep Neural Networks for Reverberation Robust Speech Recognition

Konferenz: Speech Communication - 12. ITG-Fachtagung Sprachkommunikation
05.10.2016 - 07.10.2016 in Paderborn, Deutschland

Tagungsband: ITG-Fb. 267: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Ritter, Marvin; Mueller, Markus; Stueker, Sebastian; Metze, Florian; Waibel, Alex (Institute for Anthropomatics, Karlsruhe Institute of Technology, 76131 Karlsruhe, Germany)

Recently hybrid systems of deep neural networks (DNNs) and hidden Markov models (HMMs) have shown state of the art results on various speech recognition tasks. Best results were achieved by training large neural networks (NNs) on huge data sets (≥ 2000h [11, 16, 20]). The required training data is often generated using different methods of data augmentation. We show that a simple approach using room impulse response (RIR) can be used to train systems more robust to reverberation. The method does not require multiple microphones or complex signal processing techniques. On a test set, simulating large rooms we show improvements from 59.7% word error rate (WER) down to 41.9%. In the case of known lectures rooms, with varying microphone positions the approach can be used to train the system for the environment. We compare systems trained with RIRs from one, multiple and simulated rooms.