Robust Online Multi-Channel Speech Recognition

Conference: Speech Communication - 12. ITG-Fachtagung Sprachkommunikation
10/05/2016 - 10/07/2016 at Paderborn, Deutschland

Proceedings: Speech Communication

Pages: 5Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Authors:
Kitza, Markus; Zeyer, Albert; Schlueter, Ralf (Human Language Technology and Pattern Recognition, RWTH Aachen, 52074 Aachen, Germany)
Heymann, Jahn; Haeb-Umbach, Reinhold (Department of Communications Engineering Paderborn, Paderborn University, 33098 Paderborn, Germany)

Abstract:
In this paper we present a system for robust online far-field multi-channel speech recognition with minimal assumptions on microphone configuration and target location. We employ an online-enabled Generalized Eigenvalue (GEV) beamformer and a Long Short-Term Memory (LSTM) network to robustly calculate the signal statistics necessary for the beamforming operation in the front-end. After multiple channels have been condensed to one, a Bidirectional Long Short-Term Memory (BLSTM) acoustic model is applied on a running window of input speech. This enables online decoding in combination with the beamforming front-end. To assess the performance of the system we test it on the real evaluation set of the CHiME 3 data where we achieve a Word Error Rate (WER) of 10.4 %.