Dereverberation Preprocessing and Training Data Adjustments for Robust Speech Recognition in Reverberant Environments
Conference: Sprachkommunikation - Beiträge zur 10. ITG-Fachtagung
09/26/2012 - 09/28/2012 at Braunschweig, Deutschland
Pages: 4Language: englishTyp: PDFPersonal VDE Members are entitled to a 10% discount on this title
Schmid, Dominic; Thüne, Philipp; Kolossa, Dorothea; Enzner, Gerald (Institute of Communication Acoustics, Ruhr-Universität Bochum, 44780 Bochum, Germany)
The performance of automatic speech recognition (ASR) systems is severely degraded in distant recording situations where the clean speech signal is distorted by reverberation and background noise. This paper evaluates and compares two possible strategies for improving the ASR accuracy in such conditions. At first, we examine different preprocessing approaches employed prior to ASR which aim at estimating the undistorted speech signal. In particular, we compare the large-vocabulary speech recognition results of a recently proposed multichannel dereverberation algorithm with the results obtained using the output of an ideal delay-and-sum beamformer (DSB) and a simulated latereverberation suppression (LRS). As a second strategy, we consider an adjusted ASR training in which certain distortions are incorporated into the training data. By evaluating results for clean, matched, and mismatched training cases, we finally quantify the improvements that can be expected from dereverberation preprocessing and specifically adjusted training sets.