Dereverberation Preprocessing and Training Data Adjustments for Robust Speech Recognition in Reverberant Environments

Konferenz: Sprachkommunikation - Beiträge zur 10. ITG-Fachtagung
26.09.2012 - 28.09.2012 in Braunschweig, Deutschland

Tagungsband: Sprachkommunikation

Seiten: 4Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Schmid, Dominic; Thüne, Philipp; Kolossa, Dorothea; Enzner, Gerald (Institute of Communication Acoustics, Ruhr-Universität Bochum, 44780 Bochum, Germany)

The performance of automatic speech recognition (ASR) systems is severely degraded in distant recording situations where the clean speech signal is distorted by reverberation and background noise. This paper evaluates and compares two possible strategies for improving the ASR accuracy in such conditions. At first, we examine different preprocessing approaches employed prior to ASR which aim at estimating the undistorted speech signal. In particular, we compare the large-vocabulary speech recognition results of a recently proposed multichannel dereverberation algorithm with the results obtained using the output of an ideal delay-and-sum beamformer (DSB) and a simulated latereverberation suppression (LRS). As a second strategy, we consider an adjusted ASR training in which certain distortions are incorporated into the training data. By evaluating results for clean, matched, and mismatched training cases, we finally quantify the improvements that can be expected from dereverberation preprocessing and specifically adjusted training sets.