DFT–based Speech Enhancement for Robust Automatic Speech Recognition

Conference: Sprachkommunikation 2008 - 8. ITG-Fachtagung
10/08/2008 - 10/10/2008 at Aachen, Germany

Proceedings: Sprachkommunikation 2008

Pages: 4Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Breithaupt, Colin; Martin, Rainer (Institute for Communication Acoustics (IKA), Ruhr-Universität Bochum, 44780 Bochum)

In this paper we demonstrate that a DFT-based noise reduction preprocessor in conjunction with an estimate of the signal-to-noise ratio (SNR) using cepstro–temporal smoothing (CTS) significantly increases the robustness of automatic speech recognition (ASR). An analysis of Hidden Markov-model (HMM) based ASR with Melfrequency cepstral coefficients (MFCCs) as speech features reveals that the robustness not only depends on a reduced noise level, but also on a low variance of the speech features. For an estimate of the SNR in low SNR conditions, the estimation error variance of the widely-used decision-directed approach leads to speech features with an increased variance which counteracts the reduced noise level. Unlike the decision-directed SNR estimator, an SNR estimate based on CTS is able to both reduce the noise level in a speech signal and to keep the variance of the clean speech estimate low. As a result, the SNR estimation based on CTS allows to employ DFT-based spectral estimators for robust ASR even in non-stationary noise.