DFT–based Speech Enhancement for Robust Automatic Speech Recognition

Konferenz: Sprachkommunikation 2008 - 8. ITG-Fachtagung
08.10.2008 - 10.10.2008 in Aachen, Germany

Tagungsband: Sprachkommunikation 2008

Seiten: 4Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Breithaupt, Colin; Martin, Rainer (Institute for Communication Acoustics (IKA), Ruhr-Universität Bochum, 44780 Bochum)

In this paper we demonstrate that a DFT-based noise reduction preprocessor in conjunction with an estimate of the signal-to-noise ratio (SNR) using cepstro–temporal smoothing (CTS) significantly increases the robustness of automatic speech recognition (ASR). An analysis of Hidden Markov-model (HMM) based ASR with Melfrequency cepstral coefficients (MFCCs) as speech features reveals that the robustness not only depends on a reduced noise level, but also on a low variance of the speech features. For an estimate of the SNR in low SNR conditions, the estimation error variance of the widely-used decision-directed approach leads to speech features with an increased variance which counteracts the reduced noise level. Unlike the decision-directed SNR estimator, an SNR estimate based on CTS is able to both reduce the noise level in a speech signal and to keep the variance of the clean speech estimate low. As a result, the SNR estimation based on CTS allows to employ DFT-based spectral estimators for robust ASR even in non-stationary noise.