Uncertainty Propagation for Speech Recognition using RASTA Features in Highly Nonstationary Noisy Environments

Konferenz: Sprachkommunikation 2008 - 8. ITG-Fachtagung
08.10.2008 - 10.10.2008 in Aachen, Germany

Tagungsband: Sprachkommunikation 2008

Seiten: 4Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Astudillo, Ramón F.; Kolossa, Dorothea; Orglmeister, Reinhold (Fachgebiet Elektronik und medizinische Signalverarbeitung, TU Berlin, 10587 Berlin)

The use of speech enhancement techniques in nonstationary noisy environments often leaves residual noise along with speech distortions that affect the performance of automatic speech recognition systems. In "R.F. Astudillo, D. Kolossa and R. Orglmeister, Propagation of statistical information through non-linear feature extractions for robust speech recognition", we showed that the effect of these inaccuracies can be attenuated by propagating a measure of uncertainty from the speech enhancement domain to the mel-cepstral domain of recognition features. In this domain, it is possible to re-estimate unreliable features by combining the uncertainty information with the speech recognizer parameters. In this paper we extend this technique to a more complex feature extraction domain with a psychoacoustical basis: Cepstral coefficients obtained from RelAtive SpecTrAl Perceptual Linear Prediction features (RASTA-PLP). We particularly address the problem of dealing with the RASTA filter and the different non-linearities involved in the feature extraction while keeping computational costs low.