Recognition of Noisy Speech by Starting the Likelihood Calculation at Voiced Segments

Konferenz: Speech Communication - 11. ITG-Fachtagung Sprachkommunikation
24.09.2014 - 26.09.2014 in Erlangen, Deutschland

Tagungsband: Speech Communication

Seiten: 4Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Autoren:
Hirsch, Hans-Guenter; Kremer, Frank (Institute for Pattern Recognition, Niederrhein University of Applied Sciences, 47805 Krefeld, Germany)

Inhalt:
The performance of automatic speech recognition systems is still not comparable with the ability of humans to communicate and understand speech in noisy scenarios. Observing humans in noisy environments they do not perceive and understand all fragments of the speech. Under extremely noisy conditions this can go that far that they can only understand a few segments with a high signal-to-noise ratio (SNR). Derived from this observation we investigate an alternative recognition approach where we start the recognition process at the usually voiced segments with high SNR. Furthermore, we ignore or will take into account to a lesser extent the fragments with low SNR. We developed a method to detect the voiced segments in distorted speech signals. Beginning at the detected positions we calculate probabilities backward and forward in time with a type of modified subword HMM. Neglecting segments with low SNR leads to the need to modify the usual likelihood calculation and to extract probability information for being in a certain HMM state during the recognition process. First recognition results are presented for the recognition of the isolated TIDigits. The achieved results can be seen as a proof that this approach of an alternative recognition strategy is applicable in principal.