Uncertainty Decoding Using a Sampling Strategy Based on the Eigenvalue Decomposition

Konferenz: Speech Communication - 12. ITG-Fachtagung Sprachkommunikation
05.10.2016 - 07.10.2016 in Paderborn, Deutschland

Tagungsband: ITG-Fb. 267: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Huemmer, Christian; Stadter, Philipp; Kellermann, Walter (Multimedia Communications and Signal Processing, University of Erlangen-Nuremberg, 91058 Erlangen, Germany)

Uncertainty decoding combines a probabilistic distortion model with the acoustic model of a speech recognition system. This can be realized for DNN-based acoustic models by drawing feature samples from an estimated probability distribution and averaging the resulting set of posterior likelihoods at the output of the DNN. According to this principle, we consider a probabilistic feature description in the logmelspec domain to model the front-end estimation errors produced by a coherence-based Wiener filter. As main innovation with respect to previous work, we employ a sampling strategy based on the eigenvalue decomposition to capture (instead of neglect) the cross-correlations between the acoustic features as part of the uncertainty decoding scheme. The experimental results for real recordings provided by the REVERB Challenge task highlight the effectiveness of this sampling strategy in improving the recognition accuracy of a DNN-HMM hybrid system.