Rank based Decoding for Improved DNN/HMM Hybrid Acoustic Models in the EML Transcription Platform

Konferenz: Speech Communication - 12. ITG-Fachtagung Sprachkommunikation
05.10.2016 - 07.10.2016 in Paderborn, Deutschland

Tagungsband: ITG-Fb. 267: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Autoren:
Fischer, Volker; Kunzmann, Siegfried (EML European Media Laboratory GmbH, Schloss-Wolfsbrunnenweg 35, 69118 Heidelberg, Germany)

Inhalt:
In this paper we review the use of neural network based acoustic models in the EML Transcription Platform and describe some recent improvements to DNN/HMM hybrid acoustic modeling, including vocal tract length perturbation (VTLP) and the use of a DNN state prior for decoding. We investigate network adaptation techniques to overcome a noise level mismatch between training and test data and propose the use of a robust, rank based decoding method as an alternative to the standard softmax output layer. We examine its impact on word error rate for both original and adapted DNN/HMM models for several noise conditions. Initial results obtained on a publicly available data set suggest that a rank based output layer can outperform the conventional softmax layer and is a powerful method if no adaptation data is available.