DNN/CNN Acoustic Model Turbo Fusion for Phoneme Recognition

Konferenz: Speech Communication - 13. ITG-Fachtagung Sprachkommunikation
10.10.2018 - 12.10.2018 in Oldenburg, Deutschland

Tagungsband: ITG-Fb. 282: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Autoren:
Lohrenz, Timo; Li, Wei; Fingscheidt, Tim (Institute for Communications Technology, Technische Universität Braunschweig, Braunschweig, Germany)

Inhalt:
This work presents phoneme recognition results on the fusion of DNN- and CNN-based acoustic models, both individually trained, using the recently proposed turbo fusion approach. Our contribution is two-fold: For the first time, turbo fusion is applied on a multi-model task, creating the opportunity to perform information fusion in a singlemodality and single-feature representation task. Second, we propose a two-stage optimization procedure to the turbo fusion approach by including both the choice of the optimal start component recognizer and the optimal stop-iteration. Experiments on the classical TIMIT benchmark reveal two insights: Combining DNN and CNN architectures yields performance gains and among all modular reference fusion methods, turbo fusion performs best while achieving a competetive phoneme error rate of 18.8% on the TIMIT core test set.