DNN/CNN Acoustic Model Turbo Fusion for Phoneme Recognition

Conference: Speech Communication - 13. ITG-Fachtagung Sprachkommunikation
10/10/2018 - 10/12/2018 at Oldenburg, Deutschland

Proceedings: Speech Communication

Pages: 5Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Authors:
Lohrenz, Timo; Li, Wei; Fingscheidt, Tim (Institute for Communications Technology, Technische Universität Braunschweig, Braunschweig, Germany)

Abstract:
This work presents phoneme recognition results on the fusion of DNN- and CNN-based acoustic models, both individually trained, using the recently proposed turbo fusion approach. Our contribution is two-fold: For the first time, turbo fusion is applied on a multi-model task, creating the opportunity to perform information fusion in a singlemodality and single-feature representation task. Second, we propose a two-stage optimization procedure to the turbo fusion approach by including both the choice of the optimal start component recognizer and the optimal stop-iteration. Experiments on the classical TIMIT benchmark reveal two insights: Combining DNN and CNN architectures yields performance gains and among all modular reference fusion methods, turbo fusion performs best while achieving a competetive phoneme error rate of 18.8% on the TIMIT core test set.