Bilingual I-Vector Extractor for DNN Hybrid Acoustic Model Training in German Speech Recognition Systems

Konferenz: Speech Communication - 14th ITG Conference
29.09.2021 - 01.10.2021 in online

Tagungsband: ITG-Fb. 298: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Wang, Yao; Gref, Michael; Walter, Oliver; Schmidt, Christoph (Fraunhofer Institute for Intelligent Analysis and Information Systems (IAIS), Sankt Augustin, Germany)

In recent research, i-vectors have been shown to be significantly beneficial for speaker recognition and have been successfully applied in deep neural network (DNN) acoustic model (AM) training to improve the performance of automatic speech recognition (ASR). This paper describes our work in developing a bilingual i-vector extractor for training a German speech recognition system. A bilingual data set, which consisting of German and English speech data is used to train an i-vector extractor for a DNN hybrid acoustic model. The system is evaluated on different data sets. The results show that i-vector extractors trained with bilingual data can be used to improve the training of ASR models in the case of insufficient monolingual data. Additionally, using telephone speech as a case study, we show that i-vector extractor training with data from this domain leads to improvements in recognition.