Language Feature Vectors for Resource Constraint Speech Recognition

Konferenz: Speech Communication - 12. ITG-Fachtagung Sprachkommunikation
05.10.2016 - 07.10.2016 in Paderborn, Deutschland

Tagungsband: ITG-Fb. 267: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Mueller, Markus; Stueker, Sebastian; Waibel, Alex (Karlsruhe Institute of Technology, 76131 Karlsruhe, Germany)

Deep Neural Networks (DNNs) are a key element of state-of-the-art speech recognition systems. Being a data-driven method, they require a significant amount of training data. There exist scenarios in which such an amount of data is not available for a particular language. Building systems for such resource constrained tasks requires special techniques. One common method is to use data from multiple languages to train the acoustic model. But there are limitations on knowledge transfer between different languages. By the use of Language Feature Vectors (LFVs), we try to mitigate these limitations by providing language information to DNNs. Similar to i-Vectors for speaker adaptation, LFVs enable DNNs to better capture and adapt to inter language characteristics. Previous experiments have shown that providing LFVs to DNNs improved system performance. In this paper, we show that by adding LFVs the performance gap between mono- and multilingual systems decreases.