Growing a Deep Neural Network Acoustic Model with Singular Value Decomposition

Konferenz: Speech Communication - 12. ITG-Fachtagung Sprachkommunikation
05.10.2016 - 07.10.2016 in Paderborn, Deutschland

Tagungsband: ITG-Fb. 267: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Kilgour, Kevin; Tseyzer, Igor; Nguyen, Thai Son; Stueker, Sebastian; Waibel, Alex (Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology, Germany)

Singular Value Decomposition (SVD) allows the weight matrix connecting two layers in a deep neural network (DNN) to be decomposed into two smaller matrices. In this paper we show how SVD can be used to initialise a new layer between the two original layers. Using SVD restructuring we can improve the word error rate (WER) of DNN based speech recognition systems while at the same time reducing their number of parameters. On a German test this resulted in a WER improvement from 16.61% to 16.16% while the number of parameters were reduced from 17.3 million to 14.55 million. When applied to an online real time speech recognition system the approach noticeable improved its real time factor while at the same time also slighty reducing its WER.