Parameter Optimization for CTC Acoustic Models in a Less-resourced Scenario: An Empirical Study

Konferenz: Speech Communication - 13. ITG-Fachtagung Sprachkommunikation
10.10.2018 - 12.10.2018 in Oldenburg, Deutschland

Tagungsband: ITG-Fb. 282: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Mueller, Markus; Stueker, Sebastian; Waibel, Alex (Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology, Karlsruhe, Germany)

In this work, we performed a study evaluating parameter configurations for acoustic models trained using the connectionist temporal classification (CTC) loss function. We varied the sizes of the hidden layers and mini-batches. We further used different optimizer strategies, comparing using SGD with a static learning rate and newbob scheduling. We further evaluated multiple configurations for data augmentation: As the acoustic features were extracted using overlapping windows, neighboring frames contain redundant information. By systematically omitting frames, thus creating multiple versions of the same utterances, the network can be trained on "more" data. Applying this scheme, we generated multiple versions of each utterance using different shifts. We evaluated parameter configurations using two conditions: A multilingual setup with 4 languages and 45h of data per language which can be considered less-resourced in the regimen of RNN/CTC acoustic models. In addition, we used a very low-resource dataset to assess network configurations if less than 10h of data is available.