Parameter Optimization for CTC Acoustic Models in a Less-resourced Scenario: An Empirical Study

Conference: Speech Communication - 13. ITG-Fachtagung Sprachkommunikation
10/10/2018 - 10/12/2018 at Oldenburg, Deutschland

Proceedings: Speech Communication

Pages: 5Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Authors:
Mueller, Markus; Stueker, Sebastian; Waibel, Alex (Institute for Anthropomatics and Robotics, Karlsruhe Institute of Technology, Karlsruhe, Germany)

Abstract:
In this work, we performed a study evaluating parameter configurations for acoustic models trained using the connectionist temporal classification (CTC) loss function. We varied the sizes of the hidden layers and mini-batches. We further used different optimizer strategies, comparing using SGD with a static learning rate and newbob scheduling. We further evaluated multiple configurations for data augmentation: As the acoustic features were extracted using overlapping windows, neighboring frames contain redundant information. By systematically omitting frames, thus creating multiple versions of the same utterances, the network can be trained on "more" data. Applying this scheme, we generated multiple versions of each utterance using different shifts. We evaluated parameter configurations using two conditions: A multilingual setup with 4 languages and 45h of data per language which can be considered less-resourced in the regimen of RNN/CTC acoustic models. In addition, we used a very low-resource dataset to assess network configurations if less than 10h of data is available.