Balancing Gaussianity and sparseness in feature-space speaker adaptation for word prominence detection

Conference: Speech Communication - 12. ITG-Fachtagung Sprachkommunikation
10/05/2016 - 10/07/2016 at Paderborn, Deutschland

Proceedings: Speech Communication

Pages: 5Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Authors:
Schnall, Andrea (TU Darmstadt - Control Methods and Robotics, Holzhofallee 38, 64295 Darmstadt, Germany)
Heckmann, Martin (Honda Research Institute Europe GmbH, Carl-Legien-Str. 30, 63073 Offenbach/Main, Germany)

Abstract:
Word prominence is an essential part of communication, e.g. to express important information. But due to speaker variations, it is difficult to extract these cues. A common method therefore is the adaptation of data from a novel speaker to a model trained from a larger pool of speakers. For the detection via an SVM classifier we developed an adaptation method based on the radial basis function of the SVM with a Gaussian regularization, which is derived from fMLLR. With this method we could improve the word prominence detection, but, as it is a common problem, a sufficient amount of labeled data is necessary. In this paper we show that by adding an additional regularization term, we can further improve the classification and reduce the amount of necessary data notably. Additionally, we investigate how the weighting of the different terms influences the performance and can be used to improve the results.