Balancing Gaussianity and sparseness in feature-space speaker adaptation for word prominence detection

Konferenz: Speech Communication - 12. ITG-Fachtagung Sprachkommunikation
05.10.2016 - 07.10.2016 in Paderborn, Deutschland

Tagungsband: ITG-Fb. 267: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Autoren:
Schnall, Andrea (TU Darmstadt - Control Methods and Robotics, Holzhofallee 38, 64295 Darmstadt, Germany)
Heckmann, Martin (Honda Research Institute Europe GmbH, Carl-Legien-Str. 30, 63073 Offenbach/Main, Germany)

Inhalt:
Word prominence is an essential part of communication, e.g. to express important information. But due to speaker variations, it is difficult to extract these cues. A common method therefore is the adaptation of data from a novel speaker to a model trained from a larger pool of speakers. For the detection via an SVM classifier we developed an adaptation method based on the radial basis function of the SVM with a Gaussian regularization, which is derived from fMLLR. With this method we could improve the word prominence detection, but, as it is a common problem, a sufficient amount of labeled data is necessary. In this paper we show that by adding an additional regularization term, we can further improve the classification and reduce the amount of necessary data notably. Additionally, we investigate how the weighting of the different terms influences the performance and can be used to improve the results.