Automatic Estimation of the Triangular Vowel Space Area from i-Vectors

Konferenz: Speech Communication - 13. ITG-Fachtagung Sprachkommunikation
10.10.2018 - 12.10.2018 in Oldenburg, Deutschland

Tagungsband: ITG-Fb. 282: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Tanuadji, Maureen; Stadtschnitzer, Michael; Bardeli, Rolf; Jaeger, Hagen (Fraunhofer IAIS, Schloss Birlinghoven, 53757 Sankt Augustin, Germany)

Parkinson’s Disease (PD) is a neurodegenerative disorder which gradually effects the neurological condition of the patient. In many cases the disease impairs the reliability of the articulatory system and the ability to pronounce vowels normally. One prominent way to measure the degree of the functioning of the articulatory system is the Vowel Space Area (VSA). However, the typical way to measure it, is to manually annotate sustained vowel recordings or phonetically annotated speech utterances of a speaker and then analyze the signals. However, it is often desirable to measure the VSA directly from unlabeled natural speech. Therefore an automatic model-based system is proposed in this paper to estimate the triangular Vowels Space Area (tVSA) and the underlying corner vowel formant frequencies directly from unlabeled natural speech. The proposed algorithm is able to estimate the tVSA automatically from the speech signals without the need of phonetical or vowel transcriptions. The i-Vectors are extracted from the signals as the speaker’s characteristic representation, from which the speaker’s corner vowel formant frequencies are estimated by regression classifiers. Two regression classifiers, namely Deep Neural Networks (DNN) and Support Vector Regression (SVR), are investigated in this work. The proposed configuration employs the SVR classifier, which is able to predict the corner vowel formant frequencies of the test speakers with R2 up to 0.56719 and ρ up to 0.76485.