Voice and Speech Assessment From Telephone Recordings Using Prosodic Analysis Based on μ-Law-Companded Features

Konferenz: Speech Communication - 12. ITG-Fachtagung Sprachkommunikation
05.10.2016 - 07.10.2016 in Paderborn, Deutschland

Tagungsband: ITG-Fb. 267: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Autoren:
Haderlein, Tino; Noeth, Elmar (Lehrstuhl für Informatik 5 (Mustererkennung), Friedrich-Alexander-Universität Erlangen-Nürnberg, Martensstr. 3, 91058 Erlangen, Germany)
Schuetzenberger, Anne; Doellinger, Michael (Phoniatrische und pädaudiologische Abteilung in der HNO-Klinik, Klinikum der Universität Erlangen-Nürnberg, Bohlenplatz 21, 91054 Erlangen, Germany)

Inhalt:
Objective assessment of voice and speech properties via telephone is desirable for rehabilitation purposes. 82 patients after partial laryngectomy read a standardized text on the phone. Five experienced raters assessed speech effort, match of breath and sense units, vocal tone, intelligibility, and overall voice quality perceptually based on these recordings. Objective evaluation was performed by the word accuracy and word correctness of a speech recognition system, and a set of prosodic features. The speech recognition system used μ-law features, i. e. modifiedMel- Frequency Cepstrum Coefficients (MFCCs). The prosodic features were computed based on word hypotheses graphs produced by the speech recognizer. The human-machine correlation between these features and the perceptual evaluation show slightly better results for the system based on μ-law features than for the baseline MFCC system.