Deep Learning of Articulatory-Based Representations and Applications for Improving Dysarthric Speech Recognition

Konferenz: Speech Communication - 13. ITG-Fachtagung Sprachkommunikation
10.10.2018 - 12.10.2018 in Oldenburg, Deutschland

Tagungsband: ITG-Fb. 282: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Autoren:
Xiong, Feifei; Barker, Jon (Speech and Hearing Group (SPandH), Dept. of Computer Science, University of Sheffield, Sheffield, UK)
Christensen, Heidi (Speech and Hearing Group (SPandH), Dept. of Computer Science, University of Sheffield, Sheffield, UK & Centre for Assistive Technology and Connected Healthcare (CATCH), University of Sheffield, Sheffield, UK)

Inhalt:
Improving the accuracy of dysarthric speech recognition is a challenging research field due to the high inter- and intra-speaker variability in disordered speech. In this work, we propose to use estimated articulatory-based representations to augment the conventional acoustic features for better modeling of the dysarthric speech variability in automatic speech recognition. To obtain the articulatory information, long short-time memory recurrent neural networks are employed to learn the acoustic-to-articulatory inverse mapping based on a simulated articulatory database. Experimental results show that the estimated articulatory features can provide consistent improvement for dysarthric speech recognition with more improvement observed for speakers with moderate and moderate-severe dysarthria.