Deep Learning of Articulatory-Based Representations and Applications for Improving Dysarthric Speech Recognition

Conference: Speech Communication - 13. ITG-Fachtagung Sprachkommunikation
10/10/2018 - 10/12/2018 at Oldenburg, Deutschland

Proceedings: Speech Communication

Pages: 5Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Authors:
Xiong, Feifei; Barker, Jon (Speech and Hearing Group (SPandH), Dept. of Computer Science, University of Sheffield, Sheffield, UK)
Christensen, Heidi (Speech and Hearing Group (SPandH), Dept. of Computer Science, University of Sheffield, Sheffield, UK & Centre for Assistive Technology and Connected Healthcare (CATCH), University of Sheffield, Sheffield, UK)

Abstract:
Improving the accuracy of dysarthric speech recognition is a challenging research field due to the high inter- and intra-speaker variability in disordered speech. In this work, we propose to use estimated articulatory-based representations to augment the conventional acoustic features for better modeling of the dysarthric speech variability in automatic speech recognition. To obtain the articulatory information, long short-time memory recurrent neural networks are employed to learn the acoustic-to-articulatory inverse mapping based on a simulated articulatory database. Experimental results show that the estimated articulatory features can provide consistent improvement for dysarthric speech recognition with more improvement observed for speakers with moderate and moderate-severe dysarthria.