Deep Learning of Articulatory-Based Representations and Applications for Improving Dysarthric Speech Recognition
Konferenz: Speech Communication - 13. ITG-Fachtagung Sprachkommunikation
10.10.2018 - 12.10.2018 in Oldenburg, Deutschland
Tagungsband: ITG-Fb. 282: Speech Communication
Seiten: 5Sprache: EnglischTyp: PDFPersönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt
Xiong, Feifei; Barker, Jon (Speech and Hearing Group (SPandH), Dept. of Computer Science, University of Sheffield, Sheffield, UK)
Christensen, Heidi (Speech and Hearing Group (SPandH), Dept. of Computer Science, University of Sheffield, Sheffield, UK & Centre for Assistive Technology and Connected Healthcare (CATCH), University of Sheffield, Sheffield, UK)
Improving the accuracy of dysarthric speech recognition is a challenging research field due to the high inter- and intra-speaker variability in disordered speech. In this work, we propose to use estimated articulatory-based representations to augment the conventional acoustic features for better modeling of the dysarthric speech variability in automatic speech recognition. To obtain the articulatory information, long short-time memory recurrent neural networks are employed to learn the acoustic-to-articulatory inverse mapping based on a simulated articulatory database. Experimental results show that the estimated articulatory features can provide consistent improvement for dysarthric speech recognition with more improvement observed for speakers with moderate and moderate-severe dysarthria.