Automatic Speech Recognition for Dementia Screening using ILSE-Interviews

Konferenz: Speech Communication - 14th ITG Conference
29.09.2021 - 01.10.2021 in online

Tagungsband: ITG-Fb. 298: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Ablimit, Ayimnisagul; Schultz, Tanja (Cognitive Systems Lab, University of Bremen, Bremen, Germany)

Spoken language skills are strong biomarkers for detecting cognitive decline. Studies like the Interdisciplinary Longitudinal Study of Adult Development and Aging (ILSE) are of particular interest to quantify the predictive power of biomarkers in terms of acoustic/linguistic features. ILSE consists of ca. 6500 hours of interviews and only 10% were manually transcribed. To extract linguistic features, we need to build reliable ASR to provide transcriptions. The ILSE-corpus is challenging for ASR, due to a combination of factors. In this study, we present our effort to overcome some of these challenges. We automatically segmented 45-minutes of interviews into shorter segments and time aligned. Using these segments, we developed HMMDNN based ASR and achieved 33.55% of WER. Based on this system, we recreated the time-alignments for manualtranscriptions and derived acoustic and linguistic features for classifier-training. we applied the resulting system for dementia screening and achieved UAR of 0.867 for a threeclass problem.