Evaluation of Enhanced F0-Trajectories for Speech Detection and Classification in Acoustic Monitoring

Konferenz: Speech Communication - 12. ITG-Fachtagung Sprachkommunikation
05.10.2016 - 07.10.2016 in Paderborn, Deutschland

Tagungsband: ITG-Fb. 267: Speech Communication

Seiten: 4Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Kurth, Frank; Cornaggia-Urrigshardt, Alessia (Fraunhofer FKIE, Communication Systems, Fraunhoferstr. 20, 53343 Wachtberg, Germany)

We evaluate the performance of enhanced F0-features for robustly detecting speech segments in noisy acoustic monitoring recordings. F0-features are extracted from the spectrogram based on the recently introduced shift autocorrelation (shift-ACF) and subsequent trajectory extraction. Speech detection is performed in a two-stage approach, involving both a classification and a segment extraction stage. We systematically evaluate the shift-ACF features and the speech detection performance using (i) purely synthetically generated data, (ii) a mix of synthetic speech and real noise background, and (iii) real speech and real noise background. In reviewing their strengths and weaknesses it turns out that shift-ACF based F0-features outperform classical features in several scenarios.