Improved Performance Measures for Voice Activity Detection

Konferenz: Speech Communication - 11. ITG-Fachtagung Sprachkommunikation
24.09.2014 - 26.09.2014 in Erlangen, Deutschland

Tagungsband: Speech Communication

Seiten: 4Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Autoren:
Graf, Simon; Herbig, Tobias; Buck, Markus (Acoustic Speech Enhancement Research, Nuance Communications Deutschland GmbH, 89077 Ulm, Germany)
Graf, Simon; Schmidt, Gerhard (Digital Signal Processing and System Theory, Christian-Albrechts-Universitaet zu Kiel, 24143 Kiel, Germany)

Inhalt:
Voice activity detection is an essential part of many speech processing algorithms. The requirements of the speech application determine the design of voice activity detection. Some applications need low-latency results whereas the accuracy of speech detection is more important for other applications. The performance is generally evaluated by Receiver Operating Characteristic (ROC) curves, which perform a static analysis averaged over speech and nonspeech segments, respectively. We adopt the ROC curves but evaluate them for specific speech classes, e.g., voiced or unvoiced speech, to describe the overall accuracy of speech detection. In addition, we present a new measure for the dynamic behavior that considers the delay and latency of speech on- and offset detection. Finally, we present a unified measure for both aspects. This measure may be used to find appropriate voice activity detection features for a given application. An automotive noise scenario is employed to demonstrate the measures as it contains stationary and non-stationary noise.