Improved Performance Measures for Voice Activity Detection

Conference: Speech Communication - 11. ITG-Fachtagung Sprachkommunikation
09/24/2014 - 09/26/2014 at Erlangen, Deutschland

Proceedings: Speech Communication

Pages: 4Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Authors:
Graf, Simon; Herbig, Tobias; Buck, Markus (Acoustic Speech Enhancement Research, Nuance Communications Deutschland GmbH, 89077 Ulm, Germany)
Graf, Simon; Schmidt, Gerhard (Digital Signal Processing and System Theory, Christian-Albrechts-Universitaet zu Kiel, 24143 Kiel, Germany)

Abstract:
Voice activity detection is an essential part of many speech processing algorithms. The requirements of the speech application determine the design of voice activity detection. Some applications need low-latency results whereas the accuracy of speech detection is more important for other applications. The performance is generally evaluated by Receiver Operating Characteristic (ROC) curves, which perform a static analysis averaged over speech and nonspeech segments, respectively. We adopt the ROC curves but evaluate them for specific speech classes, e.g., voiced or unvoiced speech, to describe the overall accuracy of speech detection. In addition, we present a new measure for the dynamic behavior that considers the delay and latency of speech on- and offset detection. Finally, we present a unified measure for both aspects. This measure may be used to find appropriate voice activity detection features for a given application. An automotive noise scenario is employed to demonstrate the measures as it contains stationary and non-stationary noise.