An Analysis of MFCC-like Parametric Audio Features for Keyphrase Spotting Applications

Conference: Sprachkommunikation 2010 - 9. ITG-Fachtagung
10/06/2010 - 10/08/2010 at Bochum, Deutschland

Proceedings: Sprachkommunikation 2010

Pages: 4Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Kurth, Frank; Zeddelmann, Dirk von (Fraunhofer-FKIE, Abteilung KOM, 53343 Wachtberg, Germany)

The design of suitable mid-level audio features is an important ingredient for many speech retrieval applications. In a recent contribution, Human Factor Cepstral Coefficients using Energy Normalized Statistics (HFCC-ENS) have been shown to yield superior performance in an unsupervised keyphrase spotting application as compared to equivalent MFCC-based features. In this paper, we propose a general framework of parametric audio features which includes MFCCs and HFCC-ENS as special cases. This setting allows us to independently investigate how particular changes in temporal-, frequency- and amplitude resolution of the resulting audio features affect the performance of the keyphrase spotting application. Furthermore, we are able to define sequences of filterbanks and associated audio features which correspond to a smooth transition, or morphing, from one type of feature to another. Here, we evaluate the keyphrase spotting performance for a sequence of audio features defined by a transition from MFCC- to HFCC-filterbanks.