An Analysis of MFCC-like Parametric Audio Features for Keyphrase Spotting Applications

Konferenz: Sprachkommunikation 2010 - 9. ITG-Fachtagung
06.10.2010 - 08.10.2010 in Bochum, Deutschland

Tagungsband: Sprachkommunikation 2010

Seiten: 4Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Kurth, Frank; Zeddelmann, Dirk von (Fraunhofer-FKIE, Abteilung KOM, 53343 Wachtberg, Germany)

The design of suitable mid-level audio features is an important ingredient for many speech retrieval applications. In a recent contribution, Human Factor Cepstral Coefficients using Energy Normalized Statistics (HFCC-ENS) have been shown to yield superior performance in an unsupervised keyphrase spotting application as compared to equivalent MFCC-based features. In this paper, we propose a general framework of parametric audio features which includes MFCCs and HFCC-ENS as special cases. This setting allows us to independently investigate how particular changes in temporal-, frequency- and amplitude resolution of the resulting audio features affect the performance of the keyphrase spotting application. Furthermore, we are able to define sequences of filterbanks and associated audio features which correspond to a smooth transition, or morphing, from one type of feature to another. Here, we evaluate the keyphrase spotting performance for a sequence of audio features defined by a transition from MFCC- to HFCC-filterbanks.