Exploring Nonnegative Matrix Factorization for Audio Classification: Application to Speaker Recognition

Konferenz: Sprachkommunikation - Beiträge zur 10. ITG-Fachtagung
26.09.2012 - 28.09.2012 in Braunschweig, Deutschland

Tagungsband: Sprachkommunikation

Seiten: 4Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Autoren:
Joder, Cyril; Schuller, Björn (Institute for Human-Machine Communication, Technical University Munich, 80333 München, Germany)

Inhalt:
In this paper, we test the use of Nonnegative Matrix Factorization (NMF) for feature extraction in the context of audio classification. NMF calculates a decomposition of the spectrogram into nonnegative factors and has been successfully applied to audio source separation. Thus, it has the potential to be robust to noise disturbances when used for feature calculation. We then introduce two feature sets directly derived from the NMF decomposition. Experiments performed on an 8-class speaker recognition task with Support Vector Machines show that the proposed representations convey complementary information to the baseline MFCC features. Indeed, the use of only the NMF-based descriptors lead to similar results as the reference features, and the combination of these representations yields a significant improvement of the obtained accuracy.