Blind Estimation of the Number of Speech Sources in Reverberant Multisource Scenarios Based on Binaural Signals

Konferenz: IWAENC 2012 - International Workshop on Acoustic Signal Enhancement
04.09.2012-06.09.2012 in Aachen, Germany

Tagungsband: IWAENC 2012

Seiten: 4Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

May, Tobias; van de Par, Steven (University of Oldenburg, Institute of Physics, Acoustics Group, Oldenburg, Germany)

In this paper we present a new approach for estimating the number of active speech sources in the presence of interfering noise sources and reverberation. First, a binaural front-end is used to detect the spatial positions of all active sound sources, resulting in a binary mask for each candidate position. Then, each candidate position is characterized by a set of features. In addition to exploiting the overall spectral shape, a new set of mask-based features is proposed which aims at characterizing the pattern of the estimated binary mask. The decision stage for detecting a speech source is based on a support vector machine (SVM) classifier. A systematic analysis shows that the proposed algorithm is able to blindly determine the number and the corresponding spatial positions of speech sources in multisource scenarios and generalizes well to unknown acoustic conditions. Index Terms — binaural processing, binary mask, computational auditory scene analysis (CASA)