2D Audio-Visual Localization in Home Environments using a Particle Filter

Konferenz: Sprachkommunikation - Beiträge zur 10. ITG-Fachtagung
26.09.2012 - 28.09.2012 in Braunschweig, Deutschland

Tagungsband: Sprachkommunikation

Seiten: 4Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Gerlach, Stephan; Goetze, Stefan (Project group Hearing, Speech and Audio Technology, Fraunhofer Institute for Digital Media Technology (IDMT), 26129 Oldenburg, Germany)
Doclo, Simon (University of Oldenburg, Institute of Physics, Signal Processing Group, 26111 Oldenburg, Germany)

Multimodal algorithms benefit fromthe advantage that they can mutually compensate the weaknesses of the individual modalities. Therefore, we propose a system to localize concurrent speakers in a two dimensional (2D) space jointly using a combined audio-visual localization algorithm. The acoustic source localization is calculated by the multichannel cross-correlation coefficient (MCCC) algorithm and the visual localization is accomplished by the SHORE TM, (Sophisticated High-speed Object Recognition Engine (SHORE), Trademark of Fraunhofer IIS, 91058 Erlangen (Germany)), video localization system. The multimodal fusion is performed by a particle filter with adaptations to the particle weighting. An evaluation of the proposed algorithm in an home-environment living lab is performed focussing on possible gains obtained by the complementary localization modalities.