2D Audio-Visual Localization in Home Environments using a Particle Filter

Conference: Sprachkommunikation - Beiträge zur 10. ITG-Fachtagung
09/26/2012 - 09/28/2012 at Braunschweig, Deutschland

Proceedings: Sprachkommunikation

Pages: 4Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Authors:
Gerlach, Stephan; Goetze, Stefan (Project group Hearing, Speech and Audio Technology, Fraunhofer Institute for Digital Media Technology (IDMT), 26129 Oldenburg, Germany)
Doclo, Simon (University of Oldenburg, Institute of Physics, Signal Processing Group, 26111 Oldenburg, Germany)

Abstract:
Multimodal algorithms benefit fromthe advantage that they can mutually compensate the weaknesses of the individual modalities. Therefore, we propose a system to localize concurrent speakers in a two dimensional (2D) space jointly using a combined audio-visual localization algorithm. The acoustic source localization is calculated by the multichannel cross-correlation coefficient (MCCC) algorithm and the visual localization is accomplished by the SHORE TM, (Sophisticated High-speed Object Recognition Engine (SHORE), Trademark of Fraunhofer IIS, 91058 Erlangen (Germany)), video localization system. The multimodal fusion is performed by a particle filter with adaptations to the particle weighting. An evaluation of the proposed algorithm in an home-environment living lab is performed focussing on possible gains obtained by the complementary localization modalities.