Enhancement of Neural Embeddings for Speaker Identification in Ad-hoc Acoustic Sensor Networks and Multi-Speaker Scenarios

Konferenz: Speech Communication - 16th ITG Conference
24.09.2025-26.09.2025 in Berlin, Germany

Tagungsband: ITG-Fb. 321: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Autoren:
Intek, Philipp; Becker, Luca; Koppelmann, Timm; Martin, Rainer

Inhalt:
In ad-hoc acoustic sensor networks, nodes collaborate to improve the performance of downstream applications. In this work, our aim is to study and mitigate the effects of interfering speakers and room reverberation on a speaker identification task. To this end, we consider a multi-speaker scenario and leverage signals from a multitude of scattered sensors. To reduce the required bit-rate when sharing information between nodes, the method operates on neural embeddings directly and thus does not need to transmit raw audio signals between nodes in the enhancement step. The method uses embeddings from the latent space of an autoencoder and includes separate neural networks for reverberation and interference reduction. We evaluate the performance in the context of a speaker identification task and compare our baseline system with the proposed one using the enhanced embeddings. The results show that our method can cope with challenging acoustic conditions and brings significant improvements in terms of speaker identification performance compared to single-microphone approaches.