Speaker vs Noise Conditioning for Adaptive Speech Enhancement

Konferenz: Speech Communication - 16th ITG Conference
24.09.2025-26.09.2025 in Berlin, Germany

Tagungsband: ITG-Fb. 321: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Autoren:
Triantafyllopoulos, Andreas; Tsangko, Iosif; Mueller, Michael; Schroeter, Hendrik; Schuller, Bjoern

Inhalt:
Deep neural networks have shown improved results on noise attenuation and speech quality compared to traditional denoising algorithms. However, their heavy computational requirements make them unsuitable for the constraints of hearing aid devices. To that end, adaptive denoising, which introduces complementary information to the main denoising network, has emerged as an alternative to improve performance and partially offload the load to a supporting module that runs externally. Adaptive denoising relies primarily on speaker fingerprints, also known as personalised speech enhancement, or on fingerprints of the background noise. Our work compares the two in a fair setting for the first time using the DeepFilterNet architecture. We also investigate the extent in which they can both facilitate a reduction in the size of the main denoising model, showcasing the promise of using contextual adaptation to reduce the workload of models running on the hearing aid.