A Maximum Entropy Information Bottleneck (MEIB) Regularization for Generative Speech Enhancement with HiFi-GAN

Konferenz: Speech Communication - 15th ITG Conference
20.09.2023-22.09.2023 in Aachen

doi:10.30420/456164055

Tagungsband: ITG-Fb. 312: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Autoren:
Sach, Marvin; Pirklbauer, Jan; Fingscheidt, Tim (Institute for Communications Technology, TU Braunschweig, Germany)
Fluyt, Kristoff; Tirry, Wouter (Goodix Technology (Belgium) BV, Leuven, Belgium)

Inhalt:
Generative approaches to speech enhancement using a vocoder to synthesize a clean speech estimate aim at solving the problem of residual noise occuring with typical maskbased spectral estimation approaches. The necessity to restrict the system’s knowledge to only clean speech and to prevent the possibility of noise reconstruction has recently motivated the introduction of a sparse autoencoder (AE) bottleneck using a pre-trained vector quantizer codebook. In our work, inspired from information bottleneck theory, we propose a maximum entropy information bottleneck (MEIB) regularization, which we derive for the deterministic AE with quantized bottleneck on time series. Furthermore, we introduce a feature-matching regularization encouraging noisy inputs and clean inputs to select the same vector quantizer symbols. The proposed methods significantly elevate our denoising performance by 0.23 PESQ points and 0.06 DNSMOS points.