Low-complexity Real-time Single-channel Speech Enhancement Based on Skip- GRUs

Konferenz: Speech Communication - 15th ITG Conference
20.09.2023-22.09.2023 in Aachen

doi:10.30420/456164035

Tagungsband: ITG-Fb. 312: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Autoren:
Sinha, Ragini (Fraunhofer Institute for Digital Media Technology IDMT, Oldenburg Branch for Hearing, Speech and Audio Technology, HSA, Germany)
Rollwage, Christian; Doclo, Simon (Fraunhofer Institute for Digital Media Technology IDMT, Oldenburg Branch for Hearing, Speech and Audio Technology, HSA, Germany & Dept. of Medical Physics and Acoustics and Cluster of Excellence Hearing4all, University of Oldenburg, Germany)

Inhalt:
Recently, algorithms based on deep neural networks have led to a significant speech enhancement performance improvement in terms of speech quality and intelligibility both for offline as well as online processing. However, obtaining a low-complexity and resource-efficient system is still a challenge. In this paper, we focus on real-time single-channel speech enhancement systems that are both compact and resource-efficient during inference. We propose two systems, either applying a real-valued or a complex-valued mask. Both systems are based on the Skip-GRU architecture, which employs a skip connection between the GRU layers. Experimental results on reverberant noisy signals demonstrate significant advantages of using the Skip-GRU architecture vs. the GRU architecture and applying a complex-valued mask vs. a real-valued mask. Moreover, the proposed Skip-GRU system with complex-valued masking achieves a similar speech enhancement performance as the best-performing baseline system but with a significantly reduced number of parameters and computational complexity.