Exploring Visualization Techniques for Interpretable Learning in Speech Enhancement Deep Neural Networks

Konferenz: Speech Communication - 15th ITG Conference
20.09.2023-22.09.2023 in Aachen

doi:10.30420/456164043

Tagungsband: ITG-Fb. 312: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Autoren:
Nustede, Eike J.; Anemueller, Joern (Carl von Ossietzky University Oldenburg, Computational Audition Group, Germany & Dept. med. Physics & Acoustics and Cluster of Excellence Hearing4all, Oldenburg, Germany)

Inhalt:
Deep network interpretability research provides an avenue to gain insights into the underlying mechanisms of noise suppression and artefact removal in speech enhancement networks. Analyzing the networks’ internal representations and activation patterns allows identification of critical acoustic features, and provides another approach for model optimization. In this paper, we contribute to the visualization of speech processing networks by leveraging the U-Network’s hierarchical and convolutional processing scheme to derive feature maps of audio samples at different levels in the network. Further, we adapt the activation maximization method to speech enhancement networks by regularizing the maximization process. The presented feature maps and filter activations follow a clear encoding/decoding scheme, starting with a decomposition into distinct acoustic features in the encoder, while the decoder combines sparse features into an enhanced spectrogram.