On Source-Microphone Distance Estimation Using Convolutional Recurrent Neural Networks

Konferenz: Speech Communication - 14th ITG Conference
29.09.2021 - 01.10.2021 in online

Tagungsband: ITG-Fb. 298: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Gburrek, Tobias; Schmalenstroeer, Joerg; Haeb-Umbach, Reinhold (Paderborn University, Department of Communications Engineering, Paderborn, Germany)

Several features computed from an audio signal have been shown to depend on the distance between the acoustic source and the receiver, but at the same time are heavily influenced by room characteristics and the microphone setup. While neural networks, if trained on signals representing a large variety of setups, have shown to deliver robust distance estimates from a coherent-todiffuse power ratio (CDR) feature map at the input, we here push their modeling capabilities by additionally using the network as feature extractor. It is shown that distance estimation based on short-time Fourier transform (STFT) features can achieve a smaller estimation error and can operate on shorter signal segments compared to the previous CDR-based estimator.