On Source-Microphone Distance Estimation Using Convolutional Recurrent Neural Networks

Conference: Speech Communication - 14th ITG Conference
09/29/2021 - 10/01/2021 at online

Proceedings: ITG-Fb. 298: Speech Communication

Pages: 5Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Gburrek, Tobias; Schmalenstroeer, Joerg; Haeb-Umbach, Reinhold (Paderborn University, Department of Communications Engineering, Paderborn, Germany)

Several features computed from an audio signal have been shown to depend on the distance between the acoustic source and the receiver, but at the same time are heavily influenced by room characteristics and the microphone setup. While neural networks, if trained on signals representing a large variety of setups, have shown to deliver robust distance estimates from a coherent-todiffuse power ratio (CDR) feature map at the input, we here push their modeling capabilities by additionally using the network as feature extractor. It is shown that distance estimation based on short-time Fourier transform (STFT) features can achieve a smaller estimation error and can operate on shorter signal segments compared to the previous CDR-based estimator.