CRNN-based Multi-DOA Estimator: Comparing Classification and Regression

Konferenz: Speech Communication - 15th ITG Conference
20.09.2023-22.09.2023 in Aachen

doi:10.30420/456164030

Tagungsband: ITG-Fb. 312: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Autoren:
Cooreman, Pieter; Bohlender, Alexander; Madhu, Nilesh (IDLab, Department of Electronics and Information Systems, Ghent University - imec, Belgium)

Inhalt:
Deep learning methods have greatly improved the localization of sound sources in adverse conditions. An important consideration in this case is the output representation. Direction of arrival (DOA) estimation can be interpreted as a classification problem, but performing a regression to continuously estimate the DOAs is also possible. Whereas classification and regression were previously compared for particular cases, such as frame-wise DOA estimation and single source conditions, in this paper we study the more general localization of one or two concurrent sources with a convolutional recurrent neural network. Our experiments show that the two approaches perform comparably in single source scenarios. To address the ambiguity in the source-to-output assignment when multiple DOAs are estimated using regression, we consider permutation invariant training and angular sorting of the desired outputs. However, we find that classification is then generally preferred, especially for closely spaced sources.