An Integrated Deep Clustering-Based System for Speaker Count Agnostic Speech Separation

Konferenz: Speech Communication - 14th ITG Conference
29.09.2021 - 01.10.2021 in online

Tagungsband: ITG-Fb. 298: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Lemercier, Jean-Marie; Bartel, Leroy; Ditter, David; Gerkmann, Timo (Universität Hamburg, Hamburg, Germany)

This paper proposes to unify two deep-learning methods, Count- Net and Deep Clustering, designed for speaker count and separation respectively, in order to perform speaker count agnostic speech separation. Two approaches are compared, where the speaker count estimation and separation subnetworks are either trained separately or jointly. Training and evaluation are conducted on a tailored dataset WSJ0-Kmix, which is an extension of the WSJ0-2mix and WSJ0-3mix datasets for an arbitrary number of speakers. Results show that both systems are capable of separating up to four sources without prior information on the number of speakers. Furthermore, the joint approach is able to perform similarly to its separate counterpart while using fewer parameters and simplifying the overall architecture.