An Integrated Deep Clustering-Based System for Speaker Count Agnostic Speech Separation

Conference: Speech Communication - 14th ITG Conference
09/29/2021 - 10/01/2021 at online

Proceedings: ITG-Fb. 298: Speech Communication

Pages: 5Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Lemercier, Jean-Marie; Bartel, Leroy; Ditter, David; Gerkmann, Timo (Universit├Ąt Hamburg, Hamburg, Germany)

This paper proposes to unify two deep-learning methods, Count- Net and Deep Clustering, designed for speaker count and separation respectively, in order to perform speaker count agnostic speech separation. Two approaches are compared, where the speaker count estimation and separation subnetworks are either trained separately or jointly. Training and evaluation are conducted on a tailored dataset WSJ0-Kmix, which is an extension of the WSJ0-2mix and WSJ0-3mix datasets for an arbitrary number of speakers. Results show that both systems are capable of separating up to four sources without prior information on the number of speakers. Furthermore, the joint approach is able to perform similarly to its separate counterpart while using fewer parameters and simplifying the overall architecture.