An Improved Neural Network Architecture for Target Speech Extraction

Conference: Speech Communication - 16th ITG Conference
09/24/2025 - 09/26/2025 at Berlin, Germany

Proceedings: ITG-Fb. 321: Speech Communication

Pages: 5Language: englishTyp: PDF

Authors:
Joos, David; Faubel, Friedrich; Jungclaussen, Jonas; Buck, Markus; Minker, Wolfgang

Abstract:
In this work, we present an improved architecture for neural- network based target speech extraction. The main idea can be described as having a dedicated path in which the target speaker embedding is adapted to the input signal. The rationale behind this approach is to translate longterm characteristics of a pre-trained voice print into shortterm information that can be exploited at the frame level. We show that this architecture exceeds the current stateof- the-art, which consists in applying a static transformation of the embedding independently of the input signal. Next to exhaustive ablation studies that corroborate the proposed design, we provide a comparative evaluation with the SpeakerBeam approach. The separation performance is evaluated by means of objective speech quality metrics such as SI-SDR, PESQ and STOI. In addition to this, we show voice recognition results.