A Multi-Stage, Multi-Channel Processing System for Overlapping Speech Separation in a Real Scenario

Konferenz: Speech Communication - 11. ITG-Fachtagung Sprachkommunikation
24.09.2014 - 26.09.2014 in Erlangen, Deutschland

Tagungsband: Speech Communication

Seiten: 4Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Autoren:
Toroghi, Rahil Mahdian; Oualil, Youssef; Klakow, Dietrich (Spoken Language Systems, Saarland University, Saarbruecken, Germany)

Inhalt:
This paper addresses the problem of overlapping speech separation in a noisy room using a microphone array. The presented approach proposes a multistage processing framework to separate the desired sources and reduce the corruptive effects of noise, reverberation and interference. More specifically, 1) a beamformer separates the sources based on their location diversities, 2) a postfilter maximizes the output SNRs, and 3) a novel filter is derived to suppress the coherent terms at each output with respect to its contrasting one. Finally, 4) the clean signal is estimated using a modified masking filter. Exploiting the fact that a desired signal remains coherent within time frames, the mask is smoothed between frames to preserve this coherency and reduce the musical noise. Experiments on AMI-Wall Street Journal corpus show a significant improvement in speech quality, SNR, Source to Reverberation Ratio, and naturalness of the proposed method, compared to some methods in Blind Source Separation.