Intelligibility Prediction of Speech Reconstructed From Its Magnitude or Phase

Konferenz: Speech Communication - 14th ITG Conference
29.09.2021 - 01.10.2021 in online

Tagungsband: ITG-Fb. 298: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Peer, Tal; Gerkmann, Timo (Signal Processing (SP), Universität Hamburg, Germany)

While Fourier phase has long been considered unimportant for speech enhancement in the short-time Fourier domain, phase-aware speech processing is receiving increasing attention in recent years. Among other advances, it has been shown that when using very short frames (< 2 ms), the phase spectrum carries enough information to allow an intelligible reconstruction from phase alone, whereas the magnitude spectrum remains virtually bare of any information relevant to intelligibility. Formal listening experiments are expensive and laborious. Hence, in order to facilitate further research into very short frames, researchers require reliable instrumental intelligibility measures. We present a study of two common measures: Short-Time Objective Intelligibility (STOI) and Extended STOI (ESTOI) and compare their performance for speech reconstruction from phase. Our results indicate that only ESTOI is able to predict the general trend of intelligibility across a wide range of frame lengths (including very short frames) for both phase-based and magnitude-based reconstructions.