Intelligibility Prediction of Speech Reconstructed From Its Magnitude or Phase

Conference: Speech Communication - 14th ITG Conference
09/29/2021 - 10/01/2021 at online

Proceedings: ITG-Fb. 298: Speech Communication

Pages: 5Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Peer, Tal; Gerkmann, Timo (Signal Processing (SP), Universit├Ąt Hamburg, Germany)

While Fourier phase has long been considered unimportant for speech enhancement in the short-time Fourier domain, phase-aware speech processing is receiving increasing attention in recent years. Among other advances, it has been shown that when using very short frames (< 2 ms), the phase spectrum carries enough information to allow an intelligible reconstruction from phase alone, whereas the magnitude spectrum remains virtually bare of any information relevant to intelligibility. Formal listening experiments are expensive and laborious. Hence, in order to facilitate further research into very short frames, researchers require reliable instrumental intelligibility measures. We present a study of two common measures: Short-Time Objective Intelligibility (STOI) and Extended STOI (ESTOI) and compare their performance for speech reconstruction from phase. Our results indicate that only ESTOI is able to predict the general trend of intelligibility across a wide range of frame lengths (including very short frames) for both phase-based and magnitude-based reconstructions.