Predicting the quality of processed speech by combining modulation-based features and model trees

Konferenz: Speech Communication - 12. ITG-Fachtagung Sprachkommunikation
05.10.2016 - 07.10.2016 in Paderborn, Deutschland

Tagungsband: ITG-Fb. 267: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Autoren:
Cauchi, Benjamin; Goetze, Stefan (Fraunhofer IDMT, Project Group Hearing, Speech and Audio Technology, Oldenburg, Germany & Cluster of Excellence Hearing4all, Oldenburg, Germany)
Santos, Joao F.; Falk, Tiago H. (INRS-EMT, University of Quebec, Montreal, QC, Canada)
Siedenburg, Kai; Doclo, Simon (University of Oldenburg, Dept. of Medical Physics and Acoustics, Oldenburg, Germany & Cluster of Excellence Hearing4all, Oldenburg, Germany)
Naylor, Patrick A. (Imperial College London, Dept. of Electrical Engineering, London, UK)

Inhalt:
Many signal processing methods have been proposed to improve the quality of speech recorded in the presence of noise and reverberation. The evaluation of these methods either requires the use of perceptual measures, i.e. listening tests, or instrumental measures. Perceptual measures are typically more reliable but are quite costly and timeconsuming. On the other hand, instrumental measures may correlate poorly with the perceived speech quality. In this paper we propose to train an instrumental measure, combining modulation-based features and model trees, on the basis of perceptual scores obtained on a small corpus of speech data that has been processed by a combination of beamforming and spectral postfiltering. For evaluation purposes the resulting measure is then applied to a larger corpus. Results show that the use of model trees to train the predicting function of an instrumental measure increases its correlation with perceptual scores.