Investigations on Offline Artificial Bandwidth Extension of Telephone Speech Databases

Conference: Sprachkommunikation 2010 - 9. ITG-Fachtagung
10/06/2010 - 10/08/2010 at Bochum, Deutschland

Proceedings: Sprachkommunikation 2010

Pages: 4Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Bauer, Patrick; Jung, Marc-André; Fingscheidt, Tim (Institut für Nachrichtentechnik (IfN), Technische Universität Braunschweig, 38106 Braunschweig, Germany)

Automatic speech recognition (ASR) systems have to be trained on large speech databases. For telephony tasks, speech databases almost exclusively exist with a narrow acoustic bandwidth. In near future, more and more wideband (WB) speech codecs - such as the adaptive multirate wideband codec - will be deployed. This leads to a demand for WB telephony speech databases in order to train WB acoustic models. ASR systems may benefit from WB telephony allowing more demanding tasks with large vocabulary or spelling applications. Recording WB telephony speech databases, however, entails high effort in time and cost. Furthermore, there are only few WB-capable mobile terminals on the market yet. Additionally, appropriate network infrastructure is mostly available for testing purposes so far. This paper presents an offline artificial bandwidth extension (ABWE) algorithm enhancing telephony speech databases for WB ASR training. By exploiting time-aligned phonetic transcriptions, the speech quality and intelligibility of state-of-the-art ABWE techniques can be significantly improved. Furthermore, artifacts, such as the typical lisping effect, are largely reduced. Experiments show that ABWE-trained WB ASR systems indeed have a decreased word error rate compared to purely NB-trained (and -tested) systems.