WTIMIT: The TIMIT Speech Corpus Transmitted Over the 3G AMR Wideband Mobile Network
Konferenz: Sprachkommunikation 2010 - 9. ITG-Fachtagung
06.10.2010 - 08.10.2010 in Bochum, Deutschland
Tagungsband: Sprachkommunikation 2010
Seiten: 4Sprache: EnglischTyp: PDFPersönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt
Bauer, Patrick; Scheler, David; Fingscheidt, Tim (Institute for Communications Technology, Technische Universität Braunschweig, 38106 Braunschweig, Germany)
In anticipation of upcoming mobile telephony services with higher speech quality, a wideband (50 Hz to 7 kHz) mobile telephony derivative of TIMIT has been recorded called WTIMIT. It opens up various scientific investigations; e.g., on speech quality and intelligibility, as well as on wideband upgrades of network-side interactive voice response (IVR) systems with retrained or bandwidthextended acoustic models for automatic speech recognition (ASR). Wideband telephony could enable network-side speech recognition applications such as remote dictation or spelling without the need of distributed speech recognition techniques. The WTIMIT corpus was transmitted via two prepared Nokia 6220 mobile phones over T-Mobile’s 3G wideband mobile network in The Hague, The Netherlands, employing the Adaptive Multirate Wideband (AMR-WB) speech codec. The paper mainly presents results of phoneme recognition experiments. It turns out that in the case of wideband telephony, server-side ASR should not be carried out by simply decimating received signals to 8 kHz and applying existent narrowband acoustic models. Nor do we recommend just simulating the AMR-WB codec for training of wideband acoustic models. Instead, realworld wideband telephony channel data (such as WTIMIT) provides the best training material for wideband IVR systems.