Natural vs. Synthesized Speech in Spoken Dialog Systems Research – Comparing the Performance of Recognition Results

Conference: Sprachkommunikation - Beiträge zur 10. ITG-Fachtagung
09/26/2012 - 09/28/2012 at Braunschweig, Deutschland

Proceedings: Sprachkommunikation

Pages: 4Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Authors:
Scheffler, Tatjana; Roller, Roland; Reithinger, Norbert (DFKI GmbH, Projektbüro Berlin, Alt-Moabit 91c, 10559 Berlin, Germany)
Kretzschmar, Florian; Möller, Sebastian (Deutsche Telekom Laboratories, TU Berlin, Ernst-Reuter-Platz 7, 10587 Berlin, Germany)

Abstract:
In this paper, we test the effect of using speech synthesis when interacting with a spoken dialog system (SDS). We use a user simulation to connect our speech synthesis to a real, state-of-the-art automatic speech recognition (ASR) component deployed in a working commercial SDS via a standard telephone line. In a series of experiments, we compare human-machine dialogs and their recognition scores with simulated dialogs using synthesis. Our results show that a good text-to-speech synthesis configuration rivals human speech both in recognition scores as well as variability. This makes the speech interface in user simulation quite attractive.