Natural vs. Synthesized Speech in Spoken Dialog Systems Research – Comparing the Performance of Recognition Results

Conference: Sprachkommunikation - Beiträge zur 10. ITG-Fachtagung
09/26/2012 - 09/28/2012 at Braunschweig, Deutschland

Proceedings: Sprachkommunikation

Pages: 4Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Authors:
Scheffler, Tatjana; Roller, Roland; Reithinger, Norbert (DFKI GmbH, Projektbüro Berlin, Alt-Moabit 91c, 10559 Berlin, Germany)
Kretzschmar, Florian; Möller, Sebastian (Deutsche Telekom Laboratories, TU Berlin, Ernst-Reuter-Platz 7, 10587 Berlin, Germany)

Abstract:
In this paper, we test the effect of using speech synthesis when interacting with a spoken dialog system (SDS). We use a user simulation to connect our speech synthesis to a real, state-of-the-art automatic speech recognition (ASR) component deployed in a working commercial SDS via a standard telephone line. In a series of experiments, we compare human-machine dialogs and their recognition scores with simulated dialogs using synthesis. Our results show that a good text-to-speech synthesis configuration rivals human speech both in recognition scores as well as variability. This makes the speech interface in user simulation quite attractive.

Natural vs. Synthesized Speech in Spoken Dialog Systems Research – Comparing the Performance of Recognition Results

Individual Cookie Settings

Necessary Cookies

Optional Cookies