Breaking a Paradox: Applying VTLN to Residuals

Conference: Sprachkommunikation 2006 - ITG-Fachtagung
04/26/2006 - 04/28/2006 at Kiel, Germany

Proceedings: Sprachkommunikation 2006

Pages: 4Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Sündermann, David (Universitat Polit`ecnica de Catalunya, 08034 Barcelona, Spain)
Höge, Harald (Siemens Corporate Technology, 81739 Munich, Germany)
Fingscheidt, Tim (TU Braunschweig, 38106 Braunschweig, Germany)

Vocal tract length normalization (VTLN) is a speaker normalization technique that tries to compensate for the effect of speaker-dependent vocal tract lengths. According to the source-filter model of speech production, the vocal tract can be represented by features as linear predictive or cepstral coefficients, whereas the residual is an estimate of the underlying excitation, i.e. the voice source. In speech recognition, VTLN is performed before deriving the features. Interestingly, recent studies on VTLN-based voice conversion suggest that the application of VTLN to the residuals, that should hardly contain vocal tract information, may result in a voice that is very similar to a voice generated in the conventional way, i.e., by applying VTLN to the original speech frames. In this paper, we discuss an objective experiment that is to confirm this observation.