Breaking a Paradox: Applying VTLN to Residuals

Konferenz: Sprachkommunikation 2006 - ITG-Fachtagung
26.04.2006 - 28.04.2006 in Kiel, Germany

Tagungsband: Sprachkommunikation 2006

Seiten: 4Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Sündermann, David (Universitat Polit`ecnica de Catalunya, 08034 Barcelona, Spain)
Höge, Harald (Siemens Corporate Technology, 81739 Munich, Germany)
Fingscheidt, Tim (TU Braunschweig, 38106 Braunschweig, Germany)

Vocal tract length normalization (VTLN) is a speaker normalization technique that tries to compensate for the effect of speaker-dependent vocal tract lengths. According to the source-filter model of speech production, the vocal tract can be represented by features as linear predictive or cepstral coefficients, whereas the residual is an estimate of the underlying excitation, i.e. the voice source. In speech recognition, VTLN is performed before deriving the features. Interestingly, recent studies on VTLN-based voice conversion suggest that the application of VTLN to the residuals, that should hardly contain vocal tract information, may result in a voice that is very similar to a voice generated in the conventional way, i.e., by applying VTLN to the original speech frames. In this paper, we discuss an objective experiment that is to confirm this observation.