Non-stationary, filter-stable acoustic objects as atoms of voiced speech

Conference: Sprachkommunikation 2008 - 8. ITG-Fachtagung
10/08/2008 - 10/10/2008 at Aachen, Germany

Proceedings: Sprachkommunikation 2008

Pages: 4

Drepper, Friedhelm R. (Zentralinstitut für Elektronik, Forschungszentrum Jülich GmbH)
Schlüter, Ralf (Lehrstuhl für Informatik 6, Computer Science Department, RWTH Aachen University)

Pitch (and loudness) perception can be interpreted as conscious part of a perception module which separates the phonological and linguistic time scales of speech signals from the phonetically relevant ones. In automatic speech processing the time scale separation problem is commonly addressed by assuming a frequency gap in the long term spectrum which is equivalent to the assumption of temporary stationarity. The present approach is based on the alternative hypothesis that communication of voiced parts of speech is based on non-stationary acoustic objects which support a centre filter frequency stable reconstruction of empirical modes (speech formants). The latter ones are interpreted as topologically equivalent images of dominant oscilla-tory modes on the transmitter side and as acoustic corre-lates of spectral pitch perception. The topological invariants of the filter stable empirical modes (formants) represent ideal cues for phoneme recognition.