Gläser, Claudius; Heckmann, Martin; Joublin, Frank; Goerick, Christian (Honda Research Institute Europe, Carl-Legien-Strasse 30, 63073 Offenbach, Germany)
Despite the fact that formant extraction has been investigated for a long time it still remains a challenging task. Particularly in real-world environments, where noise and echoes are detrimental factors for speech processing, existing methods for formant extraction yield unfavorable results. Here, we present a framework for formant tracking which is specifically tailored for application in such difficult settings. Keys to our method are, firstly, an auditory inspired preprocessing which enhances formants in spectrograms and, secondly, a probabilistic scheme which estimates the joint distribution of formants. Especially the latter contributes to the robustness of our system as it naturally considers the uncertainty inherent to the speech data. We demonstrate the favorable performance of our framework by a comprehensive evaluation on a publicly available database as well as in form of an online system operating under real-world conditions.