Utilizing Slow Feature Analysis for Lipreading

Konferenz: Speech Communication - 13. ITG-Fachtagung Sprachkommunikation
10.10.2018 - 12.10.2018 in Oldenburg, Deutschland

Tagungsband: ITG-Fb. 282: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Freiwald, Jan; Karbasi, Mahdie; Zeiler, Steffen; Melchior, Jan; Kompella, Varun; Wiskott, Laurenz; Kolossa, Dorothea (Institute of Communication Acousticsa and Institute for Neural Computationb, Ruhr University Bochum, Bochum, Germany)

While speech recognition has become highly robust in the recent past, it is still a challenging task under very noisy or reverberant conditions. Augmenting speech recognition by lipreading from video input is hence a promising approach to make speech recognition more reliable. For this purpose, we consider slow feature analysis (SFA), an unsupervised machine learning method that finds temporally slowest varying features in sequential input data. It can automatically extract temporally slow features within a video sequence, such as lip movements, while at the same time removing quickly changing components such as noise. In this work, we apply SFA as an initial feature extraction step to the task of automatic lipreading. The performance is evaluated on small-vocabulary lipreading, both in the speaker-dependent and speaker-independent case, showing that the features are competitive to the often highly successful combination of a discrete cosine transform and a linear discriminant analysis, while also offering good interpretability.