Utilizing Slow Feature Analysis for Lipreading

Conference: Speech Communication - 13. ITG-Fachtagung Sprachkommunikation
10/10/2018 - 10/12/2018 at Oldenburg, Deutschland

Proceedings: Speech Communication

Pages: 5Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Authors:
Freiwald, Jan; Karbasi, Mahdie; Zeiler, Steffen; Melchior, Jan; Kompella, Varun; Wiskott, Laurenz; Kolossa, Dorothea (Institute of Communication Acousticsa and Institute for Neural Computationb, Ruhr University Bochum, Bochum, Germany)

Abstract:
While speech recognition has become highly robust in the recent past, it is still a challenging task under very noisy or reverberant conditions. Augmenting speech recognition by lipreading from video input is hence a promising approach to make speech recognition more reliable. For this purpose, we consider slow feature analysis (SFA), an unsupervised machine learning method that finds temporally slowest varying features in sequential input data. It can automatically extract temporally slow features within a video sequence, such as lip movements, while at the same time removing quickly changing components such as noise. In this work, we apply SFA as an initial feature extraction step to the task of automatic lipreading. The performance is evaluated on small-vocabulary lipreading, both in the speaker-dependent and speaker-independent case, showing that the features are competitive to the often highly successful combination of a discrete cosine transform and a linear discriminant analysis, while also offering good interpretability.