A Bag-of-Audio-Words Approach for Snore Sounds’ Excitation Localisation

Konferenz: Speech Communication - 12. ITG-Fachtagung Sprachkommunikation
05.10.2016 - 07.10.2016 in Paderborn, Deutschland

Tagungsband: ITG-Fb. 267: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Schmitt, Maximilian; Pandit, Vedhas; Qian, Kun (Chair of Complex & Intelligent Systems, Universität Passau, Germany)
Janott, Christoph; Hemmert, Werner (Institute for Medical Engineering, Technische Universität München, Germany)
Heiser, Clemens (Department of Otorhinolaryngology/Head and Neck Surgery, Klinikum rechts der Isar, Technische Universität München, Germany)
Schuller, Bjoern (Chair of Complex & Intelligent Systems, Universität Passau, Germany & Machine Learning Group, Department of Computing, Imperial College London, UK)

Habitual snoring and Obstructive Sleep Apnea are serious conditions that can affect the health of the snorer. For a targeted surgical treatment, it is crucial to identify the exact location of the vibration within the upper airways. As opposed to earlier work, we present the first unsupervised feature learning approach to this task based on bags-ofaudio- words. Likewise, we cluster feature values within a given time-segment into acoustic ‘words’. The frequency of occurrence per such word is then represented in a histogram per sound chunk to classify between four excitation locations. In extensive test runs based on snore sound data of 24 patients labelled by experts, we evaluated several feature sets as basis for audio word creation. In the result, we find audio words based on wavelet features, formants, and MFCC to be highly suited and outperform previous experiments based on the same data set.