Evaluation of the Pain Level from Speech: Introducing a Novel Pain Database and Benchmarks

Konferenz: Speech Communication - 13. ITG-Fachtagung Sprachkommunikation
10.10.2018 - 12.10.2018 in Oldenburg, Deutschland

Tagungsband: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Autoren:
Ren, Zhao; Cummins, Nicholas; Han, Jing (ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Germany)
Schnieder, Sebastian (Industrial Psychology, Hochschule für Medien, Kommunikation und Wirtschaft Berlin, Germany & Institute of Safety Technology, Human Factors and Diagnostics, University of Wuppertal, Germany)
Krajewski, Jarek (Institute of Safety Technology, Human Factors and Diagnostics, University of Wuppertal, Germany & Engineering Psychology, Rheinische Fachhochschule Cologne, Germany)
Bjoern Schuller (ZD.B Chair of Embedded Intelligence for Health Care and Wellbeing, University of Augsburg, Germany & GLAM – Group on Language, Audio & Music, Imperial College London, UK)

Inhalt:
In many clinical settings, the evaluation of pain is achieved through a manual diagnostics procedure relying heavily on verbal descriptions from the patient. Such procedures can be time-consuming, costly, liable to subjective biases and therefore often inaccurate. The automatic evaluation of pain based on paralinguistic speech cues has the potential to enable objective methodologies for improving the objectivity and accuracy of pain diagnosis. In this regard, we herein introduce a novel audiovisual pain database, the Duesseldorf Acute Pain Corpus, in which 844 recordings were collected from 80 subjects whose speech was collected while they undertook a cold pressor pain induction paradigm. The database is split into speaker independent training/development/test sets for a three-class level of pain classification task and we provide a comprehensive set of benchmark experimental results. The feature representations tested include functionals and bag-of-audio-words from three feature sets: the Computational Paralinguistics Challenge (ComParE) features, mel-frequency ceptral coefficients, and deep spectrum representations. We use support vector machines and long short-term memory recurrent neural networks (LSTM-RNN) as the classifiers. The best result, 42.7% unweighted average recall on the test set, is obtained by LSTM-RNN working on the deep spectrum representations.