Unveiling Deep Speech Embeddings: Acoustic Insights into Hatespeech Detection
Konferenz: Speech Communication - 16th ITG Conference
24.09.2025-26.09.2025 in Berlin, Germany
Tagungsband: ITG-Fb. 321: Speech Communication
Seiten: 5Sprache: EnglischTyp: PDF
Autoren:
Rammohan, Rathi Adarshi; Ren, Zhao; Swiderska, Aleksandra; Kuester, Dennis; Schultz, Tanja
Inhalt:
Modern democratic societies face a growing challenge from hate speech, yet most research focuses on hatetext (written content) detection, overlooking valuable acoustic cues. To explore the potential of hatespeech (spoken content), this study focuses on hate detection using acoustic features and self-supervised speech embeddings in the HateMM dataset. We performed a layer-wise analysis of two fine-tuned speech models, Wav2Vec2.0 and HuBERT. A linear classifier yielded F1 scores of 0.8137 and 0.8106, demonstrating their effectiveness. To further interpret these models, canonical correlation analysis (CCA) was applied to measure the similarity between hand-crafted acoustic features and learned speech embeddings. Our findings highlight the role of energy-related features, embedded in layers of speech models, in distinguishing hatespeech from non-hatespeech.

