On Feature Importance and Interpretability of Speaker Representations

Konferenz: Speech Communication - 15th ITG Conference
20.09.2023-22.09.2023 in Aachen

doi:10.30420/456164037

Tagungsband: ITG-Fb. 312: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Autoren:
Rautenberg, Frederik; Kuhlmann, Michael; Haeb-Umbach, Reinhold (Department of Communications Engineering, Paderborn University, Germany)
Wiechmann, Jana; Seebauer, Fritz; Wagner, Petra (Phonetics Work Group, Bielefeld University, Germany)

Inhalt:
Unsupervised speech disentanglement aims at separating fast varying from slowly varying components of a speech signal. In this contribution, we take a closer look at the embedding vector representing the slowly varying signal components, commonly named the speaker embedding vector. We ask, which properties of a speaker’s voice are captured and investigate to which extent do individual embedding vector components sign responsible for them, using the concept of Shapley values. Our findings show that certain speaker-specific acoustic-phonetic properties can be fairly well predicted from the speaker embedding, while the investigated more abstract voice quality features cannot.