On Feature Importance and Interpretability of Speaker Representations
                  Conference: Speech Communication - 15th ITG Conference
                  09/20/2023 - 09/22/2023 at Aachen              
doi:10.30420/456164037
Proceedings: ITG-Fb. 312: Speech Communication
Pages: 5Language: englishTyp: PDF
            Authors:
                          Rautenberg, Frederik; Kuhlmann, Michael; Haeb-Umbach, Reinhold (Department of Communications Engineering, Paderborn University, Germany)
                          Wiechmann, Jana; Seebauer, Fritz; Wagner, Petra (Phonetics Work Group, Bielefeld University, Germany)
                      
              Abstract:
              Unsupervised speech disentanglement aims at separating fast varying from slowly varying components of a speech signal. In this contribution, we take a closer look at the embedding vector representing the slowly varying signal components, commonly named the speaker embedding vector. We ask, which properties of a speaker’s voice are captured and investigate to which extent do individual embedding vector components sign responsible for them, using the concept of Shapley values. Our findings show that certain speaker-specific acoustic-phonetic properties can be fairly well predicted from the speaker embedding, while the investigated more abstract voice quality features cannot.            


