Investigating disentanglement of speaker identity and characteristics through user experience

Konferenz: Speech Communication - 15th ITG Conference
20.09.2023-22.09.2023 in Aachen

doi:10.30420/456164046

Tagungsband: ITG-Fb. 312: Speech Communication

Seiten: 5Sprache: EnglischTyp: PDF

Autoren:
Rallabandi, Sai Sirisha (Quality and Usability Lab, Technische Universität Berlin, Germany)
Moeller, Sebastian (Quality and Usability Lab, Technische Universität Berlin, Germany & Speech and Language Technology, Deutsches Forschungszentrum für Künstliche Intelligenz (DFKI), Berlin, Germany)

Inhalt:
In this paper, we investigate the disentanglement of speakerspecific information in voice-converted female synthetic voices. We categorize this speaker-specific information into a) speaker identity and b) social speaker characteristics. The separability and inter-dependence of these two categories were investigated based on the user experience using five different evaluation methods namely, a) speech quality, b) intelligibility, c) semantic differential scaling test, d) speaker similarity test, and e) characteristic similarity test. The analysis of the subjective results shows that intelligibility significantly impacts the perceptions of other evaluation methods. We have also observed statistically significant differences between speaker similarity and characteristic similarity tests.