Speaker Identification Via the Relation Network: a Meta-Learning Method

Conference: ICETIS 2022 - 7th International Conference on Electronic Technology and Information Science
01/21/2022 - 01/23/2022 at Harbin, China

Proceedings: ICETIS 2022

Pages: 7Language: englishTyp: PDF

Authors:
Li, Yarong; Ke, Xianxin; Ping, Hu (School of Mechanical and Electrical Engineering and Automation, Shanghai University, Shanghai, China)

Abstract:
Speaker identification(SI) can be categorized as closed-set and open-set according to the characteristics of data set. For the latter, the label space of the test phase and the training phase is disjoint. Essentially, this is a few-shot learning problem. In this study, spectrogram features are obtained from raw audio data, and a convolution-based speaker identification model is trained end-to-end by using spectrogram features within the framework of the relation network (RN), a metalearning based on the deep metric learning method. Our approach is validated on the VoxCeleb2 dataset, and compared with the model with prototypical network loss (PNL) which also leverages meta-learning for speaker identification, RN outperforms PNL and needs less data. In the absence of novel categories, the identification accuracy of the best-trained model can reach 85.44% when there are only 3 support samples in each category and 71.98% when there are only 1 support sample. In addition to demonstrating the effectiveness of this method, we also suggest possible directions for improvement.