The face encoder of speech portrait based on the ResNet

Conference: ISCTT 2022 - 7th International Conference on Information Science, Computer Technology and Transportation
05/27/2022 - 05/29/2022 at Xishuangbanna, China

Proceedings: ISCTT 2022

Pages: 7Language: englishTyp: PDF

Authors:
Wang, Yuanyuan; Bu, Fanliang; Zhang, Tengfei; Hu, Zhexin; Yao, Yutong (People's Public Security University of China, Beijing, China)

Abstract:
Speech portrait is the research focus in cross-modal recognition. In order to improve the accuracy and specificity of speech portrait, an improved speech-face portrait algorithm is proposed, which includes speech feature encoder and face feature encoder. In the face encoder part, the facial feature extraction network is established based on ResNet-50, and the model parameters are optimized by training on JAFFE and AVSpeech datasets. The visualization of the experimental results shows that the improved model has an ideal effect of facial feature extraction, can encode the main features, and can effectively improve the performance of facial feature extraction.