Continuous Sign Language Recognition Using Multiple Feature Points

Conference: ICETIS 2022 - 7th International Conference on Electronic Technology and Information Science
01/21/2022 - 01/23/2022 at Harbin, China

Proceedings: ICETIS 2022

Pages: 6Language: englishTyp: PDF

Authors:
Jin, Yanliang; Wu, Xiaowei; Yu, Xiaoqi (Key Laboratory of Specialty Fiber Optics and Optical Access Networks, Laboratory of Specialty, Fiber Optics and Advanced Communication, Shanghai University, Shanghai, China)
Ni, Lan (Chinese Sign Language and Deaf Research Center, College of Literature, Shanghai University, Shanghai, China)

Abstract:
Recently, most continuous sign language recognition methods use images as input, which leads to the problems of partial occlusion and information redundancy. In addition, positional information and non-manual information such as facial expression has not been well used in continuous sign language recognition. To solve these problems, this paper uses the human pose recognition model to identify the points of body, hands and face. Based on these key points, the residual spatial temporal adaptive graph convolutional network (RST-AGCN) is designed to extract the features of joints and bones, which can capture the location of hands relative to other parts and reduce the over-smooth of information. In the process of transforming features into sentences, this paper proposes a bidirectional GRU codec network based on attention mechanism combined with beam search strategy, and optimizes the mapping between features, glosses and sentences by using the joint loss of connectionist temporal classification (CTC) and attention. Finally, the whole model is verified on CSL corpus and PHOENIX-2014 corpus, which further proves that this method has a certain contribution to continuous sign language recognition.