A Three-stage Method for Face Representation Recognition and Convenient Video Conversion

Konferenz: CAIBDA 2022 - 2nd International Conference on Artificial Intelligence, Big Data and Algorithms
17.06.2022 - 19.06.2022 in Nanjing, China

Tagungsband: CAIBDA 2022

Seiten: 9Sprache: EnglischTyp: PDF

Autoren:
Ding, Haoyuan (Columbia University, New York, USA)

Inhalt:
All face-centered applications, such as identity recognition and digital entertainment, involve image processing and face representation recognition. This paper proposes a three-stage method to predict emotion, age, race, gender and achieves the entire process of video conversion. The input videos are first split into images frame by frame. All the face images and their positions will be saved. In the second stage, different classification and regression models predict the age, race and gender. To sum up consistent results for each person, these representations are separately recorded according to face positions. In the last stage, the predictions of emotions are incorporated to generate complete representations that are displayed in the converted output videos. Our method is motivated by multi-label datasets. Typically, facial image datasets are used primarily for face detection and emotion recognition while other information, like gender, age and race is ignored. In this paper, we collect all the information of several datasets to train models. After face recognition, the cropped images are molded into aligned face images of the same size, which are then gathered according to their labels to increase overall data volume. However, some information of a dataset can impair the result. Based on the test accuracy and real video test result, we determine which part of the dataset is suitable for use. The classification and regression model structures are modified from VGG. In addition to the independent models, we also tested the multi-output models with shared parameters. The extension of data volume, the application of hybrid loss function and snapshot ensemble all improve the real video test.