Visual question answering model based on graph neural network

Konferenz: ICETIS 2022 - 7th International Conference on Electronic Technology and Information Science
21.01.2022 - 23.01.2022 in Harbin, China

Tagungsband: ICETIS 2022

Seiten: 5Sprache: EnglischTyp: PDF

Autoren:
Wu, Xiru; Chen, Nancong (College of Electronic Engineering and Automation, Gullin University of Electronic Technology, Guilin, Guangxi, China)

Inhalt:
In recent years, the multi-modal visual question answering (VQA) technology based on the fusion of image visual features and question text features has attracted wide attention from researchers. However, the traditional VQA model ignores the dynamic relationship of semantic information in the bimodality and the rich spatial structure between different regions. For this reason, a multi-module visual question answering model based on graph neural network is proposed, so that the model can fully understand the dynamic interaction between objects in the visual scene and the text context representation. By using the graph neural network, the feature information and structure information of the graph can be learned automatically at the same time, and the text information can be fully corresponded to the image information, so as to solve the semantic gap of different models. The experimental results on the VQA2.0 data set show that the proposed VQA model is significantly better than the comparison method in evaluation indicators, and can effectively improve the accuracy of visual question and answer.