News Topic Classification Algorithm Based on Feature Retention

Konferenz: CAIBDA 2022 - 2nd International Conference on Artificial Intelligence, Big Data and Algorithms
17.06.2022 - 19.06.2022 in Nanjing, China

Tagungsband: CAIBDA 2022

Seiten: 5Sprache: EnglischTyp: PDF

Autoren:
Deng, Hongli; Chen, Qingqing; He, Lingling; Yang, Tao (School of Computer Science, China West Normal University, Nanchong, China)
He, Baolin (School of Electronic and Information Engineering, China West Normal University, Nanchong, China)

Inhalt:
Facing the different needs of users, news topic classification can help users quickly find and use the information they need. If news contains too much invalid information, the Word-Set might contain some noise words without topic representation after text segmentation. These noise words increase the dimension of feature vector space and affect the performance of classifier. To solve the above problems, this paper proposes news Topic Classification Algorithm based on Key Feature Retention (TCA-KFR). Firstly, TCA-KFR constructs a set of class choice words by analyzing the word frequency according to the distribution of class choice words in different news categories. Secondly, TCA-KFR utilizes class choice word recognition algorithm to remove the class choice words which have no obvious effect on topic classification. Finally, TCA-KFR predictions news topic categories base on BERT. Experimental results show that TCA-KFR algorithm effectively retains key features of text. Compared with BERT model, the classification accuracy is improved, and the training time is shortened.