Research on data imbalance classification based on oversampling method

Conference: CAIBDA 2022 - 2nd International Conference on Artificial Intelligence, Big Data and Algorithms
06/17/2022 - 06/19/2022 at Nanjing, China

Proceedings: CAIBDA 2022

Pages: 4Language: englishTyp: PDF

Authors:
Bai, Yuzhu; Feng, Haiwen (Shenyang University of Technology Tiexi District, Shenyang, Liaoning, China)
Yu, Wei (Neusoft Hunnan District, Shenyang, Liaoning, China)

Abstract:
The problem of low recognition rate of minority class samples caused by the classification of unbalanced data widely exists in different fields. For this reason, this paper proposes an oversampling unbalanced data ensemble Random Forest classification algorithm. Perform feature scaling and feature selection on the sample data set to preprocess the original data set, combine the downsampling method and the oversampling method, and use the Logistic Regression and Random Forest classifiers for classification. The last two methods comparing. The experimental results show that the oversampling algorithm can improve the overall classification effect of the classifier.