Software defect prediction based on combined sampling and feature selection

Konferenz: ICMLCA 2021 - 2nd International Conference on Machine Learning and Computer Application
17.12.2021 - 19.12.2021 in Shenyang, China

Tagungsband: ICMLCA 2021

Seiten: 5Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Autoren:
Wang, Denglin; Xiong, Xiaohui (College of Information Engineering, Shanghai Maritime University, Shanghai, China)

Inhalt:
Aiming at the problem of class imbalance in software defect prediction and the dimension of feature space, a software defect prediction method based on combined sampling and feature selection is proposed, and integrated learning is applied to the classification prediction model. First, in the data processing stage, synthetic oversampling technology (SMOTE) is used to generate defective samples, combined with the nearest neighbor rule (ENN) method for data cleaning, and the number of non-defective samples is reduced to form a balanced data sample set. Then, a hybrid feature selection algorithm is used to process the equalized data set to remove features that are not highly correlated with the category, and generate an optimized feature subset. Finally, the AdaBoost ensemble learning algorithm is used as the classifier to construct a software defect prediction model. Experiments were conducted on a number of unbalanced software defect data sample sets, and the results show that the proposed method can effectively improve the prediction performance.