Software defect prediction based on combined sampling and feature selection

Conference: ICMLCA 2021 - 2nd International Conference on Machine Learning and Computer Application
12/17/2021 - 12/19/2021 at Shenyang, China

Proceedings: ICMLCA 2021

Pages: 5Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Authors:
Wang, Denglin; Xiong, Xiaohui (College of Information Engineering, Shanghai Maritime University, Shanghai, China)

Abstract:
Aiming at the problem of class imbalance in software defect prediction and the dimension of feature space, a software defect prediction method based on combined sampling and feature selection is proposed, and integrated learning is applied to the classification prediction model. First, in the data processing stage, synthetic oversampling technology (SMOTE) is used to generate defective samples, combined with the nearest neighbor rule (ENN) method for data cleaning, and the number of non-defective samples is reduced to form a balanced data sample set. Then, a hybrid feature selection algorithm is used to process the equalized data set to remove features that are not highly correlated with the category, and generate an optimized feature subset. Finally, the AdaBoost ensemble learning algorithm is used as the classifier to construct a software defect prediction model. Experiments were conducted on a number of unbalanced software defect data sample sets, and the results show that the proposed method can effectively improve the prediction performance.