Analysis of Ridesharing Vehicle Crash Severity Incorporating Data Imbalance Treatment

Conference: ISCTT 2021 - 6th International Conference on Information Science, Computer Technology and Transportation
11/26/2021 - 11/28/2021 at Xishuangbanna, China

Proceedings: ISCTT 2021

Pages: 5Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Authors:
Zhao, Jianwei (School of Traffic and Transportation, Beijing Jiaotong University, Beijing, China)

Abstract:
Existing studies have proposed various methodologies to analyze traffic crash injury severity. Nonetheless, few studies have considered the data imbalance issue embedded in traffic crash data. To address above issue, this research introduces three data rebalance techniques, namely random over-sampling, random under-sampling and Synthetic Minority Over-Sampling Technique with Edited Nearest Neighbor (SMOTE+ENN) treatment to treat data imbalance issue of traffic crash data. Further, this research combines above three data rebalance techniques with random forest and Logistic regression respectively. Adding two control groups (random forest and Logistic regression without data rebalance treatment), total 8 models are developed finally. Due to the data imbalance issue of traffic crash data, this research applies geometric mean to evaluate the classification performance of models. This research uses the traffic crash data collected from Chicago Data Portal website to train these models, the results show that three data rebalance techniques improve the model’s ability to deal with imbalanced data of traffic crash data dramatically, and random under-sampling has the best power to deal with data imbalance problems. Among these 8 models, random forest with random under-sampling treatment has the highest geometric mean.