New Construction of Ensemble Classifiers for Imbalanced Datasets
Yun Zhai, Da Ruan, Nan Ma and Bing An
To improve the predictive power of classifiers against imbalanced data sets, this paper presents an ensemble-based learning algorithm as a new ensemble classifier model called as an SVM-C5.0 ensemble classifier model, SCECM. The SCECM adopts a differentiated sampling rate algorithm based on an improved Adaboost algorithm and further employs some unique classifier-selection strategy, novel classifier integration approach and original classification decision-making method. Comparative experimental results show that the proposed approach improves performance for the minority class while preserving the ability to recognize examples from the majority classes.
Keywords: Data mining, classification, imbalanced datasets, heterogeneous classifier, differentiated sampling rate, ensemble model of classifiers.