A Comparative Review of SMOTE and ADASYN in Imbalanced Data Classification
2021 (English)Independent thesis Basic level (degree of Bachelor), 10 credits / 15 HE credits
Student thesis
Abstract [en]
In this thesis, the performance of two over-sampling techniques, SMOTE and ADASYN, is compared. The comparison is done on three imbalanced data sets using three different classification models and evaluation metrics, while varying the way the data is pre-processed. The results show that both SMOTE and ADASYN improve the performance of the classifiers in most cases. It is also found that SVM in conjunction with SMOTE performs better than with ADASYN as the degree of class imbalance increases. Furthermore, both SMOTE and ADASYN increase the relative performance of the Random forest as the degree of class imbalance grows. However, no pre-processing method consistently outperforms the other in its contribution to better performance as the degree of class imbalance varies.
Place, publisher, year, edition, pages
2021. , p. 42
Keywords [en]
Machine learning, supervised learning, classification, class imbalance, over-sampling, SMOTE, ADASYN, Sensitivity, F-measure, Matthews correlation coefficient
National Category
Probability Theory and Statistics
Identifiers
URN: urn:nbn:se:uu:diva-432162OAI: oai:DiVA.org:uu-432162DiVA, id: diva2:1519153
Subject / course
Statistics
Educational program
Bachelor Programme in Business and Economics
Supervisors
Examiners
2021-01-262021-01-182021-01-26Bibliographically approved