Machine Learning Shrewd Approach For An Imbalanced Dataset Conversion Samples

Authors

  • S. Ashraf College of Internet of Things Engineering
  • T. Ahmed

Keywords:

Classification, Machine learning, SMOTE, Spread Subsampling, Class imbalance

Abstract


The imbalance data applies to at least one of the classes, which are typically exceeded by the other ones. The Machine Learning Algorithm (Classifier) trained with an imbalance dataset predicts the majority class (frequently occurring) more than the other minority classes (rarely occurring). Training with an imbalance dataset poses challenges for classifiers; however, applying suitable techniques for reducing class imbalance issues can enhance the classifier’s performance. We take an imbalanced dataset from an educational context. Initially, all shortcomings regarding classification of imbalanced dataset have been examined. After that, we apply data-level algorithms for class balancing and compare the performance of classifiers. The performance of the classifier is measured using the underlying information in their confusion matrices such as accuracy, precision, recall, and f-measure. It shows that classification with an imbalance dataset may produce higher accuracy but low precision and recall for the minority class. The analysis confirms that both undersampling and oversampling are effective for balancing datasets, however, oversampling dominates.

Author Biography

S. Ashraf, College of Internet of Things Engineering

Assistant Professor
College of Internet of Things Engineering,Hohai University,China

Downloads

Published

2020-06-19

How to Cite

Ashraf, S., & Ahmed, T. (2020). Machine Learning Shrewd Approach For An Imbalanced Dataset Conversion Samples. Journal of Engineering and Technology (JET), 11(1), 1–22. Retrieved from https://jet.utem.edu.my/jet/article/view/5896