CURATING DATASETS TO ENHANCE SPYWARE CLASSIFICATION

JJCIT. 2025; 11(1): 1-15

doi: 10.5455/jjcit.71-1719026602

Mousumi Ahmed Mimi, Hu Ng, Timothy Tzen Vun Yap.

Abstract
Current methods for spyware classification lack effectiveness as well-structured datasets are typically absent, especially those with directionality properties in their set of features. In this particular research work, the efficacy of directionality properties for classification is explored, through engineered features from those on existing datasets. This study curates two datasets, Dataset A which includes features extracted from only single directional packet flows, and Dataset B which includes those from bi-directional packet flows. Classification with these features is performed with selected classifiers, where SVM obtained the highest accuracy with 99.88% for Dataset A, while the highest accuracy went to RF, DT, and XGBoost for Dataset B with 99.24%. Comparing these results with those from existing research work, the directional properties in these engineered features are able to provide improvements in terms of accuracy, in classifying these spywares. Key words: Feature engineering, datasets curation, spyware classification, packet analysis