The Android operating system is considered as a leading global mobile OS, with its open-source nature driving widespread use across critical daily activities like banking, communication, entertainment, education, and healthcare. Therefore, Android is a primary target and attractive ground for cyber threats. In this paper, a novel malware detection framework, which is called TAB-DROID, is introduced. The proposed framework leverages advanced feature selection, compression, and classification techniques applied to real-world datasets. Firstly, the Conditional Mutual Information Maximization (CMIM) and Joint Mutual Information (JMI) algorithms are used concurrently for feature selection. Each algorithm independently selects relevant features from the datasets. Moreover, product quantization (PQ) for feature compression is applied separately to the outputs of both CMIM and JMI to enhance storage and accelerate subsequent processing without compromising critical information. Subsequently, the Tabular Prior data Fitted Network (TabPFN) classifier is integrated into pipelines to perform the classification task. By applying 5-fold cross-validation, the results demonstrate that the optimized pipeline using CMIM achieved superior detection performance compared to the pipeline using JMI. According to CMIM-based pipeline configuration, the accuracy, AUC, precision, recall, and F1-score metrics reach 99.2%, 99.9%, 99.6%, 98.7%, and 99.2%, respectively. In addition, integrating PQ with CMIM reduced testing time by 44.4% and memory usage by 42.8%, highlighting the framework’s efficiency alongside its high detection accuracy. Furthermore, the results are compared to other competing techniques, showing that the proposed framework achieved significantly enhanced performance, where the TAB-DROID has improved the accuracy up to 1.52% and precision up to 2.69%, while also reducing the feature space by 73%.
Key words: Android Malware, Malware Detection, Machine Learning, Conditional Mutual Information Maximization, Product Quantization, Tabpfn Classifier.
|