Enhancing Malware Classification: A Comparison of Filter and Embedded Feature Selection with a Novel Dataset

Durre Zehra Syeda, Mamoona Naveed Asghar

Research output: Chapter in Book or Conference Publication/ProceedingConference Publicationpeer-review

Abstract

In recent times, the field of machine learning has witnessed remarkable progress, notably improving the efficiency of malware detection systems. However, the rapid surge in data dimensionality due to advanced technologies necessitates effective feature selection techniques. Feature selection is crucial in refining classifiers by identifying vital features and reducing computational complexity. The multitude of available feature selection algorithms, each with unique criteria, poses a challenge in choosing the right technique for specific datasets in various domains. To tackle this challenge, a combination of Filter and Embedded feature selection methods has been employed. These methods integrates outcomes from multiple feature selection approaches, effectively mitigating the limitations of individual methods. This paper presents a comprehensive comparison between Filter-based techniques, such as chi-squared and Information Gain, ANOVA-F and Embedded techniques like Lasso, Random Forest, XGBoost, and Extra-Tree Classifier. Additionally, it explores API categorization using novel datasets. Experimental findings consistently highlight Random Forest as the preferred choice, consistently delivering high classification accuracy (98%), F-measure (97%), recall (95%), precision (100%), AUC (98%), and demonstrating efficient feature reduction for malware classification datasets. Notably, all feature models exhibit a significant emphasis on Kernel and System Management, Registry Operations, File System and System Information based APIs.

Original languageEnglish
Title of host publication2023 Cyber Research Conference - Ireland, Cyber-RCI 2023
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9798350309522
DOIs
Publication statusPublished - 2023
Event2023 Cyber Research Conference - Ireland, Cyber-RCI 2023 - Letterkenny, Ireland
Duration: 24 Nov 2023 → …

Publication series

Name2023 Cyber Research Conference - Ireland, Cyber-RCI 2023

Conference

Conference2023 Cyber Research Conference - Ireland, Cyber-RCI 2023
Country/TerritoryIreland
CityLetterkenny
Period24/11/23 → …

Keywords

  • API categorisation
  • dataset generation
  • feature scoring
  • feature selection
  • machine learning models
  • malware classification

Fingerprint

Dive into the research topics of 'Enhancing Malware Classification: A Comparison of Filter and Embedded Feature Selection with a Novel Dataset'. Together they form a unique fingerprint.

Cite this