ELC-PPW: Ensemble Learning and Classification (LC) by Positional Patterns Weights (PPW) of API Calls as Dynamic n-Grams for Malware Perception

G Bala Krishna (CVR College of Engineering, India); Radha Vedala (IDRBT, India); K Venu Gopala Rao (GNITS, India)

Malware threats are continuing to grow in volume and sophistication. Current anti-virus software is ineffective on the new generation malware threats. Ongoing developments in machine learning models pose a promising alternative to act against virus attacks including detection of zero-day virus attacks. Some of the contemporary literature has explained the possibility of implementing machine learning algorithms to virus detection. Majority of these algorithms use n-gram characteristics of the dot EXE file code, where n is fixed to constant value. Moreover the calls considered in an n-gram are sequential pattern. We argue that the dynamic n-gram features, which are patterned by their positional frequency can escalate the detection accuracy and downgrade the misclassification rate. Moreover, the volume and coherence of the training data is the critical influential factor of the detection accuracy. If volume of training data is high and the given records are not coherent, then the heuristics developed to assess the fitness of the test data are not optimal. In this regard, the contribution of this manuscript proposes an ensemble classification approach, whose learning and detection is also n-gram characteristics. However, the value of n is dynamic and they are patterned by their position, instead sequence. In particular, the contribution of the manuscript is an Ensemble Learning and Classification strategy that uses cuckoo search as binary classifier. The proposed ensemble model uses positional patterns of the API calls with dynamic size as n-grams. Results from the execution of the program indicate a strong discrepancy between malicious software and benign software. The changes identified in classifier performance are evaluated in accordance with variations in malware prevalence

Journal: International journal of simulation: systems, science & technology V18

Published: Mar 31, 2017

DOI: 10.5013/IJSSST.a.18.01.13