Android Malware Detection using Machine Learning Algorithms on the Drebin Dataset. Full report here.
This is a homework about applying Support Vector Machine, Random Forest Classifier, and Naïve Bayes Classifier to the detection of malwares. The Drebin dataset containing details of 129,013 Android applications manifest files of which 5,560 are listed to be malwares was used. The algorithms had good predictions overall, with Random Forest Classifier having the best results.
A count was made for the number of occurrence of the feature properties in each feature set, and so, a feature vector of size eight was used, each feature having count values. So, each file has an input vector that looks like [2, 11, 5, 3, 7, 6, 11, 26], with the output being TRUE (is a malware) or FALSE (not a malware).
- Download the files
- If the folders 'processed_data' and 'raw_data' do not exist in the folder root directory, create them
- Download dataset here and unpack inside raw_data folder (you might want to modify the path accordingly in case of any changes)
- Execute scripts/data_extraction.py first
- Then execute any of the ML model files (nb.py, rfc.py, and svm.py) in whichever order you prefer