Change search
ReferencesLink to record
Permanent link

Direct link
Detection of Spyware by Mining Executable Files
Blekinge Institute of Technology, School of Computing.
Blekinge Institute of Technology, School of Computing.
2009 (English)Independent thesis Advanced level (degree of Master (Two Years))Student thesis
Abstract [en]

Malicious programs have been a serious threat for the confidentiality, integrity and availability of a system. Different researches have been done to detect them. Two approaches have been derived for it i.e. Signature Based Detection and Heuristic Based Detection. These approaches performed well against known malicious programs but cannot catch the new malicious programs. Different researchers tried to find new ways of detecting malicious programs. The application of data mining and machine learning is one of them and has shown good results compared to other approaches. A new category of malicious programs has gained momentum and it is called Spyware. Spyware are more dangerous for confidentiality of private data of the user of system. They may collect the data and send it to third party. Traditional techniques have not performed well in detecting Spyware. So there is a need to find new ways for the detection of Spyware. Data mining and machine learning have shown promising results in the detection of other malicious programs but it has not been used for detection of Spyware yet. We decided to employ data mining for the detection of spyware. We used a data set of 137 files which contains 119 benign files and 18 Spyware files. A theoretical taxonomy of Spyware is created but for the experiment only two classes, Benign and Spyware, are used. An application Binary Feature Extractor have been developed which extract features, called n-grams, of different sizes on the basis of common feature-based and frequency-based approaches. The number of features were reduced and used to create an ARFF file. The ARFF file is used as input to WEKA for applying machine learning algorithms. The algorithms used in the experiment are: J48, Random Forest, JRip, SMO, and Naive Bayes. 10-fold cross-validation and the area under ROC curve is used for the evaluation of classifier performance. We performed experiments on three different n-gram sizes, i.e.: 4, 5, 6. Results have shown that extraction of common feature approach has produced better results than others. We achieved an overall accuracy of 90.5 % with an n-gram size of 6 from the J48 classifier. The maximum area under ROC achieved was 83.3 % with Random Forest.

Place, publisher, year, edition, pages
2009. , 52 p.
Keyword [en]
Spyware Detection, Data Mining, Machine Learning, Feature Extraction, WEKA, ARFF
National Category
Computer Science Probability Theory and Statistics
URN: urn:nbn:se:bth-3095Local ID: diva2:830393
Physics, Chemistry, Mathematics
+46709325761, +46762782550Available from: 2015-04-22 Created: 2009-06-17 Last updated: 2015-06-30Bibliographically approved

Open Access in DiVA

fulltext(387 kB)216 downloads
File information
File name FULLTEXT01.pdfFile size 387 kBChecksum SHA-512
Type fulltextMimetype application/pdf

By organisation
School of Computing
Computer ScienceProbability Theory and Statistics

Search outside of DiVA

GoogleGoogle Scholar
Total: 216 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 39 hits
ReferencesLink to record
Permanent link

Direct link