Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
An Evaluation of Machine Learning Approaches for Hierarchical Malware Classification
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
Blekinge Institute of Technology, Faculty of Computing, Department of Computer Science.
2019 (English)Independent thesis Advanced level (degree of Master (Two Years)), 20 credits / 30 HE creditsStudent thesis
Abstract [en]

With an evermore growing threat of new malware that keeps growing in both number and complexity, the necessity for improvement in automatic detection and classification of malware is increasing. The signature-based approaches used by several Anti-Virus companies struggle with the increasing amount of polymorphic malware. The polymorphic malware change some minor aspects of the code to be able to remain undetected. Malware classification using machine learning have been used to try to solve this issue in previous research. In the proposed work, different hierarchical machine learning approaches are implemented to conduct three experiments. The methods utilise a hierarchical structure in various ways to be able to get a better classification performance. A selection of hierarchical levels and machine learning models are used in the experiments to evaluate how the results are affected.

A data set is created, containing over 90000 different labelled malware samples. The proposed work also includes the creation of a labelling method that can be helpful for researchers in malware classification that needs labels for a created data set.The feature vector used contains 500 n-gram features and 3521 Import Address Table features. In the experiments for the proposed work, the thesis includes the testing of four machine learning models and three different amount of hierarchical levels. Stratified 5-fold cross validation is used in the proposed work to reduce bias and variance in the results.

The results from the classification approach shows it achieves the highest hF-score, using Random Forest (RF) as the machine learning model and having four hierarchical levels, which got an hF-score of 0.858228. To be able to compare the proposed work with other related work, pure-flat classification accuracy was generated. The highest generated accuracy score was 0.8512816, which was not the highest compared to other related work.

Place, publisher, year, edition, pages
2019. , p. 107
Keywords [en]
Machine Learning, Hierarchical Malware Classification, Static Malware Analysis, Mnemonic N-grams
National Category
Other Computer and Information Science
Identifiers
URN: urn:nbn:se:bth-18260OAI: oai:DiVA.org:bth-18260DiVA, id: diva2:1332898
External cooperation
Securelink
Subject / course
Degree Project in Master of Science in Engineering 30,0 hp
Educational program
DVACD Master of Science in Computer Security
Presentation
2019-06-05, J1280, Valhallavägen 1, Karlskrona, 09:00 (English)
Supervisors
Examiners
Available from: 2019-07-01 Created: 2019-06-28 Last updated: 2019-07-01Bibliographically approved

Open Access in DiVA

fulltext(3665 kB)152 downloads
File information
File name FULLTEXT01.pdfFile size 3665 kBChecksum SHA-512
4cfa323be381a4d69f6f5b8a95e1f4da0add80df6c06e8c17341f47157d93ef163f8d5233fbfee374507fe38fdf38ed962e882f31462f4b78bc16f05b2a2feeb
Type fulltextMimetype application/pdf

By organisation
Department of Computer Science
Other Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 152 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

urn-nbn

Altmetric score

urn-nbn
Total: 408 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf