The Normalised Compression Distance as a File Fragment Classifier
Blekinge Institute of Technology, School of Computing2010 (English)In: Digital Investigation. The International Journal of Digital Forensics and Incident Response, ISSN 1742-2876, E-ISSN 1873-202X, Vol. 7, no Suppl 1, S24-S31 p.Article in journal (Refereed) Published
We have applied the generalised and universal distance measure NCD—Normalised Compression Distance—to the problem of determining the type of ﬁle fragments. To enable later comparison of the results, the algorithm was applied to fragments of a publicly available corpus of ﬁles. The NCD algorithm in conjunction with the k-nearest-neighbour (k ranging from one to ten) as the classiﬁcation algorithm was applied to a random selection of circa 3000 512-byte ﬁle fragments from 28 different ﬁle types. This procedure was then repeated ten times. While the overall accuracy of the n-valued classiﬁcation only improved the prior probability from approximately 3.5% to circa 32%–36%, the classiﬁer reached accuracies of circa 70% for the most successful ﬁle types. A prototype of a ﬁle fragment classiﬁer was then developed and evaluated on new set of data (from the same corpus). Some circa 3000 fragments were selected at random and the experiment repeated ﬁve times. This prototype classiﬁer remained successful at classifying individual ﬁle types with accuracies ranging from only slightly lower than 70% for the best class, down to similar accuracies as in the prior experiment.
Place, publisher, year, edition, pages
Elsevier , 2010. Vol. 7, no Suppl 1, S24-S31 p.
IdentifiersURN: urn:nbn:se:bth-7671DOI: 10.1016/j.diin.2010.05.004ISI: 000281010700004Local ID: oai:bth.se:forskinfoB3816A81E8404C23C1257806003EA7EFOAI: oai:DiVA.org:bth-7671DiVA: diva2:835315