Change search
ReferencesLink to record
Permanent link

Direct link
Compositional Decompilation using LLVM IR
Uppsala University, Disciplinary Domain of Science and Technology, Mathematics and Computer Science, Department of Information Technology.
2015 (English)Independent thesis Basic level (degree of Bachelor), 20 HE creditsStudent thesis
Abstract [en]

Decompilation or reverse compilation is the process of translating low-level machine-readable code into high-level human-readable code. The problem is non-trivial due to the amount of information lost during compilation, but it can be divided into several smaller problems which may be solved independently. This report explores the feasibility of composing a decompilation pipeline from independent components, and the potential of exposing those components to the end-user. The components of the decompilation pipeline are conceptually grouped into three modules. Firstly, the front-end translates a source language (e.g. x86 assembly) into LLVM IR; a platform-independent low-level intermediate representation. Secondly, the middle-end structures the LLVM IR by identifying high-level control flow primitives (e.g. pre-test loops, 2-way conditionals). Lastly, the back-end translates the structured LLVM IR into a high-level target programming language (e.g. Go). The control flow analysis stage of the middle-end uses subgraph isomorphism search algorithms to locate control flow primitives in CFGs, both of which are described using Graphviz DOT files. The decompilation pipeline has been proven capable of recovering nested pre-test and post-test loops (e.g. while, do-while), and 1-way and 2-way conditionals (e.g. if, if-else) from LLVM IR. Furthermore, the data-driven design of the control flow analysis stage facilitates extensions to identify new control flow primitives. There is huge potential for future development. The Go output could be made more idiomatic by extending the post-processing stage, using components such as Grind by Russ Cox which moves variable declarations closer to their usage. The language-agnostic aspects of the design will be validated by implementing components in other languages; e.g. data flow analysis in Haskell. Additional back-ends (e.g. Python output) will be implemented to verify that the general decompilation tasks (e.g. control flow analysis, data flow analysis) are handled by the middle-end.

Place, publisher, year, edition, pages
2015. , 112 p.
Keyword [en]
binary analysis, composition, compositional, control flow analysis, decompilation, decompiler, golang, intermediate representation, LLVM IR, post-processing
National Category
Computer Science
URN: urn:nbn:se:uu:diva-279307OAI: diva2:911797
Educational program
Bachelor Programme in Computer Science
2015-04-30, Buckingham Building, University of Portsmouth, Buckingham Building, Lion Terrace, Portsmouth, PO1 3HE, United Kingdom, Portsmouth, 14:30 (English)

BSc dissertation written during an ERASMUS exchange from Uppsala University to the University of Portsmouth.

Available from: 2016-03-18 Created: 2016-02-29 Last updated: 2016-03-18Bibliographically approved

Open Access in DiVA

decompilation.pdf(3303 kB)107 downloads
File information
File name FULLTEXT01.pdfFile size 3303 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
Eklind, Robin
By organisation
Department of Information Technology
Computer Science

Search outside of DiVA

GoogleGoogle Scholar
Total: 107 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 276 hits
ReferencesLink to record
Permanent link

Direct link