Change search
ReferencesLink to record
Permanent link

Direct link
Sound Modular Extraction of Control Flow Graphs from Java Bytecode
KTH, School of Computer Science and Communication (CSC), Theoretical Computer Science, TCS.ORCID iD: 0000-0002-6468-1605
2012 (English)Licentiate thesis, monograph (Other academic)
Abstract [en]

Control flow graphs (CFGs) are abstract program models that preserve the control flow information. They have been widely utilized for many static analyses in the past decades. Unfortunately, previous studies about the CFG construction from modern languages, such as Java, have either neglected advanced features that influence the control flow, or do not provide a correctness argument. This is a bearable issue for some program analyses, but not for formal methods, where the soundness of CFGs is a mandatory condition for the verification of safety-critical properties. Moreover, when developing open systems, i.e., systems in which at least one component is missing, one may want to extract CFGs to verify the available components. Soundness is even harder to achieve in this scenario, because of the unknown inter-dependencies involving missing components. In this work we present two variants of a CFG extraction algorithm from Java bytecode considering precise exceptional flow, which are sound w.r.t to the JVM behavior. The first algorithm extracts CFGs from fully-provided (closed) programs only. It proceeds in two phases. Initially the Java bytecode is translated into a stack-less intermediate representation named BIR, which provides explicit representation of exceptions, and is more compact than the original bytecode. Next, we define the transformation from BIR to CFGs, which, among other features, considers the propagation of uncaught exceptions within method calls. We then establish its correctness: the behavior of the extracted CFGs is shown to be a sound over-approximation of the behavior of the original programs. Thus, temporal safety properties that hold for the CFGs also hold for the program. We prove this by suitably combining the properties of the two transformations with those of a previous idealized CFG extraction algorithm, whose correctness has been proven directly. The second variant of the algorithm is defined for open systems. We generalize the extraction algorithm for closed systems for a modular set-up, and resolve inter-dependencies involving missing components by using user-provided interfaces. We establish its correctness by defining a refinement relation between open systems, which constrains the instantiation of missing components. We prove that if the relation holds, then the CFGs extracted from the components of the original open system are sound over-approximations of the CFGs for the same components in the refined system. Thus, temporal safety properties that hold for an open system also hold for closed systems that refine it. We have implemented both algorithms as the ConFlEx tool. It uses Sawja, an external library for the static analysis of Java bytecode, to transform bytecode into BIR, and to resolve virtual method calls. We have extended Sawja to support open systems, and improved its exception type analysis. Experimental results have shown that the algorithm for closed systems generates more precise CFGs than the modular algorithm. This was expected, due to the heavy over-approximations the latter has to perform to be sound. Also, both algorithms are linear in the number of bytecode instructions. Therefore, ConFlEx is efficient for the extraction of CFGs from either open, or closed Java bytecode programs.

Place, publisher, year, edition, pages
Stockholm: KTH Royal Institute of Technology, 2012. , vii, 74 p.
Trita-CSC-A, ISSN 1653-5723 ; 2012:16
Keyword [en]
Software Verification, Static Analysis, Program Models, Compositional Verification
National Category
Computer Systems
URN: urn:nbn:se:kth:diva-105275ISBN: 978-91-7501-570-5OAI: diva2:570503
2012-12-12, D3, Lindstedtsvägen 5, KTH, Stockholm, 13:00 (English)

QC 20121122

Available from: 2012-11-22 Created: 2012-11-19 Last updated: 2012-11-22Bibliographically approved

Open Access in DiVA

lic_thesis-pedro_gomes(941 kB)601 downloads
File information
File name FULLTEXT01.pdfFile size 941 kBChecksum SHA-512
Type fulltextMimetype application/pdf

Search in DiVA

By author/editor
de Carvalho Gomes, Pedro
By organisation
Theoretical Computer Science, TCS
Computer Systems

Search outside of DiVA

GoogleGoogle Scholar
Total: 601 downloads
The number of downloads is the sum of all downloads of full texts. It may include eg previous versions that are now no longer available

Total: 1889 hits
ReferencesLink to record
Permanent link

Direct link