Verona is a concurrent object-oriented programming language that organises all the objects in a program into a forest of isolated regions. Memory is managed locally for each region, so programmers can control a program's memory use by adjusting objects' partition into regions, and by setting each region's memory management strategy. A thread can only mutate (allocate, deallocate) objects within one active region---its "window of mutability". Memory management costs are localised to the active region, ensuring overheads can be predicted and controlled. Moving the mutability window between regions is explicit, so code can be executed wherever it is required, yet programs remain in control of memory use. An ownership type system based on reference capabilities enforces region isolation, controlling aliasing within and between regions, yet supporting objects moving between regions and threads. Data accesses never need expensive atomic operations, and are always thread-safe.
The actor paradigm supports the natural expression of concurrency. It has inspired the development of several actor-based languages, whose adoption depends, to a large extent, on the runtime characteristics ( the performance and scaling behaviour) of programs written in these languages.
This paper investigates the relative runtime characteristics of Akka, CAF and Pony, based on the Savina benchmarks. We observe that the scaling of many of the Savina benchmarks does not reflect their categorization (into essentially sequential, concurrent and parallel), that many programs have similar runtime characteristics, and that their runtime behaviour may drastically change nature ( go from essentially sequential to parallel) by tweaking some parameters.
These observations lead to our proposal of a single benchmark program which we designed so that through tweaking of some knobs (we hope) we can simulate most of the programs of the Savina suite.
The age of multi-core computers is upon us, yet current programming languages, typically designed for single-core computers and adapted post hoc for multi-cores, remain tied to the constraints of a sequential mindset and are thus in many ways inadequate. New programming language designs are required that break away from this old-fashioned mindset. To address this need, we have been developing a new programming language called Encore, in the context of the European Project UpScale. The paper presents a motivation for the Encore language, examples of its main constructs, several larger programs, a formalisation of its core, and a discussion of some future directions our work will take. The work is ongoing and we started more or less from scratch. That means that a lot of work has to be done, but also that we need not be tied to decisions made for sequential language designs. Any design decision can be made in favour of good performance and scalability. For this reason, Encore offers an interesting platform for future exploration into object-oriented parallel programming.
Aliasing is crucial for supporting useful implementation patterns, but it makes reasoning about programs difficult. To deal with this problem, numerous type-based aliasing control mechanisms have been proposed, expressing properties such as uniqueness. Uniqueness, however, is black-and-white: either a reference is unique or it can be arbitrarily aliased; and global: excluding aliases throughout the entire system, making code brittle to changing requirements. Disjointness domains, a new approach to alias control, address this problem by enabling more graduations between uniqueness and arbitrary reference sharing. They allow expressing aliasing constraints local to a certain set of variables (either stack variables or fields) for instance that no aliasing occurs between variables within some set of variables but between such sets or the opposite, that aliasing occurs within that set but not between different sets. A hierarchy of disjointness domains controls the flow of references through a program, helping the programmer reason about disjointness and enforce local alias invariants. The resulting system supports fine-grained control of aliasing between both variables and objects, making aliasing explicit to programmers, compilers, and tooling. This paper presents a formal account of disjointness domains along with examples. Disjointness domains provide novel means of expressing may-alias kinds of constraints, which may prove useful in compiler optimisation and verification.
Programming language-design and run-time-implementation require detailed knowledge about the programs that users want to implement. Acquiring this knowledge is hard, and there is little tool support to effectively estimate whether a proposed tradeoff actually makes sense in the context of real world applications. Ideally, knowledge about behaviour of "typical" programs is 1) easily obtainable, 2) easily reproducible, and 3) easily sharable. We present Spencer, an open source web service and API framework for dynamic analysis of a continuously growing set of traces of standard program corpora. Users do not obtain traces on their own, but can instead send queries to the web service that will be executed on a set of program traces. Queries are built in terms of a set of query combinators that present a high level interface for working with trace data. Since the framework is high level, and there is a hosted collection of recorded traces, queries are easy to implement. Since the data sets are shared by the research community, results are reproducible. Since the actual queries run on one (or many) servers that provide analysis as a service, obtaining results is possible on commodity hardware. Data in Spencer is meant to be obtained once, and analysed often, making the overhead of data collection mostly irrelevant. This allows Spencer to collect more data than traditional tracing tools can afford within their performance budget. Results in Spencer are cached, making complicated analyses that build on cached primitive queries speedy.
Expressive actor models combine aspects of functional programming into the pure actor model enriched with futures. Such functional features include first-class closures which can be passed between actors and chained on futures. Combined with mutable objects, this opens the door to race conditions. In some situations, closures may not be evaluated by the actor that created them yet may access fields or objects owned by that actor. In other situations, closures may be safely fired off to run as a separate task.
This paper discusses the problem of who can safely evaluate a closure to avoid race conditions, and presents the current solution to the problem adopted by the Encore language. The solution integrates with Encore's capability type system, which influences whether a closure is attached and must be evaluated by the creating actor, or whether it can be detached and evaluated independently of its creator.
Encore's current solution to this problem is not final or optimal. We conclude by discussing a number of open problems related to dealing with closures in the actor model.
Any non-trivial concurrent system warrants synchronisation, regardless of the concurrency model. Actor-based concurrency serialises all computations in an actor through asynchronous message passing. In contrast, lock-based concurrency serialises some computations by following a lock-unlock protocol for accessing certain data. Both systems require sound reasoning about pointers and aliasing to exclude data-races. If actor isolation is broken, so is the single-thread-of-control abstraction. Similarly for locks, if a datum is accessible outside of the scope of the lock, the datum is not governed by the lock. In this paper we discuss how to balance aliasing and synchronisation. In previous work, we defined a type system that guarantees data-race freedom of actor-based concurrency and lock-based concurrency. This paper extends this work by the introduction of two programming constructs; one for decoupling isolation and synchronisation and one for constructing higher-level atomicity guarantees from lower-level synchronisation. We focus predominantly on actors, and in particular the Encore programming language, but our ultimate goal is to define our constructs in such a way that they can be used both with locks and actors, given that combinations of both models occur frequently in actual systems. We discuss the design space, provide several formalisations of different semantics and discuss their properties, and connect them to case studies showing how our proposed constructs can be useful. We also report on an on-going implementation of our proposed constructs in Encore.
In concurrent systems, some form of synchronisation is typically needed to achieve data-race freedom, which is important for correctness and safety. In actor-based systems, messages are exchanged concurrently but executed sequentially by the receiving actor. By relying on isolation and non-sharing, an actor can access its own state without fear of data-races, and the internal behavior of an actor can be reasoned about sequentially. However, actor isolation is sometimes too strong to express useful patterns. For example, letting the iterator of a data-collection alias the internal structure of the collection allows a more efficient implementation than if each access requires going through the interface of the collection. With full isolation, in order to maintain sequential reasoning the iterator must be made part of the collection, which bloats the interface of the collection and means that a client must have access to the whole data-collection in order to use the iterator. In this paper, we propose a programming language construct that enables a relaxation of isolation but without sacrificing sequential reasoning. We formalise the mechanism in a simple lambda calculus with actors and passive objects, and show how an actor may leak parts of its internal state while ensuring that any interaction with this data is still synchronised.
Encore is a programming language that was developed between 2014 and 2019. Encore was designed following the principle of inversion of defaults: computations are concurrent (rather than sequential) by default; data is isolated (rather than freely sharable) by default. The language worked as a seedbed for a large number of research ideas aimed at making programming with active objects safe, expressive and efficient.
Encore allows active objects to share data but statically ensures the absence of data races and allows fully concurrent garbage collection. Active objects can synchronize using first-class futures, which are also used to delegate and coalesce computations across active objects. The type system also supports orchestration of intra-object parallelism, expressed using composable units of computation. Active objects which see a lot of traffic can turn themselves into passive objects protected by lock-free synchronization mechanisms to avoid performance bottle-necks, while still facilitating safe sharing and concurrent garbage collection.
This paper gives an overview of these features of Encore, reflecting on lessons learned from trying to fit all of these research ideas into a single language.
KAPPA is a type system for safe concurrent object-oriented program- ming using reference capabilities. It uses a combination of static and dynamic techniques to guarantee data-race freedom, and, for a certain subset of the system, non-interference (and thereby determin- istic parallelism). It combines many features from previous work on alias management, such as substructural types, regions, ownership types, and fractional permissions, and brings them together using a unified set of primitives.
In this extended abstract we show how KAPPA’s capabilities express variations of the aforementioned concepts, discuss the main insights from working with KAPPA, present the current status of the implementation of KAPPA in the context of the actor language Encore, and discuss ongoing and future work.
A linear reference is a reference guaranteed to be unaliased. Thisis a powerful property that simplifies reasoning about programs,but is also a property that is too strong for certainapplications. For example, lock-free algorithms, which implementprotocols to ensure safe concurrent access to data structures, aregenerally not typable with linear references as they involvesharing of mutable state.
This paper presents a type system with a relaxed notion oflinearity that allows an unbounded number of aliases to an objectas long as at most one alias at a time owns the right to accessthe contents of the object. This ownership can be transferredbetween aliases, but can never be duplicated. The resultinglanguage is flexible enough to express several lock-freealgorithms and at the same time powerful enough to guarantee theabsence of data-races when accessing owned data. The language isformalised and proven sound, and is also available as a prototype implementation.
We present OOlong, an object calculus with interface inheritance, structured concurrency and locks. The goal of the calculus is extensibility and reuse. The semantics are therefore available in a version for LATEX typesetting (written in Ott), a mechanised version for doing rigorous proofs in Coq, and a prototype interpreter (written in OCaml) for typechecking an running OOlong programs.
We present OOlong, an object calculus with interface inheritance, structured concurrency and locks. The goal of the calculus is extensibility and reuse. The semantics are therefore available in a version for LaTeX typesetting (written in Ott), and a mechanised version for doing rigorous proofs in Coq.
The proliferation of shared mutable state in object-oriented programming complicates software development as two seemingly unrelated operations may interact via an alias and produce unexpected results. In concurrent programming this manifests itself as data-races.
Concurrent object-oriented programming suffers from the fact that code that warrants synchronisation cannot easily be distinguished from code that does not. The burden is placed solely on the programmer to reason about alias freedom, sharing across threads and side-effects to deduce where and when to apply concurrency control, without inadvertently blocking parallelism.
This paper presents a reference capability approach to concurrent and parallel object-oriented programming where all uses of aliases are guaranteed to be data-race free. Locations' static types describe their possible sharing. Type information can express non-interfering deterministic parallelism without dynamic concurrency control, thread-locality, lock-based schemes, and guarded-by relations giving multi-object atomicity to nested data structures. Unification of capabilities and traits allows trait-reuse across multiple concurrency scenarios with minimal code duplication. The resulting system brings together features from a wide range of prior work in a unified way.
The proliferation of shared mutable state in object-orientedprogramming complicates software development as two seeminglyunrelated operations may interact via an alias and produceunexpected results. In concurrent programming this manifestsitself as data-races.
Concurrent object-oriented programming further suffers from thefact that code that warrants synchronisation cannot easily bedistinguished from code that does not. The burden is placed solelyon the programmer to reason about alias freedom, sharing acrossthreads and side-effects to deduce where and when to applyconcurrency control, without inadvertently blocking parallelism.
This paper presents a reference capability approach to concurrentand parallel object-oriented programming where all uses of aliasesare guaranteed to be data-race free. The static type of an aliasdescribes its possible sharing without using explicit ownership oreffect annotations. Type information can express non-interferingdeterministic parallelism without dynamic concurrency control,thread-locality, lock-based schemes, and guarded-by relationsgiving multi-object atomicity to nested data structures.Unification of capabilities and traits allows trait-based reuseacross multiple concurrency scenarios with minimal codeduplication. The resulting system brings together features from awide range of prior work in a unified way.
Linear references are guaranteed to be free from aliases. This is a strong property that simplifies reasoning about programs and enables powerful optimisations, but it is also a property that is too strong for many applications. Notably, lock-free algorithms, which implement protocols that ensure safe, non-blocking concurrent access to data structures, are generally not typable with linear references because they rely on aliasing to achieve lock-freedom.
This paper presents LOLCAT, a type system with a relaxed notion of linearity that allows an unbounded number of aliases to an object as long as at most one alias at a time owns the right to access the contents of the object. This ownership can be transferred between aliases, but can never be duplicated. LOLCAT types are powerful enough to type several lock-free data structures and give a compile-time guarantee of absence of data-races when accessing owned data. In particular, LOLCAT is able to assign types to the CAS (compare and swap) primitive that precisely describe how ownership is transferred across aliases, possibly across different threads. The paper introduces LOLCAT through a sound core procedural calculus, and shows how LOLCAT can be applied to three fundamental lock-free data structures. It also discusses a prototype implementation which integrates LOLCAT with an object-oriented programming language.
This extended abstract overviews work on a type system for lock-free programming based on compare-and-swap. The type system prevents atomicity violations in lock-free programs, where insertion and removal of objects from a linked structure would be subject to data-races breaking linearity of ownership. The type system has successfully been applied to a small number of lock-free data structures.
This paper presents LOLCAT, a type system with a relaxed notion of linearity that allows an unbounded number of aliases to an object as long as at most one alias at a time owns the right to access the contents of the object. This ownership can be transferred between aliases, but can never be duplicated. LOLCAT types are powerful enough to type several lock-free data structures and give a compile-time guarantee of absence of data-races when accessing owned data. In particular, LOLCAT is able to assign types to the CAS (compare and swap) primitive that precisely describe how ownership is transferred across aliases, possibly across different threads.
The paper introduces LOLCAT through a sound core procedural calculus, and shows how LOLCAT can be applied to three fundamental lock-free data structures. It also shows how LOLCAT can be used to implement synchronisation primitives like locks, and discusses a prototype implementation which integrates LOLCAT with an object-oriented programming language.
Ownership type systems give a strong notion of separation between aggregates. Objects belonging to different owners cannot be aliased, and thus a mutating operation internal to one object is guaranteed to be invisible to another. This naturally facilitates reasoning about correctness on a local scale, but also proves beneficial for coarse-grained parallelism as noninterference between statements touching different objects is easily established. For fine-grained parallelism, ownership types fall short as owner-based disjointness only allows separation of the innards of different aggregates, which is very coarse-grained. Concretely: ownership types can reason about the disjointness of two different data structures, but cannot reason about the internal structure or disjointness within the data structure, without resorting to static and overly constraining measures. For similar reasons, ownership fails to determine internal disjointness of external pointers to objects that share a common owner. In this paper, we introduce the novel notion of refined ownership which overcomes these limitations by allowing precise local reasoning about a group of objects even though they belong to the same external owner. Using refined ownership, we can statically check determinism of parallel operations on tree-shaped substructures of a data structure, including operations on values external to the structure, without imposing any non-local alias restrictions.
Expressing parallelism and coordination is central for modern concurrent programming. Many mechanisms exist for expressing both parallelism and coordination. However, the design decisions for these two mechanisms are tightly intertwined. We believe that the interdependence of these two mechanisms should be recognised and achieved through a single, powerful primitive. We are not the first to realise this: the prime example is actor model programming, where parallelism arises through fine-grained decomposition of a program’s state into actors that are able to execute independently in parallel. However, actor model programming has a serious pain point: updating multiple actors as a single atomic operation is a challenging task. We address this pain point by introducing a new concurrency paradigm: Behaviour-Oriented Concurrency (BoC). In BoC, we are revisiting the fundamental concept of a behaviour to provide a more transactional concurrency model. BoC enables asynchronously creating atomic and ordered units of work with exclusive access to a collection of independent resources. In this paper, we describe BoC informally in terms of examples, which demonstrate the advantages of exclusive access to several independent resources, as well as the need for ordering. We define it through a formal model. We demonstrate its practicality by implementing a C++ runtime. We argue its applicability through the Savina benchmark suite: benchmarks in this suite can be more compactly represented using BoC in place of Actors, and we observe comparable, if not better, performance.
Active objects offer a structured approach to concurrency, encapsulating both unshared state and a thread of control. For efficient data transfer, data should be passed by reference whenever possible, but this introduces aliasing and undermines the validity of the active objects. This paper proposes a minimal variant of ownership types that preserves the required race freedom invariant yet enables data transfer by reference between active objects (that is, without copying) in many cases, and a cheap clone operation where copying is necessary. Our approach is general and should be adaptable to several existing active object systems.
ORCA is a concurrent and parallel garbage collector for actor programs, which does not require any STW steps, or synchronization mechanisms, and that has been designed to support zero-copy message passing and sharing of mutable data. ORCA is part of a runtime for actor-based languages, which was co-designed with the Pony programming language, and in particular, with its data race free type system. By co-designing an actor language with its runtime, it was possible to exploit certain language properties in order to optimize performance of garbage collection. Namely, ORCA relies on the guarantees of absence of race conditions in order to avoid read/write barriers, and it leverages the actor message passing, for synchronization among actors.
In this paper we briefly describe Pony and its type system. We use pseudo-code in order to introduce how ORCA allocates and deallocates objects, how it shares mutable data without requiring barriers upon data mutation, and how can immutability be used to further optimize garbage collection. Moreover, we discuss the advantages of co-designing an actor language with its runtime, and we demonstrate that ORCA can be implemented in a performant and scalable way through a set of micro-benchmarks, including a comparison with other well-known collectors.
Existing software cannot benefit from the revolutionary potential increases in computational power provided by manycore chips unless their design and code are polluted by an unprecedented amount of low-level, fine-grained concurrency detail. As a consequence, the advent of manycore chips threatens to make current main-stream programming approaches obsolete, and thereby, jeopardizes the benefits gained from the last 20 years of development in industrial software engineering. In this article we put forward an argument for a fundamental breakthrough in how parallelism and concurrency are integrated into the software of the future.
Concurrent programs often make use of futures, handles to the results of asynchronous operations. Futures provide means to communicate not yet computed results, and simplify the implementation of operations that synchronise on the result of such asynchronous operations. Futures can be characterised as implicit or explicit, depending on the typing discipline used to type them. Current future implementations suffer from "future proliferation", either at the type-level or at run-time. The former adds future type wrappers, which hinders subtype polymorphism and exposes the client to the internal asynchronous communication architecture. The latter increases latency, by traversing nested future structures at run-time. Many languages suffer both kinds. Previous work offer partial solutions to the future proliferation problems; in this paper we show how these solutions can be integrated in an elegant and coherent way, which is more expressive than either system in isolation. We describe our proposal formally, and state and prove its key properties, in two related calculi, based on the two possible families of future constructs (data-flow futures and control-flow futures). The former relies on static type information to avoid unwanted future creation, and the latter uses an algebraic data type with dynamic checks. We also discuss how to implement our new system efficiently.
Dynamic languages like Erlang, Clojure, JavaScript, and E adopted data-race freedom by design. To enforce data-race freedom, these languages either deep copy objects during actor (thread) communication or proxy back to their owning thread. We present Dala, a simple programming model that ensures data-race freedom while supporting efficient inter-thread communication. Dala is a dynamic, concurrent, capability-based language that relies on three core capa- bilities: immutable values can be shared freely; isolated mutable objects can be transferred between threads but not aliased; local objects can be aliased within their owning thread but not dereferenced by other threads. The addition of a fourth capability, unsafe, allows data races and we show how data-race free programs interact with unsafe programs. We present a formal model of Dala, prove data race-freedom and the dynamic gradual guarantee. These theorems guarantee data race-freedom when using safe capabilities and show that the addition of capabili- ties is semantics preserving modulo permission and cast errors.
This paper investigates methods for quantitatively assessing the importance and relative importance of concepts taught in a university program. This assessment has many uses, e.g., to aid program design and inventory, and for communicating what concepts a course may rely on at a given point in the program. We propose to perform this quantitative assessment in two steps: first, representing the university program as an opportune graph with courses and concepts as nodes and connections between courses and concepts as edges; second, by quantitatively defining each concept's importance as its centrality as a node within the network. We thus perform two investigations, both leveraging a practical case - data collected from two engineering programs at two Swedish university: a) how to represent university programs in terms of graphs (here called Courses-Concepts Graph (CCG)), and b) how to reinterpret the most classical graph-theoretical node centrality indexes in the pedagogical term of concept centrality index.
On current architectures, the optimisation of an application's performance often involves data being stored according to access affinity - what is accessed together should be stored together, rather than logical affinity - what belongs together logically stays together. Such low level techniques lead to faster, but more error prone code, and end up tangling the program's logic with low-level data layout details.
Our vision, which we call SHAPES- Safe, High-level, Abstractions for oPtimisation of mEmory cacheS - is that the layout of a data structure should be defined only once, upon instantiation, and the remainder of the code should be layout agnostic. This enables performance improvements while also guaranteeing memory safety, and supports the separation of program logic from low level concerns. In this paper we investigate how this vision can be supported by extending a programming language.
We describe the core language features supporting this vision: classes can be customized to support different layouts, and layout information is carried around in types; the remaining source code is layout-unaware and the compiler emits layout-aware code. We then discuss our SHAPES implementation through a prototype library, which we also used for preliminary evaluations. Finally, we discuss how the core could be expanded so as to deliver SHAPES's full potential: the incorporation of compacting garbage collection, ad hoc polymorphism and late binding, synchronization of representations of different collections, support for dynamic change of representation, etc.
Modern architectures require applications to make effective use of caches to achieve high performance and hide memory latency. This in turn requires careful consideration of placement of data in memory to exploit spatial locality, leverage hardware prefetching and conserve memory bandwidth. In unmanaged languages like C++, memory optimisations are common, but at the cost of losing object abstraction and memory safety. In managed languages like Java and C#, the abstract view of memory and proliferation of moving compacting garbage collection does not provide enough control over placement and layout. We have proposed SHAPES, a type-driven abstract placement specification that can be integrated with object-oriented languages to enable memory optimisations. SHAPES preserves both memory and object abstraction. In this paper, we formally specify the SHAPES semantics and describe its memory safety model.
In modern architectures, due to the huge gap between CPU performance and memory bandwidth, an application’s performance highly depends on the speed at which the system is able to deliver data to operate on. The placement of data in memory affects the number of cache misses, and thus the overall speed of the application. To address this, pooling and splitting are two techniques that allow to group or split data in memory, according to whether they are usually accessed together or separately. However, theseare either low-level optimisations, or outside the control of the programmer. We propose OHMM, an object-oriented programming language that uses ownership-like annotations to express high-level constraintson how objects should be placed in memory. These annotations will allow the runtime to allocate objects using pooling andsplitting, and thus lead to efficient data accesses. In this short paper, we explain OHMM through an example, show how the objects will be laid out, and informally argue the benefits in terms of cache performance.
Erlang is a functional programming language with structural type-checking. Opaque types are the only types with a nominal component, where their names are used for type-checking. Using opaque types for nominal typing is possible, but it limits the use of pattern-matching and deconstruction to the module where it is defined. To distinguish types by names without imposing extra constraints, we introduce the new concept of nominal types for Erlang, together with a well-tested type-checking implementation in Dialyzer. We define a new syntax for declaring nominal types and a set of rules that specify how nominal types should be type-checked with respect to other nominal types and non-nominal types, which is designed to ensure backwards compatibility. Nominal type-checking is implemented on top of Dialyzer's structural type-checking logic. Through testing in the Erlang/OTP codebase, we show that nominal types can encode Erlang's opaque types, thereby improving Dialyzer's performance and maintainability.
This paper investigates methods for quantitatively examining the connectivity and knowledge flow in a university program considering courses and concepts included in the program. The proposed method is expected to be useful to aid program design and inventory, and for communicating what concepts a course may rely on at a given point in the program. As a first step, we represent the university program as a directed graph with courses and concepts as nodes and connections between courses and concepts as directed edges. Then, we investigate the connectivity and the flow through the graph in order to gain insights into the structure of the program. We thus perform two investigations based on data collected from an engineering program at a Swedish university: a) how to represent (parts of) the university program as a graph (here called Directed Courses-Concepts Graph (DCCG)), and b) how to use graph theory tools to analyse the coherence and structure of the program.
Jupyter notebooks has emerged as a standard tool for data science programming. Programs in Jupyter notebooks are different from typical programs as they are constructed by a collection of code snippets interleaved with text and visualisation. This allows interactive exploration and snippets may be executed in different order which may give rise to different results due to side-effects between snippets. Previous studies have shown the presence of considerable code duplication – code clones – in sources of traditional programs, in both so-called systems programming languages and so-called scripting languages. In this paper we present the first large-scale study of code cloning in Jupyter notebooks. We analyse a corpus of 2.7 million Jupyter notebooks hosted on GitHJub, representing 37 million individual snippets and 227 million lines of code. We study clones at the level of individual snippets, and study the extent to which snippets are recurring across multiple notebooks. We study both identical clones and approximate clones and conduct a small-scale ocular inspection of the most common clones. We find that code cloning is common in Jupyter notebooks – more than 70% of all code snippets are exact copies of other snippets (with possible differences in white spaces), and around 50% of all notebooks do not have any unique snippet, but consists solely of snippets that are also found elsewhere. In notebooks written in Python, at least 80% of all snippets are approximate clones and the prevalence of code cloning is higher in Python than in other languages. We further find that clones between different repositories are far more common than clones within the same repository. However, the most common individual repository from which a Jupyter notebook contains clones is the repository in which itself resides.
The study of Java as a programming language for scientific computing is warranted by simpler, more extensible and more easily maintainable code. Previous work on refactoring a C++ scientific computing code base to follow best practises of object-oriented software development revealed a coupling of such practises and considerable slowdowns due to indirections introduced by abstractions. In this paper, we explore how Java's JIT compiler handle such abstraction-induced indirection using a typical scientific computing compute kernel extracted from a linear solver written in C++. We find that the computation times for large workloads on one machine can be on-pair for C++ and Java. However, for distributed computations, a better parallelisation strategy needs to be found for non-blocking communication. We also report on the impact on performance for common "gripes": garbage collection, array bounds checking, and dynamic binding.
Dala is a design for a concurrent dynamic object-oriented language. A key goal of Dala's design is to avoid data races, by ensuring threads do not share mutable state. In this paper we discuss our experience using the program verification tool Dafny to validate Dala's design. We explain how we modelled salient features of Dala in Dafny, and how Dafny did (or did not) assist our confidence in Dala's design.
Object placement impacts cache utilisation, which is itself critical for performance. Managed languages offer fewer tools than unmanaged languages in the way of controlling object placement due to the abstract view of memory. On the other hand, managed languages often have garbage collectors (GC) that move objects as part of defragmentation. In the context of OpenJDK, Hot-Cold Objects Segregation GC (HCSGC) added locality improvement on-top of ZGC by piggybacking on its loaded value-barrier based design. In addition to the open problem of tuning HCSGC, we identify a contradiction in two of its design goals and propose LR, that addresses both these problems. We implement LR on-top of ZGC and compare it with GCs in OpenJDK and with the best performing HCSGC configuration using DaCapo, JGraphT and SPECjbb2015. While using less resources, LR outperforms HCSGC in 18 configurations, matches performance in 17, and regresses in 3.
How concurrent compacting collectors store and manage forwarding information is crucial for their performance.
In this paper, we propose CFW, a representation technique for forwarding information that guarantees that forwarding information for an entire heap can be stored in at most 3.14% of its size. By providing such a guarantee, we simplify the task of deploying programs with respect to memory needs. This is important given how memory is typically the dominating factor in the cost model for cloud-based deployments.
We explore the design space of our technique through a prototype implementation on-top of ZGC. A rigorous performance evaluation shows promising results.
Immutable data structures are a powerful tool for building concurrent programs. They allow the sharing of data without the need for locks or other synchronisation mechanisms. This makes it much easier to reason about the correctness of the program. In this paper, we focus on what we call deep immutability from freeze, that is, objects are initially mutable, and then can be frozen, and from that point on the object and everything it refers to (transitively) can no longer be mutated. A key challenge with this form of immutability is "how to manage the memory of cyclic data structures?" The standard approach is to use a garbage collector (GC), or a back-up cycle detector. These approaches sacrifice the promptness of memory reclamation, and the determinism of memory usage. In this paper, we argue that memory underlying an immutable data structure can be efficiently managed using reference counting even in the presence of cycles, based on the observation that the cycles are themselves immutable. Our approach takes a classic algorithm for calculating strongly connected components (SCCs) and managing equivalence classes with union-find (UF), and combines them so that the liveness of each SCC can be tracked efficiently using only a single reference counter. The key observation is that since the graph is unchanging, we can calculate the SCCs once, in time that is almost linear in the size of the graph, and then use the result to reference count at the level of the SCCs. This gives precise reachability information, and does not require any backup mechanism to detect or handle cycles.
Sustainable computing needs energy-efficient software. This paper explores the potential of leveraging the nature of software written in managed languages: increasing energy efficiency by changing a program’s memory management strategy without altering a single line of code. To this end, we perform comprehensive energy profiling of 35 Java applications across four benchmarks. In many cases, we find that it is possible to save energy by replacing the default G1 collector with another without sacrificing performance. Furthermore, potential energy savings can be even higher if performance regressions are permitted. Inspired by these results, we study what the most energy-efficient GCs are to help developers prune the search space for energy profiling at a low cost. Finally, we show that machine learning can be successfully applied to the problem of finding an energy-efficient GC configuration for an application, reducing the cost even further.