IceCube is a one-gigaton instrument located at the geographic South Pole, designed to detect cosmic neutrinos, identify the particle nature of dark matter, and study high-energy neutrinos themselves. Simulation of the IceCube detector and processing of data require a significant amount of computational resources. This paper presents the first detailed description of IceProd, a lightweight distributed management system designed to meet these requirements. It is driven by a central database in order to manage mass production of simulations and analysis of data produced by the IceCube detector. IceProd runs as a separate layer on top of other middleware and can take advantage of a variety of computing resources, including grids and batch systems such as CREAM, HTCondor, and PBS. This is accomplished by a set of dedicated daemons that process job submission in a coordinated fashion through the use of middleware plugins that serve to abstract the details of job submission and job management from the framework. (C) 2014 Elsevier Inc. All rights reserved.
Deductive verification has been successful in verifying interesting properties of real-world programs. One notable gap is the limited support for floating-point reasoning. This is unfortunate, as floating-point arithmetic is particularly unintuitive to reason about due to rounding as well as the presence of the special values infinity and ‘Not a Number’ (NaN). In this article, we present the first floating-point support in a deductive verification tool for the Java programming language. Our support in the KeY verifier handles floating-point arithmetics, transcendental functions, and potentially rounding-type casts. We achieve this with a combination of delegation to external SMT solvers on the one hand, and KeY-internal, rule-based reasoning on the other hand, exploiting the complementary strengths of both worlds. We evaluate this integration on new benchmarks and show that this approach is powerful enough to prove the absence of floating-point special values—often a prerequisite for correct programs—as well as functional properties, for realistic benchmarks.
Representational Similarity Analysis (RSA) is a technique developed by neuroscientists for comparing activity patterns of different measurement modalities (e.g., fMRI, electrophysiology, behavior). As a framework, RSA has several advantages over existing approaches to interpretation of language encoders based on probing or diagnostic classification: namely, it does not require large training samples, is not prone to overfitting, and it enables a more transparent comparison between the representational geometries of different models and modalities. We demonstrate the utility of RSA by establishing a previously unknown correspondence between widely-employed pre-trained language encoders and human processing difficulty via eye-tracking data, showcasing its potential in the interpretability toolbox for neural models.
Recent studies have shown that language models pretrained and/or fine-tuned on randomly permuted sentences exhibit competitive performance on GLUE, putting into question the importance of word order information. Somewhat counter-intuitively, some of these studies also report that position embeddings appear to be crucial for models' good performance with shuffled text. We probe these language models for word order information and investigate what position embeddings learned from shuffled text encode, showing that these models retain information pertaining to the original, naturalistic word order. We show this is in part due to a subtlety in how shuffling is implemented in previous work - before rather than after subword segmentation. Surprisingly, we find even Language models trained on text shuffled after subword segmentation retain some semblance of information about word order because of the statistical dependencies between sentence length and unigram probabilities. Finally, we show that beyond GLUE, a variety of language understanding tasks do require word order information, often to an extent that cannot be learned through fine-tuning.
We introduce the simple and efficient method of monotonic abstraction to prove safety properties for parameterized systems with linear topologies. A process in the system is a finite-state automaton, where the transitions are guarded by both local and global conditions. Processes may communicate via broadcast, rendez-vous and shared variables over finite domains. The method of monotonic abstraction derives an over-approximation of the induced transition system that allows the use of a simple class of regular expressions as a symbolic representation. Compared to traditional regular model checking methods, the analysis does not require the manipulation of transducers, and hence its simplicity and efficiency. We have implemented a prototype that works well on several mutual exclusion algorithms and cache coherence protocols
We consider parameterized verification of concurrent programs under the Total Store Order (TSO) semantics. A program consists of a set of processes that share a set of variables on which they can perform read and write operations. We show that the reachability problem for a system consisting of an arbitrary number of identical processes is PSPACE-complete. We prove that the complexity is reduced to polynomial time if the processes are not allowed to read the initial values of the variables in the memory. When the processes are allowed to perform atomic read-modify-write operations, the reachability problem has a non-primitive recursive complexity.
We extend the classical model of lossy channel systems by considering systems that operate on a finite set of variables ranging over an infinite data domain. Furthermore, each message inside a channel is equipped with a data item representing its value. Although we restrict the model by allowing the variables to be only tested for (dis-)equality, we show that the state reachability problem is undecidable. In light of this negative result, we consider bounded-phase reachability, where the processes are restricted to performing either send or receive operations during each phase. We show decidability of state reachability in this case by computing a symbolic encoding of the set of system configurations that are reachable from a given configuration.
Partial order reduction has traditionally been based on persistent sets, ample sets, stubborn sets, or variants thereof. Recently, we have presented a strengthening of this foundation, using source sets instead of persistent/ample/stubborn sets. Source sets subsume persistent sets and are often smaller than persistent sets. We introduced source sets as a basis for Dynamic Partial Order Reduction (DPOR), in a framework which assumes that processes are deterministic and that all program executions are finite. In this paper, show how to use source sets for partial order reduction in a framework which does not impose these restrictions. We also compare source sets with persistent sets, providing some insights into conditions under which source sets and persistent sets do or do not differ.
We present Probabilistic Total Store Ordering (PTSO) - a probabilistic extension of the classical TSO semantics. For a given (finite-state) program, the operational semantics of PTSO induces an infinite-state Markov chain. We resolve the inherent non-determinism due to process schedulings and memory updates according to given probability distributions. We provide a comprehensive set of results showing the decidability of several properties for PTSO, namely (i) Almost-Sure (Repeated) Reachability: whether a run, starting from a given initial configuration, almost surely visits (resp. almost surely repeatedly visits) a given set of target configurations. (ii) Almost-Never (Repeated) Reachability: whether a run from the initial configuration, almost never visits (resp. almost never repeatedly visits) the target. (iii) Approximate Quantitative (Repeated) Reachability: to approximate, up to an arbitrary degree of precision, the measure of runs that start from the initial configuration and (repeatedly) visit the target. (iv) Expected Average Cost: to approximate, up to an arbitrary degree of precision, the expected average cost of a run from the initial configuration to the target. We derive our results through a nontrivial combination of results from the classical theory of (infinite-state) Markov chains, the theories of decisive and eager Markov chains, specific techniques from combinatorics, as well as, decidability and complexity results for the classical (non-probabilistic) TSO semantics. As far as we know, this is the first work that considers probabilistic verification of programs running on weak memory models.
We consider the problem of safety verification, formalized as control-state reachability, for concurrent programs running on the Power architecture. Our main result shows that safety verification under Power is undecidable for programs with just two threads.
We consider the verification of concurrent programs and, in particular, the challenges that arise because modern platforms only guarantee weak semantics, i.e., semantics that are weaker than the classical Sequential Consistency (SC). We describe two architectural concepts that give rise to weak semantics, namely weak consistency and weak persistency. The former defines the order in which operations issued by a given process become visible to the rest of the processes. The latter prescribes the order in which data becomes persistent. To deal with the extra complexity in program behaviors that arises due to weak semantics, we propose translating the program verification problem under weak semantics to SC. The main principle is to augment the program with a set of (unbounded) data structures that guarantee the equivalence of the source program’s behavior under the weak semantics with the augmented program’s behavior under the SC semantics. Such an equivalence opens the door to leverage, albeit in a non-trivial manner, the rich set of techniques that we have developed over the years for program verification under the SC semantics. We illustrate the framework’s potential by considering the persistent version of the well-known Total Store Order semantics. We show that we can capture the program behaviors on such a platform using a finite set of unbounded monotone FIFO buffers. The use of monotone FIFO buffers allows the use of the well-structured-systems framework to prove the decidability of the reachability problem.
We address the problem of verifying the reachability problem in programs running under the formal model Px86 defined recently by Raad et al. in POPL'20 for the persistent Intel x86 architecture. We prove that this problem is decidable. To achieve that, we provide a new formal model that is equivalent to Px86 and that has the feature of being a well structured system. Deriving this new model is the result of a deep investigation of the properties of Px86 and the interplay of its components.
The verification of reachability properties for programs under weak memory models is a hard problem, even undecidable in some cases. The decidability of this problem has been investigated so far in the case of static programs where the number of threads does not change during execution. However, dynamic thread creation is crucial in asynchronous concurrent programming. In this paper, we address the decidability of the reachability problem for dynamic concurrent programs running under TSO. An important issue when considering a TSO model in this case is maintaining causality precedence between operations issued by threads and those issued by their children. We propose a general TSO model that respects causality and prove that the reachability problem for programs with dynamic creation of threads is decidable.
String-number conversion is an important class of constraints needed for the symbolic execution of string-manipulating programs. In particular solving string constraints with string-number conversion is necessary for the analysis of scripting languages such as JavaScript and Python, where string-number conversion is a part of the definition of the core semantics of these languages. However, solving this type of constraint is very challenging for the state-of-the-art solvers. We propose in this paper an approach that can efficiently support both string-number conversion and other common types of string constraints. Experimental results show that it significantly outperforms other state-of-the-art tools on benchmarks that involves string-number conversion.
Not-substring is currently among the least supported types of string constraints, and existing solvers use only relatively crude heuristics. Yet, not-substring occurs relatively often in practical examples and is useful in encoding other types of constraints. In this paper, we propose a systematic way to solve not-substring using based on flat abstraction. In this framework, the domain of string variables is restricted to flat languages and subsequently the whole constraints can be expressed as linear arithmetic formulae. We show that non-substring constraints can be flattened efficiently, and provide experimental evidence that the proposed solution for not-substring is competitive with the state of the art string solvers.
We address the separability problem for straight-line string constraints. The separability problem for languages of a class C by a class S asks: given two languages A and B in C, does there exist a language I in S separating A and B (i.e., I is a superset of A and disjoint from B)? The separability of string constraints is the same as the fundamental problem of interpolation for string constraints. We first show that regular separability of straight line string constraints is undecidable. Our second result is the decidability of the separability problem for straight-line string constraints by piece-wise testable languages, though the precise complexity is open. In our third result, we consider the positive fragment of piece-wise testable languages as a separator, and obtain an ExpSpace algorithm for the separability of a useful class of straight-line string constraints, and a Pspace-hardness result.
We present a preliminary formalization based on transition systems of decentralized contact tracing protocols for smart devices equipped with Bluetooth trasmitters. In our model the behaviour of individual users, via their app, is modelled as a timed automata with a local unbounded memory. Protocol configurations consist of the current state of a shared server and a finite set of local states containing the states of individual users. The transition system models the interaction between devices in the same physical location and between a sigle device and the shared server. In the paper we address different research directions concerning semi-automated verification based on automated reasoning tools of the considered class of protocols, theoretical issues related to the expressiveness of the resulting class of formal models, and data-driven analysis of the logs collected on the server as well as on user devices.
We present a new approach for stateless model checking (SMC) of multithreaded programs under Sequential Consistency (SC) semantics. To combat state-space explosion, SMC is often equipped with a partial-order reduction technique, which defines an equivalence on executions, and only needs to explore one execution in each equivalence class. Recently, it has been observed that the commonly used equivalence of Mazurkiewicz traces can be coarsened but still cover all program crashes and assertion violations. However, for this coarser equivalence, which preserves only the reads-from relation from writes to reads, there is no SMC algorithm which is (i) optimal in the sense that it explores precisely one execution in each reads-from equivalence class, and (ii) efficient in the sense that it spends polynomial effort per class. \end{inparaenum} We present the first SMC algorithm for SC that is both optimal and efficient in practice, meaning that it spends polynomial time per equivalence class on all programs that we have tried. This is achieved by a novel test that checks whether a given reads-from relation can arise in some execution. Our experimental results show that Nidhugg/rfsc, although slower than the fastest SMC tools in programs where tools happen to examine the same number of executions, always scales similarly or better than them, and outperforms them by an exponential factor in programs where the reads-from equivalence is coarser than the standard one. We also present two non-trivial use cases where the new equivalence is particularly effective, as well as the significant performance advantage that Nidhugg/rfsc offers compared to state-of-the-art SMC and systematic concurrency testing tools.
In this paper, we give a step by step introduction to the theory of well quasi-ordered transition systems. The framework combines two concepts, namely (i) transition systems which are monotonic wrt. a well-quasi ordering; and (ii) a scheme for symbolic backward reachability analysis. We describe several models with infinite-state spaces, which can be analyzed within the framework, e.g., Petri nets, lossy channel systems, timed automata, timed Petri nets, and multiset rewriting systems. We will also present better quasi-ordered transition systems which allow the design of efficient symbolic representations of infinite sets of states.
A formal model called database manipulating systems was introduced to model data-aware dynamic systems. Its semantics is given by an infinite labelled transition systems where a label can be an unbounded relational database. Reachability problem is undecidable over schemas consisting of either a binary relation or two unary relations. We study the reachability problem under schema restrictions and restrictions on the query language. We provide tight complexity bounds for different combinations of schema and query language, by reductions to/from standard formalism of infinite state systems such as Petri nets and counter systems. Our reductions throw light into the connections between these two seemingly unrelated models.
We address the verification of concurrent programs running under the release-acquire (RA) semantics. We show that the reachability problem is undecidable even in the case where the input program is finite-state. Given this undecidability, we follow the spirit of the work on context-bounded analysis for detecting bugs in programs under the classical SC model, and propose an under-approximate reachability analysis for the case of RA. To this end, we propose a novel notion, called view-switching, and provide a code-to-code translation from an input program under RA to a program under SC. This leads to a reduction, in polynomial time, of the bounded view-switching reachability problem under RA to the bounded context-switching problem under SC. We have implemented a prototype tool VBMC and tested it on a set of benchmarks, demonstrating that many bugs in programs can be found using a small number of view switches.
We address the problem of verifying safety properties of concurrent programs running over the Total Store Order (TSO) memory model. Known decision procedures for this model are based on complex encodings of store buffers as lossy channels. These procedures assume that the number of processes is fixed. However, it is important in general to prove the correctness of a system/algorithm in a parametric way with an arbitrarily large number of processes.
In this paper, we introduce an alternative (yet equivalent) semantics to the classical one for the TSO semantics that is more amenable to efficient algorithmic verification and for the extension to parametric verification. For that, we adopt a dual view where load buffers are used instead of store buffers. The flow of information is now from the memory to load buffers. We show that this new semantics allows (1) to simplify drastically the safety analysis under TSO, (2) to obtain a spectacular gain in efficiency and scalability compared to existing procedures, and (3) to extend easily the decision procedure to the parametric case, which allows obtaining a new decidability result, and more importantly, a verification algorithm that is more general and more efficient in practice than the one for bounded instances.
We introduce TRAU, an SMT solver for an expressive constraint language, including word equations, length constraints, context-free membership queries, and transducer constraints. The satisfiability problem for such a class of constraints is in general undecidable. The key idea behind TRAU is a technique called flattening, which searches for satisfying assignments that follow simple patterns. TRAU implements a Counter-Example Guided Abstraction Refinement (CEGAR) framework which contains both an under- and an over-approximation module. The approximations are refined in an automatic manner by information flow between the two modules. The technique implemented by TRAU can handle a rich class of string constraints and has better performance than state-of-the-art string solvers.
We present version 1.0 of the Norn SMT solver for string constraints. Norn is a solver for an expressive constraint language, including word equations, length constraints, and regular membership queries. As a feature distinguishing Norn from other SMT solvers, Norn is a decision procedure under the assumption of a set of acyclicity conditions on word equations, without any restrictions on the use of regular membership.
We consider the reachability problem for finite-state multi-threaded programs under the promising semantics (PS 2.0) of Lee et al., which captures most common program transformations. Since reachability is already known to be undecidable in the fragment of PS 2.0 with only release-acquire accesses (PS 2.0-ra), we consider the fragment with only relaxed accesses and promises (PS 2.0-rlx). We show that reachability under PS 2.0-rlx is undecidable in general and that it becomes decidable, albeit non-primitive recursive, if we bound the number of promises. Given these results, we consider a bounded version of the reachability problem. To this end, we bound both the number of promises and of "view-switches", i.e., the number of times the processes may switch their local views of the global memory. We provide a code-to-code translation from an input program under PS 2.0 (with relaxed and release-acquire memory accesses along with promises) to a program under SC, thereby reducing the bounded reachability problem under PS 2.0 to the bounded context-switching problem under SC. We have implemented a tool and tested it on a set of benchmarks, demonstrating that typical bugs in programs can be found with a small bound.
We present the first framework for efficient application of stateless model checking (SMC) to programs running under the relaxed memory model of POWER. The framework combines several contributions. The first contribution is that we develop a scheme for systematically deriving operational execution models from existing axiomatic ones. The scheme is such that the derived execution models are well suited for efficient SMC. We apply our scheme to the axiomatic model of POWER from [8]. Our main contribution is a technique for efficient SMC, called Relaxed Stateless Model Checking (RSMC), which systematically explores the possible inequivalent executions of a program. RSMC is suitable for execution models obtained using our scheme. We prove that RSMC is sound and optimal for the POWER memory model, in the sense that each complete program behavior is explored exactly once. We show the feasibility of our technique by providing an implementation for programs written in C/pthreads.
We describe at a high-level the main concepts in the Release-Acquire (RA) semantics that is part of the C11 language. Furthermore, we describe the ideas behind an optimal dynamic partial order reduction technique that can be used for systematic analysis of concurrent programs running under RA. This tutorial is based on the material presented in [5], which also contains the formal definitions of all the models, concepts, and algorithms.