IceCube is a one-gigaton instrument located at the geographic South Pole, designed to detect cosmic neutrinos, identify the particle nature of dark matter, and study high-energy neutrinos themselves. Simulation of the IceCube detector and processing of data require a significant amount of computational resources. This paper presents the first detailed description of IceProd, a lightweight distributed management system designed to meet these requirements. It is driven by a central database in order to manage mass production of simulations and analysis of data produced by the IceCube detector. IceProd runs as a separate layer on top of other middleware and can take advantage of a variety of computing resources, including grids and batch systems such as CREAM, HTCondor, and PBS. This is accomplished by a set of dedicated daemons that process job submission in a coordinated fashion through the use of middleware plugins that serve to abstract the details of job submission and job management from the framework. (C) 2014 Elsevier Inc. All rights reserved.
Differentiation of oligodendroglia lineage cells in humans still remains largely unclear. Oligodendrocyte progenitor cells (OPCs) are known to participate in remyelination processes by proliferating, migrating to the area of the lesion and then differentiating into oligodendrocytes (OLs), which can myelinate the affected axons again. This has sparked an interest in OPCs, since cell transplant could be a potential form of therapy for demyelinating diseases such as multiple sclerosis. However, that is not the only relevant aspect about them. OPCs have been shown to present heterogeneous populations with different functions, such as participating in immunological processes or responses to injury.
Single cell technologies have become a powerful tool for the identification of unknown functions in OPCs and the characterization of the evolution of the oligodendroglia lineage. In this project, we analysed single-nuclei data of human foetal brain samples. For most of the steps of this pipeline, we used the Scanpy toolbox. In order to mitigate batch effects in our data, the Harmony algorithm was used for the correction. The Harmony-corrected principal components still retained part of the bias by batch. Leiden graph-based clustering resulted in a total of 19 clusters, 14 of which we were able to successfully annotate. Annotation was performed in combination of differential expression analysis and literature markers from public datasets. We obtained a single OPC cluster in our data, but marker genes expression suggests not all cells within this cluster are equally mature. Instead, some of them seem to be closer to commitment to an OL fate. This hypothesis would have to be confirmed by lineage inference analysis, which we could not include in this study. Finally, validation of our annotation with label transfer gave mixed results depending on the dataset used. This step was performed in Seurat. A possible explanation of these results could be sensitivity to differences between plate-based and droplet-based technologies for library preparation before sequencing. OPCs were successfully transferred regardless of the dataset used, so we can be certain of their identity.
The field of deep learning has become increasingly important for particle physics experiments, yielding a multitude of advances, predominantly in event classification and reconstruction tasks. Many of these applications have been adopted from other domains. However, data in the field of physics are unique in the context of machine learning, insofar as their generation process and the laws and symmetries they abide by are usually well understood. Most commonly used deep learning architectures fail at utilizing this available information. In contrast, more traditional likelihood-based methods are capable of exploiting domain knowledge, but they are often limited by computational complexity.
In this contribution, a hybrid approach is presented that utilizes generative neural networks to approximate the likelihood, which may then be used in a traditional maximum-likelihood setting. Domain knowledge, such as invariances and detector characteristics, can easily be incorporated in this approach. The hybrid approach is illustrated by the example of event reconstruction in IceCube.
Deductive verification has been successful in verifying interesting properties of real-world programs. One notable gap is the limited support for floating-point reasoning. This is unfortunate, as floating-point arithmetic is particularly unintuitive to reason about due to rounding as well as the presence of the special values infinity and ‘Not a Number’ (NaN). In this article, we present the first floating-point support in a deductive verification tool for the Java programming language. Our support in the KeY verifier handles floating-point arithmetics, transcendental functions, and potentially rounding-type casts. We achieve this with a combination of delegation to external SMT solvers on the one hand, and KeY-internal, rule-based reasoning on the other hand, exploiting the complementary strengths of both worlds. We evaluate this integration on new benchmarks and show that this approach is powerful enough to prove the absence of floating-point special values—often a prerequisite for correct programs—as well as functional properties, for realistic benchmarks.
Periodic signals can be modeled as a real wave with unknown period in cascade with a piecewise linear function. In this report, a recursive Gauss-Newton prediction error identification algorithm for joint estimation of the driving frequency and the parameters of the nonlinear output function parameterized in a number of adaptively estimated grid points is introduced. The Cramer-Rao bound (CRB) is derived for the suggested algorithm. Numerical examples indicate that the suggested algorithm gives better performance than using fixed grid point algorithms and easily can be modified to track both the fundamental frequency variations and the time varying amplitude.
Periodic signals can be modeled by means of second-order nonlinear ordinary differential equations (ODE's). The right hand side function of the ODE is parameterized in terms of known basis functions. The least squares algorithm developed for estimating the coefficients of these basis functions gives biased estimates, especially at low signal to noise ratios. This is due to noise contributions to the periodic signal and its derivatives evaluated using finite difference approximations. In this paper an analysis for this bias is given.
Representational Similarity Analysis (RSA) is a technique developed by neuroscientists for comparing activity patterns of different measurement modalities (e.g., fMRI, electrophysiology, behavior). As a framework, RSA has several advantages over existing approaches to interpretation of language encoders based on probing or diagnostic classification: namely, it does not require large training samples, is not prone to overfitting, and it enables a more transparent comparison between the representational geometries of different models and modalities. We demonstrate the utility of RSA by establishing a previously unknown correspondence between widely-employed pre-trained language encoders and human processing difficulty via eye-tracking data, showcasing its potential in the interpretability toolbox for neural models.
Recent studies have shown that language models pretrained and/or fine-tuned on randomly permuted sentences exhibit competitive performance on GLUE, putting into question the importance of word order information. Somewhat counter-intuitively, some of these studies also report that position embeddings appear to be crucial for models' good performance with shuffled text. We probe these language models for word order information and investigate what position embeddings learned from shuffled text encode, showing that these models retain information pertaining to the original, naturalistic word order. We show this is in part due to a subtlety in how shuffling is implemented in previous work - before rather than after subword segmentation. Surprisingly, we find even Language models trained on text shuffled after subword segmentation retain some semblance of information about word order because of the statistical dependencies between sentence length and unigram probabilities. Finally, we show that beyond GLUE, a variety of language understanding tasks do require word order information, often to an extent that cannot be learned through fine-tuning.
We introduce the simple and efficient method of monotonic abstraction to prove safety properties for parameterized systems with linear topologies. A process in the system is a finite-state automaton, where the transitions are guarded by both local and global conditions. Processes may communicate via broadcast, rendez-vous and shared variables over finite domains. The method of monotonic abstraction derives an over-approximation of the induced transition system that allows the use of a simple class of regular expressions as a symbolic representation. Compared to traditional regular model checking methods, the analysis does not require the manipulation of transducers, and hence its simplicity and efficiency. We have implemented a prototype that works well on several mutual exclusion algorithms and cache coherence protocols
We consider parameterized verification of concurrent programs under the Total Store Order (TSO) semantics. A program consists of a set of processes that share a set of variables on which they can perform read and write operations. We show that the reachability problem for a system consisting of an arbitrary number of identical processes is PSPACE-complete. We prove that the complexity is reduced to polynomial time if the processes are not allowed to read the initial values of the variables in the memory. When the processes are allowed to perform atomic read-modify-write operations, the reachability problem has a non-primitive recursive complexity.
We consider symbolic verification for a class of parameterized systems, where a system consists of a linear array of processes, and where an action of a process may in general be guarded by both local conditions restricting the state of the process about to perform the action, and global conditions defining the context in which the action is enabled. Such actions are present, e.g., in idealized versions of mutual exclusion protocols, such as the bakery and ticket algorithms by Lamport, Burn’s protocol, Dijkstra’s algorithm, and Szymanski’s algorithm. The presence of both local and global conditions makes the parameterized versions of these protocols infeasible to analyze fully automatically, using existing model checking methods for parameterized systems. In all these methods the actions are guarded only by local conditions involving the states of a finite set of processes. We perform verification using a standard symbolic reachability algorithm enhanced by an operation to accelerate the search of the state space. The acceleration operation computes the effect of an arbitrary number of applications of an action, rather than a single application. This is crucial for convergence of the analysis e.g. when applying the algorithm to the above protocols. We illustrate the use of our method through an application to Szymanski’s algorithm.
We extend the classical model of lossy channel systems by considering systems that operate on a finite set of variables ranging over an infinite data domain. Furthermore, each message inside a channel is equipped with a data item representing its value. Although we restrict the model by allowing the variables to be only tested for (dis-)equality, we show that the state reachability problem is undecidable. In light of this negative result, we consider bounded-phase reachability, where the processes are restricted to performing either send or receive operations during each phase. We show decidability of state reachability in this case by computing a symbolic encoding of the set of system configurations that are reachable from a given configuration.
The goal of parameterized verification is to prove the correctness of a system specification regardless of the number of its components. The problem is of interest in several different areas: verification of hardware design, multithreaded programs, distributed systems, and communication protocols. The problem is undecidable in general. Solutions for restricted classes of systems and properties have been studied in areas like theorem proving, model checking, automata and logic, process algebra, and constraint solving. In this introduction to the special issue, dedicated to a selection of works from the Parameterized Verification workshop PV '14 and PV '15, we survey some of the works developed in this research area.
We propose a formal model of concurrent systems in which the history of a computation is explicitly represented as a collection of events that provide a view of a sequence of configurations. In our model events generated by transitions become part of the system configurations leading to operational semantics with historical data. This model allows us to formalize what is usually done in symbolic verification algorithms. Indeed, search algorithms often use meta-information, e.g., names of fired transitions, selected processes, etc., to reconstruct (error) traces from symbolic state exploration. The other interesting point of the proposed model is related to a possible new application of the theory of well-structured transition systems (wsts). In our setting wsts theory can be applied to formally extend the class of properties that can be verified using coverability to take into consideration (ordered and unordered) historical data. This can be done by using different types of representation of collections of events and by combining them with wsts by using closure properties of well-quasi orderings.
Partial order reduction has traditionally been based on persistent sets, ample sets, stubborn sets, or variants thereof. Recently, we have presented a strengthening of this foundation, using source sets instead of persistent/ample/stubborn sets. Source sets subsume persistent sets and are often smaller than persistent sets. We introduced source sets as a basis for Dynamic Partial Order Reduction (DPOR), in a framework which assumes that processes are deterministic and that all program executions are finite. In this paper, show how to use source sets for partial order reduction in a framework which does not impose these restrictions. We also compare source sets with persistent sets, providing some insights into conditions under which source sets and persistent sets do or do not differ.
Stateless model checking is a powerful technique for program verification, which however suffers from an exponential growth in the number of explored executions. A successful technique for reducing this number, while still maintaining complete coverage, is Dynamic Partial Order Reduction (DPOR). We present a new DPOR algorithm, which is the first to be provably optimal in that it always explores the minimal number of executions. It is based on a novel class of sets, called source sets, which replace the role of persistent sets in previous algorithms. First, we show how to modify an existing DPOR algorithm to work with source sets, resulting in an efficient and simple to implement algorithm. Second, we extend this algorithm with a novel mechanism, called wakeup trees, that allows to achieve optimality. We have implemented both algorithms in a stateless model checking tool for Erlang programs. Experiments show that source sets significantly increase the performance and that wakeup trees incur only a small overhead in both time and space.
Stateless model checking is a powerful method for program verification that, however, suffers from an exponential growth in the number of explored executions. A successful technique for reducing this number, while still maintaining complete coverage, is Dynamic Partial Order Reduction (DPOR), an algorithm originally introduced by Flanagan and Godefroid in 2005 and since then not only used as a point of reference but also extended by various researchers. In this article, we present a new DPOR algorithm, which is the first to be provably optimal in that it always explores the minimal number of executions. It is based on a novel class of sets, called source sets, that replace the role of persistent sets in previous algorithms. We begin by showing how to modify the original DPOR algorithm to work with source sets, resulting in an efficient and simple-to-implement algorithm, called source-DPOR. Subsequently, we enhance this algorithm with a novel mechanism, called wakeup trees, that allows the resulting algorithm, called optimal-DPOR, to achieve optimality. Both algorithms are then extended to computational models where processes may disable each other, for example, via locks. Finally, we discuss tradeoffs of the source-and optimal-DPOR algorithm and present programs that illustrate significant time and space performance differences between them. We have implemented both algorithms in a publicly available stateless model checking tool for Erlang programs, while the source-DPOR algorithm is at the core of a publicly available stateless model checking tool for C/pthread programs running on machines with relaxed memory models. Experiments show that source sets significantly increase the performance of stateless model checking compared to using the original DPOR algorithm and that wakeup trees incur only a small overhead in both time and space in practice.
We present Probabilistic Total Store Ordering (PTSO) - a probabilistic extension of the classical TSO semantics. For a given (finite-state) program, the operational semantics of PTSO induces an infinite-state Markov chain. We resolve the inherent non-determinism due to process schedulings and memory updates according to given probability distributions. We provide a comprehensive set of results showing the decidability of several properties for PTSO, namely (i) Almost-Sure (Repeated) Reachability: whether a run, starting from a given initial configuration, almost surely visits (resp. almost surely repeatedly visits) a given set of target configurations. (ii) Almost-Never (Repeated) Reachability: whether a run from the initial configuration, almost never visits (resp. almost never repeatedly visits) the target. (iii) Approximate Quantitative (Repeated) Reachability: to approximate, up to an arbitrary degree of precision, the measure of runs that start from the initial configuration and (repeatedly) visit the target. (iv) Expected Average Cost: to approximate, up to an arbitrary degree of precision, the expected average cost of a run from the initial configuration to the target. We derive our results through a nontrivial combination of results from the classical theory of (infinite-state) Markov chains, the theories of decisive and eager Markov chains, specific techniques from combinatorics, as well as, decidability and complexity results for the classical (non-probabilistic) TSO semantics. As far as we know, this is the first work that considers probabilistic verification of programs running on weak memory models.
We consider the problem of safety verification, formalized as control-state reachability, for concurrent programs running on the Power architecture. Our main result shows that safety verification under Power is undecidable for programs with just two threads.
We consider the verification of concurrent programs and, in particular, the challenges that arise because modern platforms only guarantee weak semantics, i.e., semantics that are weaker than the classical Sequential Consistency (SC). We describe two architectural concepts that give rise to weak semantics, namely weak consistency and weak persistency. The former defines the order in which operations issued by a given process become visible to the rest of the processes. The latter prescribes the order in which data becomes persistent. To deal with the extra complexity in program behaviors that arises due to weak semantics, we propose translating the program verification problem under weak semantics to SC. The main principle is to augment the program with a set of (unbounded) data structures that guarantee the equivalence of the source program’s behavior under the weak semantics with the augmented program’s behavior under the SC semantics. Such an equivalence opens the door to leverage, albeit in a non-trivial manner, the rich set of techniques that we have developed over the years for program verification under the SC semantics. We illustrate the framework’s potential by considering the persistent version of the well-known Total Store Order semantics. We show that we can capture the program behaviors on such a platform using a finite set of unbounded monotone FIFO buffers. The use of monotone FIFO buffers allows the use of the well-structured-systems framework to prove the decidability of the reachability problem.
We address the problem of verifying the reachability problem in programs running under the formal model Px86 defined recently by Raad et al. in POPL'20 for the persistent Intel x86 architecture. We prove that this problem is decidable. To achieve that, we provide a new formal model that is equivalent to Px86 and that has the feature of being a well structured system. Deriving this new model is the result of a deep investigation of the properties of Px86 and the interplay of its components.
The full semantics of the Intel-x86 architecture has been defined by Raad et al in POPL 2022, extending the earlier formalization based on the TSO memory model incorporating persistency. This new semantics involves an intricate combination of the SC, TSO, and PSO models to account for the diverse features of the enlarged instruction set. In this paper we investigate the reachability problem under this semantics, including both its consistency and persistency aspects each of which requires reasoning about unbounded operation reorderings. Our first contribution is to show that reachability under this model can be reduced to reachability under a model without the persistency component. This is achieved by showing that the persistency semantics can be simulated by a finite-state protocol running in parallel with the program. Our second contribution is to prove that reachability under the consistency model of Intel-x86 (even without crashes and persistency) is undecidable. Undecidability is obtained as soon as one thread in the program is allowed to use both TSO variables and two PSO variables. The third contribution is showing that for any fixed bound on the alternation between TSO writes (write-backs), and PSO writes (non-temporal writes), the reachability problem is decidable. This defines a complete parametrized schema for under-approximate analysis that can be used for bug finding.
The verification of reachability properties for programs under weak memory models is a hard problem, even undecidable in some cases. The decidability of this problem has been investigated so far in the case of static programs where the number of threads does not change during execution. However, dynamic thread creation is crucial in asynchronous concurrent programming. In this paper, we address the decidability of the reachability problem for dynamic concurrent programs running under TSO. An important issue when considering a TSO model in this case is maintaining causality precedence between operations issued by threads and those issued by their children. We propose a general TSO model that respects causality and prove that the reachability problem for programs with dynamic creation of threads is decidable.
String-number conversion is an important class of constraints needed for the symbolic execution of string-manipulating programs. In particular solving string constraints with string-number conversion is necessary for the analysis of scripting languages such as JavaScript and Python, where string-number conversion is a part of the definition of the core semantics of these languages. However, solving this type of constraint is very challenging for the state-of-the-art solvers. We propose in this paper an approach that can efficiently support both string-number conversion and other common types of string constraints. Experimental results show that it significantly outperforms other state-of-the-art tools on benchmarks that involves string-number conversion.
Not-substring is currently among the least supported types of string constraints, and existing solvers use only relatively crude heuristics. Yet, not-substring occurs relatively often in practical examples and is useful in encoding other types of constraints. In this paper, we propose a systematic way to solve not-substring using based on flat abstraction. In this framework, the domain of string variables is restricted to flat languages and subsequently the whole constraints can be expressed as linear arithmetic formulae. We show that non-substring constraints can be flattened efficiently, and provide experimental evidence that the proposed solution for not-substring is competitive with the state of the art string solvers.
We address the separability problem for straight-line string constraints. The separability problem for languages of a class C by a class S asks: given two languages A and B in C, does there exist a language I in S separating A and B (i.e., I is a superset of A and disjoint from B)? The separability of string constraints is the same as the fundamental problem of interpolation for string constraints. We first show that regular separability of straight line string constraints is undecidable. Our second result is the decidability of the separability problem for straight-line string constraints by piece-wise testable languages, though the precise complexity is open. In our third result, we consider the positive fragment of piece-wise testable languages as a separator, and obtain an ExpSpace algorithm for the separability of a useful class of straight-line string constraints, and a Pspace-hardness result.
We present a preliminary formalization based on transition systems of decentralized contact tracing protocols for smart devices equipped with Bluetooth trasmitters. In our model the behaviour of individual users, via their app, is modelled as a timed automata with a local unbounded memory. Protocol configurations consist of the current state of a shared server and a finite set of local states containing the states of individual users. The transition system models the interaction between devices in the same physical location and between a sigle device and the shared server. In the paper we address different research directions concerning semi-automated verification based on automated reasoning tools of the considered class of protocols, theoretical issues related to the expressiveness of the resulting class of formal models, and data-driven analysis of the logs collected on the server as well as on user devices.
We present a new approach for stateless model checking (SMC) of multithreaded programs under Sequential Consistency (SC) semantics. To combat state-space explosion, SMC is often equipped with a partial-order reduction technique, which defines an equivalence on executions, and only needs to explore one execution in each equivalence class. Recently, it has been observed that the commonly used equivalence of Mazurkiewicz traces can be coarsened but still cover all program crashes and assertion violations. However, for this coarser equivalence, which preserves only the reads-from relation from writes to reads, there is no SMC algorithm which is (i) optimal in the sense that it explores precisely one execution in each reads-from equivalence class, and (ii) efficient in the sense that it spends polynomial effort per class. \end{inparaenum} We present the first SMC algorithm for SC that is both optimal and efficient in practice, meaning that it spends polynomial time per equivalence class on all programs that we have tried. This is achieved by a novel test that checks whether a given reads-from relation can arise in some execution. Our experimental results show that Nidhugg/rfsc, although slower than the fastest SMC tools in programs where tools happen to examine the same number of executions, always scales similarly or better than them, and outperforms them by an exponential factor in programs where the reads-from equivalence is coarser than the standard one. We also present two non-trivial use cases where the new equivalence is particularly effective, as well as the significant performance advantage that Nidhugg/rfsc offers compared to state-of-the-art SMC and systematic concurrency testing tools.
We present a framework for efficient stateless model checking (SMC) of concurrent programs under three prominent models of causal consistency, . Our approach is based on exploring traces under the program order and the reads from relations. Our SMC algorithm is provably optimal in the sense that it explores each and relation exactly once. We have implemented our framework in a tool called Conschecker. Experiments show that Conschecker performs well in detecting anomalies in classical distributed databases benchmarks.
In this paper, we give a step by step introduction to the theory of well quasi-ordered transition systems. The framework combines two concepts, namely (i) transition systems which are monotonic wrt. a well-quasi ordering; and (ii) a scheme for symbolic backward reachability analysis. We describe several models with infinite-state spaces, which can be analyzed within the framework, e.g., Petri nets, lossy channel systems, timed automata, timed Petri nets, and multiset rewriting systems. We will also present better quasi-ordered transition systems which allow the design of efficient symbolic representations of infinite sets of states.
A formal model called database manipulating systems was introduced to model data-aware dynamic systems. Its semantics is given by an infinite labelled transition systems where a label can be an unbounded relational database. Reachability problem is undecidable over schemas consisting of either a binary relation or two unary relations. We study the reachability problem under schema restrictions and restrictions on the query language. We provide tight complexity bounds for different combinations of schema and query language, by reductions to/from standard formalism of infinite state systems such as Petri nets and counter systems. Our reductions throw light into the connections between these two seemingly unrelated models.
We address the verification of concurrent programs running under the release-acquire (RA) semantics. We show that the reachability problem is undecidable even in the case where the input program is finite-state. Given this undecidability, we follow the spirit of the work on context-bounded analysis for detecting bugs in programs under the classical SC model, and propose an under-approximate reachability analysis for the case of RA. To this end, we propose a novel notion, called view-switching, and provide a code-to-code translation from an input program under RA to a program under SC. This leads to a reduction, in polynomial time, of the bounded view-switching reachability problem under RA to the bounded context-switching problem under SC. We have implemented a prototype tool VBMC and tested it on a set of benchmarks, demonstrating that many bugs in programs can be found using a small number of view switches.
We address the problem of verifying safety properties of concurrent programs running over the Total Store Order (TSO) memory model. Known decision procedures for this model are based on complex encodings of store buffers as lossy channels. These procedures assume that the number of processes is fixed. However, it is important in general to prove the correctness of a system/algorithm in a parametric way with an arbitrarily large number of processes.
In this paper, we introduce an alternative (yet equivalent) semantics to the classical one for the TSO semantics that is more amenable to efficient algorithmic verification and for the extension to parametric verification. For that, we adopt a dual view where load buffers are used instead of store buffers. The flow of information is now from the memory to load buffers. We show that this new semantics allows (1) to simplify drastically the safety analysis under TSO, (2) to obtain a spectacular gain in efficiency and scalability compared to existing procedures, and (3) to extend easily the decision procedure to the parametric case, which allows obtaining a new decidability result, and more importantly, a verification algorithm that is more general and more efficient in practice than the one for bounded instances.