In the context of the project CESyMPA funded by Persyval-Lab, we organize a workshop. The programme is given below. The talks will take place at Verimag, room Turing. For details on how to come, see the lab webpage.
PRET DRAM controller: bank privatization for predictability and temporal isolation
Hard real-time embedded systems employ high-capacity memories such as Dynamic RAMs (DRAMs) to cope with increasing data and code sizes of modern designs. However, memory controller design has so far largely focused on improving average-case performance. As a consequence, the latency of memory accesses is unpredictable, which complicates the worst-case execution time analysis necessary for hard real-time embedded systems.
Our work introduces a novel DRAM controller design that is predictable and that significantly reduces worst-case access latencies. Instead of viewing the DRAM device as one resource that can only be shared as a whole, our approach views it as multiple resources that can be shared between one or more clients individually. We partition the physical address space following the internal structure of the DRAM device, i.e., its ranks and banks, and interleave accesses to the blocks of this partition. This eliminates contention for shared resources within the device, making accesses temporally predictable and temporally isolated. This paper describes our DRAM controller design and its integration with a precision-timed (PRET) architecture called PTARM. We present analytical bounds on the latency and throughput of the proposed controller, and confirm these via simulation.
Refinement of Worst-Case Execution Time Bounds by Graph Pruning
As real-time systems increase in complexity to provide more and more functionality and perform more demanding computations, the problem of statically analyzing the Worst-Case Execution Time bound (WCET) of real-time programs is becoming more and more time-consuming and imprecise.
The problem stems from the fact that with increasing program size also the number of potentially relevant program and hardware states to be considered during the WCET analysis increases. However, only a relatively small portion of the program actually contributes to the final WCET bound. Large parts of the program are thus irrelevant and are analyzed in vain. In the best case this only leads to increased analysis time. Very often, however, the analysis of irrelevant program parts interferes with the analysis of those program parts that turn out to be relevant.
We explore a novel technique based on graph pruning that promises to reduce the analysis overhead and, at the same time, increase the analysis’ precision. The basic idea is to eliminate those program parts from the analysis problem that are known to be irrelevant for
the final WCET bound. This reduces the analysis overhead, since only a subset of the program and hardware states have to be tracked. Consequently, more aggressive analysis techniques can be applied to the smaller problem, effectively reducing the overestimation of the WCET. As a side-effect, interference from irrelevant program parts are eliminated, e.g., on addresses of memory accesses, on loop bounds, or on the cache or processor state.
First experiments using a commercial WCET analysis tool show that our approach is feasible in practice and leads to reductions of up to 12% when a standard IPET approach is used for the analysis.
The case for programmable on-chip interconnect
There is a long-standing tradition of representing embedded applications as data-flow process network models, which make explicit the dichotomy between computations and communications. The mapping of such applications onto MPSoC architectures already considers in great detail the distribution and the scheduling of computations onto the CPUs. On the other hand, the Network-on-Chip (NoC) interconnect provides less control to the programmer, especially concerning the arbitration. This paper studies how more expressive NoC routers may help with application deployment, so a to harmonize the pace of data transfers with local data computations. More precisely, we allow for some limite programmability inside NoC routers, so that they can establish effective static scheduling and routing of data transmissions as demanded by the application. Router programs are the result of a general compilation process which targets the NoC and the individual cores altogether. The objective is to reduce NoC contentions, improving speed and timing predictability. We consider the range of applications of such an approach and provide results on two of the (an embedded controller and an FFT).
CompSOC: A Mixed-Criticality Platform, Formalism, and Design Flow
Cyber-physical, embedded real-time systems often contain multiple concurrent applications that have different characteristics and requirements, and are often designed by different parties. As a result, a single system contains applications designed using different models of computation, and with different criticalities (e.g. real time, safety critical, adaptive, or not). CompSOC is a complete solution consisting of formalism (dataflow), software (microkernels, RTOS), hardware (multiprocessor, NOC, DRAM), and design flow (SDF3) that addresses this problem. In this presentation we present in detail how CompSOC achieves this by being: a) composable: offering virtual execution platforms for independent design, verification, &
execution of applications of mixed criticality; b) predictable: using the dataflow (CSDF/SADF) formalism as both programming and analysis model for real-time execution in a virtual execution platform.