Synthesis of certified compilers

Doctoral student: 
GOURDIN Léo
Date de soutenance: 
Tuesday, December 12, 2023
Supervisors: 
Name: 
BOULMÉ Sylvain
Laboratory: 
Verimag
Name: 
PÉTROT Frédéric
Laboratory: 
TIMA
Summary: 

Bugs in compilers, although relatively rare in very common production compilers, are particularly insidious: correct source code may be compiled into incorrect assembler code, and changes on the source code to better understand the error may make the bug disappear. In safety critical applications, regulation authorities may require traceability from source code to binary code which in practice excludes compiler optimizations, hence a very suboptimal use of the target processor. Moreover for such applications, using a faster, but more complex and potentially more erroneous, more heating processor, or even a multi-core processor, with concurrent programming problems, are not necessarily acceptable options. A solution is provided by the CompCert compiler: the traceability from the source code to the assembly code is ensured by a mathematical proof of refinement of programs through all the translation and optimization phases. This proof is itself formally verified by the Coq proof assistant. As an evidence of its industrial relevance, CompCert is currently on the market. As part of his thesis (still in progress at Verimag), Cyril Six extended CompCert to add the Kalray KVX processor as target, with instruction reordering - CompCert, basic, does not reorder instructions, which is penalizing on superscalar processors without dynamic reordering. It turns out that when adding a new target, there are quite tedious steps: defining the instruction sets of the different intermediate representations and their formal semantics, modeling the pipeline for the scheduling algorithm , etc., with information transcribed from the documentation. While this work has been carried out manually for a limited number of instructions, it would be extremely painful for extended instruction sets (either by SIMD instructions or by specific extensions intended to accelerate such or such type of calculation). Ideally, we would like a system that takes the description of the instruction set, the pipeline, etc. and would produce most of the instruction selection, instruction combinations, schedules etc. and of their evidence, or even produce a 'superoptimizer' capable of recombining complex blocks of instructions. As part of the RISC-V ecosystem, whose open and extensible architecture will allow a wide variety of processors, such a system would make it possible to efficiently produce compilers combining extreme specialization and extreme reliability. The thesis would therefore consist in designing such a system, if necessary using techniques such as modulo verified theory satisfiability.