loader image

Research and Publications

Optimized and formally-verified compilation for a VLIW processor

Software programs are used for many critical roles. A bug in those can have a devastatingcost, possibly leading to the loss of human lives. Such bugs are usually found at a source level(which can be ruled out with source-level verification methods), but they can also be inserted bythe compiler unknowingly. CompCert is the first compiler with a formal proof of correctness:compiled programs are proven to behave the same as their source programs. However, because ofthe challenges involved in proving compiler optimizations, CompCert only has a limited numberof them. As such, CompCert usually generates low-performance code compared to classicalcompilers such as GCC. While this may not significantly impact out-of-order architectures suchas x86, on in-order architectures, particularly on VLIW processors, the slowness is significant(code running half as fast as GCC -O2). On VLIW processors, the intra-level parallelism isexplicit and specified in the assembly code through “bundles” of instructions: the compiler mustbundlize instructions to achieve good performance.In this thesis, we identify, investigate, implement and formally verify several classical optimiza-tions missing in CompCert. We start by introducing a formal model for VLIW bundles executionon an interlocked core and generate those bundles through a postpass (after register allocation)scheduling. Then, we introduce a prepass (before register allocation) superblock scheduling,implementing static branch prediction and tail-duplication along the way. Finally, we furtherincrease the performance of our generated code by implementing loop unrolling, loop rotationand loop peeling – the latter being used for Loop-Invariant Code Motion. These transformationsare verified by translation validation, some of them with hash-consing to achieve reasonablecompilation time.We evaluate each introduced optimization on benchmarks, including Polybench and TACleBench,on the KV3 VLIW core, ARM Cortex A53, and RiscV “Rocket” core. Thanks to this work, ourversion of CompCert is now only 16% slower (respectively 12% slower and 30% slower) thanGCC -O2 on the KV3 (respectively ARM and RiscV), instead of 50% (respectively 38% and45%).