Performance optimizations for compiler-based error detection
The trend towards smaller transistor technologies and lower operating voltages stresses the hardware and makes transistors more susceptible to transient errors. In future systems, performance and power gains will come at the cost of unreliable areas on the chip. For this reason, there is an increased need for low-overhead highly-reliable error detection methodologies. In the last years, several techniques have been proposed. The majority of them are based on redundancy which can be implemented at several levels (e.g., hardware, instruction, thread, process, etc). In instruction-level error detection approaches, the compiler replicates the instructions of the program and inserts checks wherever they are needed. The checks evaluate code correctness and decide whether or not an error has occurred. This type of error detection is more flexible than the hardware alternatives. It allows the programmer to choose the protected area of the program and it can be applied without any hardware modifications. On the other hand, the replicated instructions and the checks cause a large slowdown making software techniques less appealing. In this thesis, we propose two techniques that aim at reducing the error detection overhead of compiler-based approaches and improving system’s performance without sacrificing the fault-coverage. The first technique, DRIFT, achieves this by decoupling the execution of the code (original and replicated) from the checks. The checks are compare and jump instructions. The latter ones tend to make the code sequential and prohibit the compiler from performing aggressive instruction scheduling optimizations. We call this phenomenon basic-block fragmentation. DRIFT reduces the impact of basic-block fragmentation by breaking the synchronized execute-check-confirm-execute cycle. In this way, DRIFT generates a scheduler-friendly code with more instruction-level parallelism (ILP). As a result, it reduces the performance overhead down to 1.29× (on average) and outperforms the state-of-the-art by up to 29.7% retaining the same fault-coverage. Next, CASTED focuses on reducing the impact of error detection overhead on single-chip scalable architectures that are composed of tightly-coupled cores. The proposed compiler methodology adaptively distributes the error detection overhead to the available resources across multiple cores, fully exploiting the abundant ILP of these architectures. CASTED adapts to a wide range of architecture configurations (issue-width, inter-core communication). The results show that CASTED matches the performance of, and often outperforms, sometimes by as mush as 21.2%, the best fixed state-of-the-art approach while maintaining the same fault coverage.
The following license files are associated with this item: