Exploring the optimization space of multi-core architectures with OpenCL benchmarks
Open Computing Language (OpenCL) is an open standard for writing portable software for heterogeneous architectures such as Central Processing Units (CPUs) and Graphic Processing Units (GPUs). Programs written in OpenCL are functionally portable across architectures. However, due to the architectural differences, OpenCL does not warrant performance portability. As previous research shows, different architectures are sensitive to different optimization parameters. A parameter which exhibits good performance on an architecture might not be so for another. In this thesis, the optimization space of multi-core architectures is explored by running OpenCL benchmarks. The benchmarks are run for all possible combinations of optimization parameters. Exploring the optimization space is not a trivial task as there are various factors, such as the number of threads, the vectorization factor, etc., which impact the performance. The value range that each parameter takes is quite large. For e.g., the number of threads can vary from from 1 to 225. Four different architectures are evaluated in this thesis. Considering all the parameter combinations for all the four architectures, the optimization space is prohibitively large to be explored within the time constraints of the project. Impossible combinations are pruned to reduce the exploration space. Over 600,000 runs of the OpenCL benchmarks are executed to exhaustively explore this space and successfully identify the optimal optimization parameters. In addition, the rationality for a parameter being the best on a particular architecture is sought out. The findings of the thesis could be used by developers for significantly improving the performance of their OpenCL applications. They could also be incorporated into a compiler for automatic optimization based on the target architecture.