Exploring the optimization space of multi-core architectures with OpenCL benchmarks
View/ Open
Date
2011Author
Panickal, Deepak
Metadata
Abstract
Open Computing Language (OpenCL) is an open standard for writing portable software for heterogeneous architectures such as Central Processing Units (CPUs) and
Graphic Processing Units (GPUs). Programs written in OpenCL are functionally portable
across architectures. However, due to the architectural differences, OpenCL does not
warrant performance portability. As previous research shows, different architectures
are sensitive to different optimization parameters. A parameter which exhibits good
performance on an architecture might not be so for another.
In this thesis, the optimization space of multi-core architectures is explored by running OpenCL benchmarks. The benchmarks are run for all possible combinations of
optimization parameters. Exploring the optimization space is not a trivial task as there
are various factors, such as the number of threads, the vectorization factor, etc., which
impact the performance. The value range that each parameter takes is quite large. For
e.g., the number of threads can vary from from 1 to 225. Four different architectures
are evaluated in this thesis. Considering all the parameter combinations for all the
four architectures, the optimization space is prohibitively large to be explored within
the time constraints of the project. Impossible combinations are pruned to reduce the
exploration space.
Over 600,000 runs of the OpenCL benchmarks are executed to exhaustively explore
this space and successfully identify the optimal optimization parameters. In addition,
the rationality for a parameter being the best on a particular architecture is sought out.
The findings of the thesis could be used by developers for significantly improving the
performance of their OpenCL applications. They could also be incorporated into a
compiler for automatic optimization based on the target architecture.