Improved Programming of GPU Architectures through Automated Data Allocation and Loop Restructuring

Conference: ARCS 2010 - 23th International Conference on Architecture of Computing Systems
02/22/2010 - 02/23/2010 at Hannover, Germany

Proceedings: ARCS 2010

Pages: 8Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Biagio, Andrea Di; Agosta, Giovanni (Dipartimento di Elettronica e Informazione, Politecnico di Milano)

The programmability of recent graphic processing unit (GPU) architectures has been the main factor driving the dramatic increase in interest for this class of architectures as low-cost accelerators for a wide range of high-performance applications. Current GPU programming models, such as OpenCL and CUDA, still expose too many architectural features, such as the memory hierarchy, to the programmer. We propose to raise the abstraction level of code by mapping some constructs of the well-known OpenMP parallel programmingmodel onto the dominant CUDA GPU programmingmodel. To this end, we are studying solutions for two main issues: the automated allocation of data on the GPU device memory hierarchy, and the translation of OpenMP parallel loops to CUDA kernels. We report some initial experimental results showing that the transformations are indeed promising.