Improved Programming of GPU Architectures through Automated Data Allocation and Loop Restructuring

Konferenz: ARCS 2010 - 23th International Conference on Architecture of Computing Systems
22.02.2010 - 23.02.2010 in Hannover, Germany

Tagungsband: ARCS 2010

Seiten: 8Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Biagio, Andrea Di; Agosta, Giovanni (Dipartimento di Elettronica e Informazione, Politecnico di Milano)

The programmability of recent graphic processing unit (GPU) architectures has been the main factor driving the dramatic increase in interest for this class of architectures as low-cost accelerators for a wide range of high-performance applications. Current GPU programming models, such as OpenCL and CUDA, still expose too many architectural features, such as the memory hierarchy, to the programmer. We propose to raise the abstraction level of code by mapping some constructs of the well-known OpenMP parallel programmingmodel onto the dominant CUDA GPU programmingmodel. To this end, we are studying solutions for two main issues: the automated allocation of data on the GPU device memory hierarchy, and the translation of OpenMP parallel loops to CUDA kernels. We report some initial experimental results showing that the transformations are indeed promising.