Full-Overlapped Concurrent Kernels

Conference: ARCS 2015 - 28th International Conference on Architecture of Computing Systems
03/24/2015 - 03/27/2015 at Porto, Portugal

Proceedings: ARCS 2015

Pages: 8Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Authors:
Valero-Lara, Pedro (CFD and Computational Technology Unit, Basque Center for Applied Mathematics (BCAM), Bilbao, Spain)
Pelayo, Fernando L. (Escuela Superior de Ingenieria Informática de Albacete, University of Castilla-La Mancha (UCLM), Albacete, Spain)

Abstract:
This work focuses on executing multiple kernels in the same GPU device simultaneously. We have done a comparison among three of the most well known strategies to reach that goal. The analysis has shown that the CPU-GPU communication generates a non-negligible overhead which should be mitigated in order to improve the performance of such high performance architectures. In this paper we present a “full overlapped concurrent kernels proposal” that minimize such overhead. It is achieved by considering a modulardistributed scheme in which those necessary steps to compute concurrent kernels are assigned to a set of threads which are executed in parallel. In this sense, the main computational resources of the current CPU-GPU platforms can be efficiently exploited in parallel, minimizing the overhead of memory transfers.