Full-Overlapped Concurrent Kernels

Konferenz: ARCS 2015 - 28th International Conference on Architecture of Computing Systems
24.03.2015 - 27.03.2015 in Porto, Portugal

Tagungsband: ARCS 2015

Seiten: 8Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Valero-Lara, Pedro (CFD and Computational Technology Unit, Basque Center for Applied Mathematics (BCAM), Bilbao, Spain)
Pelayo, Fernando L. (Escuela Superior de Ingenieria Informática de Albacete, University of Castilla-La Mancha (UCLM), Albacete, Spain)

This work focuses on executing multiple kernels in the same GPU device simultaneously. We have done a comparison among three of the most well known strategies to reach that goal. The analysis has shown that the CPU-GPU communication generates a non-negligible overhead which should be mitigated in order to improve the performance of such high performance architectures. In this paper we present a “full overlapped concurrent kernels proposal” that minimize such overhead. It is achieved by considering a modulardistributed scheme in which those necessary steps to compute concurrent kernels are assigned to a set of threads which are executed in parallel. In this sense, the main computational resources of the current CPU-GPU platforms can be efficiently exploited in parallel, minimizing the overhead of memory transfers.