Implementing a Data-Parallel Application with Low Data Locality on Multicore Processors

Conference: ARCS 2009 - 22th International Conference on Architecture of Computing Systems
03/11/2009 at Delft, The Netherlands

Proceedings: ARCS 2009

Pages: 8Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Authors:
Meiländer, Dominik; Schellmann, Maraike; Gorlatch, Sergei (University of Münster, Germany)

Abstract:
We introduce a novel implementation of a production-quality medical image reconstruction application on the Cell Broadband Engine (CBE) and compare it with an OpenMP implementation on a commodity quad-core processor. While all time-consuming parts of the application can be performed in a data-parallel manner and thus can be parallelized well on the CBE, non-local memory accesses, together with small local memory and restricted DMA transfer sizes, limit the performance of the CBE implementation enormously. Our implementation using two CBEs is therefore outperformed by the OpenMP implementation on a commodity quad-core processor by ˜60%.