Implementing a Data-Parallel Application with Low Data Locality on Multicore Processors
Conference: ARCS 2009 - 22th International Conference on Architecture of Computing Systems
03/11/2009 at Delft, The Netherlands
Proceedings: ARCS 2009
Pages: 8Language: englishTyp: PDFPersonal VDE Members are entitled to a 10% discount on this title
Meiländer, Dominik; Schellmann, Maraike; Gorlatch, Sergei (University of Münster, Germany)
We introduce a novel implementation of a production-quality medical image reconstruction application on the Cell Broadband Engine (CBE) and compare it with an OpenMP implementation on a commodity quad-core processor. While all time-consuming parts of the application can be performed in a data-parallel manner and thus can be parallelized well on the CBE, non-local memory accesses, together with small local memory and restricted DMA transfer sizes, limit the performance of the CBE implementation enormously. Our implementation using two CBEs is therefore outperformed by the OpenMP implementation on a commodity quad-core processor by ˜60%.