Implementing a Data-Parallel Application with Low Data Locality on Multicore Processors

Konferenz: ARCS 2009 - 22th International Conference on Architecture of Computing Systems
11.03.2009 in Delft, The Netherlands

Tagungsband: ARCS 2009

Seiten: 8Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Meiländer, Dominik; Schellmann, Maraike; Gorlatch, Sergei (University of Münster, Germany)

We introduce a novel implementation of a production-quality medical image reconstruction application on the Cell Broadband Engine (CBE) and compare it with an OpenMP implementation on a commodity quad-core processor. While all time-consuming parts of the application can be performed in a data-parallel manner and thus can be parallelized well on the CBE, non-local memory accesses, together with small local memory and restricted DMA transfer sizes, limit the performance of the CBE implementation enormously. Our implementation using two CBEs is therefore outperformed by the OpenMP implementation on a commodity quad-core processor by ˜60%.