Performance Engineering for a Medical Imaging Application on the Intel Xeon Φ Accelerator

Konferenz: ARCS 2014 - 27th International Conference on Architecture of Computing Systems
25.02.2014 - 28.02.2014 in Luebeck, Deutschland

Tagungsband: ARCS 2014

Seiten: 8Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Autoren:
Hofmann, Johannes (Chair of Computer Architecture, University Erlangen-Nuremberg, Germany)
Treibig, Jan; Hager, Georg; Wellein, Gerhard (Erlangen Regional Computing Center, University Erlangen-Nuremberg, Germany)

Inhalt:
We examine the Xeon Φ, which is based on Intel’s Many Integrated Cores architecture, for its suitability to run the FDK algorithm—the most commonly used algorithm to perform the 3D image reconstruction in cone-beam computed tomography. We study the challenges of efficiently parallelizing the application and means to enable sensible data sharing between threads despite the lack of a shared last level cache. Apart from parallelization, SIMD vectorization is critical for good performance on the Xeon Φ; we perform various micro-benchmarks to investigate the platform’s new set of vector instructions and put a special emphasis on the newly introduced vector gather capability. We refine a previous performance model for the application and adapt it for the Xeon Φ to validate the performance of our optimized hand-written assembly implementation, as well as the performance of several different auto-vectorization approaches.