Determination of the Optimum Degree of Redundancy for Fault-prone Many-Core Systems

Conference: Zuverlässigkeit und Entwurf - 6. GMM/GI/ITG-Fachtagung
09/25/0000 - 09/27/2012 at Bremen, Deutschland

Proceedings: Zuverlässigkeit und Entwurf

Pages: 8Language: englishTyp: PDF

Runge, Armin (University of Würzburg, Am Hubland, 97074 Würzburg, Germany)

The increasing transistor integration capacity will entail hundreds of processors on a single chip. Further, this will lead to an inherent susceptibility to errors of these systems. To obtain reliable systems again, various redundancy techniques can be applied. Of course, the usage of those techniques involves a significant overhead. Therefore, the identification of the optimal degree of redundancy is an important objective. In this paper we focus on core-level redundancy and checkpointing rollback-recovery. A model to determine the optimal degree of spatial and temporal redundancy regarding the minimal expected execution time will be introduced. Further, we will show that in several cases, the minimal expected execution time is achieved just by a simultaneous combination of both techniques, spatial redundancy and temporal redundancy.