Reliability Modeling of Fault-tolerant Storage System - Covering MDS-Codes and Regenerating Codes

Conference: ARCS 2013 - 26th International Conference on Architecture of Computing Systems 2013
02/19/2013 - 02/22/2013 at Prague, Czech Republic

Proceedings: ARCS 2013

Pages: 6Language: englishTyp: PDF

Personal VDE Members are entitled to a 10% discount on this title

Authors:
Sobe, Peter (Dresden University of Applied Sciences, Faculty of Computer Science and Mathematics, Germany)

Abstract:
Fault-tolerant storage systems tolerate device failures using data redundancy that is brought by erasure-tolerant codes. Devices can fail, can be replaced and data can be recovered with a decoding operation. For classical MDS codes (such as Reed/Solomon codes) this decoding operation is costly and and takes a considerable time because all data elements from a high number of devices must be read and transferred. This high cost is independent on whether a single device failed or many devices failed. Another class of codes that got recently discovered for storage systems - regenerating codes - allow a faster decoding operation for a single device along with a costly recovery operation for multiple failed devices. For a single device repair (regeneration), the cost reduction is reached by shrinking down the amount of data that is transferred. By assumption that a single failure occurs more frequently than multiple concurrent failures, a faster repair is beneficial for the overall reliability of the storage system. Following this motivation, we show how to compare MDS codes and regenerating codes by a reliability model based on Markov chains.