Reliability Modeling of Fault-tolerant Storage System - Covering MDS-Codes and Regenerating Codes

Konferenz: ARCS 2013 - 26th International Conference on Architecture of Computing Systems 2013
19.02.2013 - 22.02.2013 in Prague, Czech Republic

Tagungsband: ARCS 2013

Seiten: 6Sprache: EnglischTyp: PDF

Persönliche VDE-Mitglieder erhalten auf diesen Artikel 10% Rabatt

Autoren:
Sobe, Peter (Dresden University of Applied Sciences, Faculty of Computer Science and Mathematics, Germany)

Inhalt:
Fault-tolerant storage systems tolerate device failures using data redundancy that is brought by erasure-tolerant codes. Devices can fail, can be replaced and data can be recovered with a decoding operation. For classical MDS codes (such as Reed/Solomon codes) this decoding operation is costly and and takes a considerable time because all data elements from a high number of devices must be read and transferred. This high cost is independent on whether a single device failed or many devices failed. Another class of codes that got recently discovered for storage systems - regenerating codes - allow a faster decoding operation for a single device along with a costly recovery operation for multiple failed devices. For a single device repair (regeneration), the cost reduction is reached by shrinking down the amount of data that is transferred. By assumption that a single failure occurs more frequently than multiple concurrent failures, a faster repair is beneficial for the overall reliability of the storage system. Following this motivation, we show how to compare MDS codes and regenerating codes by a reliability model based on Markov chains.