Implementation of Recoverable Distributed Shared Memory by Logging Writes

Sundarrajan Kanthadai and Jennifer L. Welch

Distributed shared memory, by avoiding the progamming complexities of message passing, has become a convenient model to work with. But the benefits given by these systems can possibly be achieved only if the whole system behaves like a failure-free system. Many algorithms that have been proposed for implementing a reliable DSM require the processes to take checkpoints whenever there is a data transfer, thus resulting in a heavy overhead during failure-free execution. We present an algorithm to provide recoverable DSM for sequential consistency where the checkpoint interval can be tailored to balance the cost of checkpointing versus the savings in recovery obtained by taking checkpoints often. Unlike previous recovery techniques that use logging, both the logging and the message overheads are reduced. It can tolerate up to n faults, where n is the number of processes, and can be used in an environment where the cost of synchronizing the checkpoints is substantially high.