Alain Vaucher, Philippe Schwaller, et al.
AMLD EPFL 2022
Backward error recovery, based on checkpointing and rollback, is often used for implementing fault tolerance in multicomputer systems. During failure-free operation the process states are regularly saved, and after a fault is detected the system is rolled back to a previously saved state. Four classes of techniques can be distinguished: semiautomatic techniques, message logging, coordinated checkpointing, and hybrid techniques. The authors provide a survey of these alternatives and discuss the overhead possibly involved, allowing the user to choose an optimal checkpointing and rollback technique for given facilities and applications.
Alain Vaucher, Philippe Schwaller, et al.
AMLD EPFL 2022
A. Nagarajan, S. Mukherjee, et al.
Journal of Applied Mechanics, Transactions ASME
D. Edelstein
MRS Spring 1998
Peter Nirmalraj, Damien Thompson, et al.
Nature Materials