A Scalable Double In-memory Checkpoint and Restart Scheme towards Exascale
Workshop on Fault-Tolerance for HPC at Extreme Scale (FTXS) 2012
Publication Type: Talk
Repository URL:
Download:
[PPT]
Summary
This talk described recent progress in optimizing inmem checkpoint/restart fault tolerance scheme to 64K cores of Blue Gene/P machine with scalable performance.
People
Research Areas