Robust Non-Intrusive Record-Replay with Processor Extraction
Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging at ISSTA (PADTAD) 2010
Publication Type: Paper
Repository URL: 201003_RecordReplay
Abstract
With the advent of increasingly larger parallel machines, debugging
is becoming more and more challenging. In particular, applications
at this scale tend to behave non-deterministically, leading to race
condition bugs. Furthermore, gaining access to these large machines
for long debugging sessions is generally infeasible. In this paper,
we present a 3-step algorithm to perform what we call ``processor
extraction'': a procedure to record the execution of a set of
processors from a parallel application, and replay any of them in a
controlled environment. Our technique generates very low
interference in the recorded program thanks to the separation
between non-determinism elimination, and detailed processor
recording. In order to improve robustness and accuracy, we further
augmented our algorithm with a self-correction mechanism.
TextRef
Filippo Gioachin and Gengbin Zheng and Laxmikant V. Kal{\'e}, "Robust Record-Replay with Processor Extraction", in Proceedings of the Workshop on Parallel and Distributed Systems: Testing, Analysis, and Debugging (PADTAD - VIII), 2010
People
Research Areas