Skip to content

Debugging divergence

Robert O'Callahan edited this page Oct 31, 2023 · 3 revisions

Debugging divergence is hard. Here are some things to try.

  • In a failed replay, try following the emergency debugger instructions to get a stack. Are you somewhere suspicious, e.g. that might be manipulating memory shared outside the trace?

If you can reproduce the bug by re-recording:

  • Try using Intel PT to check for control flow divergence; see below.
  • Try using memory checksums to identify changes in memory values before the divergence was detected.

Intel PT

Make sure that the "max locked memory" limit is very high (e.g. 1073741824 KB). Then record with Intel PT data collection enabled, e.g.

rr record --intel-pt ls

This captures the full tracee control flow into Intel's PT compressed trace representation. If you crash with an error about overflowing buffers, try increasing PT_PERF_AUX_SIZE in PerfCounters.cc. If you run out of memory, try reducing it.

Install libipt. Build rr with cmake -Dintel_pt_decoding=TRUE. Then

rr replay --intel-pt-start-checking-event=1 -a

This captures tracee control flow during replay and checks it (starting at event 1) against the control flow during recording. The instructions at the first divergence in control flow will be reported.

Clone this wiki locally