You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To make profiling a single cluster easier, we should implement a toolkit to replay a cluster. For my preliminary idea, this toolkit includes two phases:
dump cluster args and compiler input IR with the protobuf format on disc_launch_op, users can specify the iteration with environ variable and then find the dump message on logs as the following example:
Launch the training jobs with some environment variables:
To make profiling a single cluster easier, we should implement a toolkit to replay a cluster. For my preliminary idea, this toolkit includes two phases:
dump cluster args and compiler input IR with the protobuf format on
disc_launch_op
, users can specify the iteration with environ variable and then find the dump message on logs as the following example:Launch the training jobs with some environment variables:
Then users can find the replay logs with
grep
command after period of time:execute with an executable program
disc_replay_main
with the nvprof profiler toolkitTODOs:
disc_replay_main
executable program.The text was updated successfully, but these errors were encountered: