-
Notifications
You must be signed in to change notification settings - Fork 2.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA Memory Profile Analyzer #9860
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file needs a copyright header
nemo/core/classes/modelPT.py
Outdated
@@ -204,6 +206,8 @@ def __init__(self, cfg: DictConfig, trainer: Trainer = None): | |||
|
|||
# Setup nsys profiling if it has been enabled in the model config | |||
self._setup_profiling() | |||
# real accurate _batch_idx. We found that the `batch_idx` in `on_train_batch_start` and `on_train_batch_end` has a bug. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, can you expand what bug was found?
nemo/core/classes/modelPT.py
Outdated
@@ -49,6 +49,8 @@ | |||
from nemo.utils.debug_hook import register_debug_hooks | |||
from nemo.utils.exceptions import NeMoBaseException | |||
from nemo.utils.get_rank import get_rank, is_global_rank_zero | |||
# from nemo.utils.memory_profile_analyzer import peak_memory_analysis_activation, peak_memory_analysis_weight, peak_memory_analysis_oom |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is the commented line needed?
logging.info(f"===== Memory Profile Analysis: OOM ======") | ||
peak_memory_analysis(self._memory_profile_snapshot_file_oom, self._memory_profile_analysis_path, 'oom', self._memory_profile_rank) | ||
else: | ||
raise Exception(f"Snapshot file not found: {self._memory_profile_snapshot_file_oom}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe move this after line 1833 torch.cuda.memory._dump_snapshot
?
return | ||
|
||
# Call the analysis function | ||
if self._memory_profile_analysis_enabled: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need this if ? I would assume _memory_profile_analysis_enabled
does not change value and in line 1880 you already check whether it's true or not.
nemo/core/classes/modelPT.py
Outdated
logging.info(f"====== Memory Profile Analysis: Weight ======") | ||
peak_memory_analysis(self._memory_profile_snapshot_file_weight, self._memory_profile_analysis_path, 'weight', self._memory_profile_rank) | ||
else: | ||
raise Exception(f"Snapshot file not found: {self._memory_profile_snapshot_file_weight}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nemo/core/classes/modelPT.py
Outdated
if batch_idx >= self._memory_profile_start_step and get_rank() == self._memory_profile_rank: | ||
logging.info("====== Start CUDA memory profiling ======") | ||
torch.cuda.memory._record_memory_history(max_entries=100000) | ||
if self._real_batch_idx == self._memory_profile_start_step and get_rank() == self._memory_profile_rank: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is it self._real_batch_idx == self._memory_profile_start_step
instead of self._real_batch_idx >= self._memory_profile_start_step
?
nemo/core/classes/modelPT.py
Outdated
logging.info("====== End nsys profiling ======") | ||
torch.cuda.cudart().cudaProfilerStop() | ||
self._nsys_profile_complete = True | ||
|
||
if hasattr(self, '_memory_profile_enabled'): | ||
if self._memory_profile_enabled and not self._memory_profile_complete: | ||
if batch_idx >= self._memory_profile_end_step and get_rank() == self._memory_profile_rank: | ||
logging.info("====== End CUDA memory profiling ======") | ||
if self._real_batch_idx == self._memory_profile_end_step and get_rank() == self._memory_profile_rank: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
) | ||
torch.cuda.memory._record_memory_history(enabled=None) | ||
self._memory_profile_complete = True | ||
# Call the analysis function | ||
if self._memory_profile_analysis_enabled and self._memory_profile_complete: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same as previously, self._memory_profile_analysis_enabled
should be true? due to line 1890
Add `\n` in between each frame for the readability. | ||
""" | ||
# Prune Frames | ||
after_prune_frames = [prune_frames(x[3]) for x in alive_memory] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's x[3]? can you add a comment?
|
||
|
||
# ===== Function: for two time points, check the corresponding alive memory, and compare them to see: what's new, what's gone, what's unchanged. | ||
def compare_alive_memory(tracker, time_us_1, time_us_2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a couple tests for this?
return frame | ||
return None | ||
|
||
def alloc_memory_timeline(trace): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if that finds the maximum/min alloc memory and the corresponding timestamps, it would be helpful if that was reflected in the name.
for idx, timepoint in enumerate(trace): | ||
(time_us, addr, action, size, frames, stream) = read_tp(timepoint) | ||
|
||
if (action == "alloc"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you don't need brackets.
|
||
|
||
|
||
def record_alloc_memory_timeline(trace): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this looks very similar to alloc_memory_timeline
can you refactor to reduce duplicate code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, just a few minor comments, this looks great overall.
One more request @tonyjie , can you rebase to the latest main & use --signoff to your commits and push again ? Otherwise CI won't play. |
f246513
to
d57a9f5
Compare
dc21178
to
8e81820
Compare
8e81820
to
f11590e
Compare
…l analysis the memory at the global peak of the trace, and generate CSV files Signed-off-by: tonyjie <jl4257@cornell.edu>
…activation. Signed-off-by: tonyjie <jl4257@cornell.edu>
Signed-off-by: tonyjie <jl4257@cornell.edu>
Signed-off-by: tonyjie <jl4257@cornell.edu>
…ing the setup Signed-off-by: tonyjie <jl4257@cornell.edu>
Signed-off-by: tonyjie <jl4257@cornell.edu>
Signed-off-by: tonyjie <jl4257@cornell.edu>
Signed-off-by: tonyjie <jl4257@cornell.edu>
Signed-off-by: tonyjie <jl4257@cornell.edu>
Signed-off-by: tonyjie <jl4257@cornell.edu>
Signed-off-by: tonyjie <jl4257@cornell.edu>
Signed-off-by: tonyjie <jl4257@cornell.edu>
f11590e
to
2cb2497
Compare
…ch_version; fix other minor issues based on review
This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days. |
This PR was closed because it has been inactive for 7 days since being marked as stale. |
This PR seems to have slipped through. Should we merge it? @ericharper @titu1994 |
What does this PR do ?
pickle file
, one for weight, one for activation. The user can load the file in the below page: https://pytorch.org/memory_vizpickle file
.analysis_enabled: True
, the memory profile analyzer will generate two csv files each for weight/activation/OOM. The output csv files includes:alive_memory_weight.csv
group_by_alloc_frames_weight.csv
alive_memory_memory.csv
group_by_alloc_frames_memory.csv
alive_memory_oom.csv
group_by_alloc_frames_oom.csv
Changelog
batch_idx
mismatch issue.max_entries
is too small, which makes the generated snapshot easily truncated.Usage
GitHub Actions CI
The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.
The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".
Before your PR is "Ready for review"
Pre checks:
PR Type:
If you haven't finished some of the above items you can still open "Draft" PR.
Who can review?
Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.
Additional Information