Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable memory tracker metrics for npu #27280

Merged
merged 1 commit into from
Nov 6, 2023

Conversation

ji-huazhong
Copy link
Contributor

@ji-huazhong ji-huazhong commented Nov 4, 2023

What does this PR do?

As per title.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline,
    Pull Request section?
  • Was this discussed/approved via a Github issue or the forum? Please add a link
    to it if that's the case.
  • Did you make sure to update the documentation with your changes? Here are the
    documentation guidelines, and
    here are tips on formatting docstrings.
  • Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

cc @amyeroberts

@ji-huazhong
Copy link
Contributor Author

Verified with the following test case.

# spec.py
import torch
import torch_npu
# User can add additional imports here

# Specify the device name (eg. 'cuda', 'cpu')
DEVICE_NAME = 'npu:0'

# Specify device-specific backends to dispatch to.
# If not specified (i.e., `None`) will fallback to 'default' in 'testing_utils.py`
MANUAL_SEED_FN = torch.npu.manual_seed
EMPTY_CACHE_FN = torch.npu.empty_cache
DEVICE_COUNT_FN = torch.npu.device_count
(mem) [root@localhost mem]# RUN_SLOW=1 TRANSFORMERS_TEST_BACKEND="torch_npu"  TRANSFORMERS_TEST_DEVICE="npu:0" TRANSFORMERS_TEST_DEVICE_SPEC="spec.py" python -m pytest -sv tests/trainer/test_trainer.py::TrainerIntegrationTest::test_mem_metrics
============================================================================================================ test session starts ============================================================================================================
platform linux -- Python 3.8.18, pytest-7.4.3, pluggy-1.3.0 -- /root/miniconda3/envs/mem/bin/python
cachedir: .pytest_cache
rootdir: /home/w00668292/mem
configfile: setup.cfg
collected 1 item

  0%|                                                                                                                                                                                                                 | 0/24 [00:00<?, ?it/s]Could not estimate the number of tokens of the input, floating-point operations will not be computed
{'train_runtime': 0.9881, 'train_samples_per_second': 194.31, 'train_steps_per_second': 24.289, 'train_loss': 10.069827397664389, 'init_mem_cpu_alloc_delta': 0, 'init_mem_gpu_alloc_delta': 1024, 'init_mem_cpu_peaked_delta': 0, 'init_mem_gpu_peaked_delta': 0, 'train_mem_cpu_alloc_delta': 151519232, 'train_mem_gpu_alloc_delta': 4096, 'train_mem_cpu_peaked_delta': 0, 'train_mem_gpu_peaked_delta': 4608, 'before_init_mem_cpu': 1883963392, 'before_init_mem_gpu': 0, 'epoch': 3.0}
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:01<00:00, 16.16it/s]
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 78.41it/s]
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 525.37it/s]
  0%|                                                                                                                                                                                                                 | 0/24 [00:00<?, ?it/s]Could not estimate the number of tokens of the input, floating-point operations will not be computed
{'train_runtime': 0.0769, 'train_samples_per_second': 2496.571, 'train_steps_per_second': 312.071, 'train_loss': 10.069827397664389, 'epoch': 3.0}
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:00<00:00, 312.19it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 1256.67it/s]
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 1005.98it/s]
PASSED

============================================================================================================= warnings summary ==============================================================================================================
../../../root/miniconda3/envs/mem/lib/python3.8/site-packages/_pytest/config/__init__.py:1373
  /root/miniconda3/envs/mem/lib/python3.8/site-packages/_pytest/config/__init__.py:1373: PytestConfigWarning: Unknown config option: doctest_glob

    self._warn_or_fail_if_strict(f"Unknown config option: {key}\n")

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
======================================================================================================= 1 passed, 1 warning in 19.46s =======================================================================================================
(mem) [root@localhost mem]#

Copy link
Collaborator

@amyeroberts amyeroberts left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks for adding this!

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint.

@amyeroberts amyeroberts merged commit 1ffc4de into huggingface:main Nov 6, 2023
3 checks passed
@ji-huazhong ji-huazhong deleted the mem-tracker branch November 7, 2023 08:44
EduardoPach pushed a commit to EduardoPach/transformers that referenced this pull request Nov 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants