Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] (flops_profiler) Duplicate registration check for start_time_hook is not working #5432

Closed
threewayhandshake opened this issue Apr 18, 2024 · 1 comment · Fixed by #6668
Assignees
Labels
bug Something isn't working windows Questions or PRs relating to running DeepSpeed on Windows

Comments

@threewayhandshake
Copy link

threewayhandshake commented Apr 18, 2024

Describe the bug
I get AttributeError: Can't pickle local object 'FlopsProfiler.start_profile.<locals>. register_module_hooks.<locals>.start_time_hook' when I run torch.save on a model that has been run get_model_profile.
I checked the flops_profiler code and found that the part that should be if not hasattr(module, "__start_time_hook_handle__"): is if not hasattr(module, "__start_time_hook_handle"):.
After correcting the above, the error no longer occurs.

To Reproduce
Run get_model_profile and torch.save on a model that has the same module in several different parts.
My reproduction code is here.

import io
import torch
import torch.nn as nn
from deepspeed.profiling.flops_profiler import get_model_profile

class Foo(nn.Module):
    def __init__(self, submodule=nn.Identity()):
        super().__init__()
        self.submodule = submodule

    def forward(self, x):
        return x

model = nn.Sequential(Foo(), Foo())
get_model_profile(model, args=(1,), print_profile=False)
with io.BytesIO() as buf:
    torch.save(model, buf)

Expected behavior
AttributeError: Can't pickle local object... does not occur in torch.save after get_model_profile
(or all modules do not contain start_time_hook in _forward_pre_hooks)

ds_report output
ds_report could not be complete due to AttributeError: module 'os' has no attribute 'statvfs'
I have installed deepspeed-0.12.7+40342055-py3-none-any.whl to Python 3.12 on Windows 10

System info (please complete the following information):

Collecting environment information...
PyTorch version: 2.2.2+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Home
GCC version: (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 8.1.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.12.3 (tags/v3.12.3:f6650f9, Apr  9 2024, 14:05:25) [MSC v.1938 64 bit (AMD64)] (64
-bit runtime)
Python platform: Windows-10-10.0.19045-SP0
Is CUDA available: True
CUDA runtime version: 8.0.60
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4060 Ti
Nvidia driver version: 546.01
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9
CurrentClockSpeed=3696
DeviceID=CPU0
Family=198
L2CacheSize=1536
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=3696
Name=Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
ProcessorType=3
Revision=

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.2.2+cu121
[conda] Could not collect
@threewayhandshake threewayhandshake added bug Something isn't working training labels Apr 18, 2024
@loadams loadams added windows Questions or PRs relating to running DeepSpeed on Windows and removed training labels Oct 25, 2024
@loadams loadams self-assigned this Oct 25, 2024
@loadams
Copy link
Contributor

loadams commented Oct 25, 2024

Hi @threewayhandshake - could you confirm if you are still hitting this on the current DeepSpeed released wheel that is build specifically for Windows?

I'll check on the solution you have and create a PR and we can discuss there as well if this is still relevant?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working windows Questions or PRs relating to running DeepSpeed on Windows
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants