[BUG] (flops_profiler) Duplicate registration check for start_time_hook is not working #5432

threewayhandshake · 2024-04-18T03:05:40Z

Describe the bug
I get AttributeError: Can't pickle local object 'FlopsProfiler.start_profile.<locals>. register_module_hooks.<locals>.start_time_hook' when I run torch.save on a model that has been run get_model_profile.
I checked the flops_profiler code and found that the part that should be if not hasattr(module, "__start_time_hook_handle__"): is if not hasattr(module, "__start_time_hook_handle"):.
After correcting the above, the error no longer occurs.

To Reproduce
Run get_model_profile and torch.save on a model that has the same module in several different parts.
My reproduction code is here.

import io
import torch
import torch.nn as nn
from deepspeed.profiling.flops_profiler import get_model_profile

class Foo(nn.Module):
    def __init__(self, submodule=nn.Identity()):
        super().__init__()
        self.submodule = submodule

    def forward(self, x):
        return x

model = nn.Sequential(Foo(), Foo())
get_model_profile(model, args=(1,), print_profile=False)
with io.BytesIO() as buf:
    torch.save(model, buf)

Expected behavior
AttributeError: Can't pickle local object... does not occur in torch.save after get_model_profile
(or all modules do not contain start_time_hook in _forward_pre_hooks)

ds_report output
ds_report could not be complete due to AttributeError: module 'os' has no attribute 'statvfs'
I have installed deepspeed-0.12.7+40342055-py3-none-any.whl to Python 3.12 on Windows 10

System info (please complete the following information):

Collecting environment information...
PyTorch version: 2.2.2+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A

OS: Microsoft Windows 10 Home
GCC version: (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 8.1.0
Clang version: Could not collect
CMake version: Could not collect
Libc version: N/A

Python version: 3.12.3 (tags/v3.12.3:f6650f9, Apr  9 2024, 14:05:25) [MSC v.1938 64 bit (AMD64)] (64
-bit runtime)
Python platform: Windows-10-10.0.19045-SP0
Is CUDA available: True
CUDA runtime version: 8.0.60
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4060 Ti
Nvidia driver version: 546.01
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture=9
CurrentClockSpeed=3696
DeviceID=CPU0
Family=198
L2CacheSize=1536
L2CacheSpeed=
Manufacturer=GenuineIntel
MaxClockSpeed=3696
Name=Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
ProcessorType=3
Revision=

Versions of relevant libraries:
[pip3] numpy==1.26.4
[pip3] torch==2.2.2+cu121
[conda] Could not collect

The text was updated successfully, but these errors were encountered:

loadams · 2024-10-25T17:01:21Z

Hi @threewayhandshake - could you confirm if you are still hitting this on the current DeepSpeed released wheel that is build specifically for Windows?

I'll check on the solution you have and create a PR and we can discuss there as well if this is still relevant?

Resolves #5432.

threewayhandshake added bug Something isn't working training labels Apr 18, 2024

loadams added windows Questions or PRs relating to running DeepSpeed on Windows and removed training labels Oct 25, 2024

loadams self-assigned this Oct 25, 2024

loadams mentioned this issue Oct 25, 2024

Update profiler registration check #6668

Merged

github-merge-queue bot pushed a commit that referenced this issue Oct 25, 2024

Update profiler registration check (#6668)

54903e0

Resolves #5432.

loadams closed this as completed in #6668 Oct 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] (flops_profiler) Duplicate registration check for start_time_hook is not working #5432

[BUG] (flops_profiler) Duplicate registration check for start_time_hook is not working #5432

threewayhandshake commented Apr 18, 2024 •

edited

Loading

loadams commented Oct 25, 2024

[BUG] (flops_profiler) Duplicate registration check for start_time_hook is not working #5432

[BUG] (flops_profiler) Duplicate registration check for start_time_hook is not working #5432

Comments

threewayhandshake commented Apr 18, 2024 • edited Loading

loadams commented Oct 25, 2024

threewayhandshake commented Apr 18, 2024 •

edited

Loading