Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

VRAM increasing slightly every step until I run out. #86

Open
dillfrescott opened this issue Oct 13, 2024 · 10 comments
Open

VRAM increasing slightly every step until I run out. #86

dillfrescott opened this issue Oct 13, 2024 · 10 comments

Comments

@dillfrescott
Copy link

import torch
from PIL import Image
from pyramid_dit import PyramidDiTForVideoGeneration
from diffusers.utils import load_image, export_to_video

torch.cuda.set_device(0)
model_dtype, torch_dtype = 'bf16', torch.bfloat16   # Use bf16 (not support fp16 yet)

model = PyramidDiTForVideoGeneration(
    'PATH',                                         # The downloaded checkpoint dir
    model_dtype,
    model_variant='diffusion_transformer_768p',     # 'diffusion_transformer_384p'
)

model.vae.enable_tiling()
#model.vae.to("cuda")
#model.dit.to("cuda")
#model.text_encoder.to("cuda")

prompt = "A dog walking on the beach."

with torch.no_grad(), torch.cuda.amp.autocast(enabled=True, dtype=torch_dtype):
    frames = model.generate(
        prompt=prompt,
        num_inference_steps=[20, 20, 20],
        video_num_inference_steps=[10, 10, 10],
        height=768,     
        width=1280,
        temp=31,                    # temp=16: 5s, temp=31: 10s
        guidance_scale=9.0,         # The guidance for the first frame, set it to 7 for 384p variant
        video_guidance_scale=5.0,   # The guidance for the other video latent
        output_type="pil",
        cpu_offloading=True,
        save_memory=True,           # If you have enough GPU memory, set it to `False` to improve vae decoding speed
    )

export_to_video(frames, "./text_to_video_sample.mp4", fps=24)

And the vram starts off low but gradually increases every step until it hits the limit. I have 24 GB of vram.

@dillfrescott
Copy link
Author

Even using app.py, it hits step 14 and starts flooding into shared ram, halting progress.
1

@Ednaordinary
Copy link
Contributor

I'm unable to reproduce using your shared script and the latest commit. Make sure your repo is up to date as the fix for this just got merged yesterday. Disabling system memory fallback (I believe this is something you can do in NVIDIA control panel, though I don't use windows) will also fix this, as the gpu will recognize its time to deallocate when it reaches the max

@dillfrescott
Copy link
Author

Okay thank you!

@dillfrescott
Copy link
Author

I think that was my issue. I disabled the sysmem fallback and it seems to be helping.

@dillfrescott
Copy link
Author

nevermind. Now it says:

Traceback (most recent call last):
  File "text.py", line 23, in <module>
    frames = model.generate(
  File "C:\Users\cross\miniconda3\envs\pyramid\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\cross\Downloads\Pyramid-Flow\pyramid_dit\pyramid_dit_for_video_gen_pipeline.py", line 703, in generate
    intermed_latents = self.generate_one_unit(
  File "C:\Users\cross\miniconda3\envs\pyramid\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\cross\Downloads\Pyramid-Flow\pyramid_dit\pyramid_dit_for_video_gen_pipeline.py", line 285, in generate_one_unit
    noise_pred = self.dit(
  File "C:\Users\cross\miniconda3\envs\pyramid\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\cross\miniconda3\envs\pyramid\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\cross\Downloads\Pyramid-Flow\pyramid_dit\modeling_pyramid_mmdit.py", line 479, in forward
    encoder_hidden_states, hidden_states = block(
  File "C:\Users\cross\miniconda3\envs\pyramid\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\cross\miniconda3\envs\pyramid\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\cross\Downloads\Pyramid-Flow\pyramid_dit\modeling_mmdit_block.py", line 640, in forward
    attn_output, context_attn_output = self.attn(
  File "C:\Users\cross\miniconda3\envs\pyramid\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\cross\miniconda3\envs\pyramid\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\cross\Downloads\Pyramid-Flow\pyramid_dit\modeling_mmdit_block.py", line 548, in forward
    hidden_states, encoder_hidden_states = self.var_len_attn(
  File "C:\Users\cross\Downloads\Pyramid-Flow\pyramid_dit\modeling_mmdit_block.py", line 308, in __call__
    stage_hidden_states = F.scaled_dot_product_attention(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 13.68 GiB. GPU 0 has a total capacty of 23.99 GiB of which 12.48 GiB is free. Of the allocated memory 6.10 GiB is allocated by PyTorch, and 3.78 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

@dillfrescott dillfrescott reopened this Oct 13, 2024
@dillfrescott
Copy link
Author

I dont understand why it is doing this still

@Ednaordinary
Copy link
Contributor

hmm. Do you have the temp, res, or something else set super high? It shouldn't be allocating 14 gb with the script you provided in the original post

@dillfrescott
Copy link
Author

dillfrescott commented Oct 13, 2024

I tried app.py as well and it got to step 17 and crashed (OOM). I have not modified anything.

@dillfrescott
Copy link
Author

I set the steps to 31 because I want a 10 second video, but everything else is the default.

@agronholm
Copy link

I get a crash after step 6. Radeon 7900XTX (24 GB VRAM). 80 GB system RAM.

torch.OutOfMemoryError: HIP out of memory. Tried to allocate 8.46 GiB. GPU 0 has a total capacity of 23.98 GiB of which 7.98 GiB is free. Of the allocated memory 14.00 GiB is allocated by PyTorch, and 1.64 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_HIP_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants