VRAM increasing slightly every step until I run out. #86

dillfrescott · 2024-10-13T20:33:24Z

import torch
from PIL import Image
from pyramid_dit import PyramidDiTForVideoGeneration
from diffusers.utils import load_image, export_to_video

torch.cuda.set_device(0)
model_dtype, torch_dtype = 'bf16', torch.bfloat16   # Use bf16 (not support fp16 yet)

model = PyramidDiTForVideoGeneration(
    'PATH',                                         # The downloaded checkpoint dir
    model_dtype,
    model_variant='diffusion_transformer_768p',     # 'diffusion_transformer_384p'
)

model.vae.enable_tiling()
#model.vae.to("cuda")
#model.dit.to("cuda")
#model.text_encoder.to("cuda")

prompt = "A dog walking on the beach."

with torch.no_grad(), torch.cuda.amp.autocast(enabled=True, dtype=torch_dtype):
    frames = model.generate(
        prompt=prompt,
        num_inference_steps=[20, 20, 20],
        video_num_inference_steps=[10, 10, 10],
        height=768,     
        width=1280,
        temp=31,                    # temp=16: 5s, temp=31: 10s
        guidance_scale=9.0,         # The guidance for the first frame, set it to 7 for 384p variant
        video_guidance_scale=5.0,   # The guidance for the other video latent
        output_type="pil",
        cpu_offloading=True,
        save_memory=True,           # If you have enough GPU memory, set it to `False` to improve vae decoding speed
    )

export_to_video(frames, "./text_to_video_sample.mp4", fps=24)

And the vram starts off low but gradually increases every step until it hits the limit. I have 24 GB of vram.

dillfrescott · 2024-10-13T21:11:54Z

Even using app.py, it hits step 14 and starts flooding into shared ram, halting progress.

Ednaordinary · 2024-10-13T21:16:06Z

I'm unable to reproduce using your shared script and the latest commit. Make sure your repo is up to date as the fix for this just got merged yesterday. Disabling system memory fallback (I believe this is something you can do in NVIDIA control panel, though I don't use windows) will also fix this, as the gpu will recognize its time to deallocate when it reaches the max

dillfrescott · 2024-10-13T21:22:22Z

Okay thank you!

dillfrescott · 2024-10-13T21:25:39Z

I think that was my issue. I disabled the sysmem fallback and it seems to be helping.

dillfrescott · 2024-10-13T21:45:24Z

nevermind. Now it says:

Traceback (most recent call last):
  File "text.py", line 23, in <module>
    frames = model.generate(
  File "C:\Users\cross\miniconda3\envs\pyramid\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\cross\Downloads\Pyramid-Flow\pyramid_dit\pyramid_dit_for_video_gen_pipeline.py", line 703, in generate
    intermed_latents = self.generate_one_unit(
  File "C:\Users\cross\miniconda3\envs\pyramid\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "C:\Users\cross\Downloads\Pyramid-Flow\pyramid_dit\pyramid_dit_for_video_gen_pipeline.py", line 285, in generate_one_unit
    noise_pred = self.dit(
  File "C:\Users\cross\miniconda3\envs\pyramid\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\cross\miniconda3\envs\pyramid\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\cross\Downloads\Pyramid-Flow\pyramid_dit\modeling_pyramid_mmdit.py", line 479, in forward
    encoder_hidden_states, hidden_states = block(
  File "C:\Users\cross\miniconda3\envs\pyramid\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\cross\miniconda3\envs\pyramid\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\cross\Downloads\Pyramid-Flow\pyramid_dit\modeling_mmdit_block.py", line 640, in forward
    attn_output, context_attn_output = self.attn(
  File "C:\Users\cross\miniconda3\envs\pyramid\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "C:\Users\cross\miniconda3\envs\pyramid\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "C:\Users\cross\Downloads\Pyramid-Flow\pyramid_dit\modeling_mmdit_block.py", line 548, in forward
    hidden_states, encoder_hidden_states = self.var_len_attn(
  File "C:\Users\cross\Downloads\Pyramid-Flow\pyramid_dit\modeling_mmdit_block.py", line 308, in __call__
    stage_hidden_states = F.scaled_dot_product_attention(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 13.68 GiB. GPU 0 has a total capacty of 23.99 GiB of which 12.48 GiB is free. Of the allocated memory 6.10 GiB is allocated by PyTorch, and 3.78 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

dillfrescott · 2024-10-13T21:45:46Z

I dont understand why it is doing this still

Ednaordinary · 2024-10-13T21:57:08Z

hmm. Do you have the temp, res, or something else set super high? It shouldn't be allocating 14 gb with the script you provided in the original post

dillfrescott · 2024-10-13T21:59:59Z

I tried app.py as well and it got to step 17 and crashed (OOM). I have not modified anything.

dillfrescott · 2024-10-13T22:05:40Z

I set the steps to 31 because I want a 10 second video, but everything else is the default.

agronholm · 2024-10-13T23:04:28Z

I get a crash after step 6. Radeon 7900XTX (24 GB VRAM). 80 GB system RAM.

torch.OutOfMemoryError: HIP out of memory. Tried to allocate 8.46 GiB. GPU 0 has a total capacity of 23.98 GiB of which 7.98 GiB is free. Of the allocated memory 14.00 GiB is allocated by PyTorch, and 1.64 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_HIP_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

dillfrescott mentioned this issue Oct 13, 2024

Progress in app + sys mem fallback fix #76

Merged

dillfrescott closed this as completed Oct 13, 2024

dillfrescott reopened this Oct 13, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VRAM increasing slightly every step until I run out. #86

VRAM increasing slightly every step until I run out. #86

dillfrescott commented Oct 13, 2024

dillfrescott commented Oct 13, 2024

Ednaordinary commented Oct 13, 2024

dillfrescott commented Oct 13, 2024

dillfrescott commented Oct 13, 2024

dillfrescott commented Oct 13, 2024

dillfrescott commented Oct 13, 2024

Ednaordinary commented Oct 13, 2024

dillfrescott commented Oct 13, 2024 •

edited

Loading

dillfrescott commented Oct 13, 2024

agronholm commented Oct 13, 2024

VRAM increasing slightly every step until I run out. #86

VRAM increasing slightly every step until I run out. #86

Comments

dillfrescott commented Oct 13, 2024

dillfrescott commented Oct 13, 2024

Ednaordinary commented Oct 13, 2024

dillfrescott commented Oct 13, 2024

dillfrescott commented Oct 13, 2024

dillfrescott commented Oct 13, 2024

dillfrescott commented Oct 13, 2024

Ednaordinary commented Oct 13, 2024

dillfrescott commented Oct 13, 2024 • edited Loading

dillfrescott commented Oct 13, 2024

agronholm commented Oct 13, 2024

dillfrescott commented Oct 13, 2024 •

edited

Loading