Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUDA OOM #47

Closed
zhengbi-yong opened this issue Nov 27, 2024 · 8 comments
Closed

CUDA OOM #47

zhengbi-yong opened this issue Nov 27, 2024 · 8 comments

Comments

@zhengbi-yong
Copy link

When I ran python demo.py --input demo_data/lady-running --output_dir demo_tmp --seq_name lady-running
I have torch.OutOfMemoryError

(monst3r)  sisyphus@sisyphus-dual4090  ~/Projects/monst3r   main ±  python demo.py --input demo_data/lady-running --output_dir demo_tmp --seq_name lady-running            
/home/sisyphus/Projects/monst3r/dust3r/cloud_opt/base_opt.py:399: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  @torch.cuda.amp.autocast(enabled=False)
... loading model from checkpoints/MonST3R_PO-TA-S-W_ViTLarge_BaseDecoder_512_dpt.pth
/home/sisyphus/Projects/monst3r/dust3r/model.py:29: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  ckpt = torch.load(model_path, map_location='cpu')
instantiating : AsymmetricCroCo3DStereo(pos_embed='RoPE100', patch_embed_cls='PatchEmbedDust3R', img_size=(512, 512), head_type='dpt', output_mode='pts3d', depth_mode=('exp', -inf, inf), conf_mode=('exp', 1, inf), enc_embed_dim=1024, enc_depth=24, enc_num_heads=16, dec_embed_dim=768, dec_depth=12, dec_num_heads=12, freeze='encoder', landscape_only=False)
Freezing encoder parameters
<All keys matched successfully>
Outputting stuff in demo_tmp
>> Loading a list of 65 items
 - Adding demo_data/lady-running/00000.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00001.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00002.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00003.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00004.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00005.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00006.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00007.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00008.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00009.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00010.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00011.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00012.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00013.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00014.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00015.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00016.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00017.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00018.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00019.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00020.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00021.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00022.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00023.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00024.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00025.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00026.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00027.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00028.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00029.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00030.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00031.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00032.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00033.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00034.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00035.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00036.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00037.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00038.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00039.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00040.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00041.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00042.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00043.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00044.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00045.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00046.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00047.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00048.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00049.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00050.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00051.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00052.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00053.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00054.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00055.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00056.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00057.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00058.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00059.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00060.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00061.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00062.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00063.jpg with resolution 854x480 --> 512x288
 - Adding demo_data/lady-running/00064.jpg with resolution 854x480 --> 512x288
 (Found 65 images)
>> Inference with model on 600 image pairs
  0%|                     | 0/600 [00:00<?, ?it/s]/home/sisyphus/Projects/monst3r/dust3r/inference.py:70: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.cuda.amp.autocast(enabled=bool(use_amp)):
/home/sisyphus/Projects/monst3r/dust3r/model.py:209: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.cuda.amp.autocast(enabled=False):
/home/sisyphus/Projects/monst3r/dust3r/inference.py:74: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.
  with torch.cuda.amp.autocast(enabled=False):
100%|███████████| 600/600 [00:25<00:00, 23.93it/s]
precomputing flow...
/home/sisyphus/Projects/monst3r/third_party/raft.py:64: FutureWarning: You are using `torch.load` with `weights_only=False` (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for `weights_only` will be flipped to `True`. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via `torch.serialization.add_safe_globals`. We recommend you start setting `weights_only=True` for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
  state_dict = torch.load(args.model)
Loaded pretrained RAFT model from third_party/RAFT/models/Tartan-C-T-TSKH-spring540x960-M.pth
  0%|                      | 0/50 [00:00<?, ?it/s]/home/sisyphus/anaconda3/envs/monst3r/lib/python3.11/site-packages/torch/functional.py:534: UserWarning: torch.meshgrid: in an upcoming release, it will be required to pass the indexing argument. (Triggered internally at /opt/conda/conda-bld/pytorch_1729647429097/work/aten/src/ATen/native/TensorShape.cpp:3595.)
  return _VF.meshgrid(tensors, **kwargs)  # type: ignore[attr-defined]
100%|█████████████| 50/50 [00:22<00:00,  2.26it/s]
flow precomputed
100%|███████████| 300/300 [00:11<00:00, 25.06it/s]
propagate in video: 100%|| 65/65 [00:01<00:00, 36
propagate in video: 100%|| 65/65 [00:01<00:00, 37
 init edge (36*,41*) score=np.float64(75.08370971679688)
 init edge (36,39*) score=np.float64(72.14512634277344)
 init edge (36,43*) score=np.float64(71.0331802368164)
 init edge (31*,36) score=np.float64(69.88085174560547)
 init edge (35*,36) score=np.float64(69.50623321533203)
 init edge (33*,36) score=np.float64(68.75778198242188)
 init edge (35,34*) score=np.float64(68.12627410888672)
 init edge (35,32*) score=np.float64(67.92353057861328)
 init edge (36,45*) score=np.float64(65.36090087890625)
 init edge (28*,35) score=np.float64(64.32600402832031)
 init edge (29*,36) score=np.float64(63.106346130371094)
 init edge (27*,36) score=np.float64(62.209922790527344)
 init edge (27,30*) score=np.float64(59.32157897949219)
 init edge (26*,31) score=np.float64(54.36586380004883)
 init edge (39,48*) score=np.float64(54.24318313598633)
 init edge (43,52*) score=np.float64(51.04729080200195)
 init edge (28,25*) score=np.float64(50.92348098754883)
 init edge (27,24*) score=np.float64(50.71531295776367)
 init edge (41,50*) score=np.float64(48.895809173583984)
 init edge (59*,52) score=np.float64(47.21723175048828)
 init edge (59,64*) score=np.float64(45.50202560424805)
 init edge (24,23*) score=np.float64(43.378780364990234)
 init edge (27,22*) score=np.float64(42.379398345947266)
 init edge (21*,24) score=np.float64(38.36738967895508)
 init edge (27,20*) score=np.float64(34.135589599609375)
 init edge (19*,24) score=np.float64(32.27302169799805)
 init edge (18*,19) score=np.float64(28.75078582763672)
 init edge (15*,22) score=np.float64(26.890655517578125)
 init edge (17*,18) score=np.float64(25.864116668701172)
 init edge (16*,23) score=np.float64(24.763416290283203)
 init edge (13*,18) score=np.float64(24.73274040222168)
 init edge (10*,15) score=np.float64(20.788616180419922)
 init edge (11*,16) score=np.float64(19.95123863220215)
 init edge (6*,11) score=np.float64(19.225000381469727)
 init edge (1*,10) score=np.float64(18.201091766357422)
 init edge (4*,11) score=np.float64(17.85396957397461)
 init edge (35,42*) score=np.float64(74.03719329833984)
 init edge (35,40*) score=np.float64(72.90853881835938)
 init edge (35,38*) score=np.float64(72.0782241821289)
 init edge (35,44*) score=np.float64(70.10966491699219)
 init edge (37*,40) score=np.float64(67.8114013671875)
 init edge (37,46*) score=np.float64(61.162193298339844)
 init edge (42,47*) score=np.float64(53.49862289428711)
 init edge (42,51*) score=np.float64(52.294639587402344)
 init edge (40,49*) score=np.float64(51.81222152709961)
 init edge (56*,59) score=np.float64(50.528812408447266)
 init edge (51,60*) score=np.float64(48.431884765625)
 init edge (46,55*) score=np.float64(47.660011291503906)
 init edge (56,61*) score=np.float64(47.41448974609375)
 init edge (59,54*) score=np.float64(47.35690689086914)
 init edge (54,57*) score=np.float64(45.83323669433594)
 init edge (54,63*) score=np.float64(45.600563049316406)
 init edge (14*,17) score=np.float64(28.082401275634766)
 init edge (12*,17) score=np.float64(26.88484764099121)
 init edge (5*,10) score=np.float64(20.78989601135254)
 init edge (2*,11) score=np.float64(20.3303165435791)
 init edge (2,9*) score=np.float64(19.79378318786621)
 init edge (7*,12) score=np.float64(19.537267684936523)
 init edge (3*,12) score=np.float64(19.106111526489258)
 init edge (5,8*) score=np.float64(18.861652374267578)
 init edge (0*,9) score=np.float64(16.9484920501709)
 init edge (53*,60) score=np.float64(50.2554817199707)
 init edge (55,58*) score=np.float64(48.09707260131836)
 init edge (53,62*) score=np.float64(47.796112060546875)
flow loss: 9.280871391296387
 init loss = 0.14837846159934998
Global alignement - optimizing for:
['pw_poses', 'im_depthmaps', 'im_poses', 'im_focals']
 10%| | 30/300 [00:04<00:38,  6.97it/s, lr=0.00913
Traceback (most recent call last):
  File "/home/sisyphus/Projects/monst3r/demo.py", line 338, in <module>
    scene, outfile, imgs = recon_fun(
                           ^^^^^^^^^^
  File "/home/sisyphus/Projects/monst3r/demo.py", line 124, in get_reconstructed_scene
    loss = scene.compute_global_alignment(init='mst', niter=niter, schedule=schedule, lr=lr)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sisyphus/anaconda3/envs/monst3r/lib/python3.11/site-packages/torch/amp/autocast_mode.py", line 44, in decorate_autocast
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/sisyphus/Projects/monst3r/dust3r/cloud_opt/base_opt.py", line 414, in compute_global_alignment
    return global_alignment_loop(self, **kw)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sisyphus/Projects/monst3r/dust3r/cloud_opt/base_opt.py", line 479, in global_alignment_loop
    loss, lr = global_alignment_iter(net, bar.n, niter, lr_base, lr_min, optimizer, schedule, 
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sisyphus/Projects/monst3r/dust3r/cloud_opt/base_opt.py", line 511, in global_alignment_iter
    loss = net(epoch=cur_iter)
           ^^^^^^^^^^^^^^^^^^^
  File "/home/sisyphus/anaconda3/envs/monst3r/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sisyphus/anaconda3/envs/monst3r/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sisyphus/Projects/monst3r/dust3r/cloud_opt/optimizer.py", line 520, in forward
    ego_flow_1_2, _ = self.depth_wrapper(R1, T1, R2, T2, disp_1, K_2, inv_K_1)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sisyphus/anaconda3/envs/monst3r/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1736, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sisyphus/anaconda3/envs/monst3r/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1747, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sisyphus/Projects/monst3r/dust3r/utils/goem_opt.py", line 526, in forward
    return warp_by_disp(src_R, src_t, tgt_R, tgt_t, K, src_disp, self.coord, inv_K, debug_mode, use_depth)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/sisyphus/Projects/monst3r/dust3r/utils/goem_opt.py", line 233, in warp_by_disp
    tgt_coord = torch.matmul(H_mat, coord) + flat_disp * \
                                             ^^^^^^^^^^^^^
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 1014.00 MiB. GPU 0 has a total capacity of 23.54 GiB of which 89.12 MiB is free. Including non-PyTorch memory, this process has 22.29 GiB memory in use. Of the allocated memory 20.25 GiB is allocated by PyTorch, and 1.53 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation.  See documentation for Memory Management  (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

I am using RTX4090

@lxin98
Copy link

lxin98 commented Nov 27, 2024

same issue to me ,still to figure out how to disable the flow_loss,and If only 9 images are used, there won't be this problem

@Junyi42
Copy link
Owner

Junyi42 commented Nov 27, 2024

same issue to me ,still to figure out how to disable the flow_loss,and If only 9 images are used, there won't be this problem

Hi @lxin98,

To disable the flow_loss, you can set flow_loss_weight=0 in the following line:

monst3r/demo.py

Line 357 in ce88f01

flow_loss_weight=0.01,

Currently, running the "lady-running" sequence with the default setup requires 33G VRAM. Therefore, CUDA OOM is expected for hardware with 24GB VRAM (e.g., RTX4090).

These are the tricks to overcome OOM issue:

Hope this helps!

Best

@zhengbi-yong
Copy link
Author

Thank you, I disable the flow_loss, and it worked.

@lxin98
Copy link

lxin98 commented Nov 29, 2024

Thank you for your excellent work and assistance @Junyi42

@npmhung
Copy link

npmhung commented Dec 12, 2024

same issue to me ,still to figure out how to disable the flow_loss,and If only 9 images are used, there won't be this problem

Hi @lxin98,

To disable the flow_loss, you can set flow_loss_weight=0 in the following line:

monst3r/demo.py

Line 357 in ce88f01

flow_loss_weight=0.01,

Currently, running the "lady-running" sequence with the default setup requires 33G VRAM. Therefore, CUDA OOM is expected for hardware with 24GB VRAM (e.g., RTX4090).

These are the tricks to overcome OOM issue:

Hope this helps!

Best

I tried to turn off the flow-based loss. However, after turning it off, the result is much worse for the lady running demo. The model could no longer detect the lady as moving object anymore. The dynamic mask is completly black, the fact that indicate the whole scene is static.

Noted: I only used the first 30 frames of the whole video.

@lxin98
Copy link

lxin98 commented Dec 13, 2024

@npmhung
That's right, I've also encountered this situation

@Junyi42
Copy link
Owner

Junyi42 commented Dec 26, 2024

Hi,

I have updated an implementation for memory-efficient global alignment: #59 which comes with less effect on performance.

Hope this helps.

@YunjieYu
Copy link
Contributor

YunjieYu commented Mar 5, 2025

@npmhung @lxin98 @zhengbi-yong Hi, everyone, I just submitted a merge request for a window-wise optimization. Waiting for the author to review #72. Now one can directly optimize a long video with a larger number of frames, and obtain expected results. Please enjoy these changes!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants