How to Train: torch.cuda.OutOfMemoryError: CUDA out of memory #144

Shaggy0815 · 2024-08-18T08:58:27Z

Hi all,
I am following the instructions to install pixart-sigma on my local linux server. On 1.3 You are ready to train! I am getting now the following error:

python3 -m torch.distributed.launch --nproc_per_node=1 --master_port=12345           train_scripts/train.py           configs/pixart_sigma_config/PixArt_sigma_xl2_img512_internalms.py           --load-from output/pretrained_models/PixArt-Sigma-XL-2-512-MS.pth           --work-dir output/your_first_pixart-exp           --debug
/home/test/.local/lib/python3.10/site-packages/torch/distributed/launch.py:181: FutureWarning: The module torch.distributed.launch is deprecated
and will be removed in future. Use torchrun.
Note that --use-env is set by default in torchrun.
If your script expects `--local-rank` argument to be set, please
change it to read from `os.environ['LOCAL_RANK']` instead. See 
https://pytorch.org/docs/stable/distributed.html#launch-utility for 
further instructions

  warnings.warn(
/home/test/.local/lib/python3.10/site-packages/mmcv/__init__.py:20: UserWarning: On January 1, 2023, MMCV will release v2.0.0, in which it will remove components related to the training process and add a data transformation module. In addition, it will rename the package names mmcv to mmcv-lite and mmcv-full to mmcv. See https://github.com/open-mmlab/mmcv/blob/master/docs/en/compatibility.md for more details.
  warnings.warn(
2024-08-18 10:48:52,906 - PixArt - INFO - Distributed environment: MULTI_GPU  Backend: nccl
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda:0

Mixed precision type: fp16

2024-08-18 10:48:52,941 - PixArt - INFO - Config: 
data_root = 'pixart-sigma-toy-dataset'
data = dict(
    type='InternalDataMSSigma',
    root='InternData',
    image_list_json=['data_info.json'],
    transform='default_train',
    load_vae_feat=False,
    load_t5_feat=False)
image_size = 512
train_batch_size = 2
eval_batch_size = 16
use_fsdp = False
valid_num = 0
fp32_attention = True
model = 'PixArtMS_XL_2'
aspect_ratio_type = 'ASPECT_RATIO_512'
multi_scale = True
pe_interpolation = 1.0
qk_norm = False
kv_compress = False
kv_compress_config = dict(sampling=None, scale_factor=1, kv_compress_layer=[])
num_workers = 10
train_sampling_steps = 1000
visualize = True
deterministic_validation = False
eval_sampling_steps = 500
model_max_length = 300
lora_rank = 4
num_epochs = 10
gradient_accumulation_steps = 1
grad_checkpointing = True
gradient_clip = 0.01
gc_step = 1
auto_lr = dict(rule='sqrt')
validation_prompts = [
    'dog',
    'portrait photo of a girl, photograph, highly detailed face, depth of field',
    'Self-portrait oil painting, a beautiful cyborg with golden hair, 8k',
    'Astronaut in a jungle, cold color palette, muted colors, detailed, 8k',
    'A photo of beautiful mountain with realistic sunset and blue lake, highly detailed, masterpiece'
]
optimizer = dict(
    type='CAMEWrapper',
    lr=2e-05,
    weight_decay=0.0,
    eps=(1e-30, 1e-16),
    betas=(0.9, 0.999, 0.9999))
lr_schedule = 'constant'
lr_schedule_args = dict(num_warmup_steps=1000)
save_image_epochs = 1
save_model_epochs = 5
save_model_steps = 2500
sample_posterior = True
mixed_precision = 'fp16'
scale_factor = 0.13025
ema_rate = 0.9999
tensorboard_mox_interval = 50
log_interval = 1
cfg_scale = 4
mask_type = 'null'
num_group_tokens = 0
mask_loss_coef = 0.0
load_mask_index = False
vae_pretrained = 'output/pretrained_models/pixart_sigma_sdxlvae_T5_diffusers/vae'
load_from = 'output/pretrained_models/PixArt-Sigma-XL-2-512-MS.pth'
resume_from = None
snr_loss = False
real_prompt_ratio = 0.5
class_dropout_prob = 0.1
work_dir = 'output/your_first_pixart-exp'
s3_work_dir = None
micro_condition = False
seed = 43
skip_step = 0
loss_type = 'huber'
huber_c = 0.001
num_ddim_timesteps = 50
w_max = 15.0
w_min = 3.0
ema_decay = 0.95
image_list_json = ['data_info.json']

2024-08-18 10:48:52,941 - PixArt - INFO - World_size: 1, seed: 43
2024-08-18 10:48:52,941 - PixArt - INFO - Initializing: DDP for training
You are using the default legacy behaviour of the <class 'transformers.models.t5.tokenization_t5.T5Tokenizer'>. This is expected, and simply means that the `legacy` (previous) behavior will be used so nothing changes for you. If you want to use the new behaviour, set `legacy=False`. This should only be set if you understand what it means, and thouroughly read the reason why this was added as explained in https://github.com/huggingface/transformers/pull/24565
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████| 2/2 [00:12<00:00,  6.28s/it]
Traceback (most recent call last):
  File "/home/test/AI/PixArt-sigma/train_scripts/train.py", line 359, in <module>
    args.pipeline_load_from, subfolder="text_encoder", torch_dtype=torch.float16).to(accelerator.device)
  File "/home/test/.local/lib/python3.10/site-packages/transformers/modeling_utils.py", line 2460, in to
    return super().to(*args, **kwargs)
  File "/home/test/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1145, in to
    return self._apply(convert)
  File "/home/test/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/test/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/home/test/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  [Previous line repeated 4 more times]
  File "/home/test/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 820, in _apply
    param_applied = fn(param)
  File "/home/test/.local/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 80.00 MiB (GPU 0; 7.78 GiB total capacity; 7.17 GiB already allocated; 69.38 MiB free; 7.18 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 8550) of binary: /usr/bin/python3
Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/test/.local/lib/python3.10/site-packages/torch/distributed/launch.py", line 196, in <module>
    main()
  File "/home/test/.local/lib/python3.10/site-packages/torch/distributed/launch.py", line 192, in main
    launch(args)
  File "/home/test/.local/lib/python3.10/site-packages/torch/distributed/launch.py", line 177, in launch
    run(args)
  File "/home/test/.local/lib/python3.10/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/home/test/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/home/test/.local/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError: 
============================================================
train_scripts/train.py FAILED
------------------------------------------------------------
Failures:
  <NO_OTHER_FAILURES>
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2024-08-18_10:49:19
  host      : test-MD34764
  rank      : 0 (local_rank: 0)
  exitcode  : 1 (pid: 8550)
  error_file: <N/A>
  traceback : To enable traceback see: https://pytorch.org/docs/stable/elastic/errors.html
============================================================

GPU is nvidia 2060 super with 8GB

I understand that there is an issue with the memory allocation however not sure howto resolve it and in first place (at least in my mind) it should not happen anyway :)

Anyone can help please?
Thanks a lot in advance

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to Train: torch.cuda.OutOfMemoryError: CUDA out of memory #144

How to Train: torch.cuda.OutOfMemoryError: CUDA out of memory #144

Shaggy0815 commented Aug 18, 2024 •

edited

Loading

How to Train: torch.cuda.OutOfMemoryError: CUDA out of memory #144

How to Train: torch.cuda.OutOfMemoryError: CUDA out of memory #144

Comments

Shaggy0815 commented Aug 18, 2024 • edited Loading

Shaggy0815 commented Aug 18, 2024 •

edited

Loading