-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA out of memory #9
Comments
model_max_length 设置小一点看看 |
model_max_length设为10也是一样out of memory,请问是怎么在24G卡上跑起来的呢 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
环境与requirements.txt中一致,除了用的是bitsandbytes-0.41.3
在单张80G H100上运行bash fintune_lora_llama3_8B_chat.sh,报错CUDA out of memory
下面是完整的log,请问是什么原因导致的
[2024-10-22 12:37:30,959] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
/home/haojk/miniconda3/envs/guangke/lib/python3.8/site-packages/transformers/deepspeed.py:23: FutureWarning: transformers.deepspeed module is deprecated and will be removed in a future version. Please import deepspeed modules directly from transformers.integrations
warnings.warn(
[2024-10-22 12:37:32,556] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented
[2024-10-22 12:37:32,556] [INFO] [comm.py:594:init_distributed] cdb=None
[2024-10-22 12:37:32,556] [INFO] [comm.py:625:init_distributed] Initializing TorchBackend in DeepSpeed with backend nccl
quantization_config: None
[2024-10-22 12:37:39,510] [INFO] [partition_parameters.py:453:exit] finished initializing model with 8.03B parameters
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4/4 [00:06<00:00, 1.70s/it]
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
['o_proj', 'k_proj', 'q_proj', 'v_proj']
None
trainable params: 54,525,952 || all params: 8,084,787,200 || trainable%: 0.6744265575722265
Loading data...
Formatting inputs...Skip in lazy mode
/home/haojk/miniconda3/envs/guangke/lib/python3.8/site-packages/accelerate/accelerator.py:432: FutureWarning: Passing the following arguments to
Accelerator
is deprecated and will be removed in version 1.0 of Accelerate: dict_keys(['dispatch_batches', 'split_batches', 'even_batches', 'use_seedable_sampler']). Please pass anaccelerate.DataLoaderConfiguration
instead:dataloader_config = DataLoaderConfiguration(dispatch_batches=None, split_batches=False, even_batches=True, use_seedable_sampler=True)
warnings.warn(
Detected kernel version 3.10.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
[WARNING] cpu_adam cuda is missing or is incompatible with installed torch, only cpu ops can be compiled!
Using /home/haojk/.cache/torch_extensions/py38_cu118 as PyTorch extensions root...
Emitting ninja build file /home/haojk/.cache/torch_extensions/py38_cu118/cpu_adam/build.ninja...
Building extension module cpu_adam...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module cpu_adam...
Time to load cpu_adam op: 0.6103599071502686 seconds
Using /home/haojk/.cache/torch_extensions/py38_cu118 as PyTorch extensions root...
Emitting ninja build file /home/haojk/.cache/torch_extensions/py38_cu118/utils/build.ninja...
Building extension module utils...
Allowing ninja to set a default number of workers... (overridable by setting the environment variable MAX_JOBS=N)
ninja: no work to do.
Loading extension module utils...
Time to load utils op: 0.3512094020843506 seconds
Parameter Offload: Total persistent parameters: 266240 in 65 params
Using /home/haojk/.cache/torch_extensions/py38_cu118 as PyTorch extensions root...
No modifications detected for re-loaded extension module utils, skipping build step...
Loading extension module utils...
Time to load utils op: 0.00031566619873046875 seconds
0%| | 0/3300 [00:00<?, ?it/s]Traceback (most recent call last):
File "../finetune_llama3.py", line 434, in
train()
File "../finetune_llama3.py", line 427, in train
trainer.train()
File "/home/haojk/miniconda3/envs/guangke/lib/python3.8/site-packages/transformers/trainer.py", line 1624, in train
return inner_training_loop(
File "/home/haojk/miniconda3/envs/guangke/lib/python3.8/site-packages/transformers/trainer.py", line 1961, in _inner_training_loop
tr_loss_step = self.training_step(model, inputs)
File "/home/haojk/miniconda3/envs/guangke/lib/python3.8/site-packages/transformers/trainer.py", line 2902, in training_step
loss = self.compute_loss(model, inputs)
File "/home/haojk/miniconda3/envs/guangke/lib/python3.8/site-packages/transformers/trainer.py", line 2925, in compute_loss
outputs = model(**inputs)
File "/home/haojk/miniconda3/envs/guangke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
return forward_call(*args, **kwargs)
File "/home/haojk/miniconda3/envs/guangke/lib/python3.8/site-packages/deepspeed/utils/nvtx.py", line 15, in wrapped_fn
ret_val = func(*args, **kwargs)
File "/home/haojk/miniconda3/envs/guangke/lib/python3.8/site-packages/deepspeed/runtime/engine.py", line 1736, in forward
loss = self.module(*inputs, **kwargs)
File "/home/haojk/miniconda3/envs/guangke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/home/haojk/miniconda3/envs/guangke/lib/python3.8/site-packages/peft/peft_model.py", line 918, in forward
return self.base_model(
File "/home/haojk/miniconda3/envs/guangke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/home/haojk/miniconda3/envs/guangke/lib/python3.8/site-packages/peft/tuners/tuners_utils.py", line 94, in forward
return self.model.forward(*args, **kwargs)
File "/home/haojk/miniconda3/envs/guangke/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 1176, in forward
outputs = self.model(
File "/home/haojk/miniconda3/envs/guangke/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
File "/home/haojk/miniconda3/envs/guangke/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 993, in forward
causal_mask = self._update_causal_mask(attention_mask, inputs_embeds)
File "/home/haojk/miniconda3/envs/guangke/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 1074, in _update_causal_mask
causal_mask = self.causal_mask[None, None, :, :].repeat(batch_size, 1, 1, 1).to(dtype) * min_dtype
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 32.00 GiB (GPU 0; 79.11 GiB total capacity; 50.06 GiB already allocated; 20.25 GiB free; 52.69 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
The text was updated successfully, but these errors were encountered: