GPU VRAM Requirements for experiments #10

akhauriyash · 2024-09-21T18:07:07Z

Hello,

Thanks for open sourcing the code! I was trying to run the longbench eval script as shown below, and I seem to run out of VRAM on my A6000. For the passkey, longbench, pplxPG-19, what hardware did you use?

Kernels and end-to-end effiency are evaluated on NVIDIA Ada6000 and RTX4090 GPUs with CUDA version of 12.4 --> Does this only apply to kernel efficiency evaluation, separate from accuracy evaluation? Further, are the kernels solely for perf evaluation, or are they actually integrated into the framework end-to-end?

(quest) ya255@abdelfattah-compute-02:~/projects/DoubleSparse/quest/evaluation/LongBench$ CUDA_VISIBLE_DEVICES=0 python -u pred.py --model longchat-v1.5-7b-32k --task gov_report --quest --token_budget 512 --chunk_size 16
Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /scratch/ya255/huggingface/token
Login successful
use FlashAttention
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.53s/it]
/home/ya255/.conda/envs/quest/lib/python3.10/site-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
  warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
  0%|█▋                                                                                                                                                                                                                                                                                                                                         | 1/200 [00:57<3:10:55, 57.57s/it]
Traceback (most recent call last):
  File "/home/ya255/projects/DoubleSparse/quest/evaluation/LongBench/pred.py", line 333, in <module>
    preds = get_pred(
  File "/home/ya255/projects/DoubleSparse/quest/evaluation/LongBench/pred.py", line 176, in get_pred
    output = model(
  File "/home/ya255/.conda/envs/quest/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ya255/.conda/envs/quest/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ya255/.conda/envs/quest/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 825, in forward
    logits = logits.float()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.25 GiB. GPU 0 has a total capacty of 47.44 GiB of which 383.38 MiB is free. Including non-PyTorch memory, this process has 47.05 GiB memory in use. Of the allocated memory 35.73 GiB is allocated by PyTorch, and 11.01 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Thank you!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU VRAM Requirements for experiments #10

GPU VRAM Requirements for experiments #10

akhauriyash commented Sep 21, 2024

GPU VRAM Requirements for experiments #10

GPU VRAM Requirements for experiments #10

Comments

akhauriyash commented Sep 21, 2024