Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GPU VRAM Requirements for experiments #10

Open
akhauriyash opened this issue Sep 21, 2024 · 0 comments
Open

GPU VRAM Requirements for experiments #10

akhauriyash opened this issue Sep 21, 2024 · 0 comments

Comments

@akhauriyash
Copy link

Hello,

Thanks for open sourcing the code! I was trying to run the longbench eval script as shown below, and I seem to run out of VRAM on my A6000. For the passkey, longbench, pplxPG-19, what hardware did you use?

Kernels and end-to-end effiency are evaluated on NVIDIA Ada6000 and RTX4090 GPUs with CUDA version of 12.4 --> Does this only apply to kernel efficiency evaluation, separate from accuracy evaluation? Further, are the kernels solely for perf evaluation, or are they actually integrated into the framework end-to-end?

(quest) ya255@abdelfattah-compute-02:~/projects/DoubleSparse/quest/evaluation/LongBench$ CUDA_VISIBLE_DEVICES=0 python -u pred.py --model longchat-v1.5-7b-32k --task gov_report --quest --token_budget 512 --chunk_size 16
Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /scratch/ya255/huggingface/token
Login successful
use FlashAttention
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00,  3.53s/it]
/home/ya255/.conda/envs/quest/lib/python3.10/site-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
  warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
  0%|█▋                                                                                                                                                                                                                                                                                                                                         | 1/200 [00:57<3:10:55, 57.57s/it]
Traceback (most recent call last):
  File "/home/ya255/projects/DoubleSparse/quest/evaluation/LongBench/pred.py", line 333, in <module>
    preds = get_pred(
  File "/home/ya255/projects/DoubleSparse/quest/evaluation/LongBench/pred.py", line 176, in get_pred
    output = model(
  File "/home/ya255/.conda/envs/quest/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/ya255/.conda/envs/quest/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/ya255/.conda/envs/quest/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 825, in forward
    logits = logits.float()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.25 GiB. GPU 0 has a total capacty of 47.44 GiB of which 383.38 MiB is free. Including non-PyTorch memory, this process has 47.05 GiB memory in use. Of the allocated memory 35.73 GiB is allocated by PyTorch, and 11.01 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant