You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for open sourcing the code! I was trying to run the longbench eval script as shown below, and I seem to run out of VRAM on my A6000. For the passkey, longbench, pplxPG-19, what hardware did you use?
Kernels and end-to-end effiency are evaluated on NVIDIA Ada6000 and RTX4090 GPUs with CUDA version of 12.4 --> Does this only apply to kernel efficiency evaluation, separate from accuracy evaluation? Further, are the kernels solely for perf evaluation, or are they actually integrated into the framework end-to-end?
(quest) ya255@abdelfattah-compute-02:~/projects/DoubleSparse/quest/evaluation/LongBench$ CUDA_VISIBLE_DEVICES=0 python -u pred.py --model longchat-v1.5-7b-32k --task gov_report --quest --token_budget 512 --chunk_size 16
Token will not been saved to git credential helper. Pass `add_to_git_credential=True` if you want to set the git credential as well.
Token is valid (permission: fineGrained).
Your token has been saved to /scratch/ya255/huggingface/token
Login successful
use FlashAttention
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:07<00:00, 3.53s/it]
/home/ya255/.conda/envs/quest/lib/python3.10/site-packages/huggingface_hub/repocard.py:105: UserWarning: Repo card metadata block was not found. Setting CardData to empty.
warnings.warn("Repo card metadata block was not found. Setting CardData to empty.")
0%|█▋ | 1/200 [00:57<3:10:55, 57.57s/it]
Traceback (most recent call last):
File "/home/ya255/projects/DoubleSparse/quest/evaluation/LongBench/pred.py", line 333, in <module>
preds = get_pred(
File "/home/ya255/projects/DoubleSparse/quest/evaluation/LongBench/pred.py", line 176, in get_pred
output = model(
File "/home/ya255/.conda/envs/quest/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ya255/.conda/envs/quest/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ya255/.conda/envs/quest/lib/python3.10/site-packages/transformers/models/llama/modeling_llama.py", line 825, in forward
logits = logits.float()
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 2.25 GiB. GPU 0 has a total capacty of 47.44 GiB of which 383.38 MiB is free. Including non-PyTorch memory, this process has 47.05 GiB memory in use. Of the allocated memory 35.73 GiB is allocated by PyTorch, and 11.01 GiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Thank you!
The text was updated successfully, but these errors were encountered:
Hello,
Thanks for open sourcing the code! I was trying to run the longbench eval script as shown below, and I seem to run out of VRAM on my A6000. For the passkey, longbench, pplxPG-19, what hardware did you use?
Kernels and end-to-end effiency are evaluated on NVIDIA Ada6000 and RTX4090 GPUs with CUDA version of 12.4 --> Does this only apply to kernel efficiency evaluation, separate from accuracy evaluation? Further, are the kernels solely for perf evaluation, or are they actually integrated into the framework end-to-end?
Thank you!
The text was updated successfully, but these errors were encountered: