You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Through i go smoothly by the example in readme, i was trapped when reproducing the work on Qwen2-7b-instruct. By now i have 3 questions in the gradient step, as follows,
Qwen2-7b-Instruct default max_position_embeddings to 32768, how shold i set the seqlen and maxseqlen parameter? Or seqlen=32768 & maxseqlen=131072(some value that i want to extend to)? seqlen=32768 & maxseqlen=32768?
When runing run_fish.py by setting model from_pretrained(..., device_map="auto",...), it crashes with CUDA OOM. While without the device_map="auto" it would run only on 1 GPU, slow but without OOM. What is the right way to run run-fisher.py on multi-GPU?
When setting the device_map="auto" or "balanced_low_0" or "balanced", there comes several error places looks like param A "on cuda:0" but param B "on cuda:1" (in code A * B, or code func(A)). I'm using it in a wrong way, or it need some further work for variety of models?
The following is my script to run run-fisher.py using 8x A100 GPU.
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python run-fisher_ori.py --model_name_or_path ../models/Qwen2-7B-Instruct --output_dir ../fisher --dataset ../datasets/wikitext --seqlen 32768 --maxseqlen 32768 --num_examples 16
The text was updated successfully, but these errors were encountered:
Hi all, and thanks for this wondful work.
Through i go smoothly by the example in readme, i was trapped when reproducing the work on Qwen2-7b-instruct. By now i have 3 questions in the gradient step, as follows,
The following is my script to run run-fisher.py using 8x A100 GPU.
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python run-fisher_ori.py --model_name_or_path ../models/Qwen2-7B-Instruct --output_dir ../fisher --dataset ../datasets/wikitext --seqlen 32768 --maxseqlen 32768 --num_examples 16
The text was updated successfully, but these errors were encountered: