Problems when reproducing the method on Qwen2-7b-instruct #15

jiangshimiao · 2024-09-14T06:32:30Z

Hi all, and thanks for this wondful work.

Through i go smoothly by the example in readme, i was trapped when reproducing the work on Qwen2-7b-instruct. By now i have 3 questions in the gradient step, as follows,

Qwen2-7b-Instruct default max_position_embeddings to 32768, how shold i set the seqlen and maxseqlen parameter? Or seqlen=32768 & maxseqlen=131072(some value that i want to extend to)? seqlen=32768 & maxseqlen=32768?
When runing run_fish.py by setting model from_pretrained(..., device_map="auto",...), it crashes with CUDA OOM. While without the device_map="auto" it would run only on 1 GPU, slow but without OOM. What is the right way to run run-fisher.py on multi-GPU?
When setting the device_map="auto" or "balanced_low_0" or "balanced", there comes several error places looks like param A "on cuda:0" but param B "on cuda:1" (in code A * B, or code func(A)). I'm using it in a wrong way, or it need some further work for variety of models?

The following is my script to run run-fisher.py using 8x A100 GPU.
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python run-fisher_ori.py --model_name_or_path ../models/Qwen2-7B-Instruct --output_dir ../fisher --dataset ../datasets/wikitext --seqlen 32768 --maxseqlen 32768 --num_examples 16

shahaamirbader · 2024-10-01T18:30:03Z

any update on this?

jiangshimiao · 2024-10-10T14:10:43Z

any update on this?

Not yet. Waiting for the official answer.

md-hassan · 2024-11-23T23:38:20Z

Facing this exact same issue with Llama 3.1. Did anyone solve it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problems when reproducing the method on Qwen2-7b-instruct #15

Problems when reproducing the method on Qwen2-7b-instruct #15

jiangshimiao commented Sep 14, 2024

shahaamirbader commented Oct 1, 2024

jiangshimiao commented Oct 10, 2024

md-hassan commented Nov 23, 2024 •

edited

Loading

Problems when reproducing the method on Qwen2-7b-instruct #15

Problems when reproducing the method on Qwen2-7b-instruct #15

Comments

jiangshimiao commented Sep 14, 2024

shahaamirbader commented Oct 1, 2024

jiangshimiao commented Oct 10, 2024

md-hassan commented Nov 23, 2024 • edited Loading

md-hassan commented Nov 23, 2024 •

edited

Loading