Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems when reproducing the method on Qwen2-7b-instruct #15

Open
jiangshimiao opened this issue Sep 14, 2024 · 3 comments
Open

Problems when reproducing the method on Qwen2-7b-instruct #15

jiangshimiao opened this issue Sep 14, 2024 · 3 comments

Comments

@jiangshimiao
Copy link

Hi all, and thanks for this wondful work.

Through i go smoothly by the example in readme, i was trapped when reproducing the work on Qwen2-7b-instruct. By now i have 3 questions in the gradient step, as follows,

  1. Qwen2-7b-Instruct default max_position_embeddings to 32768, how shold i set the seqlen and maxseqlen parameter? Or seqlen=32768 & maxseqlen=131072(some value that i want to extend to)? seqlen=32768 & maxseqlen=32768?
  2. When runing run_fish.py by setting model from_pretrained(..., device_map="auto",...), it crashes with CUDA OOM. While without the device_map="auto" it would run only on 1 GPU, slow but without OOM. What is the right way to run run-fisher.py on multi-GPU?
  3. When setting the device_map="auto" or "balanced_low_0" or "balanced", there comes several error places looks like param A "on cuda:0" but param B "on cuda:1" (in code A * B, or code func(A)). I'm using it in a wrong way, or it need some further work for variety of models?

The following is my script to run run-fisher.py using 8x A100 GPU.
CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python run-fisher_ori.py --model_name_or_path ../models/Qwen2-7B-Instruct --output_dir ../fisher --dataset ../datasets/wikitext --seqlen 32768 --maxseqlen 32768 --num_examples 16

@shahaamirbader
Copy link

any update on this?

@jiangshimiao
Copy link
Author

any update on this?

Not yet. Waiting for the official answer.

@md-hassan
Copy link

md-hassan commented Nov 23, 2024

Facing this exact same issue with Llama 3.1. Did anyone solve it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants