Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"ppo_micro_batch_size" missing in examples/grpo_trainer/run_qwen2-7b_seq_balance.sh #145

Closed
deter3 opened this issue Jan 27, 2025 · 6 comments

Comments

@deter3
Copy link

deter3 commented Jan 27, 2025

I figure the config file in examples/grpo_trainer/run_qwen2-7b_seq_balance.sh , missed one line which is

actor_rollout_ref.actor.ppo_micro_batch_size=64 \

please update . And it can be run on 8*A100 .

currently , we have error of "unsupported operand type(s) for %: 'int' and 'NoneType'" when missing "ppo_micro_batch_size" .

Thanks .

@PeterSH6
Copy link
Collaborator

PeterSH6 commented Jan 27, 2025

@deter3 This should be missing. When we use seq_balance, we don't need to tune ppo_micro_batch_size anymore.

Instead, we should tune ppo_max_token_len_per_gpu instead. You can refer to performance tuning guide for more information: https://github.com/volcengine/verl/pull/142/files

Your quick fix will not affect the actual training but can pass this wrongly placed assertion.
This error is related to another issue that is fixed in #141.

We'll merge them shortly.

@deter3
Copy link
Author

deter3 commented Jan 27, 2025

@deter3 This should be missing. When we use seq_balance, we don't need to tune ppo_micro_batch_size anymore.

Instead, we should tune ppo_max_token_len_per_gpu instead. You can refer to performance tuning guide for more information: https://github.com/volcengine/verl/pull/142/files

Your quick fix will not affect the actual training but can pass this wrongly placed assertion. This error is related to another issue that is fixed in #141.

We'll merge them shortly.

thanks for the prompt reply . I will give a try then . quite steep learning curve :) .

@PeterSH6
Copy link
Collaborator

It’s never easy to achieve optimal performance :)

@deter3
Copy link
Author

deter3 commented Jan 27, 2025

Keep trying all kinds of combination , and the instructions and error are quite confusing . below is the one working in 8*a100 80gb . @PeterSH6 is it good to go ? star text line is the one changed from examples config .

set -x

export VLLM_ATTENTION_BACKEND=XFORMERS

python3 -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    data.train_files=$HOME/data/gsm8k/train.parquet \
    data.val_files=$HOME/data/gsm8k/test.parquet \
    data.train_batch_size=1024 \
    data.val_batch_size=1312 \
    data.max_prompt_length=512 \
    data.max_response_length=1024 \
    actor_rollout_ref.model.path=Qwen/Qwen2-7B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=256 \
    **actor_rollout_ref.actor.ppo_micro_batch_size=64 \**
    actor_rollout_ref.actor.use_dynamic_bsz=True \
    actor_rollout_ref.actor.ppo_max_token_len_per_gpu=24000 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.001 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.grad_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.rollout.n=5 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.kl_ctrl.kl_coef=0.001 \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='verl_grpo_example_gsm8k' \
    trainer.experiment_name='qwen2_7b_function_rm_kl1e-3' \
    +trainer.val_before_train=False \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=1 \
    trainer.save_freq=100 \
    trainer.test_freq=10 \
    trainer.total_epochs=15 $@

@PeterSH6
Copy link
Collaborator

You don't need that line. Can you pull the latest main and try again?

@deter3
Copy link
Author

deter3 commented Jan 27, 2025

commit 695bdbb , 8*a100 80gb , working with the config below , which is same as examples/grpo_trainer/run_qwen2-7b_seq_balance.sh

Great thanks to @PeterSH6 , still working at Chinese new year holiday . We all should blame deepseek R1 paper !!!!

set -x

export VLLM_ATTENTION_BACKEND=XFORMERS

python3 -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    data.train_files=$HOME/data/gsm8k/train.parquet \
    data.val_files=$HOME/data/gsm8k/test.parquet \
    data.train_batch_size=1024 \
    data.val_batch_size=1312 \
    data.max_prompt_length=512 \
    data.max_response_length=1024 \
    actor_rollout_ref.model.path=Qwen/Qwen2-7B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=256 \
    actor_rollout_ref.actor.use_dynamic_bsz=True \
    actor_rollout_ref.actor.ppo_max_token_len_per_gpu=24000 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.001 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.grad_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.rollout.n=5 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.kl_ctrl.kl_coef=0.001 \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='verl_grpo_example_gsm8k' \
    trainer.experiment_name='qwen2_7b_function_rm_kl1e-3' \
    +trainer.val_before_train=False \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=1 \
    trainer.save_freq=-1 \
    trainer.test_freq=5 \
    trainer.total_epochs=15 $@

@deter3 deter3 closed this as completed Jan 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants