"ppo_micro_batch_size" missing in examples/grpo_trainer/run_qwen2-7b_seq_balance.sh #145

deter3 · 2025-01-27T12:58:16Z

I figure the config file in examples/grpo_trainer/run_qwen2-7b_seq_balance.sh , missed one line which is

actor_rollout_ref.actor.ppo_micro_batch_size=64 \

please update . And it can be run on 8*A100 .

currently , we have error of "unsupported operand type(s) for %: 'int' and 'NoneType'" when missing "ppo_micro_batch_size" .

Thanks .

The text was updated successfully, but these errors were encountered:

PeterSH6 · 2025-01-27T13:03:16Z

@deter3 This should be missing. When we use seq_balance, we don't need to tune ppo_micro_batch_size anymore.

Instead, we should tune ppo_max_token_len_per_gpu instead. You can refer to performance tuning guide for more information: https://github.com/volcengine/verl/pull/142/files

Your quick fix will not affect the actual training but can pass this wrongly placed assertion.
This error is related to another issue that is fixed in #141.

We'll merge them shortly.

deter3 · 2025-01-27T13:10:23Z

@deter3 This should be missing. When we use seq_balance, we don't need to tune ppo_micro_batch_size anymore.

Instead, we should tune ppo_max_token_len_per_gpu instead. You can refer to performance tuning guide for more information: https://github.com/volcengine/verl/pull/142/files

Your quick fix will not affect the actual training but can pass this wrongly placed assertion. This error is related to another issue that is fixed in #141.

We'll merge them shortly.

thanks for the prompt reply . I will give a try then . quite steep learning curve :) .

PeterSH6 · 2025-01-27T13:13:47Z

It’s never easy to achieve optimal performance :)

deter3 · 2025-01-27T13:50:03Z

Keep trying all kinds of combination , and the instructions and error are quite confusing . below is the one working in 8*a100 80gb . @PeterSH6 is it good to go ? star text line is the one changed from examples config .

set -x

export VLLM_ATTENTION_BACKEND=XFORMERS

python3 -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    data.train_files=$HOME/data/gsm8k/train.parquet \
    data.val_files=$HOME/data/gsm8k/test.parquet \
    data.train_batch_size=1024 \
    data.val_batch_size=1312 \
    data.max_prompt_length=512 \
    data.max_response_length=1024 \
    actor_rollout_ref.model.path=Qwen/Qwen2-7B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=256 \
    **actor_rollout_ref.actor.ppo_micro_batch_size=64 \**
    actor_rollout_ref.actor.use_dynamic_bsz=True \
    actor_rollout_ref.actor.ppo_max_token_len_per_gpu=24000 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.001 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.grad_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.rollout.n=5 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.kl_ctrl.kl_coef=0.001 \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='verl_grpo_example_gsm8k' \
    trainer.experiment_name='qwen2_7b_function_rm_kl1e-3' \
    +trainer.val_before_train=False \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=1 \
    trainer.save_freq=100 \
    trainer.test_freq=10 \
    trainer.total_epochs=15 $@

PeterSH6 · 2025-01-27T13:56:32Z

You don't need that line. Can you pull the latest main and try again?

deter3 · 2025-01-27T14:17:12Z

commit 695bdbb , 8*a100 80gb , working with the config below , which is same as examples/grpo_trainer/run_qwen2-7b_seq_balance.sh

Great thanks to @PeterSH6 , still working at Chinese new year holiday . We all should blame deepseek R1 paper !!!!

set -x

export VLLM_ATTENTION_BACKEND=XFORMERS

python3 -m verl.trainer.main_ppo \
    algorithm.adv_estimator=grpo \
    data.train_files=$HOME/data/gsm8k/train.parquet \
    data.val_files=$HOME/data/gsm8k/test.parquet \
    data.train_batch_size=1024 \
    data.val_batch_size=1312 \
    data.max_prompt_length=512 \
    data.max_response_length=1024 \
    actor_rollout_ref.model.path=Qwen/Qwen2-7B-Instruct \
    actor_rollout_ref.actor.optim.lr=1e-6 \
    actor_rollout_ref.model.use_remove_padding=True \
    actor_rollout_ref.actor.ppo_mini_batch_size=256 \
    actor_rollout_ref.actor.use_dynamic_bsz=True \
    actor_rollout_ref.actor.ppo_max_token_len_per_gpu=24000 \
    actor_rollout_ref.actor.use_kl_loss=True \
    actor_rollout_ref.actor.kl_loss_coef=0.001 \
    actor_rollout_ref.actor.kl_loss_type=low_var_kl \
    actor_rollout_ref.model.enable_gradient_checkpointing=True \
    actor_rollout_ref.actor.fsdp_config.param_offload=False \
    actor_rollout_ref.actor.fsdp_config.grad_offload=False \
    actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
    actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
    actor_rollout_ref.rollout.name=vllm \
    actor_rollout_ref.rollout.gpu_memory_utilization=0.6 \
    actor_rollout_ref.rollout.n=5 \
    actor_rollout_ref.ref.fsdp_config.param_offload=True \
    algorithm.kl_ctrl.kl_coef=0.001 \
    trainer.critic_warmup=0 \
    trainer.logger=['console','wandb'] \
    trainer.project_name='verl_grpo_example_gsm8k' \
    trainer.experiment_name='qwen2_7b_function_rm_kl1e-3' \
    +trainer.val_before_train=False \
    trainer.n_gpus_per_node=8 \
    trainer.nnodes=1 \
    trainer.save_freq=-1 \
    trainer.test_freq=5 \
    trainer.total_epochs=15 $@

deter3 closed this as completed Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"ppo_micro_batch_size" missing in examples/grpo_trainer/run_qwen2-7b_seq_balance.sh #145

"ppo_micro_batch_size" missing in examples/grpo_trainer/run_qwen2-7b_seq_balance.sh #145

deter3 commented Jan 27, 2025

PeterSH6 commented Jan 27, 2025 •

edited

Loading

deter3 commented Jan 27, 2025

PeterSH6 commented Jan 27, 2025

deter3 commented Jan 27, 2025 •

edited

Loading

PeterSH6 commented Jan 27, 2025

deter3 commented Jan 27, 2025

"ppo_micro_batch_size" missing in examples/grpo_trainer/run_qwen2-7b_seq_balance.sh #145

"ppo_micro_batch_size" missing in examples/grpo_trainer/run_qwen2-7b_seq_balance.sh #145

Comments

deter3 commented Jan 27, 2025

PeterSH6 commented Jan 27, 2025 • edited Loading

deter3 commented Jan 27, 2025

PeterSH6 commented Jan 27, 2025

deter3 commented Jan 27, 2025 • edited Loading

PeterSH6 commented Jan 27, 2025

deter3 commented Jan 27, 2025

PeterSH6 commented Jan 27, 2025 •

edited

Loading

deter3 commented Jan 27, 2025 •

edited

Loading