Skip to content

Commit

Permalink
[misc] feat: support offload parameter and optimizer during rollout (#…
Browse files Browse the repository at this point in the history
…284)

- Fixed FSDP1 model offload
- With `actor_rollout_ref.actor.fsdp_config.param_offload=True \` and
`actor_rollout_ref.actor.fsdp_config.optimizer_offload=True \ `. The GPU
memory utilization can increase to 0.9
- With actor, critic and reference offload all enabled, there will only
be one model copy at a time in the GPU memory. Therefore, we can further
increase the `micro_batch_size_per_gpu` or `max_token_per_gpu`

**Specifically:**
- During rollout, only rollout model and KVCache are in the GPU memory.
- During critic compute values, only the critic model will stay in the
GPU memory while its optimizer and other model states are in CPU main
memory
- During actor update, the actor model, optimizer are stored on GPU
while the reference model and critic model, critic optimizer are
offloaded to CPU.
  • Loading branch information
PeterSH6 authored Feb 17, 2025
1 parent f0e5bdf commit 9db5232
Show file tree
Hide file tree
Showing 30 changed files with 65 additions and 130 deletions.
1 change: 0 additions & 1 deletion docs/examples/config.rst
Original file line number Diff line number Diff line change
Expand Up @@ -93,7 +93,6 @@ Actor/Rollout/Reference Policy
# transformer_layer_cls_to_wrap: None
min_num_params: 0
param_offload: False
grad_offload: False
optimizer_offload: False
fsdp_size: -1
ref:
Expand Down
2 changes: 0 additions & 2 deletions docs/examples/gsm8k_example.rst
Original file line number Diff line number Diff line change
Expand Up @@ -139,7 +139,6 @@ The script of run_deepseek7b_llm.sh
actor_rollout_ref.actor.ppo_mini_batch_size=256 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=16 \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.grad_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.model.enable_gradient_checkpointing=True \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=32 \
Expand All @@ -154,7 +153,6 @@ The script of run_deepseek7b_llm.sh
critic.model.enable_gradient_checkpointing=True \
critic.ppo_micro_batch_size_per_gpu=32 \
critic.model.fsdp_config.param_offload=False \
critic.model.fsdp_config.grad_offload=False \
critic.model.fsdp_config.optimizer_offload=False \
algorithm.kl_ctrl.kl_coef=0.001 \
trainer.critic_warmup=0 \
Expand Down
1 change: 0 additions & 1 deletion examples/grpo_trainer/run_deepseek7b_llm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ python3 -m verl.trainer.main_ppo \
actor_rollout_ref.actor.kl_loss_type=low_var_kl \
actor_rollout_ref.model.enable_gradient_checkpointing=True \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.grad_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=160 \
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
Expand Down
1 change: 0 additions & 1 deletion examples/grpo_trainer/run_deepseek7b_llm_seq_balance.sh
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,6 @@ python3 -m verl.trainer.main_ppo \
actor_rollout_ref.actor.kl_loss_type=low_var_kl \
actor_rollout_ref.model.enable_gradient_checkpointing=True \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.grad_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
actor_rollout_ref.rollout.name=vllm \
Expand Down
1 change: 0 additions & 1 deletion examples/grpo_trainer/run_qwen2-7b.sh
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@ python3 -m verl.trainer.main_ppo \
actor_rollout_ref.actor.kl_loss_type=low_var_kl \
actor_rollout_ref.model.enable_gradient_checkpointing=True \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.grad_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=160 \
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
Expand Down
1 change: 0 additions & 1 deletion examples/grpo_trainer/run_qwen2-7b_seq_balance.sh
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,6 @@ python3 -m verl.trainer.main_ppo \
actor_rollout_ref.actor.kl_loss_type=low_var_kl \
actor_rollout_ref.model.enable_gradient_checkpointing=True \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.grad_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
actor_rollout_ref.rollout.name=vllm \
Expand Down
2 changes: 0 additions & 2 deletions examples/ppo_trainer/run_deepseek7b_llm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ python3 -m verl.trainer.main_ppo \
actor_rollout_ref.actor.ppo_mini_batch_size=256 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=16 \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.grad_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.model.enable_gradient_checkpointing=True \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=32 \
Expand All @@ -28,7 +27,6 @@ python3 -m verl.trainer.main_ppo \
critic.model.enable_gradient_checkpointing=True \
critic.ppo_micro_batch_size_per_gpu=32 \
critic.model.fsdp_config.param_offload=False \
critic.model.fsdp_config.grad_offload=False \
critic.model.fsdp_config.optimizer_offload=False \
algorithm.kl_ctrl.kl_coef=0.001 \
trainer.critic_warmup=0 \
Expand Down
2 changes: 0 additions & 2 deletions examples/ppo_trainer/run_deepseek7b_llm_sp2.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ python3 -m verl.trainer.main_ppo \
actor_rollout_ref.actor.ulysses_sequence_parallel_size=2 \
actor_rollout_ref.model.enable_gradient_checkpointing=True \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.grad_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=64 \
actor_rollout_ref.rollout.tensor_model_parallel_size=4 \
Expand All @@ -30,7 +29,6 @@ python3 -m verl.trainer.main_ppo \
critic.model.enable_gradient_checkpointing=True \
critic.ppo_micro_batch_size_per_gpu=64 \
critic.model.fsdp_config.param_offload=False \
critic.model.fsdp_config.grad_offload=False \
critic.model.fsdp_config.optimizer_offload=False \
algorithm.kl_ctrl.kl_coef=0.001 \
trainer.critic_warmup=0 \
Expand Down
2 changes: 0 additions & 2 deletions examples/ppo_trainer/run_gemma.sh
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ python3 -m verl.trainer.main_ppo \
actor_rollout_ref.actor.ppo_mini_batch_size=128 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.grad_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=4 \
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
Expand All @@ -27,7 +26,6 @@ python3 -m verl.trainer.main_ppo \
critic.model.enable_gradient_checkpointing=False \
critic.ppo_micro_batch_size_per_gpu=4 \
critic.model.fsdp_config.param_offload=False \
critic.model.fsdp_config.grad_offload=False \
critic.model.fsdp_config.optimizer_offload=False \
algorithm.kl_ctrl.kl_coef=0.001 \
trainer.critic_warmup=0 \
Expand Down
2 changes: 0 additions & 2 deletions examples/ppo_trainer/run_qwen2-7b.sh
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,6 @@ python3 -m verl.trainer.main_ppo \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=16 \
actor_rollout_ref.model.enable_gradient_checkpointing=True \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.grad_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=32 \
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
Expand All @@ -36,7 +35,6 @@ python3 -m verl.trainer.main_ppo \
critic.model.enable_gradient_checkpointing=True \
critic.ppo_micro_batch_size_per_gpu=32 \
critic.model.fsdp_config.param_offload=False \
critic.model.fsdp_config.grad_offload=False \
critic.model.fsdp_config.optimizer_offload=False \
algorithm.kl_ctrl.kl_coef=0.001 \
trainer.critic_warmup=0 \
Expand Down
2 changes: 0 additions & 2 deletions examples/ppo_trainer/run_qwen2-7b_rm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,6 @@ python3 -m verl.trainer.main_ppo \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=16 \
actor_rollout_ref.model.enable_gradient_checkpointing=True \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.grad_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=16 \
actor_rollout_ref.rollout.tensor_model_parallel_size=1 \
Expand All @@ -53,7 +52,6 @@ python3 -m verl.trainer.main_ppo \
critic.model.enable_gradient_checkpointing=True \
critic.ppo_micro_batch_size_per_gpu=32 \
critic.model.fsdp_config.param_offload=False \
critic.model.fsdp_config.grad_offload=False \
critic.model.fsdp_config.optimizer_offload=False \
reward_model.enable=True \
reward_model.model.path="$HOME/models/FsfairX-LLaMA3-RM-v0.1" \
Expand Down
2 changes: 0 additions & 2 deletions examples/ppo_trainer/run_qwen2-7b_rm_seq_balance.sh
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,6 @@ python3 -m verl.trainer.main_ppo \
actor_rollout_ref.actor.use_dynamic_bsz=True \
actor_rollout_ref.actor.ppo_max_token_len_per_gpu=24000 \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.grad_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
actor_rollout_ref.rollout.name=vllm \
Expand All @@ -39,7 +38,6 @@ python3 -m verl.trainer.main_ppo \
critic.use_dynamic_bsz=True \
critic.ppo_max_token_len_per_gpu=98304 \
critic.model.fsdp_config.param_offload=False \
critic.model.fsdp_config.grad_offload=False \
critic.model.fsdp_config.optimizer_offload=False \
reward_model.enable=True \
reward_model.model.path=sfairXC/FsfairX-LLaMA3-RM-v0.1\
Expand Down
2 changes: 0 additions & 2 deletions examples/ppo_trainer/run_qwen2-7b_seq_balance.sh
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ python3 -m verl.trainer.main_ppo \
actor_rollout_ref.actor.use_dynamic_bsz=True \
actor_rollout_ref.actor.ppo_max_token_len_per_gpu=24000 \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.grad_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
actor_rollout_ref.rollout.name=vllm \
Expand All @@ -37,7 +36,6 @@ python3 -m verl.trainer.main_ppo \
critic.model.enable_gradient_checkpointing=True \
critic.ppo_max_token_len_per_gpu=98304 \
critic.model.fsdp_config.param_offload=False \
critic.model.fsdp_config.grad_offload=False \
critic.model.fsdp_config.optimizer_offload=False \
algorithm.kl_ctrl.kl_coef=0.001 \
trainer.critic_warmup=0 \
Expand Down
2 changes: 0 additions & 2 deletions examples/ppo_trainer/run_qwen2.5-32b.sh
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ python3 -m verl.trainer.main_ppo \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=8 \
actor_rollout_ref.model.enable_gradient_checkpointing=True \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.grad_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=16 \
actor_rollout_ref.rollout.tensor_model_parallel_size=4 \
Expand All @@ -37,7 +36,6 @@ python3 -m verl.trainer.main_ppo \
critic.model.enable_gradient_checkpointing=False \
critic.ppo_micro_batch_size_per_gpu=8 \
critic.model.fsdp_config.param_offload=False \
critic.model.fsdp_config.grad_offload=False \
critic.model.fsdp_config.optimizer_offload=False \
algorithm.kl_ctrl.kl_coef=0.0001 \
trainer.critic_warmup=0 \
Expand Down
1 change: 0 additions & 1 deletion examples/remax_trainer/run_qwen2.5-3b_seq_balance.sh
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,6 @@ python3 -m verl.trainer.main_ppo \
actor_rollout_ref.actor.kl_loss_type=low_var_kl \
actor_rollout_ref.model.enable_gradient_checkpointing=True \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.grad_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
actor_rollout_ref.rollout.name=vllm \
Expand Down
1 change: 0 additions & 1 deletion examples/remax_trainer/run_qwen2.5-7b_seq_balance.sh
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,6 @@ python3 -m verl.trainer.main_ppo \
actor_rollout_ref.actor.kl_loss_type=low_var_kl \
actor_rollout_ref.model.enable_gradient_checkpointing=True \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.grad_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
actor_rollout_ref.rollout.name=vllm \
Expand Down
2 changes: 0 additions & 2 deletions examples/split_placement/config/ppo_trainer_split.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,6 @@ actor_rollout_ref:
# transformer_layer_cls_to_wrap: None
min_num_params: 0
param_offload: False
grad_offload: False
optimizer_offload: False
fsdp_size: -1
ref:
Expand Down Expand Up @@ -105,7 +104,6 @@ critic:
use_remove_padding: False
fsdp_config:
param_offload: False
grad_offload: False
optimizer_offload: False
wrap_policy:
# transformer_layer_cls_to_wrap: None
Expand Down
2 changes: 0 additions & 2 deletions examples/split_placement/run_deepseek7b_llm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,6 @@ python3 main_ppo_split.py \
actor_rollout_ref.actor.ppo_mini_batch_size=256 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=8 \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.grad_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=8 \
actor_rollout_ref.rollout.tensor_model_parallel_size=4 \
Expand All @@ -25,7 +24,6 @@ python3 main_ppo_split.py \
critic.model.enable_gradient_checkpointing=False \
critic.ppo_micro_batch_size_per_gpu=8 \
critic.model.fsdp_config.param_offload=False \
critic.model.fsdp_config.grad_offload=False \
critic.model.fsdp_config.optimizer_offload=False \
algorithm.kl_ctrl.kl_coef=0.001 \
trainer.critic_warmup=0 \
Expand Down
2 changes: 0 additions & 2 deletions tests/e2e/run_qwen_gsm8k_function_rm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ python3 -m verl.trainer.main_ppo \
actor_rollout_ref.actor.ppo_mini_batch_size=256 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.grad_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=16 \
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
Expand All @@ -29,7 +28,6 @@ python3 -m verl.trainer.main_ppo \
critic.model.enable_gradient_checkpointing=False \
critic.ppo_micro_batch_size_per_gpu=4 \
critic.model.fsdp_config.param_offload=False \
critic.model.fsdp_config.grad_offload=False \
critic.model.fsdp_config.optimizer_offload=False \
algorithm.kl_ctrl.kl_coef=0.001 \
trainer.critic_warmup=0 \
Expand Down
1 change: 0 additions & 1 deletion tests/e2e/run_qwen_gsm8k_function_rm_grpo.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ python3 -m verl.trainer.main_ppo \
actor_rollout_ref.actor.ppo_mini_batch_size=256 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.grad_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=16 \
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
Expand Down
2 changes: 0 additions & 2 deletions tests/e2e/run_qwen_gsm8k_function_rm_no_rmpad.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ python3 -m verl.trainer.main_ppo \
actor_rollout_ref.actor.ppo_mini_batch_size=256 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.grad_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=16 \
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
Expand All @@ -29,7 +28,6 @@ python3 -m verl.trainer.main_ppo \
critic.model.enable_gradient_checkpointing=False \
critic.ppo_micro_batch_size_per_gpu=4 \
critic.model.fsdp_config.param_offload=False \
critic.model.fsdp_config.grad_offload=False \
critic.model.fsdp_config.optimizer_offload=False \
algorithm.kl_ctrl.kl_coef=0.001 \
trainer.critic_warmup=0 \
Expand Down
1 change: 0 additions & 1 deletion tests/e2e/run_qwen_gsm8k_function_rm_remax.sh
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,6 @@ python3 -m verl.trainer.main_ppo \
actor_rollout_ref.actor.ppo_mini_batch_size=256 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.grad_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=16 \
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
Expand Down
2 changes: 0 additions & 2 deletions tests/e2e/run_qwen_gsm8k_model_rm.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ python3 -m verl.trainer.main_ppo \
actor_rollout_ref.actor.ppo_mini_batch_size=256 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.grad_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=16 \
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
Expand All @@ -32,7 +31,6 @@ python3 -m verl.trainer.main_ppo \
critic.model.enable_gradient_checkpointing=False \
critic.ppo_micro_batch_size_per_gpu=4 \
critic.model.fsdp_config.param_offload=False \
critic.model.fsdp_config.grad_offload=False \
critic.model.fsdp_config.optimizer_offload=False \
reward_model.enable=True \
reward_model.model.path=Qwen/Qwen2.5-0.5B\
Expand Down
2 changes: 0 additions & 2 deletions tests/e2e/run_qwen_gsm8k_model_rm_liger_kernel.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ python3 -m verl.trainer.main_ppo \
actor_rollout_ref.actor.ppo_mini_batch_size=256 \
actor_rollout_ref.actor.ppo_micro_batch_size=32 \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.grad_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.log_prob_micro_batch_size=128 \
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
Expand All @@ -33,7 +32,6 @@ python3 -m verl.trainer.main_ppo \
critic.model.enable_gradient_checkpointing=False \
critic.ppo_micro_batch_size=32 \
critic.model.fsdp_config.param_offload=False \
critic.model.fsdp_config.grad_offload=False \
critic.model.fsdp_config.optimizer_offload=False \
reward_model.enable=True \
reward_model.model.path=Qwen/Qwen2.5-0.5B\
Expand Down
2 changes: 0 additions & 2 deletions tests/e2e/run_qwen_gsm8k_model_rm_no_rmpad.sh
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,6 @@ python3 -m verl.trainer.main_ppo \
actor_rollout_ref.actor.ppo_mini_batch_size=256 \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.grad_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=16 \
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
Expand All @@ -32,7 +31,6 @@ python3 -m verl.trainer.main_ppo \
critic.model.enable_gradient_checkpointing=False \
critic.ppo_micro_batch_size_per_gpu=4 \
critic.model.fsdp_config.param_offload=False \
critic.model.fsdp_config.grad_offload=False \
critic.model.fsdp_config.optimizer_offload=False \
reward_model.enable=True \
reward_model.model.path=Qwen/Qwen2.5-0.5B\
Expand Down
2 changes: 0 additions & 2 deletions tests/e2e/run_qwen_gsm8k_model_rm_seq_balance.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ python3 -m verl.trainer.main_ppo \
actor_rollout_ref.actor.use_dynamic_bsz=True \
actor_rollout_ref.actor.ppo_max_token_len_per_gpu=12000 \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.grad_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.rollout.tensor_model_parallel_size=2 \
actor_rollout_ref.rollout.name=vllm \
Expand All @@ -34,7 +33,6 @@ python3 -m verl.trainer.main_ppo \
critic.use_dynamic_bsz=True \
critic.ppo_max_token_len_per_gpu=98304 \
critic.model.fsdp_config.param_offload=False \
critic.model.fsdp_config.grad_offload=False \
critic.model.fsdp_config.optimizer_offload=False \
reward_model.enable=True \
reward_model.model.path=Qwen/Qwen2.5-0.5B\
Expand Down
2 changes: 0 additions & 2 deletions tests/e2e/run_qwen_gsm8k_model_rm_ulysses.sh
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,6 @@ python3 -m verl.trainer.main_ppo \
actor_rollout_ref.actor.ppo_micro_batch_size_per_gpu=4 \
actor_rollout_ref.actor.ulysses_sequence_parallel_size=2 \
actor_rollout_ref.actor.fsdp_config.param_offload=False \
actor_rollout_ref.actor.fsdp_config.grad_offload=False \
actor_rollout_ref.actor.fsdp_config.optimizer_offload=False \
actor_rollout_ref.actor.fsdp_config.fsdp_size=-1 \
actor_rollout_ref.rollout.log_prob_micro_batch_size_per_gpu=16 \
Expand All @@ -35,7 +34,6 @@ python3 -m verl.trainer.main_ppo \
critic.model.enable_gradient_checkpointing=False \
critic.ppo_micro_batch_size_per_gpu=4 \
critic.model.fsdp_config.param_offload=False \
critic.model.fsdp_config.grad_offload=False \
critic.model.fsdp_config.optimizer_offload=False \
critic.model.fsdp_config.fsdp_size=-1 \
reward_model.enable=True \
Expand Down
2 changes: 0 additions & 2 deletions verl/trainer/config/ppo_trainer.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,6 @@ actor_rollout_ref:
# transformer_layer_cls_to_wrap: None
min_num_params: 0
param_offload: False
grad_offload: False
optimizer_offload: False
fsdp_size: -1
ref:
Expand Down Expand Up @@ -105,7 +104,6 @@ critic:
use_remove_padding: False
fsdp_config:
param_offload: False
grad_offload: False
optimizer_offload: False
wrap_policy:
# transformer_layer_cls_to_wrap: None
Expand Down
Loading

0 comments on commit 9db5232

Please sign in to comment.