v0.2 release
Highlights
New algorithms and features
- GRPO
- ReMax
- REINFORCE++
- Checkpoint manager for FSDP backend
- Sandbox for reward verification and scoring in PRIME
Performance optimization:
- Remove padding tokens (i.e. sequence packing). Significant throughput increase expected for Llama, Mistral, Gemma, Qwen2 transformer models. Documentation
actor_rollout_ref.model.use_remove_padding=True
critic.model.use_remove_padding=True
- Dynamic batch size. Significant throughput increase for variable length sequences. Documentation and example
actor_rollout_ref.actor.ppo_max_token_len_per_gpu
actor_rollout_ref.rollout.log_prob_max_token_len_per_gpu
actor_rollout_ref.ref.log_prob_max_token_len_per_gpu
critic.ppo_max_token_len_per_gpu
critic.forward_micro_batch_size_per_gpu
reward_model.forward_micro_batch_size_per_gpu
- Sequence parallelism for long context training. Documentation and example
actor_rollout_ref.actor.ulysses_sequence_parallel_size
critic.ulysses_sequence_parallel_size
reward_model.ulysses_sequence_parallel_size
- vllm v0.7+ integration (preview). For the qwen2 ppo example, 25% time reduction in rollout compared to v0.6.3, and 45% time reduction when cuda graph is enabled. Documentation
actor_rollout_ref.rollout.enforce_eager=False
actor_rollout_ref.rollout.free_cache_engine=False
- Liger-kernel integration for SFT. Documentation
model.use_liger=True
Changelog
New Features
-
Algorithm Support:
-
Performance Improvements:
- Enabled dynamic batch size support (#118).
- Added meta device initialization and parallel load for FSDP to avoid OOMs during init (#123).
- Improved gradient accumulation in sequence balance (#141).
- Added ref/RM offload support (#121).
- Added LoRA support for SFT (#127).
- feat: spport rmpad/data-packing in FSDP with transformers (#91)
- Liger kernel integration (#133)
-
Experiment Tracking:
Bug Fixes
-
Critical Fixes:
-
Code Fixes:
Improvements
-
Performance:
-
Miscellaneous:
- Added option to log validation generations to wandb (#177).
Deprecations and Breaking Changes
- Breaking Changes:
Contributors
A big thank you to all the contributors who made this release possible:
@zhanluxianshen @xingyaoww @fzyzcjy @emergenz @openhands-agent @ZSL98 @YSLIU627 @ZefanW @corbt @jaysonfrancis @hiyouga @Jiayi-Pan @hongpeng-guo @eltociear @chujiezheng @PanAndy @zwhe99 @pcmoritz @huiyeruzhou @VPeterV @uygnef @zhiqi-0 @ExtremeViscent @liziniu @nch0w @Cppowboy @TonyLianLong @4332001876 @tyler-romero @ShaohonChen @kinman0224 @willem-bd @bebetterest @WeiXiongUST @dignfei
Pypi package will be soon available! Please let us know on Github if there's a problem extending RL training recipe based on the pip installed version fo verl.
Full Changelog: v0.1...v0.2