Releases · volcengine/verl

Remove padding tokens (i.e. sequence packing). Significant throughput increase expected for Llama, Mistral, Gemma, Qwen2 transformer models. Documentation

actor_rollout_ref.model.use_remove_padding=True
critic.model.use_remove_padding=True

Dynamic batch size. Significant throughput increase for variable length sequences. Documentation and example

actor_rollout_ref.actor.ppo_max_token_len_per_gpu
actor_rollout_ref.rollout.log_prob_max_token_len_per_gpu
actor_rollout_ref.ref.log_prob_max_token_len_per_gpu
critic.ppo_max_token_len_per_gpu
critic.forward_micro_batch_size_per_gpu
reward_model.forward_micro_batch_size_per_gpu

Sequence parallelism for long context training. Documentation and example

actor_rollout_ref.actor.ulysses_sequence_parallel_size
critic.ulysses_sequence_parallel_size
reward_model.ulysses_sequence_parallel_size

vllm v0.7+ integration (preview). For the qwen2 ppo example, 25% time reduction in rollout compared to v0.6.3, and 45% time reduction when cuda graph is enabled. Documentation

actor_rollout_ref.rollout.enforce_eager=False
actor_rollout_ref.rollout.free_cache_engine=False

Liger-kernel integration for SFT. Documentation

model.use_liger=True

Changelog

New Features

Algorithm Support:
- Added support for GRPO algorithm (#124).
- Implemented REINFORCE++ algorithm (#228).
- Added ReMax algorithm (#234)
Performance Improvements:
- Enabled dynamic batch size support (#118).
- Added meta device initialization and parallel load for FSDP to avoid OOMs during init (#123).
- Improved gradient accumulation in sequence balance (#141).
- Added ref/RM offload support (#121).
- Added LoRA support for SFT (#127).
- feat: spport rmpad/data-packing in FSDP with transformers (#91)
- Liger kernel integration (#133)
Experiment Tracking:
- Integrated SwanLab for experiment tracking with online/offline mode and local dashboard support (#218).
- Added Mlflow support (#74).

Bug Fixes

Critical Fixes:
- Fixed checkpoint save with existing directories (#174).
- Fixed incorrect response_attention_mask in vLLM rollout (#213).
- Fixed gradient accumulation loss value (#102).
- Fixed reward model issues with TokenClassification models (#99).
Code Fixes:
- Fixed redundant non_zero_mask (#152).
- Fixed validation dp_size (#90).
- Fixed response_mask index (#60).

Improvements

Performance:
- Improved memory efficiency in logprobs_from_logits_v2 (#220).
- Enabled multiprocess dataloader in SFT trainer (#122).
- Added MFU calculation support (#117).
Miscellaneous:
- Added option to log validation generations to wandb (#177).

Deprecations and Breaking Changes

Breaking Changes:
- Changed micro_batch_size to micro_batch_size_per_gpu (#136).
- Removed @ray.remote on workers to allow inheritance (#61).
- Refactored old_log_prob into a separate function (#129).

Contributors

A big thank you to all the contributors who made this release possible:
@zhanluxianshen @xingyaoww @fzyzcjy @emergenz @openhands-agent @ZSL98 @YSLIU627 @ZefanW @corbt @jaysonfrancis @hiyouga @Jiayi-Pan @hongpeng-guo @eltociear @chujiezheng @PanAndy @zwhe99 @pcmoritz @huiyeruzhou @VPeterV @uygnef @zhiqi-0 @ExtremeViscent @liziniu @nch0w @Cppowboy @TonyLianLong @4332001876 @tyler-romero @ShaohonChen @kinman0224 @willem-bd @bebetterest @WeiXiongUST @dignfei

Pypi package will be soon available! Please let us know on Github if there's a problem extending RL training recipe based on the pip installed version fo verl.

Full Changelog: v0.1...v0.2

Contributors

pcmoritz, corbt, and 33 other contributors

Assets 2

11 Dec 16:14

PeterSH6

v0.1

9fa2cfb

v0.1

What's Changed

[misc] feat: update tutorial for opensource version by @PeterSH6 in #4
[misc] fix: vllm gpu executor issue when world_size is 1 and typo in doc by @PeterSH6 in #9
[ci] feat: add test files for ray hybrid programming model by @PeterSH6 in #23
[chore] remove unnecessary updating of _worker_names by @kevin85421 in #19
[misc] feat: add gemma example for small scale debug and fix gradient checkpoint in critic by @PeterSH6 in #27
[misc] fix issue in hf_weight_loader and fix typo in doc by @PeterSH6 in #30
[ci] test lint ci and lint tests dir by @PeterSH6 in #28
[example] fix: fix math circular dependency by @eric-haibin-lin in #31
[example] fix: make wandb optional dependency. allow extra args in existing scripts by @eric-haibin-lin in #32
[docs] feat: add related publications by @eric-haibin-lin in #35
[tokenizer] feat: support tokenizers whose pad_token_id is none by @eric-haibin-lin in #36
[rollout] feat: support vLLM v0.6.3 and fix hf rollout import issue by @PeterSH6 in #33
[distro] feat: add docker support by @eric-haibin-lin in #41
[example] add a split placement tutorial by @PeterSH6 in #43
[doc] add a new quickstart section by @PeterSH6 in #44
[BREAKING][core] move single_controller into verl directory by @PeterSH6 in #45