Skip to content

Releases: volcengine/verl

v0.2.0.post2

21 Feb 14:32
fb53278
Compare
Choose a tag to compare

What's Changed

  • Fixed installation issues.
  • Fixed the remove padding flags in the gemma example.

New Contributors

Full Changelog: v0.2...v0.2.0.post2

v0.2 release

15 Feb 15:18
828df7e
Compare
Choose a tag to compare

Highlights

New algorithms and features

Performance optimization:

  • Remove padding tokens (i.e. sequence packing). Significant throughput increase expected for Llama, Mistral, Gemma, Qwen2 transformer models. Documentation
actor_rollout_ref.model.use_remove_padding=True
critic.model.use_remove_padding=True
  • Dynamic batch size. Significant throughput increase for variable length sequences. Documentation and example
actor_rollout_ref.actor.ppo_max_token_len_per_gpu
actor_rollout_ref.rollout.log_prob_max_token_len_per_gpu
actor_rollout_ref.ref.log_prob_max_token_len_per_gpu
critic.ppo_max_token_len_per_gpu
critic.forward_micro_batch_size_per_gpu
reward_model.forward_micro_batch_size_per_gpu
actor_rollout_ref.actor.ulysses_sequence_parallel_size
critic.ulysses_sequence_parallel_size
reward_model.ulysses_sequence_parallel_size
  • vllm v0.7+ integration (preview). For the qwen2 ppo example, 25% time reduction in rollout compared to v0.6.3, and 45% time reduction when cuda graph is enabled. Documentation
actor_rollout_ref.rollout.enforce_eager=False
actor_rollout_ref.rollout.free_cache_engine=False
model.use_liger=True

Changelog

New Features

  1. Algorithm Support:

    • Added support for GRPO algorithm (#124).
    • Implemented REINFORCE++ algorithm (#228).
    • Added ReMax algorithm (#234)
  2. Performance Improvements:

    • Enabled dynamic batch size support (#118).
    • Added meta device initialization and parallel load for FSDP to avoid OOMs during init (#123).
    • Improved gradient accumulation in sequence balance (#141).
    • Added ref/RM offload support (#121).
    • Added LoRA support for SFT (#127).
    • feat: spport rmpad/data-packing in FSDP with transformers (#91)
    • Liger kernel integration (#133)
  3. Experiment Tracking:

    • Integrated SwanLab for experiment tracking with online/offline mode and local dashboard support (#218).
    • Added Mlflow support (#74).

Bug Fixes

  1. Critical Fixes:

    • Fixed checkpoint save with existing directories (#174).
    • Fixed incorrect response_attention_mask in vLLM rollout (#213).
    • Fixed gradient accumulation loss value (#102).
    • Fixed reward model issues with TokenClassification models (#99).
  2. Code Fixes:

    • Fixed redundant non_zero_mask (#152).
    • Fixed validation dp_size (#90).
    • Fixed response_mask index (#60).

Improvements

  1. Performance:

    • Improved memory efficiency in logprobs_from_logits_v2 (#220).
    • Enabled multiprocess dataloader in SFT trainer (#122).
    • Added MFU calculation support (#117).
  2. Miscellaneous:

    • Added option to log validation generations to wandb (#177).

Deprecations and Breaking Changes

  1. Breaking Changes:
    • Changed micro_batch_size to micro_batch_size_per_gpu (#136).
    • Removed @ray.remote on workers to allow inheritance (#61).
    • Refactored old_log_prob into a separate function (#129).

Contributors

A big thank you to all the contributors who made this release possible:
@zhanluxianshen @xingyaoww @fzyzcjy @emergenz @openhands-agent @ZSL98 @YSLIU627 @ZefanW @corbt @jaysonfrancis @hiyouga @Jiayi-Pan @hongpeng-guo @eltociear @chujiezheng @PanAndy @zwhe99 @pcmoritz @huiyeruzhou @VPeterV @uygnef @zhiqi-0 @ExtremeViscent @liziniu @nch0w @Cppowboy @TonyLianLong @4332001876 @tyler-romero @ShaohonChen @kinman0224 @willem-bd @bebetterest @WeiXiongUST @dignfei


Pypi package will be soon available! Please let us know on Github if there's a problem extending RL training recipe based on the pip installed version fo verl.

Full Changelog: v0.1...v0.2

v0.1

11 Dec 16:14
Compare
Choose a tag to compare

What's Changed

  • [misc] feat: update tutorial for opensource version by @PeterSH6 in #4
  • [misc] fix: vllm gpu executor issue when world_size is 1 and typo in doc by @PeterSH6 in #9
  • [ci] feat: add test files for ray hybrid programming model by @PeterSH6 in #23
  • [chore] remove unnecessary updating of _worker_names by @kevin85421 in #19
  • [misc] feat: add gemma example for small scale debug and fix gradient checkpoint in critic by @PeterSH6 in #27
  • [misc] fix issue in hf_weight_loader and fix typo in doc by @PeterSH6 in #30
  • [ci] test lint ci and lint tests dir by @PeterSH6 in #28
  • [example] fix: fix math circular dependency by @eric-haibin-lin in #31
  • [example] fix: make wandb optional dependency. allow extra args in existing scripts by @eric-haibin-lin in #32
  • [docs] feat: add related publications by @eric-haibin-lin in #35
  • [tokenizer] feat: support tokenizers whose pad_token_id is none by @eric-haibin-lin in #36
  • [rollout] feat: support vLLM v0.6.3 and fix hf rollout import issue by @PeterSH6 in #33
  • [distro] feat: add docker support by @eric-haibin-lin in #41
  • [example] add a split placement tutorial by @PeterSH6 in #43
  • [doc] add a new quickstart section by @PeterSH6 in #44
  • [BREAKING][core] move single_controller into verl directory by @PeterSH6 in #45

New Contributors

Full Changelog: v0.1rc...v0.1

v0.1rc

01 Nov 05:10
53bb5d2
Compare
Choose a tag to compare

What's Changed

  • [init] feat: first commit for open source
  • [doc] feat: fix typo and delete deprecated config element by @PeterSH6 in #2
  • [misc] fix: resolve pypi missing directory by @PeterSH6 in #3

Credit To

@PeterSH6 @vermouth1992 @zw0610 @wuxibin89 @YipZLF @namizzz @pengyanghua @eric-haibin-lin @Meteorix and others in Seed Foundation MLSys Team