[fix] Improve the params template for generation #351

BearBiscuit05 · 2025-02-23T11:17:44Z

fix the issue#331

vermouth1992 · 2025-02-23T11:36:44Z

Could you help add a test of QWen 0.5b generation to protect this functionality?

BearBiscuit05 · 2025-02-23T11:49:28Z

Sure, I used Qwen0.5B for testing on a single machine. But in which directory under the "test" directory should I add the test?

vermouth1992 · 2025-02-23T11:55:38Z

Could you create a new folder under test with name "generation". Under the folder, create a new bash script that runs QWen 0.5b for generation. And call the generation script here https://github.com/volcengine/verl/blob/main/.github/workflows/vllm.yml#L49 by creating a new test item. Thanks!

BearBiscuit05 · 2025-02-23T13:21:43Z

Running with 1 GPU works normally, but when setting nproc_per_node > 1, it produces the error Duplicate GPU detected: rank 0 and rank 1 both on CUDA device 31000. I'm unsure whether this is caused by parameter configuration issues or a hardware-related problem. Could you help me identify the root cause?

vermouth1992 · 2025-02-23T14:09:32Z

Could you check the version of ray? And could you successfully run normal PPO training?

BearBiscuit05 · 2025-02-23T14:36:59Z

Ray version is 2.10, and I ran PPO on 2 * A100 successfully. So I think it may be a parameter problem. I will check it tomorrow.

vermouth1992 · 2025-02-23T14:53:13Z

You can either set max_colocate_count to 1 https://github.com/volcengine/verl/blob/main/verl/single_controller/ray/base.py#L55 or upgrade ray to the latest to resolve this problem

BearBiscuit05 · 2025-02-23T15:19:13Z

That's great! I successfully ran the generation with multiple GPUs and TP>1. So, in the test script, should I set TP>1?

[fix] Improve the params template for generation

4b3df79

[ci] feat: add ci for generation

cb24a3c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[fix] Improve the params template for generation #351

[fix] Improve the params template for generation #351

BearBiscuit05 commented Feb 23, 2025

vermouth1992 commented Feb 23, 2025

BearBiscuit05 commented Feb 23, 2025

vermouth1992 commented Feb 23, 2025

BearBiscuit05 commented Feb 23, 2025

vermouth1992 commented Feb 23, 2025 •

edited

Loading

BearBiscuit05 commented Feb 23, 2025

vermouth1992 commented Feb 23, 2025

BearBiscuit05 commented Feb 23, 2025

[fix] Improve the params template for generation #351

Are you sure you want to change the base?

[fix] Improve the params template for generation #351

Conversation

BearBiscuit05 commented Feb 23, 2025

vermouth1992 commented Feb 23, 2025

BearBiscuit05 commented Feb 23, 2025

vermouth1992 commented Feb 23, 2025

BearBiscuit05 commented Feb 23, 2025

vermouth1992 commented Feb 23, 2025 • edited Loading

BearBiscuit05 commented Feb 23, 2025

vermouth1992 commented Feb 23, 2025

BearBiscuit05 commented Feb 23, 2025

vermouth1992 commented Feb 23, 2025 •

edited

Loading