Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add option to log validation generations to wandb #177

Merged
merged 1 commit into from
Feb 9, 2025

Conversation

corbt
Copy link
Contributor

@corbt corbt commented Jan 31, 2025

Motivation

Often the summary of average/max/min reward is not enough information, and it's helpful to look at some real-world generations to see how the model's actual behavior is changing over time. This can be particularly helpful for debugging issues like the generation being cut off before reasoning finishes.

Change

This PR introduces a new trainer.val_generations_to_log_to_wandb config value, with a default of 0. If set to a number larger than 0, it logs that number of inputs/outputs/scores each time the validation set is generated and scored. It uses a wandb Table to do so, adding a single row for each validation set run.

I choose to log the data in this format because it allows a user to easily see how the outputs for a given input change over time by looking down a column vertically.

Screenshot

Screenshot 2025-01-31 at 8 02 47 AM

Note: if there's already another way to accomplish this easily let me know! I was surprised not to find a way to see sample generations because I find that quite useful, so let me know if I'm missing something.

@vermouth1992
Copy link
Collaborator

@PeterSH6 Shall we unify the config in e2e ci?

@PeterSH6
Copy link
Collaborator

PeterSH6 commented Feb 1, 2025

@PeterSH6 Shall we unify the config in e2e ci?

Yes, I think it's necessary

@vermouth1992
Copy link
Collaborator

Hi @corbt,

Could you rebase main and this should fix the CI. This feature is important for case study!

@corbt
Copy link
Contributor Author

corbt commented Feb 6, 2025

Sure thing, rebased!

@PeterSH6
Copy link
Collaborator

PeterSH6 commented Feb 6, 2025

Hi @corbt,

Could you add the val_generations_to_log_to_wandb: 0 config to the ppo_megatron_trainer.yaml to support the megatron backend?

@PeterSH6
Copy link
Collaborator

PeterSH6 commented Feb 9, 2025

Merged first. Will fix the Megatron ci in the next PR

@PeterSH6 PeterSH6 merged commit d0725a6 into volcengine:main Feb 9, 2025
10 of 11 checks passed
as12138 pushed a commit to as12138/verl that referenced this pull request Feb 20, 2025
## Motivation

Often the summary of average/max/min reward is not enough information,
and it's helpful to look at some real-world generations to see how the
model's actual behavior is changing over time. This can be particularly
helpful for debugging issues like the generation being cut off before
reasoning finishes.

## Change

This PR introduces a new `trainer.val_generations_to_log_to_wandb`
config value, with a default of 0. If set to a number larger than 0, it
logs that number of inputs/outputs/scores each time the validation set
is generated and scored. It uses a [wandb
Table](https://docs.wandb.ai/guides/track/log/log-tables/) to do so,
adding a single row for each validation set run.

I choose to log the data in this format because it allows a user to
easily see how the outputs for a given input change over time by looking
down a column vertically.

## Screenshot

<img width="1106" alt="Screenshot 2025-01-31 at 8 02 47 AM"
src="https://github.com/user-attachments/assets/f2ec0079-8464-4735-ad63-d71f349f4332"
/>

Note: if there's already another way to accomplish this easily let me
know! I was surprised not to find a way to see sample generations
because I find that quite useful, so let me know if I'm missing
something.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants