Add option to log validation generations to wandb #177

corbt · 2025-01-31T16:05:13Z

Motivation

Often the summary of average/max/min reward is not enough information, and it's helpful to look at some real-world generations to see how the model's actual behavior is changing over time. This can be particularly helpful for debugging issues like the generation being cut off before reasoning finishes.

Change

This PR introduces a new trainer.val_generations_to_log_to_wandb config value, with a default of 0. If set to a number larger than 0, it logs that number of inputs/outputs/scores each time the validation set is generated and scored. It uses a wandb Table to do so, adding a single row for each validation set run.

I choose to log the data in this format because it allows a user to easily see how the outputs for a given input change over time by looking down a column vertically.

Screenshot

Note: if there's already another way to accomplish this easily let me know! I was surprised not to find a way to see sample generations because I find that quite useful, so let me know if I'm missing something.

vermouth1992 · 2025-02-01T13:07:21Z

@PeterSH6 Shall we unify the config in e2e ci?

PeterSH6 · 2025-02-01T15:11:56Z

@PeterSH6 Shall we unify the config in e2e ci?

Yes, I think it's necessary

vermouth1992 · 2025-02-05T10:28:43Z

Hi @corbt,

Could you rebase main and this should fix the CI. This feature is important for case study!

corbt · 2025-02-06T00:34:57Z

Sure thing, rebased!

PeterSH6 · 2025-02-06T15:38:24Z

Hi @corbt,

Could you add the val_generations_to_log_to_wandb: 0 config to the ppo_megatron_trainer.yaml to support the megatron backend?

PeterSH6 · 2025-02-09T13:41:45Z

Merged first. Will fix the Megatron ci in the next PR

## Motivation Often the summary of average/max/min reward is not enough information, and it's helpful to look at some real-world generations to see how the model's actual behavior is changing over time. This can be particularly helpful for debugging issues like the generation being cut off before reasoning finishes. ## Change This PR introduces a new `trainer.val_generations_to_log_to_wandb` config value, with a default of 0. If set to a number larger than 0, it logs that number of inputs/outputs/scores each time the validation set is generated and scored. It uses a [wandb Table](https://docs.wandb.ai/guides/track/log/log-tables/) to do so, adding a single row for each validation set run. I choose to log the data in this format because it allows a user to easily see how the outputs for a given input change over time by looking down a column vertically. ## Screenshot <img width="1106" alt="Screenshot 2025-01-31 at 8 02 47 AM" src="https://github.com/user-attachments/assets/f2ec0079-8464-4735-ad63-d71f349f4332" /> Note: if there's already another way to accomplish this easily let me know! I was surprised not to find a way to see sample generations because I find that quite useful, so let me know if I'm missing something.

corbt force-pushed the main branch from d68d481 to 2d977c3 Compare January 31, 2025 16:07

Log validation generations to wandb

96c4bd6

corbt force-pushed the main branch from 49fa6bc to 96c4bd6 Compare February 6, 2025 00:34

PeterSH6 merged commit d0725a6 into volcengine:main Feb 9, 2025
10 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add option to log validation generations to wandb #177

Add option to log validation generations to wandb #177

corbt commented Jan 31, 2025 •

edited

Loading

vermouth1992 commented Feb 1, 2025

PeterSH6 commented Feb 1, 2025

vermouth1992 commented Feb 5, 2025

corbt commented Feb 6, 2025

PeterSH6 commented Feb 6, 2025

PeterSH6 commented Feb 9, 2025

Add option to log validation generations to wandb #177

Add option to log validation generations to wandb #177

Conversation

corbt commented Jan 31, 2025 • edited Loading

Motivation

Change

Screenshot

vermouth1992 commented Feb 1, 2025

PeterSH6 commented Feb 1, 2025

vermouth1992 commented Feb 5, 2025

corbt commented Feb 6, 2025

PeterSH6 commented Feb 6, 2025

PeterSH6 commented Feb 9, 2025

corbt commented Jan 31, 2025 •

edited

Loading