Releases: allenai/reward-bench
Releases · allenai/reward-bench
v0.1.3 -- Tons of CLI logging improvements!
rewardbench
CLI can be run on any instruction dataset with fancy logging of scores.
This makes it so rewardbench
can be used to quickly throw together a rejection sampling pipeline once give generations.
Specifically, I think this type of logging is really great for evaluation. It’s something wandb does for training, but when using the CLI, you pass one arg that will save:
- All the scores, input text, etc to HuggingFace
- The command used to launch the eval
- The current python env for reproducibility
Examples are in the readme: https://github.com/allenai/reward-bench?tab=readme-ov-file#logging
What's Changed
- Clean, minor fixes, and release 0.1.2 by @natolambert in #139
- Fix DPO prompts by @natolambert in #142
- New super secret models by @natolambert in #141
- Minor fixes, new dockerfile, new models by @natolambert in #144
- Fix llama3 quantization for DPO models by @natolambert in #145
- Fix small bugs by @natolambert in #148
- Add GRM classes by @YangRui2015 in #151
- New models + dockerfile by @natolambert in #152
- Add Claude 3.5 Sonnet by @natolambert in #153
- fix padding for GRM class by @YangRui2015 in #154
- Add bfloat16 support natively by @natolambert in #155
- Add generative models by @natolambert in #156
- Add InternLM2 RMs by @natolambert in #157
- Bump generative models by @natolambert in #160
- added offsetbias execute prompt and judgement process code by @sanghyuk-choi in #159
- small gen pr by @natolambert in #161
- Bos fix by @natolambert in #166
- Add automatic Beaker Images by @natolambert in #167
- Small bumps by @natolambert in #168
- Add attn_implementation support by @chrisliu298 in #170
- Fixes in run_generative, new models by @natolambert in #171
- fix vllm version by @natolambert in #172
- Delete training by @natolambert in #174
- Mirror change from leaderboard by @natolambert in #175
- Add models by @natolambert in #179
- Add o1 and other model by @natolambert in #181
- Support loading model from wandb by @vwxyzjn in #184
- add_con-j_support_code by @YeZiyi1998 in #183
- Bump requirements and generative improvements by @natolambert in #190
- Support upload metadata to hf by @vwxyzjn in #188
- Bump Cuda version by @natolambert in #191
- Typo and VLLM generalization by @natolambert in #192
- Add better logging and functionality with instructions to CLI by @natolambert in #193
- Tweak ArmorRM implementation, add args to CLI by @natolambert in #194
New Contributors
- @YangRui2015 made their first contribution in #151
- @sanghyuk-choi made their first contribution in #159
- @chrisliu298 made their first contribution in #170
- @vwxyzjn made their first contribution in #184
- @YeZiyi1998 made their first contribution in #183
Full Changelog: v0.1.2...v0.1.3