Data accompanying the paper "Generative Verifiers: Reward Modeling as Next-Token Prediction".
See our website for an overview of GenRM.
GSM8K data: Copyright (c) 2021 OpenAI, available under MIT Licence
Please cite our paper as:
@article{zhang2024genrm,
title={Generative verifiers: Reward modeling as next-token prediction},
author={Zhang, Lunjun and Hosseini, Arian and Bansal, Hritik and Kazemi, Mehran and Kumar, Aviral and Agarwal, Rishabh},
journal={arXiv preprint arXiv:2408.15240},
year={2024}
}