Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Enhancement: Support for
extra_info
in Reward Calculation (#266)
### **Enhancement: Support for `extra_info` in Reward Calculation** #### **Summary** This update enhances the reward computation process by introducing an additional `extra_info` parameter. This allows users to pass in more contextual information when calculating rewards, improving flexibility for different datasets. #### **Changes Made** - **Updated `_default_compute_score`** to accept an `extra_info` argument: ```python def _default_compute_score(data_source, solution_str, ground_truth, extra_info): ``` - **Modified the reward manager (`naive.py`)** to pass `extra_info` from `data_item.non_tensor_batch` to `compute_score`: ```python extra_info = data_item.non_tensor_batch['extra_info'] score = self.compute_score( data_source=data_source, solution_str=sequences_str, ground_truth=ground_truth, extra_info=extra_info, ) ``` #### **Why This Change?** - Some datasets require additional context beyond `data_source`, `solution_str`, and `ground_truth` for accurate reward computation. - The new `extra_info` field allows users to pass custom metadata, ideally in dictionary form, as specified in the [official documentation](https://verl.readthedocs.io/en/latest/preparation/prepare_data.html). - This change maintains compatibility with existing dataset processing scripts, as they already include the `extra_info` field. #### **Impact** - **Improved flexibility**: Users can now pass additional contextual information, making reward computation more adaptable to different datasets. - **Backward compatibility**: Since all example datasets already include `extra_info`, this update should integrate seamlessly. Let me know if any modifications are needed!
- Loading branch information