Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Reward model #1271

Merged
merged 10 commits into from
Dec 12, 2024
Merged

feat: Reward model #1271

merged 10 commits into from
Dec 12, 2024

Conversation

Asher-hss
Copy link
Collaborator

Description

#889

Motivation and Context

#889

  • I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds core functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)
  • Example (update in the folder of example)

Implemented Tasks

  • Subtask 1
  • Subtask 2
  • Subtask 3

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

  • I have read the CONTRIBUTION guide. (required)
  • My change requires a change to the documentation.
  • I have updated the tests accordingly. (required for a bug fix or a new feature)
  • I have updated the documentation accordingly.

@Asher-hss Asher-hss self-assigned this Dec 4, 2024
@Asher-hss Asher-hss requested a review from Wendong-Fan December 4, 2024 10:36
@Wendong-Fan Wendong-Fan changed the title Reward model feat: Reward model Dec 4, 2024
@Wendong-Fan Wendong-Fan added the Model Related to backend models label Dec 4, 2024
@Wendong-Fan Wendong-Fan added this to the Sprint 18 milestone Dec 4, 2024
@Wendong-Fan Wendong-Fan linked an issue Dec 4, 2024 that may be closed by this pull request
2 tasks
if logprobs
else {}
)
except (KeyError, IndexError):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a log here?

Suggested change
except (KeyError, IndexError):
except (KeyError, IndexError):
logging.error(f"Error parsing scores: {e}")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think we shouldn't add the error msg into score dict, better raise error

camel/models/reward/evaluator.py Show resolved Hide resolved
Copy link
Collaborator

@koch3092 koch3092 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @Asher-hss , left some comments below.

camel/models/reward/base_reward_model.py Show resolved Hide resolved
camel/models/reward/nemetro_model.py Show resolved Hide resolved
Copy link
Member

@Wendong-Fan Wendong-Fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Asher-hss , as mentioned in the issue, we also want to support skywork (Seq. Classifier), could you also add this model?

@Asher-hss
Copy link
Collaborator Author

Thanks @Asher-hss , as mentioned in the issue, we also want to support skywork (Seq. Classifier), could you also add this model?

Thanks wendong,I plan to update this model today.

Copy link
Member

@Wendong-Fan Wendong-Fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Asher-hss ! Left some comments below and added one more commit here 58d7278 based on the review, feel free to review the change, I think we can add the skywork model in another PR to let this PR merged first, but please add unit test first before the merge, thanks!

camel/models/reward/base_reward_model.py Outdated Show resolved Hide resolved
camel/models/reward/evaluator.py Outdated Show resolved Hide resolved
camel/models/reward/evaluator.py Outdated Show resolved Hide resolved
camel/models/reward/evaluator.py Show resolved Hide resolved
camel/models/reward/evaluator.py Outdated Show resolved Hide resolved
)

@api_keys_required("NVIDIA_API_KEY")
def evaluate(self, messages: List[OpenAIMessage]) -> Dict[str, float]:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

set messages type as List[Dict[str, str]] would suit for more general use case

camel/models/reward/nemetro_model.py Outdated Show resolved Hide resolved
@Wendong-Fan Wendong-Fan merged commit b3a2fde into master Dec 12, 2024
4 of 6 checks passed
@Wendong-Fan Wendong-Fan deleted the reward_model branch December 12, 2024 19:34
@lightaime
Copy link
Member

The model is called [nemotron](https://build.nvidia.com/nvidia/nemotron-4-340b-instruct) not nemetrocan you fix that asap?

@Asher-hss
Copy link
Collaborator Author

The model is called [nemotron](https://build.nvidia.com/nvidia/nemotron-4-340b-instruct) not nemetrocan you fix that asap?

I create a new pr to fix it now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Model Related to backend models
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

[Feature Request] Reward models
5 participants