-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Cohere Command R7B, replace older Command R+ handler #835
Add Cohere Command R7B, replace older Command R+ handler #835
Conversation
Co-authored-by: yxuansu <suyx1201@163.com> Co-authored-by: Jozef Mokry <jozef@cohere.com>
Hi Cohere, Congrats on the release and thanks for contributing the really nice tool calling interface to interact with command models! We have one question before onboarding r7b on the leaderboard: We typically don't deprecate a model unless the next generation alternative releases. With that, would v2 client be able to support full tool usage capability of Command R+ and Command R now or in the future? We do like to see models of different sizes on leaderboard for better comparison :) Thank you. |
Thanks @Fanjia-Yan ! All of our models support the newer version, so I've added
This is +5.1% from the leaderboard score, and the increase mainly comes from the multihop category. Looking into the increase I think part of the reason is that there are bugs in the replaced handler which lowered the overall score. For example the base handler adds a holdout function message with role/content, but the replaced handler expects them as role/message, which would explain the 10% increase in the missed function category. There may be other sources of the increase, too. Another change is the cost per 1K functions - I see an increase of around $2.30 per 1K functions. Is it possible that the price is increased because the multi turn categories now succeed in more cases? |
Thank you Harry! That addresses all my concerns. And yes, higher cost per 1K functions typically means the model attempts more and have less early stop. @HuanzhiMao will review code change and merge that in. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thank you for the PR and congratulations on the new model launch @harry-cohere
Greetings! It's been a while since our last contribution to BFCL, and the new versions and recent improvements are great to see.
This PR adds our latest model, released on Friday. I've also replaced our older models because it simplifies the code within the BFCL framework.
When I run this PR against 3245d9 I get the following results (without REST category sanity checks):
I notice some variance between runs but haven't concluded where this is coming from - I'm confident, however, that the model regularly scores the higher score.
Thanks in advance for reviewing this PR and I look forward to seeing us on the leaderboard!