Skip to content

Commit

Permalink
[BFCL Dataset Revamp 3/n] Live Dataset Fix (Multiple) (#739)
Browse files Browse the repository at this point in the history
This PR continues #737 a 2-week initiative to **re-scrutinize** across
V3 dataset issues with several objectives:

- Eliminate Ground Truth mismatches against user questions.
- Polish ambiguous prompts that have unclear user intents to eliminate
biased-judgement and saturation.

Following PRs will be rolled out on a daily basis by categories.

---------

Co-authored-by: Huanzhi (Hans) Mao <huanzhimao@gmail.com>
  • Loading branch information
Fanjia-Yan and HuanzhiMao authored Nov 14, 2024
1 parent 34fc3fa commit 79384e7
Show file tree
Hide file tree
Showing 5 changed files with 80 additions and 79 deletions.
1 change: 1 addition & 0 deletions berkeley-function-call-leaderboard/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@

All notable changes to the Berkeley Function Calling Leaderboard will be documented in this file.

- [Nov 13, 2024] [#737](https://github.com/ShishirPatil/gorilla/pull/737), [#739](https://github.com/ShishirPatil/gorilla/pull/739), [#740](https://github.com/ShishirPatil/gorilla/pull/740): Bug fix in the dataset and possible answers for the live and multi-turn categories.
- [Nov 8, 2024] [#720](https://github.com/ShishirPatil/gorilla/pull/720): Add new model `BitAgent/GoGoAgent` to the leaderboard.
- [Oct 30, 2024] [#725](https://github.com/ShishirPatil/gorilla/pull/725), [#733](https://github.com/ShishirPatil/gorilla/pull/733): Update evaluation metric for multi-turn categories:
- Introduce a new response-based checker, which works alongside with the existing state-based checker.
Expand Down
Loading

0 comments on commit 79384e7

Please sign in to comment.