Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add evaluator weighted_alpaca_eval_gpt-4o-mini-2024-07-18 #401

Merged
merged 2 commits into from
Aug 26, 2024

Conversation

tongyx361
Copy link
Contributor

@tongyx361 tongyx361 commented Aug 24, 2024

Evaluator weighted_alpaca_eval_gpt-4o-mini-2024-07-18

weighted_alpaca_eval_gpt-4o-mini-2024-07-18 is the same as weighted_alpaca_eval_gpt4_turbo except for using gpt-4o-mini-2024-07-18.

gpt-4o-mini-2024-07-18 is

  • much cheaper than gpt-4-1106-preview,
  • and even cheaper and much faster than Llama-3-70B, but weighted_alpaca_eval_gpt-4o-mini-2024-07-18 has comparable human correlation with Llama-3-70B.

Deficiencies in contribution instructions

I found a bug in calculating prices for new OpenAI APIs and fixed it for gpt-4o-mini-2024-07-18.

I suggest adding instructions about modifying src/alpaca_eval/decoders when contributing evaluators.

Besides, I am not sure if I should commit files under the following directories:

  1. src/alpaca_eval/leaderboards/data_AlpacaEval_2
  2. src/alpaca_eval/metrics/weights
  3. results/${model}/weighted_alpaca_eval_gpt-4o-mini-2024-07-18
  4. docs/data_AlpacaEval_2

For now, I keep the commit minimal. I would love to help improve the documentation if needed.

@YannDubs
Copy link
Collaborator

great thanks @tongyx361 !

If you evaluated some models using gpt-4o-mini it would be great to also push

src/alpaca_eval/leaderboards/data_AlpacaEval_2
src/alpaca_eval/metrics/weights
results/${model}/weighted_alpaca_eval_gpt-4o-mini-2024-07-18

So that others see the evaluations you ran! I'll merge as is for now given that is already useful.

@YannDubs YannDubs merged commit 9136c7f into tatsu-lab:main Aug 26, 2024
1 check passed
LLM-Alignment-sh pushed a commit to LLM-Alignment-sh/alpaca_eval that referenced this pull request Aug 28, 2024
)

* Add evaluator weighted_alpaca_eval_gpt-4o-mini-2024-07-18

* Fix price for gpt-4o-mini-2024-07-18
@TingchenFu
Copy link

There seems to be a typo in src/alpaca_eval/leaderboards/evaluators/evaluators_leaderboard.csv. It reads the human agreement for gpt-4o is only 0.33

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants