Add evaluator weighted_alpaca_eval_gpt-4o-mini-2024-07-18 #401

tongyx361 · 2024-08-24T04:58:38Z

Evaluator `weighted_alpaca_eval_gpt-4o-mini-2024-07-18`

weighted_alpaca_eval_gpt-4o-mini-2024-07-18 is the same as weighted_alpaca_eval_gpt4_turbo except for using gpt-4o-mini-2024-07-18.

gpt-4o-mini-2024-07-18 is

much cheaper than gpt-4-1106-preview,
and even cheaper and much faster than Llama-3-70B, but weighted_alpaca_eval_gpt-4o-mini-2024-07-18 has comparable human correlation with Llama-3-70B.

Deficiencies in contribution instructions

I found a bug in calculating prices for new OpenAI APIs and fixed it for gpt-4o-mini-2024-07-18.

I suggest adding instructions about modifying src/alpaca_eval/decoders when contributing evaluators.

Besides, I am not sure if I should commit files under the following directories:

src/alpaca_eval/leaderboards/data_AlpacaEval_2
src/alpaca_eval/metrics/weights
results/${model}/weighted_alpaca_eval_gpt-4o-mini-2024-07-18
docs/data_AlpacaEval_2

For now, I keep the commit minimal. I would love to help improve the documentation if needed.

YannDubs · 2024-08-26T21:30:33Z

great thanks @tongyx361 !

If you evaluated some models using gpt-4o-mini it would be great to also push

src/alpaca_eval/leaderboards/data_AlpacaEval_2
src/alpaca_eval/metrics/weights
results/${model}/weighted_alpaca_eval_gpt-4o-mini-2024-07-18

So that others see the evaluations you ran! I'll merge as is for now given that is already useful.

) * Add evaluator weighted_alpaca_eval_gpt-4o-mini-2024-07-18 * Fix price for gpt-4o-mini-2024-07-18

TingchenFu · 2025-01-15T12:58:05Z

There seems to be a typo in src/alpaca_eval/leaderboards/evaluators/evaluators_leaderboard.csv. It reads the human agreement for gpt-4o is only 0.33

Add evaluator weighted_alpaca_eval_gpt-4o-mini-2024-07-18

bb9d03a

tongyx361 closed this Aug 24, 2024

tongyx361 reopened this Aug 24, 2024

tongyx361 force-pushed the main branch from 3bd2098 to 6fb6d12 Compare August 24, 2024 05:56

Fix price for gpt-4o-mini-2024-07-18

68ad170

tongyx361 force-pushed the main branch from 6fb6d12 to 68ad170 Compare August 24, 2024 05:58

YannDubs merged commit 9136c7f into tatsu-lab:main Aug 26, 2024
1 check passed

LLM-Alignment-sh pushed a commit to LLM-Alignment-sh/alpaca_eval that referenced this pull request Aug 28, 2024

Add evaluator weighted_alpaca_eval_gpt-4o-mini-2024-07-18 (tatsu-lab#401

04466c8

) * Add evaluator weighted_alpaca_eval_gpt-4o-mini-2024-07-18 * Fix price for gpt-4o-mini-2024-07-18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add evaluator weighted_alpaca_eval_gpt-4o-mini-2024-07-18 #401

Add evaluator weighted_alpaca_eval_gpt-4o-mini-2024-07-18 #401

tongyx361 commented Aug 24, 2024 •

edited

Loading

YannDubs commented Aug 26, 2024

TingchenFu commented Jan 15, 2025

Add evaluator weighted_alpaca_eval_gpt-4o-mini-2024-07-18 #401

Add evaluator weighted_alpaca_eval_gpt-4o-mini-2024-07-18 #401

Conversation

tongyx361 commented Aug 24, 2024 • edited Loading

Evaluator weighted_alpaca_eval_gpt-4o-mini-2024-07-18

Deficiencies in contribution instructions

YannDubs commented Aug 26, 2024

TingchenFu commented Jan 15, 2025

tongyx361 commented Aug 24, 2024 •

edited

Loading

Evaluator `weighted_alpaca_eval_gpt-4o-mini-2024-07-18`