Add evaluator weighted_alpaca_eval_gpt-4o-mini-2024-07-18 #401
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Evaluator
weighted_alpaca_eval_gpt-4o-mini-2024-07-18
weighted_alpaca_eval_gpt-4o-mini-2024-07-18
is the same asweighted_alpaca_eval_gpt4_turbo
except for usinggpt-4o-mini-2024-07-18
.gpt-4o-mini-2024-07-18
isgpt-4-1106-preview
,weighted_alpaca_eval_gpt-4o-mini-2024-07-18
has comparable human correlation with Llama-3-70B.Deficiencies in contribution instructions
I found a bug in calculating prices for new OpenAI APIs and fixed it for
gpt-4o-mini-2024-07-18
.I suggest adding instructions about modifying
src/alpaca_eval/decoders
when contributing evaluators.Besides, I am not sure if I should commit files under the following directories:
src/alpaca_eval/leaderboards/data_AlpacaEval_2
src/alpaca_eval/metrics/weights
results/${model}/weighted_alpaca_eval_gpt-4o-mini-2024-07-18
docs/data_AlpacaEval_2
For now, I keep the commit minimal. I would love to help improve the documentation if needed.