-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Questeval #40
Questeval #40
Conversation
Waiting for SAFEval update to finish.
+ requirements with QuestEval
@padipadou : Thanks, looks mostly good to me! I'd have two points:
One small note: I made some changes to |
Hello @tuetschek , About unit tests: great indeed, I will add them this week (after or before the merge request accepted, up to you) Tell me what you think ! :) |
Waiting for SAFEval update to finish.
+ requirements with QuestEval
# Conflicts: # run_metrics.py
…erent from repo name in case of git install requirement. 2 cases: - no package name, intuiting it from repo name. - a package name is given by the special keyword "#egg" because it is different from repo name.
For now:
|
Thanks for all this work! I wanted to chime in wrt Could we somehow use this file to determine the task? (you will have to add QuestEval here as well btw to get it rendered on the website) In the same vain, @tuetschek, we could get rid of the I also just noticed that our test outputs don't have a |
Thanks for the quick and clear answers !! |
…nto questeval � Conflicts: � README.md � gem_metrics/__init__.py
Hi @padipadou, thank you for all the updates! My comment wrt gem_id was more targeted at @tuetschek since this is a functionality we need independent from metrics. I think it definitely makes sense to refactor the current setup of the framework to just look up a task in the eval_config where it can have a specified language + high level task. |
@padipadou : Thanks, looking good! Re. |
yes, the gem-id argument is unrelated, I just noticed that our test examples do not include them. Hence my initial comment. Wrt config file - you can find a version at https://github.com/GEM-benchmark/GEM-benchmark.github.io/blob/main/web/results/eval_config.json |
Thanks! I'll open a new issue regarding the config file. |
Adding a new metric, QuestEval, which is using source and / or references to evaluate models.
This metric needs to know on which task the model is evaluated, if the task is not specified in data, QuestEval will use a general model to evaluate which might not fit well (e.g. Textual QA models dont work well on tables)
minor typos in README