-
Notifications
You must be signed in to change notification settings - Fork 109
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ar support for MBZUAI-arabic-mmlu #209
Conversation
Hi, thanks for your PR! |
In the meantime, please make sure that the styling is correct :) |
Hi! I think once you update the PR to the new format for metrics, prompts and functions, we'll be good to go! |
prompt_function=mbzuai_arabic_mmlu, | ||
suite=["community"], | ||
hf_repo="MBZUAI/ArabicMMLU", | ||
hf_subset="default", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default
is the same as test
subset
Tnx @clefourrier for the tag and thanks dear @bakrianoo for your valuable contribution 🤗 I have one remark to make related to the comment i left above : |
I agree with the comment - if you can setup your dataset to have different splits for few shots, it will avoid context contamination. You also need to run the linters to get code quality |
Thank you @alielfilali01 for your comment. I am wondering if creating a new dataset would violate any license issues for the original dataset! I need to confirm with the dataset authors, or we can follow the zer-shot suggestion. What do you think? |
@bakrianoo , If it's not Apache2.0 for example then let's open a discussion in the repo and see if the authors would help with creating the dataset themselves. Plz feel free to do it and if no response then i can reach out directly to one or two of the authors... What do you think? |
Sure. I will start the discussion there. Thank you @alielfilali01 for your interesting. |
Hi! Feel free to tell us when this is updated! |
I think it's solved by this pr #338. If you want to use native arabic letter as options anchors use:
as formulation for the task cc @clefourrier |
@hynky1999 can you plz mention the exact name from the pr you just mentioned? Cuz the mmlu here is different than the other mmlus and couldn't find it in the list from pr #338. Also this pr is a stale now and feel free to close since I'm planning to merge this version of mmlu in an upcoming pr myb next week alongside other new arabic tasks |
Sorry just saw you mentioned "mmlu_ara_mcf" ma bad. |
So many commits are because of how we we were merging the prs hhhhh |
What tasks are you planning to add btw ? |
Some new benchmarks we got from some colleagues and partners and convinced them to make them public 😁 |
Hey @hynky1999, Sorry couldn't get back to you yesterday. Well i saw the arabic mmlu task and it seems good. I'm just not sure about the instruction if you can provide more details on that. Also i saw the In my upcoming pr i will be adding 3 arabic MMLU datasets including this one as well as part of the community suite then we can run them both and see if actually the implementation affects the evals (which shouldn't but wanna try it anyway 😅) For clarity here is the upcoming MMLU datasets :
PS : "mmlu_okapi_ar" is already part of the community suite |
Hey @clefourrier , I've spoke with @bakrianoo and plz feel free to close this PR. @bakrianoo maybe you can confirm here |
Since @alielfilali01 is working on including this in another PR, I am going to close it. |
@alielfilali01 Re instructions: Re other MMLUs:
Last note, do you think you could use the multilingual tempaltes ? |
@hynky1999 Actually I added the in house translated mmlu about a week before OpenAI release MMMLU and you can imagine how much effort it took to convince the team internally to release it 😅. Also i thought it's gonna be helpful to test how the translation quality impact models perf. And also just leave it to the community to decide which one they want to use ... Note : mmlu_mt is different than mmlu_okapi_ar. First was translated using translation engine while okapi using gpt-3.5 (i guess) |
Add Support for ArabicMMLU Evaluation Task
Overview
This PR introduces a new evaluation task for Arabic Language Models (LLMs) using the ArabicMMLU dataset, as detailed in the paper "ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic". The ArabicMMLU dataset provides a comprehensive benchmark for evaluating the performance of LLMs on a wide range of tasks in the Arabic language.
Related Work
Notes
This contribution aligns with the ongoing efforts to expand the capabilities of lightEval in supporting diverse languages and tasks. Feedback and suggestions are welcome!