Ar support for MBZUAI-arabic-mmlu #209

bakrianoo · 2024-07-03T09:54:34Z

Add Support for ArabicMMLU Evaluation Task

Overview

This PR introduces a new evaluation task for Arabic Language Models (LLMs) using the ArabicMMLU dataset, as detailed in the paper "ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic". The ArabicMMLU dataset provides a comprehensive benchmark for evaluating the performance of LLMs on a wide range of tasks in the Arabic language.

Related Work

ArabicMMLU Dataset: Hugging Face
Paper: ArabicMMLU: Assessing Massive Multitask Language Understanding in Arabic

Notes

This contribution aligns with the ongoing efforts to expand the capabilities of lightEval in supporting diverse languages and tasks. Feedback and suggestions are welcome!

clefourrier · 2024-07-04T09:10:42Z

Hi, thanks for your PR!
FYI, we have a small backlog of PRs to go through so we might take about a week to address it

clefourrier · 2024-07-04T13:56:59Z

In the meantime, please make sure that the styling is correct :)

community_tasks/arabic_evals.py

clefourrier · 2024-07-09T13:38:16Z

@NathanHB Could be worth waiting for #214 (last PR of the above serie) before editing this one to fit the new format

clefourrier · 2024-07-17T14:00:10Z

Hi! I think once you update the PR to the new format for metrics, prompts and functions, we'll be good to go!
Also tagging @alielfilali01 since he was the author of the original file for arabic_evals (these are behind the arabic LLM leaderboard) to get his opinion too.

alielfilali01 · 2024-07-18T01:44:04Z

community_tasks/arabic_evals.py

+    prompt_function=mbzuai_arabic_mmlu,
+    suite=["community"],
+    hf_repo="MBZUAI/ArabicMMLU",
+    hf_subset="default",


default is the same as test subset

alielfilali01 · 2024-07-18T01:48:01Z

Tnx @clefourrier for the tag and thanks dear @bakrianoo for your valuable contribution 🤗

I have one remark to make related to the comment i left above :
hf_subset="default" will load the default subset which is also the test subset used for the eval and few shots !
Solution for me would be to drop eval subset and never use few shots in this benchmark ! OR Make a custom version of this dataset with test and val subsets in it !?

clefourrier · 2024-07-18T07:37:53Z

I agree with the comment - if you can setup your dataset to have different splits for few shots, it will avoid context contamination. You also need to run the linters to get code quality

bakrianoo · 2024-07-18T10:24:21Z

Thank you @alielfilali01 for your comment. I am wondering if creating a new dataset would violate any license issues for the original dataset!

I need to confirm with the dataset authors, or we can follow the zer-shot suggestion.

What do you think?

alielfilali01 · 2024-07-18T10:30:34Z

@bakrianoo , If it's not Apache2.0 for example then let's open a discussion in the repo and see if the authors would help with creating the dataset themselves. Plz feel free to do it and if no response then i can reach out directly to one or two of the authors... What do you think?

bakrianoo · 2024-07-18T10:47:50Z

Sure. I will start the discussion there. Thank you @alielfilali01 for your interesting.

clefourrier · 2024-08-08T07:41:53Z

Hi! Feel free to tell us when this is updated!

hynky1999 · 2024-10-02T15:24:11Z

I think it's solved by this pr #338.
See mmlu_ara_mcf

If you want to use native arabic letter as options anchors use:

formulation=MCFFormulation("NativeLetters")

as formulation for the task

cc @clefourrier

alielfilali01 · 2024-10-02T15:29:14Z

I think it's solved by this pr #338. See mmlu_ara_mcf

cc @clefourrier

@hynky1999 can you plz mention the exact name from the pr you just mentioned? Cuz the mmlu here is different than the other mmlus and couldn't find it in the list from pr #338. Also this pr is a stale now and feel free to close since I'm planning to merge this version of mmlu in an upcoming pr myb next week alongside other new arabic tasks

alielfilali01 · 2024-10-02T15:32:42Z

I think it's solved by this pr #338. See mmlu_ara_mcf
cc @clefourrier

@hynky1999 can you plz mention the exact name from the pr you just mentioned? Cuz the mmlu here is different than the other mmlus and couldn't find it in the list from pr #338. Also this pr is a stale now and feel free to close since I'm planning to merge this version of mmlu in an upcoming pr myb next week alongside other new arabic tasks

Sorry just saw you mentioned "mmlu_ara_mcf" ma bad.
I see so many commits in the pr 😅 i will need to open it from laptop. I get back to you on it tomorrow at max so i can plan to remove it from my pr if all is good.
Tnx man for taking care of it

hynky1999 · 2024-10-02T15:34:02Z

So many commits are because of how we we were merging the prs hhhhh

hynky1999 · 2024-10-02T15:34:17Z

What tasks are you planning to add btw ?

alielfilali01 · 2024-10-02T15:39:57Z

What tasks are you planning to add btw ?

Some new benchmarks we got from some colleagues and partners and convinced them to make them public 😁

alielfilali01 · 2024-10-04T08:37:17Z

Hey @hynky1999, Sorry couldn't get back to you yesterday. Well i saw the arabic mmlu task and it seems good. I'm just not sure about the instruction if you can provide more details on that. Also i saw the hf_repo is yazeed7/ArabicMMLU where the official release is MBZUAI/ArabicMMLU if you can change that plz.

In my upcoming pr i will be adding 3 arabic MMLU datasets including this one as well as part of the community suite then we can run them both and see if actually the implementation affects the evals (which shouldn't but wanna try it anyway 😅)

For clarity here is the upcoming MMLU datasets :

arabic_mmlu_mt : machine translated
arabic_mmlu_ht : our in house human translated
arabic_mmmlu : OpenAI's human translated
arabic_mmlu : The one we discuss here (MBZUAI/ArabicMMLU)

PS : "mmlu_okapi_ar" is already part of the community suite

alielfilali01 · 2024-10-04T08:38:35Z

Hey @clefourrier , I've spoke with @bakrianoo and plz feel free to close this PR. @bakrianoo maybe you can confirm here

bakrianoo · 2024-10-04T12:45:54Z

Since @alielfilali01 is working on including this in another PR, I am going to close it.
Thank you all for your support.

hynky1999 · 2024-10-04T12:53:26Z

@alielfilali01
Sure I will switch it. The reason why I used the other repo is that previously you were either missing dedicated few-shot split or it was annoying to access subsets separately (I don't remember what reason it was exactly). Now it looks good 👍

Re instructions:
The design of templates (which is what all the multilingual evals use in that file) is heavily based on OLMEs paper. Secondly since it's a bit hard to create a global instruction for all multilingual tasks I decided to not use instructions for any task. We run several ablations with this setting and we didn't notice that it would be a reason why the models can't solve the tasks.
As said in olmes paper the question / answer are sufficient for guiding the model what to do.
In theory we could have something like instruction registry and for each task create a generic instruction.

Re other MMLUs:

This PR should be adding openai mmlus Misc-multilingual tasks #339
arabic_mmlu_mt (is this okapi one ? or different?)
arabic_mmlu_ht (I don't see a point in adding the above if you have a in-house translation)

Last note, do you think you could use the multilingual tempaltes ?

alielfilali01 · 2024-10-05T07:26:03Z

@hynky1999 Actually I added the in house translated mmlu about a week before OpenAI release MMMLU and you can imagine how much effort it took to convince the team internally to release it 😅. Also i thought it's gonna be helpful to test how the translation quality impact models perf. And also just leave it to the community to decide which one they want to use ...

Note : mmlu_mt is different than mmlu_okapi_ar. First was translated using translation engine while okapi using gpt-3.5 (i guess)

bakrianoo added 6 commits July 3, 2024 08:04

Configure MBZUAI_ArabicMMLU Arabic Task

02486b3

push community|mbzuai_arabic_mmlu to OALL_tasks

5686f30

fix function name

7c989ad

change mbzuai_arabic_mmlu definition

311d170

fix few_shots_split for mbzuai_arabic_mmlu_task

0a3e338

update hf_subset for mbzuai_arabic_mmlu_task

86a662b

Merge branch 'main' into ar-support-ArabicMMLU

6d82260

NathanHB reviewed Jul 9, 2024

View reviewed changes

community_tasks/arabic_evals.py Outdated Show resolved Hide resolved

NathanHB reviewed Jul 9, 2024

View reviewed changes

community_tasks/arabic_evals.py Show resolved Hide resolved

bakrianoo and others added 2 commits July 11, 2024 12:23

make mbzuai_arabic_mmlu callable for the prompt_function arg

8904574

Merge branch 'main' into ar-support-ArabicMMLU

ccf25d4

alielfilali01 reviewed Jul 18, 2024

View reviewed changes

Merge branch 'main' into ar-support-ArabicMMLU

72b4904

Merge branch 'main' into ar-support-ArabicMMLU

8e7d7ff

bakrianoo closed this Oct 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ar support for MBZUAI-arabic-mmlu #209

Ar support for MBZUAI-arabic-mmlu #209

bakrianoo commented Jul 3, 2024

clefourrier commented Jul 4, 2024

clefourrier commented Jul 4, 2024

clefourrier commented Jul 9, 2024

clefourrier commented Jul 17, 2024

alielfilali01 Jul 18, 2024

alielfilali01 commented Jul 18, 2024

clefourrier commented Jul 18, 2024

bakrianoo commented Jul 18, 2024

alielfilali01 commented Jul 18, 2024

bakrianoo commented Jul 18, 2024

clefourrier commented Aug 8, 2024

hynky1999 commented Oct 2, 2024 •

edited

Loading

alielfilali01 commented Oct 2, 2024

alielfilali01 commented Oct 2, 2024

hynky1999 commented Oct 2, 2024

hynky1999 commented Oct 2, 2024

alielfilali01 commented Oct 2, 2024

alielfilali01 commented Oct 4, 2024

alielfilali01 commented Oct 4, 2024

bakrianoo commented Oct 4, 2024

hynky1999 commented Oct 4, 2024 •

edited

Loading

alielfilali01 commented Oct 5, 2024

Ar support for MBZUAI-arabic-mmlu #209

Ar support for MBZUAI-arabic-mmlu #209

Conversation

bakrianoo commented Jul 3, 2024

Add Support for ArabicMMLU Evaluation Task

Overview

Related Work

Notes

clefourrier commented Jul 4, 2024

clefourrier commented Jul 4, 2024

clefourrier commented Jul 9, 2024

clefourrier commented Jul 17, 2024

alielfilali01 Jul 18, 2024

Choose a reason for hiding this comment

alielfilali01 commented Jul 18, 2024

clefourrier commented Jul 18, 2024

bakrianoo commented Jul 18, 2024

alielfilali01 commented Jul 18, 2024

bakrianoo commented Jul 18, 2024

clefourrier commented Aug 8, 2024

hynky1999 commented Oct 2, 2024 • edited Loading

alielfilali01 commented Oct 2, 2024

alielfilali01 commented Oct 2, 2024

hynky1999 commented Oct 2, 2024

hynky1999 commented Oct 2, 2024

alielfilali01 commented Oct 2, 2024

alielfilali01 commented Oct 4, 2024

alielfilali01 commented Oct 4, 2024

bakrianoo commented Oct 4, 2024

hynky1999 commented Oct 4, 2024 • edited Loading

alielfilali01 commented Oct 5, 2024

hynky1999 commented Oct 2, 2024 •

edited

Loading

hynky1999 commented Oct 4, 2024 •

edited

Loading