Skip to content

Commit

Permalink
AraDICE task config file (#2507)
Browse files Browse the repository at this point in the history
* added aradice

* Added ArabicMMLU Lev Configs

* added ArabicMMLU egy configs

* Added boolq configs

* Added cultural bench configs

* added openbookqa configs

* Added PiQA configs

* added winogrande configs

* Added truthfulQA configs

* Added aradice group config

* Remove deleted files from repository

* modified arabimmlu configs

* modified metadata versions

* fixed formatting using ruff

* added aradice tasks information

* pre-commit

* Uptaded openbookqa utils

* fixed formatting on obqa

---------

Co-authored-by: Basel Mousi <bmousi@hbku.edu.qa>
Co-authored-by: Baber <baber@hey.com>
  • Loading branch information
3 people authored Dec 24, 2024
1 parent b86aa21 commit 932e8f9
Show file tree
Hide file tree
Showing 133 changed files with 2,205 additions and 0 deletions.
1 change: 1 addition & 0 deletions lm_eval/tasks/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
| [arabic_leaderboard_complete](arabic_leaderboard_complete/README.md) | A full version of the tasks in the Open Arabic LLM Leaderboard, focusing on the evaluation of models that reflect the characteristics of Arabic language understanding and comprehension, culture, and heritage. Note that some of these tasks are machine-translated. | Arabic (Some MT) |
| [arabic_leaderboard_light](arabic_leaderboard_light/README.md) | A light version of the tasks in the Open Arabic LLM Leaderboard (i.e., 10% samples of the test set in the original benchmarks), focusing on the evaluation of models that reflect the characteristics of Arabic language understanding and comprehension, culture, and heritage. Note that some of these tasks are machine-translated. | Arabic (Some MT) |
| [arabicmmlu](arabicmmlu/README.md) | Localized Arabic version of MMLU with multiple-choice questions from 40 subjects. | Arabic |
| [AraDICE](aradice/README.md) | A collection of multiple tasks carefully designed to evaluate dialectal and cultural capabilities in large language models (LLMs). | Arabic |
| [arc](arc/README.md) | Tasks involving complex reasoning over a diverse set of questions. | English |
| [arithmetic](arithmetic/README.md) | Tasks involving numerical computations and arithmetic reasoning. | English |
| [asdiv](asdiv/README.md) | Tasks involving arithmetic and mathematical reasoning challenges. | English |
Expand Down
12 changes: 12 additions & 0 deletions lm_eval/tasks/aradice/ArabicMMLU/EGY/AraDiCE_ArabicMMLU.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
group: AraDiCE_ArabicMMLU_egy
task:
- AraDiCE_ArabicMMLU_humanities_egy
- AraDiCE_ArabicMMLU_language_egy
- AraDiCE_ArabicMMLU_social-science_egy
- AraDiCE_ArabicMMLU_stem_egy
- AraDiCE_ArabicMMLU_other_egy
aggregate_metric_list:
- metric: acc
weight_by_size: True
- metric: acc_norm
weight_by_size: True
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "high_humanities_history"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_humanities_egy"
"task": "AraDiCE_ArabicMMLU_high_humanities_history_egy"
"task_alias": "high humanities history"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "high_humanities_islamic-studies"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_humanities_egy"
"task": "AraDiCE_ArabicMMLU_high_humanities_islamic-studies_egy"
"task_alias": "high humanities islamic-studies"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "high_humanities_philosophy"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_humanities_egy"
"task": "AraDiCE_ArabicMMLU_high_humanities_philosophy_egy"
"task_alias": "high humanities philosophy"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "high_language_arabic-language"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_language_egy"
"task": "AraDiCE_ArabicMMLU_high_language_arabic-language_egy"
"task_alias": "high language arabic-language"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "high_social-science_civics"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_social-science_egy"
"task": "AraDiCE_ArabicMMLU_high_social-science_civics_egy"
"task_alias": "high social-science civics"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "high_social-science_economics"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_social-science_egy"
"task": "AraDiCE_ArabicMMLU_high_social-science_economics_egy"
"task_alias": "high social-science economics"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "high_social-science_geography"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_social-science_egy"
"task": "AraDiCE_ArabicMMLU_high_social-science_geography_egy"
"task_alias": "high social-science geography"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "high_stem_biology"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_stem_egy"
"task": "AraDiCE_ArabicMMLU_high_stem_biology_egy"
"task_alias": "high stem biology"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "high_stem_computer-science"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_stem_egy"
"task": "AraDiCE_ArabicMMLU_high_stem_computer-science_egy"
"task_alias": "high stem computer-science"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "high_stem_physics"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_stem_egy"
"task": "AraDiCE_ArabicMMLU_high_stem_physics_egy"
"task_alias": "high stem physics"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "middle_humanities_history"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_humanities_egy"
"task": "AraDiCE_ArabicMMLU_middle_humanities_history_egy"
"task_alias": "middle humanities history"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "middle_humanities_islamic-studies"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_humanities_egy"
"task": "AraDiCE_ArabicMMLU_middle_humanities_islamic-studies_egy"
"task_alias": "middle humanities islamic-studies"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "middle_language_arabic-language"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_language_egy"
"task": "AraDiCE_ArabicMMLU_middle_language_arabic-language_egy"
"task_alias": "middle language arabic-language"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "middle_other_general-knowledge"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_other_egy"
"task": "AraDiCE_ArabicMMLU_middle_other_general-knowledge_egy"
"task_alias": "middle other general-knowledge"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "middle_social-science_civics"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_social-science_egy"
"task": "AraDiCE_ArabicMMLU_middle_social-science_civics_egy"
"task_alias": "middle social-science civics"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "middle_social-science_economics"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_social-science_egy"
"task": "AraDiCE_ArabicMMLU_middle_social-science_economics_egy"
"task_alias": "middle social-science economics"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "middle_social-science_geography"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_social-science_egy"
"task": "AraDiCE_ArabicMMLU_middle_social-science_geography_egy"
"task_alias": "middle social-science geography"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "middle_social-science_social-science"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_social-science_egy"
"task": "AraDiCE_ArabicMMLU_middle_social-science_social-science_egy"
"task_alias": "middle social-science social-science"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "middle_stem_computer-science"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_stem_egy"
"task": "AraDiCE_ArabicMMLU_middle_stem_computer-science_egy"
"task_alias": "middle stem computer-science"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "middle_stem_natural-science"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_stem_egy"
"task": "AraDiCE_ArabicMMLU_middle_stem_natural-science_egy"
"task_alias": "middle stem natural-science"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "na_humanities_islamic-studies"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_humanities_egy"
"task": "AraDiCE_ArabicMMLU_na_humanities_islamic-studies_egy"
"task_alias": "na humanities islamic-studies"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "na_language_arabic-language-general"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_language_egy"
"task": "AraDiCE_ArabicMMLU_na_language_arabic-language-general_egy"
"task_alias": "na language arabic-language-general"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "na_language_arabic-language-grammar"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_language_egy"
"task": "AraDiCE_ArabicMMLU_na_language_arabic-language-grammar_egy"
"task_alias": "na language arabic-language-grammar"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "na_other_driving-test"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_other_egy"
"task": "AraDiCE_ArabicMMLU_na_other_driving-test_egy"
"task_alias": "na other driving-test"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "na_other_general-knowledge"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_other_egy"
"task": "AraDiCE_ArabicMMLU_na_other_general-knowledge_egy"
"task_alias": "na other general-knowledge"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "primary_humanities_history"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_humanities_egy"
"task": "AraDiCE_ArabicMMLU_primary_humanities_history_egy"
"task_alias": "primary humanities history"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "primary_humanities_islamic-studies"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_humanities_egy"
"task": "AraDiCE_ArabicMMLU_primary_humanities_islamic-studies_egy"
"task_alias": "primary humanities islamic-studies"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "primary_language_arabic-language"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_language_egy"
"task": "AraDiCE_ArabicMMLU_primary_language_arabic-language_egy"
"task_alias": "primary language arabic-language"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "primary_other_general-knowledge"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_other_egy"
"task": "AraDiCE_ArabicMMLU_primary_other_general-knowledge_egy"
"task_alias": "primary other general-knowledge"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "primary_social-science_geography"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_social-science_egy"
"task": "AraDiCE_ArabicMMLU_primary_social-science_geography_egy"
"task_alias": "primary social-science geography"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "primary_social-science_social-science"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_social-science_egy"
"task": "AraDiCE_ArabicMMLU_primary_social-science_social-science_egy"
"task_alias": "primary social-science social-science"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "primary_stem_computer-science"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_stem_egy"
"task": "AraDiCE_ArabicMMLU_primary_stem_computer-science_egy"
"task_alias": "primary stem computer-science"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "primary_stem_math"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_stem_egy"
"task": "AraDiCE_ArabicMMLU_primary_stem_math_egy"
"task_alias": "primary stem math"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
"dataset_name": "primary_stem_natural-science"
"description": ""
"fewshot_split": !!null "null"
"include": "_default_template_yaml"
"tag": "AraDiCE_ArabicMMLU_stem_egy"
"task": "AraDiCE_ArabicMMLU_primary_stem_natural-science_egy"
"task_alias": "primary stem natural-science"
"test_split": "test"
"training_split": !!null "null"
"validation_split": !!null "null"
Loading

0 comments on commit 932e8f9

Please sign in to comment.