Task config #289

hynky1999 · 2024-09-04T23:23:58Z

What does this implement/fix? Explain your changes.

This PR adds new adds following new args to task config

hf_revision (allows locking to certain commit)
hf_filter (allows filtering the datasets)

Set defaults as early as possible for lighteval task config
Removing of unused args from task class:

      self.save_queries: bool = False
      self.logfile_name: Optional[Path] = None
      self.is_main_process: bool = False

Improved typings
I noticed that following args are unused, maybe we should create issue to implement that functionality or remove it ?

    frozen: bool = False
    output_regex: Optional[str] = None

NathanHB · 2024-09-10T11:29:18Z

src/lighteval/utils/utils.py

+        dataset = dataset.filter(dataset_filter)
+
+    # It returns DatasetDict because we don't specify a split
+    return dataset  # type: ignore


why do we ignore the type here ?

Because it doesn't have correct type
load_dataset returns Dataset | DatasetDict | IterableDataset | IterableDatasetDict, based on the input args and afaik there is unspecified contract that if the we don't provide streaming and split arg we get DatasetDict. However there is no way to achieve this on typings level, so I just ignore this error.

If the question was why I put there the type: ignore it's because even tho we don't have a typechecker in the quality checks, I do have it on in my vscode (pyright) and it shows red when there is a typing problem.

src/lighteval/tasks/lighteval_task.py

NathanHB · 2024-09-10T11:37:47Z

Looks great !
the output_regex is legacy from the harness, we needed it for bigbench tasks but this is now handled by the metrics direclty. Frozen was to make sure the task did not change over time but we are now using task versioning for this.

hynky1999 added 7 commits September 4, 2024 15:13

add new params to config class

9f003ff

clean up task/config

9427818

connect datatasert revision and filter

10870a6

add tests for filtering/revision

c51a762

nit

ec20534

nit+1

a732e97

Merge branch 'main' into task_config

0919229

hynky1999 requested a review from NathanHB September 5, 2024 09:49

Merge branch 'main' into task_config

0d52a55

NathanHB reviewed Sep 10, 2024

View reviewed changes

src/lighteval/tasks/lighteval_task.py Outdated Show resolved Hide resolved

NathanHB approved these changes Sep 10, 2024

View reviewed changes

Hynek Kydlicek and others added 2 commits September 13, 2024 13:21

remove redudant check

f951097

Merge branch 'main' into task_config

1bd726a

hynky1999 mentioned this pull request Sep 13, 2024

[FT] Remove obsolete config properties (frozen, output_regex) #305

Closed

hynky1999 merged commit 919be47 into main Sep 13, 2024
2 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Task config #289

Task config #289

hynky1999 commented Sep 4, 2024

NathanHB Sep 10, 2024

hynky1999 Sep 13, 2024 •

edited

Loading

NathanHB commented Sep 10, 2024

Task config #289

Task config #289

Conversation

hynky1999 commented Sep 4, 2024

What does this implement/fix? Explain your changes.

NathanHB Sep 10, 2024

Choose a reason for hiding this comment

hynky1999 Sep 13, 2024 • edited Loading

Choose a reason for hiding this comment

NathanHB commented Sep 10, 2024

hynky1999 Sep 13, 2024 •

edited

Loading