-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove tasks #6999
Remove tasks #6999
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
@@ -146,7 +143,6 @@ class DatasetInfo: | |||
features: Optional[Features] = None | |||
post_processed: Optional[PostProcessedInfo] = None | |||
supervised_keys: Optional[SupervisedKeysData] = None | |||
task_templates: Optional[List[TaskTemplate]] = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can leave this (or ignore it in a post_init) since otherwise it will break ILSVRC/imagenet-1k and ylecun/mnist and many other datasets.
We could also keep the task classes to be imported and instantiated without errors but still remove all their methods like align_feature since they are unused
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we keep these deprecated classes and parameters for datasets 3.0...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively we could just hardcode a patch in dataset-viewer to keep supporting imagenet and mnist until they are converted to no-code datasets ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually breaking this will be in the right direction of making people stop using code datasets. I'm just concerned that the mnist repo will stop working but if ylecun doesn't merge the PR to convert the dataset to parquet we can still hardcode something for the viewer
(and imagenet we can merge by ourselves)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand your concerns, but at the same time I would push to not keep the deprecated tasks (since version 2.13.0, more than a year ago) in the new major version.
So I would propose, before making the next release:
- Identify all the datasets using tasks
- Open PRs to convert them to Parquet
- Wait 1/2 weeks for the owners to merge before forcing the merge ourselves for "maintenance" reasons
- Only then, make the release
I can work on that with a script.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, though no need to convert all the datasets - just the ones that are important
Show benchmarksPyArrow==8.0.0 Show updated benchmarks!Benchmark: benchmark_array_xd.json
Benchmark: benchmark_getitem_100B.json
Benchmark: benchmark_indices_mapping.json
Benchmark: benchmark_iterating.json
Benchmark: benchmark_map_filter.json
Show updated benchmarks!Benchmark: benchmark_array_xd.json
Benchmark: benchmark_getitem_100B.json
Benchmark: benchmark_indices_mapping.json
Benchmark: benchmark_iterating.json
Benchmark: benchmark_map_filter.json
|
Remove tasks, as part of the 3.0 release.