-
Notifications
You must be signed in to change notification settings - Fork 81
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update datasets dependency to 2.13.0 version #1372
Conversation
Codecov ReportPatch coverage has no change and project coverage change:
Additional details and impacted files@@ Coverage Diff @@
## main #1372 +/- ##
==========================================
- Coverage 90.02% 89.30% -0.72%
==========================================
Files 184 114 -70
Lines 10812 6491 -4321
==========================================
- Hits 9733 5797 -3936
+ Misses 1079 694 -385
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Once deployed, I'll query the database to get the list of datasets that should be refreshed (affected by one of the two bugs fixed by 2.13.0) and launch a refresh for them. |
For huggingface/datasets#5938:
|
For
I'm not sure how we could query them? @lhoestq ? |
This query seems to include many datasets made of parquet files but without dataset_info: {"kind": "config-info", "content.error": "Dict key must be str"} This should include the parquet image datasets like https://huggingface.co/datasets/philippemo/dummy_dataset_without_schema_12_06 |
^ we might need to fix the related bug though
|
Hmmm, indeed, this dataset shows {
"error": "Dict key must be str",
"cause_exception": "TypeError",
"cause_message": "Dict key must be str",
"cause_traceback": [
"Traceback (most recent call last):\n",
" File \"/src/services/worker/src/worker/job_manager.py\", line 167, in process\n if len(orjson_dumps(content)) > self.worker_config.content_max_bytes:\n",
" File \"/src/libs/libcommon/src/libcommon/utils.py\", line 79, in orjson_dumps\n return orjson.dumps(content, option=orjson.OPT_UTC_Z, default=orjson_default)\n",
"TypeError: Dict key must be str\n"
]
} I will use this to get a list of datasets. But it looks like an additional bug we should take care of! Opening an issue |
|
Let's cross fingers and hope that huggingface/datasets#5938 really fixes the "Stale file handle" error... 😅 |
Yes @albertvillanova, they have all been fixed! 👏 eg https://huggingface.co/datasets/rjac/DepressionDetection |
For the other bug... hmmm, not sure, we still have the error in all the datasets of the list. |
cc @lhoestq |
After 2.13.0 datasets release, update dependencies on it.
Note that I have also removed the explicit dependency on
datasets
fromservices/api
,This is analogous to what was previously done on
services/worker
.Fix #1370.