-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge the workers that rely on the datasets library #656
Conversation
to prepare a generic worker
Codecov ReportBase: 85.56% // Head: 89.04% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #656 +/- ##
==========================================
+ Coverage 85.56% 89.04% +3.48%
==========================================
Files 54 10 -44
Lines 2597 639 -1958
==========================================
- Hits 2222 569 -1653
+ Misses 375 70 -305
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
bae06c6
to
50cb7b9
Compare
The docker image size for a worker based on the datasets library is large, mainly due to two dependencies: PyTorch and TensorFlow (about 4GB).
We create /workers/datasets_based for all the workers that depend on datasets and chose the processing step with the
DATASETS_BASED_ENDPOINT
env var.