Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split first rows from parquet new Job Runner #988

Merged
merged 6 commits into from
Mar 31, 2023

Conversation

AndreaFrancis
Copy link
Contributor

@AndreaFrancis AndreaFrancis commented Mar 24, 2023

Final part of #755
Based on #875 for parquet reading logic

@codecov-commenter
Copy link

codecov-commenter commented Mar 24, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: -1.77 ⚠️

Comparison is base (4788650) 89.58% compared to head (0e87d4f) 87.81%.

❗ Current head 0e87d4f differs from pull request most recent head b2f428b. Consider uploading reports for the commit b2f428b to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #988      +/-   ##
==========================================
- Coverage   89.58%   87.81%   -1.77%     
==========================================
  Files         147       94      -53     
  Lines        7854     4121    -3733     
==========================================
- Hits         7036     3619    -3417     
+ Misses        818      502     -316     
Flag Coverage Δ
jobs_cache_refresh 98.50% <ø> (ø)
jobs_mongodb_migration 80.57% <ø> (ø)
libs_libcommon 93.57% <100.00%> (+0.03%) ⬆️
services_admin 87.32% <ø> (ø)
services_api 84.70% <100.00%> (ø)
services_worker ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
libs/libcommon/src/libcommon/config.py 78.33% <ø> (ø)
services/api/src/api/config.py 100.00% <ø> (ø)
libs/libcommon/src/libcommon/constants.py 100.00% <100.00%> (ø)
libs/libcommon/tests/test_processing_steps.py 100.00% <100.00%> (ø)
services/api/tests/routes/test_endpoint.py 100.00% <100.00%> (ø)

... and 53 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@HuggingFaceDocBuilder
Copy link
Collaborator

HuggingFaceDocBuilder commented Mar 24, 2023

The documentation is not available anymore as the PR was closed or merged.

@AndreaFrancis AndreaFrancis force-pushed the split-first-rows-from-parquet branch 2 times, most recently from 208598f to 3cb300d Compare March 27, 2023 22:11
@AndreaFrancis AndreaFrancis marked this pull request as ready for review March 27, 2023 22:12
@AndreaFrancis AndreaFrancis changed the title WIP - Split first rows from parquet new Job Runner Split first rows from parquet new Job Runner Mar 27, 2023
services/worker/pyproject.toml Outdated Show resolved Hide resolved
services/worker/src/worker/features.py Show resolved Hide resolved
services/worker/src/worker/asset.py Outdated Show resolved Hide resolved
services/worker/src/worker/utils.py Show resolved Hide resolved
@AndreaFrancis
Copy link
Contributor Author

Not sure why pip audit fails with worker project, it is now different from other folders that use pymongo and mongoengine

ERROR:pip_audit._virtual_env:internal pip failure: ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==. These do not:
    pymongo<5.0,>=3.4 from https://files.pythonhosted.org/packages/38/68/928d7ce22719cfa255fb973b34aed6f04ac3ea89049ce69e3b092c30a60f/pymongo-4.3.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (from mongoengine==0.24.2->-r /tmp/tmpmnbps2mm (line 1032))

@severo
Copy link
Collaborator

severo commented Mar 28, 2023

Not sure why pip audit fails with worker project, it is now different from other folders that use pymongo and mongoengine

ERROR:pip_audit._virtual_env:internal pip failure: ERROR: In --require-hashes mode, all requirements must have their versions pinned with ==. These do not:
    pymongo<5.0,>=3.4 from https://files.pythonhosted.org/packages/38/68/928d7ce22719cfa255fb973b34aed6f04ac3ea89049ce69e3b092c30a60f/pymongo-4.3.3-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (from mongoengine==0.24.2->-r /tmp/tmpmnbps2mm (line 1032))

In 12289d8, you updated all the dependencies of the project, including some that were not related to the matter of this PR. It's better to do it in a separate PR, where we update the dependencies and ensure nothing has broken.

In this case, you updated the dev dependency pip-audit to https://github.com/pypa/pip-audit/releases/tag/v2.5.3, which introduced the issue (I tried with 2.5.2, and it works).

So, my recommendation for this PR would be to:

and I think that your poetry.lock should be good (in particular, you should be using pip-audit v2.4.14)

And then, in another PR, we could upgrade the dependencies, one by one, and checking that we don't break anything.

@AndreaFrancis AndreaFrancis force-pushed the split-first-rows-from-parquet branch from 0e87d4f to b2f428b Compare March 29, 2023 22:04
@AndreaFrancis AndreaFrancis requested a review from severo March 29, 2023 22:21
@AndreaFrancis
Copy link
Contributor Author

And then, in another PR, we could upgrade the dependencies, one by one, and checking that we don't break anything.

I introduced dependencies for this PR on #996

Copy link
Collaborator

@severo severo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice. This, along with #995, should help reduce a lot the load on the workers!

@AndreaFrancis AndreaFrancis merged commit 3f78bd3 into main Mar 31, 2023
@AndreaFrancis AndreaFrancis deleted the split-first-rows-from-parquet branch March 31, 2023 13:57
@AndreaFrancis
Copy link
Contributor Author

I will introduce same logic as #995 for first rows job runners

@severo
Copy link
Collaborator

severo commented Mar 31, 2023

You're right, #995 was for split-names. Let's do it for first-rows now!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants