Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updating dependencies in worker to support first rows from parquet #996

Merged
merged 3 commits into from
Mar 29, 2023

Conversation

AndreaFrancis
Copy link
Contributor

@AndreaFrancis AndreaFrancis commented Mar 29, 2023

Preparation for #988 implementation
As suggested in #988 (comment)
I followed these steps:

  • From main branch, ran make install
  • poetry remove apache-beam
  • Updated the dependencies in pyproject.toml for datasets, hffs, pyarrow and numpy
  • Ran manually poetry update for all the packages: datasets, hffs, pyarrow and numpy

@HuggingFaceDocBuilder
Copy link
Collaborator

HuggingFaceDocBuilder commented Mar 29, 2023

The documentation is not available anymore as the PR was closed or merged.

@AndreaFrancis
Copy link
Contributor Author

Weird but some tests for parquet-and-dataset-info job runner are failing, it looks like it is caused by pyarrow 11. I will investigate locally.

CSV_PARQUET_SIZE = 1_865
AUDIO_PARQUET_SIZE = 1_383
CSV_PARQUET_SIZE = 1_866
AUDIO_PARQUET_SIZE = 1_384
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Increasing size to have test running successful. Looks like pyarrow 11 increased the size of computed parquet file.

@AndreaFrancis AndreaFrancis marked this pull request as ready for review March 29, 2023 15:54
@AndreaFrancis AndreaFrancis requested a review from severo March 29, 2023 15:54
Copy link
Collaborator

@severo severo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, thanks!

@@ -328,7 +328,7 @@ def create_dataset_info_response_for_csv(dataset: str, config: str) -> Any:
}


def create_dataset_info_response_for_audio(dataset: str, config: str) -> Any:
def create_dataset_info_response_for_audio() -> Any:
Copy link
Contributor Author

@AndreaFrancis AndreaFrancis Mar 29, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did this little refactor to make git actions alive again on this PR, nothing related to the logic.

@codecov-commenter
Copy link

Codecov Report

Patch coverage: 100.00% and project coverage change: +1.96 🎉

Comparison is base (4788650) 89.58% compared to head (bbca6db) 91.55%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #996      +/-   ##
==========================================
+ Coverage   89.58%   91.55%   +1.96%     
==========================================
  Files         147       53      -94     
  Lines        7854     3740    -4114     
==========================================
- Hits         7036     3424    -3612     
+ Misses        818      316     -502     
Flag Coverage Δ
jobs_cache_refresh ?
jobs_mongodb_migration ?
libs_libcommon ?
services_admin ?
services_api ?
services_worker 91.55% <100.00%> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...tests/job_runners/test_parquet_and_dataset_info.py 100.00% <ø> (ø)
...src/worker/job_runners/parquet_and_dataset_info.py 85.01% <100.00%> (ø)
services/worker/tests/fixtures/hub.py 98.23% <100.00%> (ø)

... and 94 files with indirect coverage changes

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

@AndreaFrancis AndreaFrancis merged commit 7b01da3 into main Mar 29, 2023
@AndreaFrancis AndreaFrancis deleted the update-dependencies branch March 29, 2023 22:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants