Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding custom exception when cache insert fails because of too many columns #749

Merged

Conversation

AndreaFrancis
Copy link
Contributor

Adding custom exception to fix scenarios like #731

@AndreaFrancis AndreaFrancis marked this pull request as draft January 31, 2023 17:03
@AndreaFrancis AndreaFrancis changed the title Adding exception Adding custom exception when cache insert fails because of too many columns Jan 31, 2023
@codecov-commenter
Copy link

codecov-commenter commented Jan 31, 2023

Codecov Report

Base: 90.08% // Head: 92.00% // Increases project coverage by +1.91% 🎉

Coverage data is based on head (b89cc5d) compared to base (89a4510).
Patch coverage: 100.00% of modified lines in pull request are covered.

❗ Current head b89cc5d differs from pull request most recent head a89f65e. Consider uploading reports for the commit a89f65e to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #749      +/-   ##
==========================================
+ Coverage   90.08%   92.00%   +1.91%     
==========================================
  Files          99       33      -66     
  Lines        4943     2238    -2705     
==========================================
- Hits         4453     2059    -2394     
+ Misses        490      179     -311     
Flag Coverage Δ
jobs_mongodb_migration ?
libs_libcommon ?
services_admin ?
services_api ?
workers_datasets_based 92.00% <100.00%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...orkers/datasets_based/src/datasets_based/config.py 98.80% <100.00%> (+0.02%) ⬆️
...ets_based/src/datasets_based/workers/first_rows.py 82.62% <100.00%> (+0.40%) ⬆️
...rs/datasets_based/tests/workers/test_first_rows.py 97.46% <100.00%> (+0.09%) ⬆️
...orkers/datasets_based/src/datasets_based/worker.py 87.27% <0.00%> (-0.63%) ⬇️
workers/datasets_based/tests/test_worker.py 100.00% <0.00%> (ø)
services/admin/src/admin/routes/cache_reports.py
services/admin/src/admin/routes/cancel_jobs.py
services/admin/tests/conftest.py
.../mongodb_migration/src/mongodb_migration/config.py
... and 62 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@AndreaFrancis AndreaFrancis marked this pull request as ready for review January 31, 2023 18:17
@severo
Copy link
Collaborator

severo commented Jan 31, 2023

Thanks! That's not really what I had in mind. I thought we could instead add a new env var to set the maximum number of columns (features) supported in /first-rows, and check this early in the function (as soon as we have the features), and then raise if the number of features is above the limit.

Then, once deployed in prod, we refresh the /first-rows endpoint for the affected datasets -> they should now have this error, which is more explicit.

@AndreaFrancis
Copy link
Contributor Author

Got it, it will first validate if the amount of columns exceeded the limit but what if even having a less number of columns but the content without rows is also too long that exceeds the rows_max_bytes? I think another kind of error could be thrown for now or just leave it as it is as unexpectederror

@severo
Copy link
Collaborator

severo commented Feb 1, 2023

Yes, two cases are missing:

  • too many columns, compared to the new configuration parameter that we need to create -> TooManyColumnsError
  • truncated content is still too big compared to the max number of bytes of the response -> it's an internal error because it should be the result of an incoherent set of configuration parameters, ie: we could fix it by reducing the max number of column, or by increasing the min size of the response, or else. But you're right, it should be a new error, something like: TooBigTruncatedContent, something like that

@AndreaFrancis AndreaFrancis marked this pull request as draft February 1, 2023 16:10
@AndreaFrancis AndreaFrancis marked this pull request as ready for review February 1, 2023 19:20
Copy link
Collaborator

@severo severo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good!

I made a lot of small comments, as it's one of your first PR.

In particular, I tried to make the error messages a bit more explicit for the end users, since it will be shown in the dataset viewer.

chart/templates/worker/first-rows/_container.tpl Outdated Show resolved Hide resolved
workers/datasets_based/README.md Outdated Show resolved Hide resolved
workers/datasets_based/README.md Outdated Show resolved Hide resolved
workers/datasets_based/tests/workers/test_first_rows.py Outdated Show resolved Hide resolved
Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool ! Just one comment:

Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome thanks !

Copy link
Collaborator

@severo severo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just details. Feel free to merge once fixed. Thanks!!!

chart/env/dev.yaml Show resolved Hide resolved
chart/env/prod.yaml Show resolved Hide resolved
@AndreaFrancis AndreaFrancis merged commit 2994f29 into huggingface:main Feb 2, 2023
@AndreaFrancis AndreaFrancis deleted the too-many-columns-exception branch February 2, 2023 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants