Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve error messages #754

Merged
merged 28 commits into from
Feb 24, 2023
Merged

Conversation

albertvillanova
Copy link
Member

@codecov-commenter
Copy link

codecov-commenter commented Feb 1, 2023

Codecov Report

Base: 91.72% // Head: 88.51% // Decreases project coverage by -3.21% ⚠️

Coverage data is based on head (c1b9909) compared to base (8128ec5).
Patch coverage: 20.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #754      +/-   ##
==========================================
- Coverage   91.72%   88.51%   -3.21%     
==========================================
  Files          40       73      +33     
  Lines        2935     3101     +166     
==========================================
+ Hits         2692     2745      +53     
- Misses        243      356     +113     
Flag Coverage Δ
jobs_mongodb_migration 77.29% <ø> (?)
libs_libcommon 92.81% <0.00%> (?)
services_admin 87.32% <50.00%> (?)
services_api 89.34% <0.00%> (?)
services_worker ?

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
libs/libcommon/src/libcommon/dataset.py 55.31% <0.00%> (ø)
services/api/src/api/authentication.py 100.00% <ø> (ø)
services/api/src/api/routes/endpoint.py 63.79% <0.00%> (ø)
services/admin/src/admin/authentication.py 91.42% <50.00%> (ø)
...ices/worker/src/worker/job_runners/dataset_info.py
...rvices/worker/src/worker/job_runners/first_rows.py
services/worker/src/worker/job_runners/parquet.py
...src/worker/job_runners/parquet_and_dataset_info.py
services/worker/src/worker/job_runners/sizes.py
...c/worker/job_runners/split_names_from_streaming.py
... and 107 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

else:
features = info.features

if features and len(features) > columns_max_number:
raise TooManyColumnsError(
f"Too many columns. The maximum supported number of columns is {columns_max_number}."
f"The number of columns ({len(features)}) exceeds the maximum supported number of columns"
f" ({columns_max_number})."
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we suggest to reduce the number of columns?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good idea!

Generally, it comes from a flaw in the design of the dataset (not "tidy data"), but anyway, people have the right to do what they want.

So, I think we just want to say it's a limitation of the dataset viewer, and that by reducing the number of columns, it would be working again

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Feb 3, 2023

The documentation is not available anymore as the PR was closed or merged.

@albertvillanova albertvillanova marked this pull request as ready for review February 7, 2023 16:29
Copy link
Collaborator

@severo severo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice, thanks!

Just one comment (that I put everywhere to be sure no to forgot one instance): are we sure the dataset contains a loading script, when we ask the user to fix it?

If the dataset has no loading script, maybe it's better not mentioning it.

Also: there are two possible users:

  • the user is the maintainer of the dataset: they will be able to apply the advice
  • the user is seeing the page of a dataset they do not own: they should open a discussion, so that the maintainer can fix it (note that the viewer already directs to opening a discussion in that case).
    Which one are we directing the messages to?

"value": "SplitsResponseNotReadyError"
},
"SplitsNamesError": {
"summary": "Cannot get the split names for the dataset.",
"summary": "Cannot get the split names for the dataset. Please fix your loading script.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw, is it possible to get this error even if the dataset does not contain a loading script?

@@ -196,19 +196,19 @@
"value": "FirstRowsResponseNotReady"
},
"InfoError": {
"summary": "The info cannot be fetched for the dataset config.",
"summary": "The info cannot be fetched for the config of the dataset. Please fix your loading script.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question: is this error only possible if a loading script exists?

"value": "InfoError"
},
"FeaturesError": {
"summary": "The split features (columns) cannot be extracted.",
"summary": "Cannot extract the features (columns) for the split of the config of the dataset. Please fix your loading script.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question: is this error only possible if a loading script exists?

"value": "FeaturesError"
},
"StreamingRowsError": {
"summary": "Cannot load the dataset split (in streaming mode) to extract the first rows.",
"summary": "Cannot load the dataset split (in streaming mode) to extract the first rows. Please make your loading script support streaming",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question: is this error only possible if a loading script exists?

"value": "StreamingRowsError"
},
"NormalRowsError": {
"summary": "Cannot load the dataset split (in normal download mode) to extract the first rows.",
"summary": "Cannot load the dataset split (in normal download mode) to extract the first rows. Please fix your loading script.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question: is this error only possible if a loading script exists?

except Exception as err:
raise ConfigNamesError("Cannot get the configuration names for the dataset.", cause=err) from err
raise ConfigNamesError(
"Cannot get the config names for the dataset. Please fix your loading script.", cause=err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question: is this error only possible if a loading script exists?

@@ -103,9 +103,12 @@ def compute_split_names_response(
for split in get_dataset_split_names(path=dataset, config_name=config, use_auth_token=use_auth_token)
]
except _EmptyDatasetError as err:
raise EmptyDatasetError("The dataset is empty.", cause=err) from err
raise EmptyDatasetError("The dataset is empty. Please fix your loading script.", cause=err) from err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question: is this error only possible if a loading script exists?

except Exception as err:
raise SplitNamesError("Cannot get the split names for the dataset and config.", cause=err) from err
raise SplitNamesError(
f"Cannot get the split names for the config '{config}' of the dataset. Please fix your loading script.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question: is this error only possible if a loading script exists?

@@ -118,9 +118,11 @@ def compute_splits_response(
try:
split_items = get_dataset_split_full_names(dataset=dataset, use_auth_token=use_auth_token)
except _EmptyDatasetError as err:
raise EmptyDatasetError("The dataset is empty.", cause=err) from err
raise EmptyDatasetError("The dataset is empty. Please fix your loading script.", cause=err) from err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question: is this error only possible if a loading script exists?

except Exception as err:
raise SplitsNamesError("Cannot get the split names for the dataset.", cause=err) from err
raise SplitsNamesError(
"Cannot get the split names for the dataset. Please fix your loading script.", cause=err
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same question: is this error only possible if a loading script exists?

@albertvillanova
Copy link
Member Author

albertvillanova commented Feb 21, 2023

@severo:

  • I am addressing the error messages to the creator/owner of the dataset: they are responsible for their dataset, it is them who are interested in having the viewer working on their dataset and it is them who must eventually merge the fixes (either implemented by themselves or by some other community member proposing a PR)
    • Note we have to choose one target user
  • You are right that in many cases the error can appear either when there is a script or when there are only data files
    • I am removing the mention to the script
    • As a follow-up work, I am investigating these cases to see if we could raise a specific error for each case from the datasets library

Copy link
Collaborator

@severo severo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@severo
Copy link
Collaborator

severo commented Feb 24, 2023

Feel free to merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants