Improve error messages #754

albertvillanova · 2023-02-01T14:48:56Z

Related to:

Improve the error messages in the dataset viewer #745
- Dataset Viewer issue for rfernand/basic_sentence_transforms #718

codecov-commenter · 2023-02-01T14:51:05Z

Codecov Report

Base: 91.72% // Head: 88.51% // Decreases project coverage by -3.21% ⚠️

Coverage data is based on head (c1b9909) compared to base (8128ec5).
Patch coverage: 20.00% of modified lines in pull request are covered.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #754      +/-   ##
==========================================
- Coverage   91.72%   88.51%   -3.21%     
==========================================
  Files          40       73      +33     
  Lines        2935     3101     +166     
==========================================
+ Hits         2692     2745      +53     
- Misses        243      356     +113

Flag	Coverage Δ
jobs_mongodb_migration	`77.29% <ø> (?)`
libs_libcommon	`92.81% <0.00%> (?)`
services_admin	`87.32% <50.00%> (?)`
services_api	`89.34% <0.00%> (?)`
services_worker	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
libs/libcommon/src/libcommon/dataset.py	`55.31% <0.00%> (ø)`
services/api/src/api/authentication.py	`100.00% <ø> (ø)`
services/api/src/api/routes/endpoint.py	`63.79% <0.00%> (ø)`
services/admin/src/admin/authentication.py	`91.42% <50.00%> (ø)`
...ices/worker/src/worker/job_runners/dataset_info.py
...rvices/worker/src/worker/job_runners/first_rows.py
services/worker/src/worker/job_runners/parquet.py
...src/worker/job_runners/parquet_and_dataset_info.py
services/worker/src/worker/job_runners/sizes.py
...c/worker/job_runners/split_names_from_streaming.py
... and 107 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

albertvillanova · 2023-02-03T10:26:06Z

workers/datasets_based/src/datasets_based/workers/first_rows.py

    else:
        features = info.features

    if features and len(features) > columns_max_number:
        raise TooManyColumnsError(
-            f"Too many columns. The maximum supported number of columns is {columns_max_number}."
+            f"The number of columns ({len(features)}) exceeds the maximum supported number of columns"
+            f" ({columns_max_number})."


Should we suggest to reduce the number of columns?

Yes, good idea!

Generally, it comes from a flaw in the design of the dataset (not "tidy data"), but anyway, people have the right to do what they want.

So, I think we just want to say it's a limitation of the dataset viewer, and that by reducing the number of columns, it would be working again

workers/datasets_based/src/datasets_based/workers/first_rows.py

HuggingFaceDocBuilderDev · 2023-02-03T13:39:47Z

The documentation is not available anymore as the PR was closed or merged.

severo

Very nice, thanks!

Just one comment (that I put everywhere to be sure no to forgot one instance): are we sure the dataset contains a loading script, when we ask the user to fix it?

If the dataset has no loading script, maybe it's better not mentioning it.

Also: there are two possible users:

the user is the maintainer of the dataset: they will be able to apply the advice
the user is seeing the page of a dataset they do not own: they should open a discussion, so that the maintainer can fix it (note that the viewer already directs to opening a discussion in that case).
Which one are we directing the messages to?

severo · 2023-02-10T12:34:07Z

chart/static-files/openapi.json

            "value": "SplitsResponseNotReadyError"
          },
          "SplitsNamesError": {
-            "summary": "Cannot get the split names for the dataset.",
+            "summary": "Cannot get the split names for the dataset. Please fix your loading script.",


btw, is it possible to get this error even if the dataset does not contain a loading script?

severo · 2023-02-10T12:34:46Z

chart/static-files/openapi.json

@@ -196,19 +196,19 @@
            "value": "FirstRowsResponseNotReady"
          },
          "InfoError": {
-            "summary": "The info cannot be fetched for the dataset config.",
+            "summary": "The info cannot be fetched for the config of the dataset. Please fix your loading script.",