Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve the error messages #404

Closed
severo opened this issue Jun 21, 2022 · 11 comments
Closed

Improve the error messages #404

severo opened this issue Jun 21, 2022 · 11 comments
Milestone

Comments

@severo
Copy link
Collaborator

severo commented Jun 21, 2022

In a lot of cases, when the dataset viewer has an error, the error message is not clear at all, or exposes internals of the project which are not important for the user, etc.

We should aim at providing information tailored for the Hub user:

  • the error comes from the repo: what can they do to fix the error?
  • the error comes from the server:
    • it's normal, just wait ... minutes before trying again
    • it's not normal, report here: ...
@severo
Copy link
Collaborator Author

severo commented Jun 21, 2022

#401 (comment)

I create an empty dataset: https://huggingface.co/datasets/severo/fix-401

Immediate feedback:

Status code:   400
Exception:     Status400Error
Message:       The dataset cache is empty.

174641694-0f91b220-95b1-40bc-bae9-766e63f266a4

Here: we should show "In progress"

@severo severo added this to the Public API milestone Jun 21, 2022
@sashavor
Copy link

Or having some updates ("processing train split") would be useful, maybe?
Also, is it possible to only show the dataset preview when the whole dataset is processed?

@severo
Copy link
Collaborator Author

severo commented Jun 21, 2022

Or having some updates ("processing train split") would be useful, maybe?

It would be better indeed.

Also, is it possible to only show the dataset preview when the whole dataset is processed?

Also a possibility, even if I think it's better to always give some hint to the user about what is occurring.

Maybe you have an opinion on that @gary149?

@severo
Copy link
Collaborator Author

severo commented Jun 29, 2022

Also: make it clearer when to open an issue on datasets and when to open a discussion:

huggingface/datasets#4592 (comment)

Maybe recommend always opening a discussion, wdyt @julien-c @huggingface/datasets @gary149 ?

@lhoestq
Copy link
Member

lhoestq commented Jun 29, 2022

Maybe recommend always opening a discussion, wdyt @julien-c https://github.com/orgs/huggingface/teams/datasets @gary149 ?

+1, and recommend to tag some of us (until there is a way to tag a whole team)

@albertvillanova
Copy link
Member

albertvillanova commented Jun 29, 2022

Yes, I also think it is better using the Hub discussions for Hub datasets.
But we need to be notified so that we don't miss the help request.
For the moment, in our docs we recommend pinging each and everyone of the datasets maintainers: https://huggingface.co/docs/datasets/master/en/share#ask-for-a-help-and-reviews

But personally, I would prefer a mention to the team: users have to remember/use all of our usernames; moreover, maintainers can change in the future (when people join/leave the team)...
This is a low priority feature enhancement though: https://huggingface.slack.com/archives/C03FF91U666/p1656076211192529

@severo
Copy link
Collaborator Author

severo commented Jun 29, 2022

OK. I think it can be managed using direct mentions for now, and improve it later when we will be able to tag teams.

Thus: the idea would be to separate errors with the dataset (4XX) in which case the message on moon-landing will be to open a discussion, from errors with the server (5XX) in which case the message will be to refresh or open an issue if it lasts too long (on datasets for now, on datasets-server later when it will be open sourced).

Another way to detect if we need to return 4XX or 5XX: the client should retry if 5XX (server error, response still being processed). On the contrary, if the error is 4XX (dataset error) the solution is to fix the dataset in some way.

@severo
Copy link
Collaborator Author

severo commented Jul 18, 2022

See also huggingface/datasets#4697 (comment)

(Sorry I'm a bit confused since it's quite hard to debug using the single error message from the data preview 😅 )

@severo
Copy link
Collaborator Author

severo commented Jul 18, 2022

See also huggingface/datasets#4680 (comment)

Thanks I found that this subset wasn't properly defined the the config, I fixed it. Now I can see the subsets but I get this error for the viewer
...

@severo
Copy link
Collaborator Author

severo commented Jul 20, 2022

Or having some updates ("processing train split") would be useful, maybe?

I worked on it in #442 / #443. I don't manage states, but simply ensure the client is aware that the response is still being processed (error code 500).
It's part of a small refactoring of the server, and it will be implemented on the Hub later.

@severo
Copy link
Collaborator Author

severo commented Jul 20, 2022

Here, we will just ensure:

  • the errors are correctly associated with HTTP status code 400 or 500.
  • the error messages are understandable

And the dataset viewer (on the Hub) will be in charge of suggesting to:

  • retry, or open an issue if the error lasts for too long, in case of 500 error (as is done currently: "Need help to make the dataset viewer work? Open an issue for direct support.")
  • open a discussion, if the error is 400

severo added a commit that referenced this issue Jul 20, 2022
@severo severo closed this as completed in 01521ec Jul 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants