Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added DataError for irrecoverable data interface initialization errors #2041

Merged
merged 6 commits into from
Oct 28, 2017

Conversation

philippjfr
Copy link
Member

We really need to do something about Interface data errors, particularly those that can provide clear and unambiguous exception message when the data is definitely incorrectly defined. In this PR I introduce a DataError, which prevents the Interface from trying to fall back to other data types and therefore allows raising a proper exception message. This approach is much simpler than what I outlined in #1986 and makes the process of raising specific exceptions quite simple.

Addresses:

#1762
#1867

@philippjfr
Copy link
Member Author

@jlstevens this is one PR that does need your review. If you agree this is a decent approach and can't think of any more errors to improve this is ready to merge.

@@ -162,6 +166,8 @@ def initialize(cls, eltype, data, kdims, vdims, datatype=None):
try:
(data, dims, extra_kws) = interface.init(eltype, data, kdims, vdims)
break
except DataError:
raise
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't something similar needed in the try/except where all the data interfaces are tried, one by one?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm confused, that's exactly where this is.

@jlstevens
Copy link
Contributor

jlstevens commented Oct 28, 2017

I agree this is important issue to address and adding a custom exception like this makes a lot of sense to me. Just some ideas about possible names:

  • DataError - what you used. Short but a little ambiguous.
  • UserDataError - to make it clear it was supplied to HoloViews?
  • MalformedDataError - bit too long!
  • InterfaceError - might be ok and less generic than DataError.

I think next to DataError, InterfaceError seems like the next possibility. One other idea is that you could automatically customize this exception to autogenerate part of the exception message e.g you could supply the interface class itself to 'InterfaceError' constructor and this could call a (class)method that would augment the supplied message with as much helpful information as possible e.g to make sure the library/datatype is mentioned, maybe point to the docs somewhere etc.

@philippjfr
Copy link
Member Author

InterfaceError seems significantly more generic than DataError to me since users don't necessarily know what an Interface even is. UserDataError seems fine, although the exception messages generally already make it clear that it is a user error.

One other idea is that you could automatically customize this exception to autogenerate part of the exception message e.g you could supply the interface class itself to 'InterfaceError' constructor and this could call a (class)method that would augment the supplied message with as much helpful information as possible e.g to make sure the library/datatype is mentioned, maybe point to the docs somewhere etc.

This doesn't make much sense to me, if you look where this is mostly used it's in cases where a particular interface identifies a specific error condition and can raise it directly. Adding extra indirection makes little sense when the original place where the error is raised already has all that information.

Might be good if @jbednar chimes in here as well.

@jlstevens
Copy link
Contributor

DataError is fine and the the idea about augmenting the exception message was only a suggestion to consider. Seems fine to me!

@philippjfr
Copy link
Member Author

Maybe I didn't consider your proposal correctly. I wouldn't mind if the DataError automatically adds a bit about the interfaces and adds a bit that points to the docs section that describes the allowed data types, e.g. to Gridded Data, Tabular Data or Geometry Data (once that user guide exists).

@philippjfr
Copy link
Member Author

In other words whatever raises the error provides a specific error message and the error then adds a more generic bit about data interfaces and points to the docs.

@philippjfr
Copy link
Member Author

philippjfr commented Oct 28, 2017

To make that a bit more concrete, it might look something like this:

Key dimension values and value array 'Fluorescence' shapes do not match. Expected shape (10, 11), does not match actual shape: (10, 10). GridInterface is a gridded data type, for more information about valid data formats see holoviews.org/user_guide/Gridded_Data.html

The second half would be auto-generated based on the Interface type.

@jlstevens
Copy link
Contributor

jlstevens commented Oct 28, 2017

Right. That is the sort of thing I was think of!

Edit: Minor point but I would format it so the core message is separate from the extra help. i.e

Key dimension values and value array 'Fluorescence' shapes do not match. Expected shape (10, 11), does not match actual shape: (10, 10).

GridInterface is a gridded data type, for more information about valid data formats see holoviews.org/user_guide/Gridded_Data.html

@philippjfr
Copy link
Member Author

philippjfr commented Oct 28, 2017

Okay here are some samples from what I've implemented now:

DataError: Key dimension values and value array Fluorescence shapes do not match. Expected shape (61, 111, 50), actual shape: (62, 111, 50)

GridInterface expects gridded data, for more information on supported datatypes see http://holoviews.org/user_guide/Gridded_Datasets.html

and:

DataError: pandas DataFrame column names used as dimensions must be strings not integers.

PandasInterface expects tabular data, for more information on supported datatypes see http://holoviews.org/user_guide/Tabular_Datasets.html

and:

DataError: xarray Dataset must define coordinates for all defined kdims, [Dimension('x'), Dimension('y'), Dimension('z')] coordinates not found.

XArrayInterface expects gridded data, for more information on supported datatypes see http://holoviews.org/user_guide/Gridded_Datasets.html

and:

DataError: If xarray DataArray does not define a name an explicit vdim must be supplied.

XArrayInterface expects gridded data, for more information on supported datatypes see http://holoviews.org/user_guide/Gridded_Datasets.html

and:

DataError: MultiInterface subpaths must all have matching vdims.

MultiInterface expects a list of tabular data, for more information on supported datatypes see http://holoviews.org/user_guide/Tabular_Datasets.html

I'll update the last one once there is a user guide for geometry (path, contour, polygon) data.

@philippjfr
Copy link
Member Author

@jlstevens If the tests pass this is ready to merge. I think this is going to be a huge improvement, but I imagine we'll be adding more specific errors over time. We should collect uninformative "None of the available storage backends were able to support the supplied data format." somewhere and aim to ensure that a user always gets a more specific error.

@jlstevens
Copy link
Contributor

I think this is going to be a huge improvement

Agreed!

... but I imagine we'll be adding more specific errors over time. We should collect uninformative "None of the available storage backends were able to support the supplied data format." somewhere and aim to ensure that a user always gets a more specific error.

Also a good idea.

If you are happy with this, go ahead and merge once the tests pass.

@philippjfr
Copy link
Member Author

Great, tests are now passing. I'll go ahead and merge.

@philippjfr philippjfr merged commit 2a0e807 into master Oct 28, 2017
@philippjfr philippjfr deleted the dataset_error branch October 28, 2017 16:22
jbednar pushed a commit that referenced this pull request Oct 31, 2017
#2041)

* Added DataError for irrecoverable data interface init errors

* Added additional interface info to DataError

* Fixed DataArray vdim error

* Added DataError for missing xarray coords

* Fixed error for GriddedInterface shape mismatch

* Updated MultiInterface error tests
Copy link

This pull request has been automatically locked since there has not been any recent activity after it was closed. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants