-
Notifications
You must be signed in to change notification settings - Fork 799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: Make the schema validation error for non-existing params more informative #2568
ENH: Make the schema validation error for non-existing params more informative #2568
Conversation
1a84e1d
to
3e99a92
Compare
I rebased on master so that the checks are passing (the many force pushes are for black and flake8, sorry about the noise). I also fixed the formatting of the parameters in the error message. It required more code than expected to make an even table with names sorted alphabetically in columns, but I think that the output is much easier to read now. Instead of the unformated list of parameters:
The output is now a table of parameters:
I did some basic manual testing with all the encoding channels listed in https://altair-viz.github.io/user_guide/encoding.html#encoding-channels and it seems to work fine. Although I am not sure if there are edge cases, I hope that the fallback to the old method could guard again those, but let me know if you think of anything specific and I can try to add tests for that. |
Hi @joelostblom can you synchronise your branch, https://github.com/joelostblom/altair/tree/schema-validation-error, with the main repo and maybe add a code-snippet how to this can be tested easily using eg. colab. This will help the review process. Hopefully @jakevdp is still willing to do this. |
f1a0932
to
fdbb50c
Compare
@mattijn I rebased this on the latest main branch and also structured the code so that it is easier to review. Do you think you would be able to have a look at this? I think it could be quite useful to get feedback on in the RC. I have included the rationale and an example below: When a non-existing parameter name is used, I think it would be helpful to include the existing parameter names in the error message. Currently, when misspelling a parameter name like in the example below, it is not clear whether I made a typo or the parameter doesn't exist at all, so the immediate feedback of seeing the correct parameter names would be helpful. import altair as alt
import pandas as pd
source = pd.DataFrame({
'a': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
'b': [28, 55, 43, 91, 81, 53, 19, 87, 52]
})
alt.Chart(source).mark_bar().encode(
x=alt.X("a", scale=alt.Scale(padingOuter=0.5)),
y=alt.Y("b")
)
I also belive the error message text could be clarified a little. Especially for novices, I don't think it is immediately clear what it means that "Additional properties are not allowed". This PR updates the error message to instead read:
It is possible that we could even remove the two first lines and start with This PR also includes a fallback to the old method of displaying error messages, so that in case the assumption in the if-statement does not hold true, things will not break and the user will just see a less helpful error message. Here is an example of when that is happening: import altair as alt
from vega_datasets import data
source = data.cars()
alt.Chart(source).mark_boxplot(
ticks={'thickness': 2},
median={'stroke': 'black', 'strokeWidth': 2},
size=50
).encode(
x=alt.X('Miles_per_Gallon:Q', scale=alt.Scale(zero=False)),
y=alt.Y('Origin', scale=alt.Scale(zero=False)),
color=alt.Color('Origin'),
) |
This looks super helpful! Haven't reviewed the code but just a quick reminder that the changes need to be moved to |
My original approach sometimes returned the name of an existing parameter. This commit uses the same approach as the fallback but extracts just the parameter name from the message string.
51afcd7
to
9bd738b
Compare
Thanks @binste, updated! |
Just a note that it might make sense to merge this PR after #2842 and test it again. I'd expect that your changes work in even more cases as the correct |
tools/schemapi/schemapi.py
Outdated
""".format( | ||
schema_path, self.validator, message | ||
) | ||
if hasattr(vegalite, schema_path.split(".")[-1]): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@binste I noticed an issue here after #2842 was merged. Previously, error messages would include a reference to the class where the error was raised, as in this case with the Y channel:
altair.vegalite.v5.schema.channels.Y, validating 'additionalProperties'
After #2842 the first part of that error message changes to no longer point to the particular class where the error was encountered, but just the chart in general:
altair.vegalite.v5.api.Chart->0, validating 'additionalProperties'
In addition to being less informative in general, this becomes an issue here because this PR was relying on that first part of the error message to find all the correct parameters names that the Y channel accept. There might be ways to work around that in this PR, but I wanted to ask you first if you think it is possible to restore the more precise error message that points to the class where the error was encountered or if that information is lost with the new approach to raising errors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To add some more info, previously with deep validation, what I think was happening was that the iterative finding of the error meant that the class of the SchemaBase (I think) would change in each iteration to reflect which part of the spec was checked for errors. For example if I print out type(self)
of the self
that is passed to the SchemaValidationError
it would look like this with deep validation (for a spec with a typo in a param in the Y channel):
<class 'altair.vegalite.v5.api.Chart'>
<class 'altair.vegalite.v5.schema.channels.Y'>
Without deep validation, all the errors are found without iteration and therefor the self
param remains as the chart itself without changing to the Y channel where the error is occurring:
<class 'altair.vegalite.v5.api.Chart'>
I tried to look briefly if there was a way to not only return all the error messages, but also all the classes so that we could still find out that in this case it was the Y class that the error pertained to.
I looked briefly and it seems like some of this information is stored in the absolute_path
attribute of the self
object inside the SchemaValidationError
class, and it might be possible to reconstruct it from there, but I couldn't find the full class path anywhere and I wanted to hear your thoughts before going down that path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the detailed write-up! Indeed, I agree that it would be more informative if that original behaviour is restored and of course helpful for your proposed changes here, which look very helpful.
I think absolute_path
is the correct place to take this from. From that name, we can try to recreate the correct lass name. We can maybe use the usual conversion to a Python object name that is used in the propertysetter https://github.com/altair-viz/altair/blob/master/altair/utils/schemapi.py#L680 and also in the code generation https://github.com/altair-viz/altair/blob/master/tools/generate_schema_wrapper.py#L454:
classname = prop[0].upper() + prop[1:]
Places to modify would be in SchemaValidationError.__str__
instead of cls = self.obj.__class__
we could do something like
class_name_lower = self.absolute_path[-1]
# TODO: Check if this always works. Would work for `xOffset` -> `XOffset`
class_name = f"{class_name_lower}"[0].upper() + f"{class_name_lower}"[1:]
cls = getattr(vegalite, class_name)
cls
and before that in SchemaValidationError.__init__
:
self.absolute_path = err.absolute_path
I haven't tested this yet but most classes should be importable from the top-level like this right? For others we could still fall back on self.obj
(the self
that is passed into SchemaValidationError
).
If I find more time in the next days I can make a PR with this to restore the original behaviour of showing the class in which the error occured. I could then also review this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #2883
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I merged in your PR and this seems to be working as intended now. There are a few failing tests which needs the error message updated; I think I can get to those on Friday or Sunday this week. There seems to be some larger error also where my solution here still needs some change to work with the new multiple returned error messages.
…they work nicely with the cmopressed error message format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@binste I have updated this PR so that it works with the new error handling and also only shows the names of the existing parameters when an unknown parameter was specified and not for other types of errors. The tests you added previously was very helpful so thanks for that!
Redundant info removed
I have also removed some redundant info from the error message that I don't think is helpful for the user. As an example, take the following spec:
import altair as alt
import pandas as pd
source = pd.DataFrame({
'a': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I'],
'b': [28, 55, 43, 91, 81, 53, 19, 87, 52]
})
alt.Chart(source).mark_bar().encode(
x=alt.X("a", scale=alt.Scale(padingOuter=0.5)),
y=alt.Y("b")
)
It now outputs an error that get straight to the point of what is incorrect:
SchemaValidationError: `Scale` has no parameter named 'padingOuter'
Existing parameter names are:
align domain interpolate range round
base domainMax nice rangeMax scheme
bins domainMid padding rangeMin type
clamp domainMin paddingInner reverse zero
constant exponent paddingOuter
See the help for `Scale` to read the full description of these parameters
Previously it included "invalid specification" and a mention of attributes in the json schema, neither of which I think adds any info that makes it easier to troubleshoot this error, just more text to read:
SchemaValidationError: Invalid specification
altair.vegalite.v5.schema.core.Scale, validating 'additionalProperties'
altair.Scale has no parameter named 'padingOuter'
Existing parameter names are:
align domain interpolate range round
base domainMax nice rangeMax scheme
bins domainMid padding rangeMin type
clamp domainMin paddingInner reverse zero
constant exponent paddingOuter
See the help for altair.Scale to read the full description of these parameters
Summary line added
In addition to removing some redundant info, I have added a summary line when an invalid value is passed to a correct parameter name as can be seen in the following spec:
source = data.cars()
(
alt.Chart(source)
.mark_text(align="right")
.encode(alt.Text("Horsepower:N", title=dict(text="Horsepower", align="right")))
)
which now outputs this error:
SchemaValidationError: '{'text': 'Horsepower', 'align': 'right'}' is an invalid value for `title`:
{'text': 'Horsepower', 'align': 'right'} is not of type 'string'
{'text': 'Horsepower', 'align': 'right'} is not of type 'array'
instead of this more verbose, but less informative error:
SchemaValidationError: Invalid specification
altair.vegalite.v5.schema.core.TitleParams, validating 'type'
{'text': 'Horsepower', 'align': 'right'} is not of type 'string'
{'text': 'Horsepower', 'align': 'right'} is not of type 'array'
I find that this is also helpful when there are multiple errors spit out as in the example above and this example (I know you were also thinking about another solution here @binste so let me know if you prefer that instead):
alt.Chart(data.barley()).mark_bar().encode(
x=alt.X('variety'),
y=alt.Y('sum(yield)', stack='asdf'),
)
which now shows the error:
SchemaValidationError: 'asdf' is an invalid value for `stack`:
'asdf' is not one of ['zero', 'center', 'normalize']
'asdf' is not of type 'null'
'asdf' is not of type 'boolean'
instead of:
SchemaValidationError: Invalid specification
altair.vegalite.v5.api.Chart->0, validating 'enum'
'asdf' is not one of ['zero', 'center', 'normalize']
'asdf' is not of type 'null'
'asdf' is not of type 'boolean'
Invalid encoding channels
For specs that include an invalid encoding channel, there is no summary line but the error message is still shortened to remove redundant info:
selection = alt.selection_point()
(
alt.Chart(data.barley())
.mark_circle()
.add_params(selection)
.encode(
color=alt.condition(selection, alt.value("red"), alt.value("green")),
invalidChannel=None,
)
)
which now shows:
SchemaValidationError: Additional properties are not allowed ('invalidChannel' was unexpected)
instead of:
SchemaValidationError: Invalid specification
altair.vegalite.v5.schema.core.Encoding->encoding, validating 'additionalProperties'
Additional properties are not allowed ('invalidChannel' was unexpected)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like the proposed changes to the error messages! They are much more succinct and informative and the tables are awesome 🚀 Thank you also for the detailed comment, this made the review much easier for me. I only have some small suggestions.
…ams to trigger the parameter table Co-authored-by: Stefan Binder <binder_stefan@outlook.com>
5623808
to
b994959
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The changes in test_schemapi.py
need to be moved into the respective tools
script. This happens to me way too often! I opened #2906 with an idea on how we can test this.
Good catch, thank you! I have updated to added the changes to the tools script as well |
I think it is a really nice PR. Thanks @joelostblom. Really helpful, nice that it gives suggestions. It makes me want to make mistakes on purpose to read the suggestions.. I found one spec that gives an incorrect hint: import altair as alt
from vega_datasets import data
cars = data.cars.url
alt.Chart(cars).mark_point().encode(
x='Acceleration:Q',
y='Horsepower:Q',
color=alt.value(1) # should be eg. alt.value('red')
) SchemaValidationError: `Color` has no parameter named 'value'
Existing parameter names are:
shorthand bin legend timeUnit
aggregate condition scale title
bandPosition field sort type
See the help for `Color` to read the full description of these parameters It is allowed to use Also the context description is based on the jsonschema. So I'm still linking this comment #2842 (comment) here as well. alt.Chart(data.barley()).mark_bar().encode(
x=alt.X('variety'),
y=alt.Y('sum(yield)', stack='null'), # should be eg. stack=None
) SchemaValidationError: 'null' is an invalid value for `stack`:
'null' is not one of ['zero', 'center', 'normalize']
'null' is not of type 'null'
'null' is not of type 'boolean' I tried to follow the suggestion, but it was not directly clear that No further comments. Congrats again on this PR! |
Thank you @mattijn 😄 I will go ahead and merge this since you approved and the change that @binste suggested has also been implemented. We can follow up on the remaining two issues separately. I noticed what you mentioned regarding alt.datum/value and it was introduced before this PR so I opened #2913 to track it. I agree that it would be helpful to show the Python types as well and opened #2914 for that. |
@jakevdp What do you think of something like this as a solution to #2565?
I am still working on organizing the printed parameters in neatly formatted columns so that they are easier to read, but I thought I would check in with you whether you think this solution is robust enough. It has been working fine in my limited testing and I kept the old error message as a fallback to be safe.
close #2565