Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorrect error message when an unknown param value is used #2744

Closed
joelostblom opened this issue Dec 5, 2022 · 9 comments · Fixed by #2842
Closed

Incorrect error message when an unknown param value is used #2744

joelostblom opened this issue Dec 5, 2022 · 9 comments · Fixed by #2842
Labels

Comments

@joelostblom
Copy link
Contributor

joelostblom commented Dec 5, 2022

I just noted that in a spec that contains a condition, using an unknown parameter value anywhere in the spec will always return an error pointing to the condition even when it has nothing to do with actual error. For example, the following spec:

import altair as alt
from vega_datasets import data

source = data.barley()

alt.Chart(source).mark_bar().encode(
    x='variety',
    y=alt.Y('sum(yield)', stack='asdf'),
    opacity=alt.condition('datum.yield > 0', alt.value(1), alt.value(0.2))
)

returns:

SchemaValidationError: Invalid specification

        altair.vegalite.v4.schema.channels.OpacityValue, validating 'additionalProperties'

        Additional properties are not allowed ('test' was unexpected)

If we change to opacity=alt.value(0.6) we get the expected output:

SchemaValidationError: Invalid specification

        altair.vegalite.v4.schema.channels.Y->0, validating 'enum'

        '' is not one of ['zero', 'center', 'normalize']

Note that there is nothing wrong the opacity condition here and removing the incorrect stack value leads to zero errors. Looking closer at the error raising sequence, it seems like the stack error is always raised first, but if there is a conditional in the spec there will also be a second error raised. Not sure how raising one error leads to the other or why the sequence doesn't stop at the first raised error.

It seems like {'test': 'datum.yield > 0', 'value': 1} is evaluating against a schema that only contains the condition key, so it should have been {'condition': {'test': 'datum.yield > 0', 'value': 1}} instead and then {'test': 'datum.yield > 0', 'value': 1} in the next validation.

@joelostblom joelostblom added the bug label Dec 5, 2022
@joelostblom
Copy link
Contributor Author

This is not unique to condition, it also happens for concat charts. In the example below the error is with the title data type, but this gets swallowed and leads to another error being raised:

import altair as alt
from vega_datasets import data

source = data.cars()

points = alt.Chart(source).mark_point().encode(
    x='Horsepower',
    y='Miles_per_Gallon',
)

text = alt.Chart(source).mark_text(align='right').encode(
    alt.Text('Horsepower:N', title=dict(text='Horsepower', align='right'))
)

points | text

The real error is the first one printed below, but the one that actually gets raised is the last one which is really confusing

image

If we do faceting instead of concating, the real error is raised correctly, but note that Altair is still taking three rounds of looping through the validation and finding error messages for some reason (I am not sure why it doesn't stop when it encounters the first error):

image

@joelostblom joelostblom changed the title Incorrect error message when an unknown param value is used in a spec that contains a condition Incorrect error message when an unknown param value is used Jan 6, 2023
@joelostblom
Copy link
Contributor Author

joelostblom commented Jan 11, 2023

Making some more incremental progress here, it seems like concatenated specs with data and any spec with a condition are just not working with deep validation in general even when the spec is perfectly valid (not sure if this is a regression or they just never worked).

Spec with a condition:

alt.Chart().mark_bar().encode(
    opacity=alt.condition('', alt.value(1), alt.value(0.2))
).to_dict(validate='deep')  # Works without `validate='deep'`
ValidationError: Additional properties are not allowed ('test' was unexpected)

Any concat or layer spec with data or a data url:

# Works without `validate='deep'` or with empty data
alt.hconcat(alt.Chart('some_data').mark_point()).to_dict(validate='deep')
ValidationError: 'data' is a required property

@joelostblom
Copy link
Contributor Author

It seems like for concatenated specs, the issue is that the rootschema only allows marks to be defined together with a data key in the same dictionary as in a regular single chart:

{'data': {'url': 'some_data_or_url'}, 'mark': 'point'}

However, for a layered or concatenated chart, the specification looks slightly different:

{'layer': [{'mark': 'point'}], 'data': {'url': 'some_data_or_url'}}

This results in the the part that says {'mark': 'point'} being evaluated on it's own without the data part and the error 'data' is a required property is raised.

@ChristopherDavisUCI
Copy link
Contributor

Hi @joelostblom, as an experiment, do you like the error messages better if you comment out this line:
https://github.com/altair-viz/altair/blob/af756e9a9549b2bf0f88685d2f93e6224665bf18/altair/vegalite/v5/api.py#L568

(That doesn't explain what's wrong with validate="deep", but at first glance the error messages seem nicer this way. Very possible the error messages are nicer in a few examples and worse in many others.)

@joelostblom
Copy link
Contributor Author

I tried that and I agree that it is better for the scenarios that currently don't work with 'deep'. However, ideally we would make everything work with 'deep' because it does make the messages more informative, e.g. this spec:

alt.Chart(data.barley()).mark_bar().encode(
    x='variety',
    y=alt.Y('sum(yield)', stack='asdf'),
)

gives the following with 'deep':

'asdf' is not one of ['zero', 'center', 'normalize']

But if we remove it, the error message would just say:

{'aggregate': 'sum', 'field': 'yield', 'stack': 'asdf', 'type': 'quantitative'} is not valid under any of the given schemas

I will look a bit more at this now

@ChristopherDavisUCI
Copy link
Contributor

Hi @joelostblom, I played around a little more and my best guess is that the first two examples at the top of this issue are related to two separate issues. This comment is only about the first example.

If you define a Chart c using the following code (which is valid as far as I can tell)

import altair as alt

cond = alt.condition('true', alt.value(100), alt.value(10))

c = alt.Chart().mark_circle().encode(
    size=cond
)

and then compute

c.encoding.size

the result is

SizeValue({
  condition: SizeValue({
    test: 'true',
    value: 100
  }),
  value: 10
})

I believe the issue is that (for this particular Chart) Altair is wrapping too deep (notice how SizeValue occurs twice). If we use validate="deep" on this object, an error is raised because {"test": "true", "value": 100} is not a valid alt.SizeValue. This I think explains the error message 'test' was unexpected that we saw in your first example above.

This particular error can be solved by removing these lines, which wrap the condition: https://github.com/altair-viz/altair/blob/6a3db4d9d8685437953237d9b7a8d8f3c99dfb3e/altair/utils/core.py#L672-L674
but then examples using more complicated conditions break (such as this interactive rectangular brush example).

I'm optimistic this particular bug can be solved by adding some more logic to the lines I linked above, but I haven't tried yet.

I don't think this is related to your concatenated charts example (how could it be, since there is no condition in that example).

@joelostblom
Copy link
Contributor Author

That's great! Thank you for looking into that in more detail Chris! Feel free to play around with that logic if you wanted. I am mainly working to solve the concat/layer issue right now and I think I have figured out that it is due to the top level data not being taking into account when evaluating the individual layers/concats, so I am trying to change the logic in one of the other files so that each layered/concated spec is evaluated in the context of the top level data.

@davidgilbertson
Copy link

+1 I'm a newbie and I'm seeing this red herring 'data' is a required property a lot, it makes for quite a frustrating experience, would be wonderful to be able to see the real error.

@mattijn
Copy link
Contributor

mattijn commented Feb 1, 2023

Thanks for sharing your frustrations.
Please keep doing this, these reported user experiences will help making the library better, maybe not directly, but in the medium long term it will.

The feedback mechanism on providing the correct error back to the user is, maybe surprising, more complex than it seems, luckily there is a great proposal in line that aims to improve this significantly. See PR: #2842.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants