Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Make the schema validation error for non-existing params more informative #2568

Merged
merged 27 commits into from
Feb 21, 2023
Merged
Show file tree
Hide file tree
Changes from 21 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
555a6c2
Make the schema validation error for non-existing params more informa…
joelostblom Mar 22, 2022
430b17b
Neatly format the existing params as a table
joelostblom Mar 22, 2022
45f11b6
Use more reliable way of returning the non-existing param name
joelostblom Apr 19, 2022
95b1c60
Break out table formatting into its own function
joelostblom Jan 6, 2023
9bd738b
Add changes into code generation file
joelostblom Jan 8, 2023
4eeac63
Remove mistakingly added old lines
joelostblom Jan 8, 2023
8dbf794
Merge branch 'master' into schema-validation-error
joelostblom Feb 5, 2023
3a41601
Merge branch 'altair-viz:master' into schema-validation-error
joelostblom Feb 16, 2023
f994245
Only show existing parameters if an unknown parameter was used
joelostblom Feb 18, 2023
bee7b8b
Update tests to check for detailed parameter errors
joelostblom Feb 18, 2023
4395f58
Trim error messages to be more to the point by removing unhelpful info
joelostblom Feb 18, 2023
8848228
Add a general heading to error messages with multiple errors so that …
joelostblom Feb 18, 2023
1ce2bcf
Remove "self" from listed params
joelostblom Feb 18, 2023
31948e8
Blake format and use cleandoc for proper indendation in the source
joelostblom Feb 18, 2023
750bfd1
Update old test message to remove what is no longer in the error message
joelostblom Feb 18, 2023
946ca90
Merge branch 'master' into schema-validation-error
joelostblom Feb 18, 2023
8e9e7c7
Move invalid y option tests next to each other
joelostblom Feb 18, 2023
7a89d76
Black format
joelostblom Feb 18, 2023
01c517f
Update test to account for missing self
joelostblom Feb 18, 2023
f9b4f3b
Refine the message construction and move it to the else clause where …
joelostblom Feb 18, 2023
3718432
Remove flake8 test for multline line strings with trailing whitespace
joelostblom Feb 18, 2023
3022f8c
Add latest updates to source generator file as well
joelostblom Feb 18, 2023
33296b0
Merge branch 'master' into schema-validation-error
joelostblom Feb 19, 2023
b994959
Remove redundant schema_path variable and allow unknown encodings par…
joelostblom Feb 19, 2023
afd614b
Remove redundant assert
joelostblom Feb 19, 2023
4e8bac5
Add latest updates to source generator file as well
joelostblom Feb 19, 2023
187c817
Include changes to tests in tools code generation file
joelostblom Feb 20, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
115 changes: 104 additions & 11 deletions altair/utils/schemapi.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@
import json
import textwrap
from typing import Any, Sequence, List
from itertools import zip_longest

import jsonschema
import jsonschema.exceptions
Expand All @@ -15,6 +16,7 @@

from altair import vegalite


# If DEBUG_MODE is True, then schema objects are converted to dict and
# validated at creation time. This slows things down, particularly for
# larger specs, but leads to much more useful tracebacks for the user.
Expand Down Expand Up @@ -158,6 +160,63 @@ def __init__(self, obj, err):
self.obj = obj
self._additional_errors = getattr(err, "_additional_errors", [])

@staticmethod
def _format_params_as_table(param_dict_keys):
"""Format param names into a table so that they are easier to read"""
param_names, name_lengths = zip(
*[
(name, len(name))
for name in param_dict_keys
if name not in ["kwds", "self"]
]
)
# Worst case scenario with the same longest param name in the same
# row for all columns
max_name_length = max(name_lengths)
max_column_width = 80
# Output a square table if not too big (since it is easier to read)
num_param_names = len(param_names)
square_columns = int(np.ceil(num_param_names**0.5))
columns = min(max_column_width // max_name_length, square_columns)

# Compute roughly equal column heights to evenly divide the param names
def split_into_equal_parts(n, p):
return [n // p + 1] * (n % p) + [n // p] * (p - n % p)

column_heights = split_into_equal_parts(num_param_names, columns)

# Section the param names into columns and compute their widths
param_names_columns = []
column_max_widths = []
last_end_idx = 0
for ch in column_heights:
param_names_columns.append(param_names[last_end_idx : last_end_idx + ch])
column_max_widths.append(
max([len(param_name) for param_name in param_names_columns[-1]])
)
last_end_idx = ch + last_end_idx

# Transpose the param name columns into rows to facilitate looping
param_names_rows = []
for li in zip_longest(*param_names_columns, fillvalue=""):
param_names_rows.append(li)
# Build the table as a string by iterating over and formatting the rows
param_names_table = ""
for param_names_row in param_names_rows:
for num, param_name in enumerate(param_names_row):
# Set column width based on the longest param in the column
max_name_length_column = column_max_widths[num]
column_pad = 3
param_names_table += "{:<{}}".format(
param_name, max_name_length_column + column_pad
)
# Insert newlines and spacing after the last element in each row
if num == (len(param_names_row) - 1):
param_names_table += "\n"
# 16 is the indendation of the returned multiline string below
param_names_table += " " * 16
return param_names_table

def __str__(self):
# Try to get the lowest class possible in the chart hierarchy so
# it can be displayed in the error message. This should lead to more informative
Expand All @@ -181,19 +240,53 @@ def __str__(self):
for val in schema_path[:-1]
if val not in (0, "properties", "additionalProperties", "patternProperties")
)
joelostblom marked this conversation as resolved.
Show resolved Hide resolved
message = self.message
if self._additional_errors:
message += "\n " + "\n ".join(
[e.message for e in self._additional_errors]
)
return """Invalid specification
joelostblom marked this conversation as resolved.
Show resolved Hide resolved

{}, validating {!r}
# Output all existing parameters when an unknown parameter is specified
if (
hasattr(vegalite, schema_path.split(".")[-1])
and self.validator == "additionalProperties"
joelostblom marked this conversation as resolved.
Show resolved Hide resolved
):
joelostblom marked this conversation as resolved.
Show resolved Hide resolved
altair_class = schema_path.split(".")[-1]
joelostblom marked this conversation as resolved.
Show resolved Hide resolved
joelostblom marked this conversation as resolved.
Show resolved Hide resolved
vegalite_core_class = getattr(vegalite, schema_path.split(".")[-1])
joelostblom marked this conversation as resolved.
Show resolved Hide resolved
param_dict_keys = inspect.signature(vegalite_core_class).parameters.keys()
joelostblom marked this conversation as resolved.
Show resolved Hide resolved
param_names_table = self._format_params_as_table(param_dict_keys)

# `cleandoc` removes multiline string indentation in the output
return inspect.cleandoc(
"""`{}` has no parameter named {!r}

Existing parameter names are:
{}
See the help for `{}` to read the full description of these parameters
""".format(
altair_class,
joelostblom marked this conversation as resolved.
Show resolved Hide resolved
self.message.split("('")[-1].split("'")[0],
param_names_table,
altair_class,
joelostblom marked this conversation as resolved.
Show resolved Hide resolved
)
)
# Use the default error message for all other cases than unknown parameter errors
else:
message = self.message
# Add a summary line when parameters are passed an invalid value
# For example: "'asdf' is an invalid value for `stack`
if hasattr(vegalite, schema_path.split(".")[-1]) and self.absolute_path:
joelostblom marked this conversation as resolved.
Show resolved Hide resolved
# The indentation here must match that of `cleandoc` below
message = f"""'{self.instance}' is an invalid value for `{self.absolute_path[-1]}`:
joelostblom marked this conversation as resolved.
Show resolved Hide resolved

{message}"""

if self._additional_errors:
message += "\n " + "\n ".join(
[e.message for e in self._additional_errors]
)

{}
""".format(
schema_path, self.validator, message
)
return inspect.cleandoc(
"""{}
""".format(
message
)
)


class UndefinedType(object):
Expand Down
67 changes: 47 additions & 20 deletions tests/utils/tests/test_schemapi.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
# tools/generate_schema_wrapper.py. Do not modify directly.
import copy
import io
import inspect
import json
import jsonschema
import re
Expand Down Expand Up @@ -377,9 +378,7 @@ def test_schema_validation_error():
assert isinstance(the_err, SchemaValidationError)
message = str(the_err)

assert message.startswith("Invalid specification")
assert "test_schemapi.MySchema->a" in message
assert "validating {!r}".format(the_err.validator) in message
joelostblom marked this conversation as resolved.
Show resolved Hide resolved
assert "4 is not of type 'string'" in message
assert the_err.message in message


Expand Down Expand Up @@ -467,36 +466,64 @@ def chart_example_invalid_y_option_value_with_condition():
[
(
chart_example_invalid_y_option,
r"schema.channels.X.*"
+ r"Additional properties are not allowed \('unknown' was unexpected\)",
inspect.cleandoc(
r"""`X` has no parameter named 'unknown'

Existing parameter names are:
shorthand bin scale timeUnit
aggregate field sort title
axis impute stack type
bandPosition

See the help for `X` to read the full description of these parameters""" # noqa: W291
),
),
(
chart_example_invalid_y_option_value,
r"schema.channels.Y.*"
+ r"'asdf' is not one of \['zero', 'center', 'normalize'\].*"
+ r"'asdf' is not of type 'null'.*'asdf' is not of type 'boolean'",
chart_example_layer,
inspect.cleandoc(
r"""`VConcatChart` has no parameter named 'width'

Existing parameter names are:
vconcat center description params title
autosize config name resolve transform
background data padding spacing usermeta
bounds datasets

See the help for `VConcatChart` to read the full description of these parameters""" # noqa: W291
),
),
(
chart_example_layer,
r"api.VConcatChart.*"
+ r"Additional properties are not allowed \('width' was unexpected\)",
chart_example_invalid_y_option_value,
inspect.cleandoc(
r"""'asdf' is an invalid value for `stack`:

'asdf' is not one of \['zero', 'center', 'normalize'\]
'asdf' is not of type 'null'
'asdf' is not of type 'boolean'"""
),
),
(
chart_example_invalid_y_option_value_with_condition,
r"schema.channels.Y.*"
+ r"'asdf' is not one of \['zero', 'center', 'normalize'\].*"
+ r"'asdf' is not of type 'null'.*'asdf' is not of type 'boolean'",
inspect.cleandoc(
r"""'asdf' is an invalid value for `stack`:

'asdf' is not one of \['zero', 'center', 'normalize'\]
'asdf' is not of type 'null'
'asdf' is not of type 'boolean'"""
),
),
(
chart_example_hconcat,
r"schema.core.TitleParams.*"
+ r"\{'text': 'Horsepower', 'align': 'right'\} is not of type 'string'.*"
+ r"\{'text': 'Horsepower', 'align': 'right'\} is not of type 'array'",
inspect.cleandoc(
r"""'{'text': 'Horsepower', 'align': 'right'}' is an invalid value for `title`:

{'text': 'Horsepower', 'align': 'right'} is not of type 'string'
{'text': 'Horsepower', 'align': 'right'} is not of type 'array'"""
),
),
(
chart_example_invalid_channel_and_condition,
r"schema.core.Encoding->encoding.*"
+ r"Additional properties are not allowed \('invalidChannel' was unexpected\)",
r"Additional properties are not allowed \('invalidChannel' was unexpected\)\n",
joelostblom marked this conversation as resolved.
Show resolved Hide resolved
),
],
)
Expand Down
96 changes: 89 additions & 7 deletions tools/schemapi/schemapi.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@
import json
import textwrap
from typing import Any, Sequence, List
from itertools import zip_longest

import jsonschema
import jsonschema.exceptions
Expand All @@ -13,6 +14,7 @@

from altair import vegalite


# If DEBUG_MODE is True, then schema objects are converted to dict and
# validated at creation time. This slows things down, particularly for
# larger specs, but leads to much more useful tracebacks for the user.
Expand Down Expand Up @@ -156,6 +158,59 @@ def __init__(self, obj, err):
self.obj = obj
self._additional_errors = getattr(err, "_additional_errors", [])

@staticmethod
def _format_params_as_table(param_dict_keys):
"""Format param names into a table so that they are easier to read"""
param_names, name_lengths = zip(
*[(name, len(name)) for name in param_dict_keys if name != "kwds"]
)
# Worst case scenario with the same longest param name in the same
# row for all columns
max_name_length = max(name_lengths)
max_column_width = 80
# Output a square table if not too big (since it is easier to read)
num_param_names = len(param_names)
square_columns = int(np.ceil(num_param_names**0.5))
columns = min(max_column_width // max_name_length, square_columns)

# Compute roughly equal column heights to evenly divide the param names
def split_into_equal_parts(n, p):
return [n // p + 1] * (n % p) + [n // p] * (p - n % p)

column_heights = split_into_equal_parts(num_param_names, columns)

# Section the param names into columns and compute their widths
param_names_columns = []
column_max_widths = []
last_end_idx = 0
for ch in column_heights:
param_names_columns.append(param_names[last_end_idx : last_end_idx + ch])
column_max_widths.append(
max([len(param_name) for param_name in param_names_columns[-1]])
)
last_end_idx = ch + last_end_idx

# Transpose the param name columns into rows to facilitate looping
param_names_rows = []
for li in zip_longest(*param_names_columns, fillvalue=""):
param_names_rows.append(li)
# Build the table as a string by iterating over and formatting the rows
param_names_table = ""
for param_names_row in param_names_rows:
for num, param_name in enumerate(param_names_row):
# Set column width based on the longest param in the column
max_name_length_column = column_max_widths[num]
column_pad = 3
param_names_table += "{:<{}}".format(
param_name, max_name_length_column + column_pad
)
# Insert newlines and spacing after the last element in each row
if num == (len(param_names_row) - 1):
param_names_table += "\n"
# 16 is the indendation of the returned multiline string below
param_names_table += " " * 16
return param_names_table

def __str__(self):
# Try to get the lowest class possible in the chart hierarchy so
# it can be displayed in the error message. This should lead to more informative
Expand Down Expand Up @@ -184,14 +239,41 @@ def __str__(self):
message += "\n " + "\n ".join(
[e.message for e in self._additional_errors]
)
return """Invalid specification

{}, validating {!r}

{}
""".format(
schema_path, self.validator, message
)
if hasattr(vegalite, schema_path.split(".")[-1]):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@binste I noticed an issue here after #2842 was merged. Previously, error messages would include a reference to the class where the error was raised, as in this case with the Y channel:

altair.vegalite.v5.schema.channels.Y, validating 'additionalProperties'

After #2842 the first part of that error message changes to no longer point to the particular class where the error was encountered, but just the chart in general:

altair.vegalite.v5.api.Chart->0, validating 'additionalProperties'

In addition to being less informative in general, this becomes an issue here because this PR was relying on that first part of the error message to find all the correct parameters names that the Y channel accept. There might be ways to work around that in this PR, but I wanted to ask you first if you think it is possible to restore the more precise error message that points to the class where the error was encountered or if that information is lost with the new approach to raising errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add some more info, previously with deep validation, what I think was happening was that the iterative finding of the error meant that the class of the SchemaBase (I think) would change in each iteration to reflect which part of the spec was checked for errors. For example if I print out type(self) of the self that is passed to the SchemaValidationError it would look like this with deep validation (for a spec with a typo in a param in the Y channel):

<class 'altair.vegalite.v5.api.Chart'>
<class 'altair.vegalite.v5.schema.channels.Y'>

Without deep validation, all the errors are found without iteration and therefor the self param remains as the chart itself without changing to the Y channel where the error is occurring:

<class 'altair.vegalite.v5.api.Chart'>

I tried to look briefly if there was a way to not only return all the error messages, but also all the classes so that we could still find out that in this case it was the Y class that the error pertained to.

I looked briefly and it seems like some of this information is stored in the absolute_path attribute of the self object inside the SchemaValidationError class, and it might be possible to reconstruct it from there, but I couldn't find the full class path anywhere and I wanted to hear your thoughts before going down that path.

Copy link
Contributor

@binste binste Feb 11, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the detailed write-up! Indeed, I agree that it would be more informative if that original behaviour is restored and of course helpful for your proposed changes here, which look very helpful.

I think absolute_path is the correct place to take this from. From that name, we can try to recreate the correct lass name. We can maybe use the usual conversion to a Python object name that is used in the propertysetter https://github.com/altair-viz/altair/blob/master/altair/utils/schemapi.py#L680 and also in the code generation https://github.com/altair-viz/altair/blob/master/tools/generate_schema_wrapper.py#L454:

classname = prop[0].upper() + prop[1:]

Places to modify would be in SchemaValidationError.__str__ instead of cls = self.obj.__class__ we could do something like

class_name_lower = self.absolute_path[-1]
# TODO: Check if this always works. Would work for `xOffset` -> `XOffset`
class_name = f"{class_name_lower}"[0].upper() + f"{class_name_lower}"[1:]
cls = getattr(vegalite, class_name)
cls

and before that in SchemaValidationError.__init__:

self.absolute_path = err.absolute_path

I haven't tested this yet but most classes should be importable from the top-level like this right? For others we could still fall back on self.obj (the self that is passed into SchemaValidationError).

If I find more time in the next days I can make a PR with this to restore the original behaviour of showing the class in which the error occured. I could then also review this PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #2883

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! I merged in your PR and this seems to be working as intended now. There are a few failing tests which needs the error message updated; I think I can get to those on Friday or Sunday this week. There seems to be some larger error also where my solution here still needs some change to work with the new multiple returned error messages.

altair_class = "altair." + schema_path.split(".")[-1]
vegalite_core_class = getattr(vegalite, schema_path.split(".")[-1])
param_dict_keys = inspect.signature(vegalite_core_class).parameters.keys()
param_names_table = self._format_params_as_table(param_dict_keys)

# cleandoc removes multiline string indentation in the output
return inspect.cleandoc(
"""Invalid specification

{}, validating {!r}

{} has no parameter named {!r}

Existing parameter names are:
{}
See the help for {} to read the full description of these parameters
""".format(
schema_path,
self.validator,
altair_class,
message.split("('")[-1].split("'")[0],
param_names_table,
altair_class,
)
)
# Fall back on the less informative error message
else:
return """Invalid specification
{}, validating {!r}
{}
""".format(
schema_path, self.validator, message
)


class UndefinedType(object):
Expand Down