Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Strict Metaschema Validation #1101

Merged
merged 11 commits into from
Dec 6, 2021
Merged

Conversation

holmanb
Copy link
Member

@holmanb holmanb commented Nov 5, 2021

Proposed Commit Message

Improve schema validation.

This adds strict validation of config module definitions at testing
time, with plumbing included for future runtime validation. This
eliminates a class of bugs resulting from schemas that have definitions
that are incorrect, but get interpreted by jsonschema as
"additionalProperties" that are therefore ignored.


- Add strict meta-schema for jsonschema unit test validation
- Separate schema from module metadata structure
- Improve type annotations for various functions and data types

Cleanup:
- Remove unused jsonschema "required" elements 
- Eliminate manual memoization in schema.py:get_schema(),
        reference module.__doc__ directly

Test Steps

Many of the files were only touched for function name/arg/schema/meta updates, however schema.py, test_schema.py, test_cli.py had the most substantial changes.

Checklist:

  • My code follows the process laid out in the documentation
  • I have updated or added any unit tests accordingly
  • I have updated or added any documentation accordingly

@holmanb holmanb force-pushed the holmanb/metaschema branch 2 times, most recently from d4cdf05 to 0b93060 Compare November 16, 2021 19:52
@holmanb holmanb changed the title WIP: metaschema testing Add Strict Metaschema Validation Nov 18, 2021
@holmanb holmanb marked this pull request as ready for review November 18, 2021 22:45
doc/rtd/conf.py Outdated Show resolved Hide resolved
@blackboxsw
Copy link
Collaborator

Thanks for this @holmanb. I'm just starting the review amd will have some significant feedback tomorrow. In your suggested commit message could you capture the intent behind the strict_metaschema being a testing time validation of config module definitions rather than runtime validation.

cloudinit/config/schema.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@holmanb thanks for putting this together. A couple of minor requests inline and a general question for you about moving toward pytest to avoid all the stdout/error context manager work.

cloudinit/config/schema.py Show resolved Hide resolved
tests/unittests/test_cli.py Outdated Show resolved Hide resolved
tests/unittests/test_cli.py Outdated Show resolved Hide resolved
tests/unittests/test_cli.py Show resolved Hide resolved
tests/unittests/test_handler/test_schema.py Outdated Show resolved Hide resolved
Copy link
Member

@TheRealFalcon TheRealFalcon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few inline questions/comments, but overall looking good.

doc/rtd/conf.py Outdated Show resolved Hide resolved
doc/rtd/conf.py Outdated Show resolved Hide resolved
cloudinit/config/schema.py Outdated Show resolved Hide resolved
cloudinit/config/schema.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think do need to include property descriptions in docs. We might be able to drop the individual module documentation from the command-line, but we need to fixup the argparse setup for the devel schema subcommand.

cloudinit/config/schema.py Outdated Show resolved Hide resolved
cloudinit/config/schema.py Show resolved Hide resolved
@holmanb holmanb force-pushed the holmanb/metaschema branch 2 times, most recently from 1ab87a1 to 3779ff2 Compare November 24, 2021 15:33
Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @holmanb any of these remaining comments are nits and I can +1 once you make any changes you think are relevant. Thank you for the restructure, I agree it is easier to read/maintain here and functionally works will with no degradation of existing behavior that I can see.

Approve pending any comments you decide to address and a passing CI run

cloudinit/config/schema.py Outdated Show resolved Hide resolved
cloudinit/config/schema.py Outdated Show resolved Hide resolved
tests/unittests/test_handler/test_schema.py Outdated Show resolved Hide resolved
tests/unittests/test_handler/test_schema.py Outdated Show resolved Hide resolved
cloudinit/config/schema.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Combined diff of my musings for your thoughts. I think we want to minimally make the changes to simplify the get_json_validator return value to a 2-tuple instead of 3-tuple. But other diffs are included in case you find them reasonable.

diff --git a/cloudinit/config/schema.py b/cloudinit/config/schema.py
index 61f8cd9d..0c46cbdc 100644
--- a/cloudinit/config/schema.py
+++ b/cloudinit/config/schema.py
@@ -100,40 +100,48 @@ def get_jsonschema_validator():
         # assignment-operation pylint warning which appears because this
         # code path isn't written against the latest jsonschema).
         types['string'] = (str, bytes)  # pylint: disable=E1137
+        # disable bottom-level keys in meta-schema
+        strict_metaschema = deepcopymeta_schema=Draft4Validator.META_SCHEMA)
+        strict_metaschema["additionalProperties"] = False
         cloudinitValidator = create(
-            meta_schema=Draft4Validator.META_SCHEMA,
+            meta_schema=strict_metaschema,
             validators=Draft4Validator.VALIDATORS,
             version="draft4",
             default_types=types)
-    return (cloudinitValidator, FormatChecker, None)
+    return (cloudinitValidator, FormatChecker)
 
 
-def validate_cloudconfig_metaschema(schema: dict, throw=True):
+def validate_cloudconfig_metaschema(
+    validator, schema: dict, throw: bool = True
+): 
     """Validate provided schema meets the metaschema definition. Return strict
     Validator and FormatChecker for use in validation
+    @param validator: Draft4Validator instance used to validate the schema
     @param schema: schema to validate
     @param throw: Sometimes the validator and checker are required, even if
-        the schema is invalid. Toggle for whether to raise SchemaErrors
-
-    @returns: Tuple(jsonschema.Validator, jsonschema.FormatChecker, err)
+        the schema is invalid. Toggle for whether to raise
+        SchemaValidationError orlog warnings.
 
     @raises: ImportError when jsonschema is not present
-    @raises: jsonschema.exceptions.SchemaError if the schema is invalid
+    @raises: SchemaValidationError when the schema is invalid
     """
 
     from jsonschema.exceptions import SchemaError
 
-    (cloudinitValidator, FormatChecker, _) = get_jsonschema_validator()
     try:
-
-        # disable bottom-level keys
-        cloudinitValidator.META_SCHEMA["additionalProperties"] = False
-        cloudinitValidator.check_schema(schema)
-    except SchemaError as e:
+        validator.check_schema(schema)
+    except SchemaError as err:
+        # Raise SchemaValidationError so we don't have jsonschema imports
+        # at call sites.
         if throw:
-            raise e
-        return (cloudinitValidator, FormatChecker, e)
-    return (cloudinitValidator, FormatChecker, None)
+            raise SchemaValidationError(
+              errors=(('.'.join([str(p) for p in err.path]), err.message),)
+            )
+        logging.warning(
+            "Meta-schema validation failed, attempting to validate config"
+            " anyway: %s",
+            err
+        )
 
 
 def validate_cloudconfig_schema(
@@ -154,20 +162,14 @@ def validate_cloudconfig_schema(
         against the provided schema.
     """
     try:
-        (cloudinitValidator, FormatChecker, err) = (
-            get_jsonschema_validator()
-            if not strict_metaschema
-            else validate_cloudconfig_metaschema(schema, throw=False)
-        )
+        (cloudinitValidator, FormatChecker) = get_jsonschema_validator()
+        if strict_metaschema:
+            validate_cloudconfig_metaschema(
+                cloudinitValidator, schema, throw=False
+            )
     except ImportError:
         logging.debug("Ignoring schema validation. jsonschema is not present")
         return
-    if err:
-        logging.warning(
-            "Meta-schema validation failed, attempting to validate config"
-            " anyway: %s",
-            err,
-        )
     validator = cloudinitValidator(schema, format_checker=FormatChecker())
     errors = ()
     for error in sorted(validator.iter_errors(config), key=lambda e: e.path):

cloudinit/config/schema.py Outdated Show resolved Hide resolved
cloudinit/config/schema.py Outdated Show resolved Hide resolved
cloudinit/config/schema.py Show resolved Hide resolved
cloudinit/config/schema.py Outdated Show resolved Hide resolved
cloudinit/config/schema.py Outdated Show resolved Hide resolved
@holmanb holmanb force-pushed the holmanb/metaschema branch from dfabeaa to c93fde0 Compare December 3, 2021 21:09
@holmanb
Copy link
Member Author

holmanb commented Dec 3, 2021

Combined diff of my musings for your thoughts. I think we want to minimally make the changes to simplify the get_json_validator return value to a 2-tuple instead of 3-tuple. But other diffs are included in case you find them reasonable.

Thanks for the help on this. I wasn't sure about the best approach to the failure paths with this one. Besides one item, I think these are all improvements.

diff --git a/cloudinit/config/schema.py b/cloudinit/config/schema.py
index 61f8cd9d..0c46cbdc 100644
--- a/cloudinit/config/schema.py
+++ b/cloudinit/config/schema.py
@@ -100,40 +100,48 @@ def get_jsonschema_validator():
         # assignment-operation pylint warning which appears because this
         # code path isn't written against the latest jsonschema).
         types['string'] = (str, bytes)  # pylint: disable=E1137
+        # disable bottom-level keys in meta-schema
+        strict_metaschema = deepcopymeta_schema=Draft4Validator.META_SCHEMA)
+        strict_metaschema["additionalProperties"] = False
         cloudinitValidator = create(
-            meta_schema=Draft4Validator.META_SCHEMA,
+            meta_schema=strict_metaschema,
             validators=Draft4Validator.VALIDATORS,
             version="draft4",
             default_types=types)
-    return (cloudinitValidator, FormatChecker, None)
+    return (cloudinitValidator, FormatChecker)

The problem with this is that get_jsonschema_validator() conditionally produces validator/formatchecker differently based on jsonschema version. The change proposed here only applies to the pre-3.0 jsonschema branch. I think we want both branches to be capable of strict validation. Assuming I don't see anything else in need of change I plan on applying the other proposals you made, in addition to a modified version of this one.

Thanks!

@blackboxsw blackboxsw self-assigned this Dec 3, 2021
@holmanb holmanb force-pushed the holmanb/metaschema branch from 4aea9ce to 7dfe838 Compare December 6, 2021 17:10
Copy link
Collaborator

@blackboxsw blackboxsw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for all the discussion and work here @holmanb +1. LGTM

@blackboxsw blackboxsw merged commit bedac77 into canonical:main Dec 6, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants