-
Notifications
You must be signed in to change notification settings - Fork 908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Single JSON schema validation in early boot #1175
Single JSON schema validation in early boot #1175
Conversation
9a04f5b
to
2f75ba4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good with some minor comments inline.
One other issue. I noticed tests/unittests/config/test_schema.py
isn't validating the cc_apt_pipelining examples. If I add something like apt_pipelining: ['abc', 1.5]
to the examples list, it still passes, though I'm not sure why.
cloudinit/config/schema.py
Outdated
os.path.join(paths.schema_dir, "cloud-init-schema-*") | ||
) | ||
full_schema = None | ||
for schema_file in schema_files: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we error or warning if we find more than one schema file? As written, this will loop through all of the schema files and use the last one found as the full_schema. Is there ever a reason to have more than one schema file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@TheRealFalcon what I was uncertain of when I first wrote this was whether we'd support plugin schemas in the future. I think we ended up having a discussion afterward that we would not look to support a parts-dir style composite schema. We would prefer downstream/private module owners to upstream their schema if they want schema validation coverage. So, I think it's safe to read one schema file only, certainly for this iteration. I'll drop the loop and hardcode the schema we read.
1fd4d48
to
aa319ee
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notes/comments for reviewers.
} | ||
|
||
__doc__ = get_meta_doc(meta, schema) | ||
__doc__ = get_meta_doc(meta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_get_meta_doc now makes schema param optional and will construct docs based on sub-schema matching meta['id'] for the "new-style" cc_ modules.
@@ -10,7 +10,7 @@ | |||
|
|||
from cloudinit import log as logging | |||
from cloudinit import temp_utils, templater, util | |||
from cloudinit.config.schema import get_meta_doc, validate_cloudconfig_schema | |||
from cloudinit.config.schema import get_meta_doc |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Migrated schema modules will no longer define "schema" attribute or run validate_cloudconfig_schema since we run schema validation one-time against the full schema after the user-data is processed in main.py:480
@@ -593,7 +618,7 @@ def load_doc(requested_modules: list) -> str: | |||
for mod_name in all_modules: | |||
if "all" in requested_modules or mod_name in requested_modules: | |||
(mod_locs, _) = importer.find_module( | |||
mod_name, ["cloudinit.config"], ["schema"] | |||
mod_name, ["cloudinit.config"], ["meta"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New-style schema definitions in cc_* modules will only have a "meta" attribute, not a local schema.
cloudinit/config/schema.py
Outdated
) | ||
full_schema = None | ||
if len(schema_files) > 1: | ||
LOG.warning( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't want to break the world if we don't have proper schema definitions. A warning will suffice.
read_cfg_paths.return_value = paths | ||
# New-style schema $defs exist in config/cloud-init-schema*.json | ||
for schema_file in Path(cloud_init_project_dir("config/")).glob( | ||
"cloud-init-schema*.json" | ||
): | ||
util.copy(schema_file, paths.schema_dir) | ||
schema = get_schema() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may want to make a json_schema fixture at some point, but I didn't think to create that in this branch. This is a lot of boilerplate setup for us to pull into all 51 test_cc_* modules.
|
||
|
||
class TestAptPipelining(CiTestCase): | ||
class TestAptPipelining: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally want to move away for CiTestCase subclassese wherever possible to get the benefits of pytest features/fixtures/etc.
e3150cf
to
d5ae0d8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly looks good, except we're missing the schema property descriptions from the docs.
That's the only major thing. Additionally:
For the parametrized tests, I would consider only testing for invalid schemas, and adding the valid schemas to the meta examples. These get tested via test_schema.py, and add some additional documentation for how to use the module. Not necessary for this PR, but something to consider.
Make sure you rebase if you haven't recently. I merged a new community provided keyboard module yesterday that included its own schema. It shouldn't cause any problems, but just make sure no unit tests are affected with the new structure.
Validate full schema one time early in boot just after user-data is processed. This avoid 51+ repeated calls to validate_cloudcfg_schema on subsections of the schema. Moving validation to one place against the whole schema will allow for a single concise error/warning about schema violations in one place in logs.
- Define single JSON schema file to allow for validation of full schema for provided user-data. - Package schema file in /etc/cloud/cloud-init-1.0.json - Add Paths.schema_dir and unpickle upgrade test handling
Ultimately cloud-init wants a single JSON schema file defining composite schema for all cc_* modules instead of each module defining it's own sub-schema. Legacy schema definitions took the form or a "schema" attribute in each cc_* module. The "new" way would be adding schema $defs to the single config/cloud-init-schema-<version> file. End goal is to publish those static composite JSON schema files to schema validation services to allow externals to consume and validate user-data before spending resources on invalid instance launches. During migration from legacy schema definitions in each cc_* module, cloudinit needs to support cc_* modules which define schema localling within the cc_* module as a "schema" attribute. - get_schema: source config/cloud-init-schema* and supplement with legacy cc_*.schema definitions to bridge gap between legacy cc_*.schema defitions and the "new" full config/cloud-init-schema. - Add support for load_doc to handle legacy and combined schema defs - get_meta_doc will validate against global composite schema when 'schema' param absent - get_meta_doc will also limit scope of documentation to the specific meta["id"] attribute as defined in global schema $defs. To allow modules to avoid passing in their own sub-schema - get_meta_doc needs to find_modules containing "meta" instead of "schema" as non-legacy schema definitions exist in config/cloud-init-schema* instead of in python dicts as a cc_*.schema attribute
Add tests for _schemapath_for_cloudconfig as it was incorrectly constructing paths prefixes for list items. Prevent concatonation of path_prefix to a key if the key is the same indent_depth as prior line.
To allow for building docs locally in the project-dir, get_schema needs to source ./config/cloud-init-schema*json which is data files outside of the typical python module paths. get_schema will prefer CLOUD_INIT_SCHEMA_DIR environment variable over Paths.schema_dir if provided. Allow tox.ini to passenv CLOUD_INIT_SCHEMA_DIR into the tox env. To generate docs from project directory: CLOUD_INIT_SCHEMA_DIR=config tox -e doc To validate schema docs from project directory: CLOUD_INIT_SCHEMA_DIR=config PYTHONPATH=. python3 \ -m cloudinit.cmd.main devel schema --docs all
d141767
to
2b43a2e
Compare
@blackboxsw , hmmm, I can't build docs now.
If I switch to another branch I don't get this. Have you seen this? |
@TheRealFalcon, Yes your poor trace there is telling you that tools/read-version can't find git tags to properly git describe the local directory. But the ordering of your output seems mixed up. I would not have expected to see the toos/read-version non-zero exit output as a result of the I'd suspect a second git describe version (None) differs from cloudinit.version (21.4)
...
subprocess.CalledProcessError: Command '['/home/james/envs/cloud-init36/bin/python', 'tools/read-version']' returned non-zero exit status 1. I also can't reproduce this on a system with
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @blackboxsw , this looks good.
I think we can remove the schema_dir
from the conftest. I'll let you do that and then you're free to merge.
Move full JSON schema validation of user-data to just after user-data has been collected.
The goals or this refactor are the following:
TODO
Proposed Commit Message
Individual commits
Author: Chad Smith chad.smith@canonical.com
Date: Fri Jan 7 13:47:22 2022 -0700
commit bc6fcf5
Author: Chad Smith chad.smith@canonical.com
Date: Fri Jan 7 13:36:35 2022 -0700
s
non-legacy schema definitions exist in config/cloud-init-schema* instead of
in python dicts as a cc_*.schema attribute
commit e0f754b
Author: Chad Smith chad.smith@canonical.com
Date: Fri Jan 7 12:52:41 2022 -0700
commit f1a7c75
Author: Chad Smith chad.smith@canonical.com
Date: Wed Jan 5 12:45:29 2022 -0700
commit e0f754b
Author: Chad Smith chad.smith@canonical.com
Date: Fri Jan 7 12:52:41 2022 -0700
commit f1a7c75
Author: Chad Smith chad.smith@canonical.com
Date: Wed Jan 5 12:45:29 2022 -0700
Additional Context
Test Steps
Checklist: