Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Repo-level automation #3

Open
bollwyvl opened this issue Feb 23, 2024 · 0 comments
Open

Repo-level automation #3

bollwyvl opened this issue Feb 23, 2024 · 0 comments

Comments

@bollwyvl
Copy link

Elevator Pitch

Adopt a top-level repo task runner which knows when to run different tasks to achieve desired goals. Use in CI, and document for local development.

Motivation

This repo's development workflow will take as inputs on a given PR:

  • human-authored TOML, YAML, etc.
  • templates to generate packages/documentation
  • narrative documentation
  • build dependencies

And generate as outputs:

  • canonical JSON in
  • documentation as HTML (and PDF, etc)
    • checking reports (e.g. links, grammar, spelling)
  • multiple language/framework-specific packages
    • distributions
    • test reports
    • coverage reports
    • documentation

Proposal

make is fine, but is still complex to operate in 2024 for windows users. Indeed, even pre-commit (or one of its many plugins) make non-portable assumptions, and "I can't even commit," isn't a very nice feature for a new/drive-by contributor.

If indeed the top level of the repo will be (at least) a canonical, no- or one-dependency python project, I'd recommend starting with doit, where the repo would contain a top-level dodo.py (or any other file, as configured in pyproject.toml).

Example

Given a layout like:

./
  pyproject.toml
  dodo.py
  schema/
    some/
      path/
        thing.schema.yaml

And a preflight such as:

python -m pip install -e .[dev]

And the dodo.py:

from pathlib import Path
import tomli_w
import json
from typing import Type
import yaml
import jsonschema

ROOT = Path(__file__).parent
SCHEMA = ROOT / "schema"
ALL_SCHEMA_SRC = [*SCHEMA.rglob("*.schema.toml"), *SCHEMA.rglob("*.schema.yaml")]
ALL_SCHEMA_DIST = {
    src: src.parent / f"""{src.stem}.json""" for src in ALL_SCHEMA_SRC
}

def task_build():
    for src, schema in ALL_SCHEMA_DIST.items():
        rel = schema.relative_to(SCHEMA)
        yield dict(
            name=f"schema:{rel}",
            actions=[(_convert_one, [src, schema])],
            file_dep=[src],
            targets=[schema]
        )

def task_validate():
    for schema in ALL_SCHEMA_DIST.values():
        rel = schema.relative_to(SCHEMA)
        yield dict(
            name=f"schema:{rel}",
            actions=[(_validate_one, [schema])]
        )

def _convert_one(src: Path, dest: Path) -> bool:
    data = None
    if src.suffix == "toml":
        data = tomli_w.load(src.open())
    elif src.suffix == "yaml":
        data = yaml.safe_load(src.open())
    else:
        return False
    text = json.dumps(data, indent=2, sort_keys=True)
    dest.write_text(text, encoding="utf-8")

def _validate_one(schema_path: Path, instance_path: Path|None=None) -> bool:
    schema = json.loads(schema_path.read_text(encoding="utf-8"))
    validator_cls: Type[jsonschema.Validator] = jsonschema.validators.validator_for(schema)
    validator_cls.check_schema(schema)

    if instance_path:
        validator = validator_cls(schema, format_checker=validator_cls.FORMAT_CHECKER)
        instance = json.loads(instance_path.read_text(encoding="utf-8"))
        validator.validate(instance)
    
    return True

Running doit validate would:

  • ensure all of the .schema.json come into existence, as each validate task depends on the output of a build task
  • ensure all of the schema are actually valid schema

Provided the above is true, running doit validate again wouldn't do anything.

This approach would be extended to:

  • format with e.g. prettier, taplo, ruff
  • lint as above, but also yamllint, etc.
  • dist initially just pyproject-build ., but eventually many more
  • docs with sphinx is fine, but the existing schema are... lacking
    • jsonschema2md is a bit better
    • but maybe jinja2 templates are the way to go
    • and eventually some interactive jupyterlite site seems relevant
  • check with pytest-check-links
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant