Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Modernizing gitlint's build and test tooling: discussion #378

Closed
jorisroovers opened this issue Nov 22, 2022 · 27 comments
Closed

Modernizing gitlint's build and test tooling: discussion #378

jorisroovers opened this issue Nov 22, 2022 · 27 comments
Labels
development Issues that are not user-facing but related to gitlint development discussion
Milestone

Comments

@jorisroovers
Copy link
Owner

I’d like to get some input on best-practices around python build/test/project management/CI tooling. I want to streamline the gitlint developer and release experience to:

  1. Reduce release overhead (eventually: more frequent and smaller releases)
  2. Clean up some unused and outdated code (most notably in ./run_tests.sh)
  3. Get gitlint up-to-date with recent python test, build and deploy best-practices (and personally learn a few things along the way)

Background

Test running

  • Today gitlint uses a custom shell script run_tests.sh which wraps most other test tools like pytest, pylint, black, coverage, and radon, as well as a build (python setup.py sdist bdist_wheel) test.
  • run_tests.sh also has some other utility functions like clean and some outdated python and docker environment management related code that is not used anymore.
  • I’d like to get rid of run_tests.sh entirely and move to a python tool or script.
  • I’d like this tool to do python environment management (like tox), so it becomes easier to run these tools against multiple python environments locally. Currently we use a test matrix in CI, perhaps this can be (partially) handled by the testing tool instead (so it’s independent of GHA).

Builds

  • gitlint consists of 2 python packages gitlint (root dir) and gitlint-core (gitlint-core dir), whatever tool/script we use should be able to support multiple python packages in a single repo / root dir.
  • I’d like to move away from setup.py towards declarative config and pyproject.toml specifically. The build system should be PEP517 compliant.
  • I’d like to remove the various requirements*.txt files in favor of using extra_requires for test, docs, etc.
  • A publish command for publishing to PyPI is a nice to have.
  • An immediate goal is to have a fully automated CI pipeline that publishes to Test PyPi on every commit. Other than writing the required pipeline that does this, this also requires generating a unique version for every commit - I assume that would probably be based on the git hash of main (seems to be the norm in my experience). Sample version string: 0.19.dev-abc123. I know some build systems or plugins help with this type of version management - that would be a nice to have. Currently we maintain gitlints version manually in 2 files: setup.py and gitlint-core/gitlint/__init__.py (also used to display the version in the UI).

Candidates

  • I really like the idea of having a single project management tool that handles/wraps both the build and script running parts. This keeps the learning curve low and the developer experience good.
  • I’ve looked at pyinvoke before, but that was 5y ago. I don’t think pyinvoke is still a candidate, too much manual work and too customized - but it might be.
  • I have experience using poetry and like it, but they’ve also done some weird things - I know opinions are divided on it. I think @asottile put it very eloquently in his video. So I think poetry is out as well.
  • I know there’s a ton of other tools, I’ve been looking at a few of them: https://packaging.python.org/en/latest/key_projects/
  • Specific call-outs to pip-tools, pre-commit and tox which I think are all in the running but none of them solve the problem entirely (so some “glue” would be needed).
  • I noticed that mkdocs recently adopted hatch and that got me very intrigued. Right now that’s what I’m mostly leaning towards (at least to give it a try).

I’m planning to work on this in the next few weeks and would appreciate input on it. 🙏

Final Note: There’s many roads that lead to Rome, I think it’s likely there won’t be any consensus on this, in which case I’ll make a judgement call and just try something :-)

CC: @sigmavirus24, @asottile, @webknjaz, @ssbarnea, @andersk

@jorisroovers jorisroovers added discussion development Issues that are not user-facing but related to gitlint development labels Nov 22, 2022
@jorisroovers jorisroovers moved this to In Progress in gitlint Nov 22, 2022
@webknjaz
Copy link
Contributor

webknjaz commented Nov 22, 2022

I’d like to get some input on best-practices around python build/test/project management/CI tooling. I want to streamline the gitlint developer and release experience to:

1. Reduce release overhead (eventually: more frequent and smaller releases)
...
3. Get gitlint up-to-date with recent python test, build and deploy best-practices (and personally learn a few things along the way)

I like having a separate job for building the artifacts (sdist and wheel) but I haven't gotten reusable bits for this :( Still managing such setups manually.
The tests would depend on that job and test the actual artifacts (saved as GHA artifacts by the first job) instead of a Git checkout, and then, the publishing stage could consist of two jobs — one for PyPI and the other for TestPyPI. TestPyPI could conditionally auto-publish (the condition would be pushes to the default branch) and PyPI would be conditioned with the release trigger and an environment. The environments feature allows you to have a protection for the jobs with an environment set — you can set it up to require manual approvals + a cooldown timer before it can begin. You'll also be able to store secrets under environments, as opposed to having them repo-global. I also use environments for the TestPyPI job but w/o an approval required.
I've recently started extending the simple “publish to PyPI” action with creating GitHub Releases coupled with GitHub Discussions after such publishing.

Note that publishing from the default branch will require version numbers that are unique for each commit push. This means that instead of hardcoding the version, you'd need a mechanism for generating them from Git. I normally use the setuptools-scm plugin for this but there's others.

Background

Test running

* Today gitlint uses a custom shell script [run_tests.sh](https://github.com/jorisroovers/gitlint/blob/5ffc227a7169602e8fd916097bd1572a29b54cc5/run_tests.sh) which wraps most other test tools like `pytest`, `pylint`, `black`, `coverage`, and `radon`, as well as a build (`python setup.py sdist bdist_wheel`) test.

* `run_tests.sh` also has some other utility functions like `clean` and some outdated python and docker environment management related code that is not used anymore.

* I’d like to get rid of `run_tests.sh` entirely and move to a python tool or script.

* I’d like this tool to do python environment management (like `tox`), so it becomes easier to run these tools against multiple python environments locally. Currently [we use a test matrix in CI](https://github.com/jorisroovers/gitlint/blob/5ffc227a7169602e8fd916097bd1572a29b54cc5/.github/workflows/checks.yml#L16-L19), perhaps this can be (partially) handled by the testing tool instead (so it’s independent of GHA).

It is common to use tox or nox for this. You can use them as regular command runners, not just for testing. In tox, you can list the configured environments via tox -av. People run pytest, python -m build and Sphinx docs builds like that. You can also configure some matrixes inside.
When having a test matrix in GHA, you can invoke tox -e py which will run your default testenv with the Python that's currently in use, this is how you map tox to GHA, typically. But locally, you can have tox set up to run all of the matrix factors by just invoking tox. Some people do this, I often just make it run testing under the current Python interpreter and let people request running more via explicit args.

Builds

* gitlint consists of 2 python packages `gitlint` (root dir) and `gitlint-core` (gitlint-core dir), whatever tool/script we use should be able to support multiple python packages in a single repo / root dir.

You'll need to keep two package definitions and just invoke python -m build for both dirs separately.

* I’d like to move away from `setup.py` towards declarative config and `pyproject.toml` specifically. The build system should be PEP517 compliant.

Today, probably all the available tools are compatible with this, including setuptools.

* I’d like to remove the various `requirements*.txt` files in favor of using `extra_requires` for `test`, `docs`, etc.

-1 on this — such requirements don't belong in the distribution package metadata that is consumable by the end-users. Can you imagine your typical non-contributor user running pip install gitlint[docs]? Why?
Go for specifics deps subsets for each of the envs, paired with corresponding constraints files, generated by pip-tools. I'm exploring even more granular techniques with per-tested-env constraint files but I won't be suggesting that at this time.

* A `publish` command for publishing to `PyPI` is a nice to have.

If you'll go GHA-way, you'll probably end up using my action https://github.com/marketplace/actions/pypi-publish, not really needing this.
Otherwise, it's just a call of twine upload. But yeah, it's possible to wrap that with tox.

* An immediate goal is to have a fully automated CI pipeline that publishes to Test PyPi on every commit. Other than writing the required pipeline that does this, this also requires generating a unique version for every commit - I assume that would probably be based on the git hash of `main` (seems to be the norm in my experience). Sample version string: `0.19.dev-abc123`. I know some build systems or plugins help with this type of version management - that would be a nice to have. Currently we maintain gitlints version manually in 2 files: [setup.py](https://github.com/jorisroovers/gitlint/blob/5ffc227a7169602e8fd916097bd1572a29b54cc5/setup.py#L26) and [gitlint-core/gitlint/__init__.py](https://github.com/jorisroovers/gitlint/blob/5ffc227a7169602e8fd916097bd1572a29b54cc5/gitlint-core/gitlint/__init__.py#L1) (also used to display the version in the UI).

As I mentioned above, you can use setuptools-scm or dunamai (probably). Not sure how non-setuptools ecosytems do this, but there's probably plugins for this too.
One thing to mention is that your suggested version would be invalid. Having it compatible with PEP 440 is a must, otherwise the uploads will be rejected.
setuptools-scm does version guessing based on the previous tag version (so make sure to checkout the whole Git repo in CI for such runs), incrementing the last (usually PATCH) version segment and adding a .dev999 with the distance to that last tag. It also adds a git hash in the form of +gabd83e. It is called a local version part. From the perspective of PEP 440 it's fine but you cannot upload such dists to PyPI — the uploads will be rejected. So for whenever you're preparing a dist for releasing to the package indexes, you'll have to select a version guessing mode that doesn't append the local version part.

* I have experience using `poetry` and like it, but they’ve also done some weird things - I know opinions are divided on it. I think @asottile [put it very eloquently in his video](https://www.youtube.com/watch?v=Gr9o8MW_pb0). So I think poetry is out as well.

I haven't seen Anthony's video, but I also have bad experience with Poetry, including the contribution attempts. I feel like the rest of the PyPA has a similar opinion.

* Specific call-outs to pip-tools, pre-commit and tox which I think are all in the running but none of them solve the problem entirely (so some “glue” would be needed).

Yes, that's correct.

* **I noticed that mkdocs recently adopted [hatch](https://hatch.pypa.io/latest/) and that got me very intrigued. Right now that’s what I’m mostly leaning towards (at least to give it a try).**

Yes, I'm seeing a lot of projects migrating to Hatch recently. And it's now hosted under the PyPA umbrella. I haven't looked too deep into it, but since many people I respect like it, I'm inclined to blindly trust their judgement. I don't know the state of their SCM integration (which is important for the points above) but I saw the maintainer of setuptools-scm interacting with this project, so I assume that it's probably supported already. (I tried to contribute a similar feature to Poetry like half a decade ago, the PR never got accepted and the plugin system capable of facilitating this only appeared recently, for comparison).

setuptools is also much better nowadays, then it used to be. I still use it a lot, one benefit is huge adoption, familiarity and that the downstream packagers know how to work with it. Not that it matters much with PEP 517 being a common interface for calling any build backends.

@ssbarnea
Copy link

Nice to ask, but keep in mind that "single" is more of an ideal and that the more people you ask the more solutions you will get suggested. Still, much better than not asking.

After a decade of building python packages, my personal take is:

  • avoid fancy new tools, or you will pay the cost of being an early adopter
  • most popular are likely the most stable
  • setuptools-scm for versioning
  • pip (not poetry, hatch), optional pip-tools for more advanced stuff
  • PEP-517 is a must
  • tox
  • pre-commit for linting
  • GHA for CI, pick one or two projects that use it and that have very active maintenance, steal from them.
  • For docs I used to recommend sphinx but I have to confess that I am actively looking for a way to avoid in the future and go for something pure-md. On my to-review list I have mkdocs and docusaurus.

And last bit: avoid trying to change/fix too many things at one, progress here is done with small increments, or you endup moving too much at once and likely fail under the burden.

@ofek
Copy link

ofek commented Nov 24, 2022

Hatch maintainer here! Let me know if I can help

@ofek
Copy link

ofek commented Nov 24, 2022

Regarding setuptools, a few benefits of switching to Hatchling:

  • Supports static analysis tools for editable installations by default whereas the new setuptools does not i.e. it requires enabling an option for IDEs to work.
  • Uses Git-style glob patterns rather than stdlib globs to better match user expectations.
  • Allows easy and explicit control of what gets shipped in the sdist target whereas with setuptools it's an objectively confusing mix of conditional wheel options and a MANIFEST.in file.

@ssbarnea
Copy link

Yesterday I managed to migrate the first (minor) project to pure pyproject.toml via setuptools-scm. I will soon do it for the rest but I will need to allow few downstream packagers (rpm,deb,arch) time to catch-up, as I am almost sure that some of them will struggle to read the dependencies, this being the reason why my first attempt to do it 6mo ago with another project went on hold.

@ofek
Copy link

ofek commented Nov 25, 2022

Why pip in build-system.requires?

@ssbarnea
Copy link

I think that that this sections needs cleaning, especially as newer versions of setuptools-scm no longer need archive. I just did not had time to clean it.

@ofek
Copy link

ofek commented Nov 25, 2022

Have you tried hatch-vcs? Example: https://github.com/psf/black/blob/main/pyproject.toml

@webknjaz
Copy link
Contributor

Why pip in build-system.requires?

The same question about wheel too.

@ofek
Copy link

ofek commented Nov 25, 2022

The same question about wheel too.

Yeah it's automatic. That wheel is required for setuptools is another big issue:

@jorisroovers
Copy link
Owner Author

jorisroovers commented Dec 12, 2022

Thanks for all your inputs - definitely insightful. Really appreciated!

I’m going to give hatch a shot, I’ve implemented the basic setup in #384. I really like hatch 🫶, the end result is very elegant IMO. I’ll leave the PR open for a few days to gather input before merging it.

I haven’t implemented the CI deployment too Test PyPi on every commit yet, but I can see a few ways to do that (potentially using https://github.com/ofek/hatch-vcs) - I’ll do that next.

@jorisroovers
Copy link
Owner Author

jorisroovers commented Jan 17, 2023

Ok, I’ve finally come around to implementing hatch-vcs in #410, works well! I’ll merge that PR soon.

One thing I ran into is that when trying to publish is that PyPI does not accept PEP440 local versions

# This is not accepted, since it contains a local version (the + and everthing after)
0.19.0.dev52+ga8eb527

# This does work
0.19.0.dev52

In practice, I believe this means I’ll have to explicitly tag every commit I wish to publish to (test)pypi. This makes me reconsider the idea of publishing on every commit, since it would basically mean having to (auto)tag every commit on main. While possible, polluting our tag list this way doesn’t feel right.

Why I wanted to do this in the first place:

  1. To lower the friction and effort of releases as much as possible because they’d be happening all the time. This includes automated post-release integration testing. The eventual goal would be to increase the user-facing release cadence as well (smaller, more frequent releases).
  2. To allow users to more easily install pre-release versions of gitlint, particularly interesting for when a bug is fixed (or PR merged) for which they normally would need to wait until the next release. While technically already possible by adding a dependency pointing to git, it’s less user friction to be able to just specify gitlint==0.19.0.dev52.

Based on this, I’m now thinking to not auto-publish on every commit but to just create a pipeline that does tagging and publishing on demand. This could then still be put on a cron job that auto-publishes every so often (2 weeks?).

Appreciate if anyone has additional thoughts on this!

@ssbarnea
Copy link

GHA workflow scheduled release, not cron running under someone's desk. It even has cron syntax!

@jorisroovers
Copy link
Owner Author

GHA workflow scheduled release, not cron running under someone's desk. It even has cron syntax!

Yes! That's what I meant :)

@ssbarnea
Copy link

@jorisroovers Sorry, I had to ask,... you never know what people really do these days.

@RonnyPfannschmidt
Copy link

in iniconfig i use https://github.com/pytest-dev/iniconfig/blob/main/.github/workflows/deploy.yml#L43 to ensure test-pypi uploads

@RonnyPfannschmidt
Copy link

alternatively a version scheme that leaves out the local node on master/main could be added as well

@jorisroovers
Copy link
Owner Author

alternatively a version scheme that leaves out the local node on master/main could be added as well

Yes, I was thinking about this but doubted a bit whether that was a clean solution and whether any breaking scenarios exist. Also would need to have a look at how to override the hatch-vcs versioning scheme, not sure if that can be done without resorting to too much patching and band-aids.

Might explore this now though, thanks for the suggestion 🙏

@RonnyPfannschmidt
Copy link

If the preexisting version schemes don't fit a new one can be added upstream or as plugin

@jorisroovers
Copy link
Owner Author

in iniconfig i use https://github.com/pytest-dev/iniconfig/blob/main/.github/workflows/deploy.yml#L43 to ensure test-pypi uploads

This was a great tip, thanks @RonnyPfannschmidt!

I ended up using it to implement the publish-release workflow that now gets triggered automatically on every commit to main.

After the gitlint and gitlint-core are published to test.pypi.org, we also automatically run integration tests (test-release workflow) on the newly published packages.

Both the publish-release and test-release workflows can also be triggered manually.

After working out a few issues (e.g. PYPI mirror replication delay), this has been running smoothly now for the last few commits.

I think I’ll switch to publishing to the regular PyPI (i.e. pypi.org, not test.pypi.org) in the near future.


This outcome is exactly what I wanted, very happy with the end result here! Thanks everyone for your input (and keep any suggestions coming)! 🫶

@jorisroovers
Copy link
Owner Author

I think I’ll switch to publishing to the regular PyPI (i.e. pypi.org, not test.pypi.org) in the near future.

Ok, that’s in effect now 🙂
https://pypi.org/project/gitlint/#history


Topic switch: hatch environment dependencies and dependabot

@ofek, something I’d like to have is dependabot support for dependencies in hatch environments, e.g. hatch.envs.test.dependencies.

gitlint/pyproject.toml

Lines 83 to 92 in 7b0c255

dependencies = [
"gitlint-core[trusted-deps] @ {root:uri}/gitlint-core",
"black==22.10.0",
"pytest==7.2.0",
"pytest-cov==4.0.0",
"python-coveralls==2.9.3",
"ruff==0.0.244",
"radon==5.1.0",
"pdbr==0.7.5; sys_platform != \"win32\"",
]

I’ve read dependabot/dependabot-core#3290, but AFAICT, that only pertains to top-level dependencies and optional-dependencies (which works well for gitlint). I couldn’t find any existing issue for environment-level dependency support within the hatch or dependabot projects, although I’d be surprised if I’m the first one asking?

I also had a quick glance at hatch-requirements-txt with the idea of splitting the test dependencies back out in a test-requirements.txt file so dependabot can pick them up. However, this feels like a step backwards and I think hatch-requirements-txt actually only supports project-level dependencies too (so this wouldn’t work).

Is this something you think will eventually be supported, or are there workarounds available? Thanks!

jorisroovers added a commit that referenced this issue Mar 7, 2023
This release was primarily focussed on modernizing gitlint's build and test
tooling (details: #378).

General

    Python 3.6 no longer supported (EOL since 2021-12-23) (#379)
    This is the last release to support the sh library (used under-the-hood to
    execute git commands) by setting GITLINT_USE_SH_LIB=1. This is already
    disabled by default since v0.18.0.

Features

    Allow for a single commit in the --commits cmd-line param (#412)
    Gitlint now separates FILE_ENCODING (always UTF-8) from TERMINAL_ENCODING
    (terminal dependent), this should improve issues with unicode. Use
    gitlint --debug to inspect these values. (#424)

Bugfixes

    ignore-by-author-name crashes without --staged (#445)
    Various documentation fixes (#401, #433) - Thanks scop

Development

    Adopted hatch for project management (#384). This significantly improves
    the developer workflow, please read the updated CONTRIBUTING page.
    Adopted ruff for linting, replacing pylint (#404)
    Gitlint now publishes dev builds on every commit to main (#429)
    Gitlint now publishes a latest_dev docker image on every commit to
    main (#451) (#452)
    Dependencies updated
    Many improvements to the CI/CD worfklows
    Improve unit test coverage (#453)
    Integration test fixes on windows (#392, #397)
    Devcontainer improvements (#428)
    Removal of Dockerfile.dev (#390)
    Fix most integration tests on Windows
    Fix Windows unit tests (#383)
    Introduce a gate/check GHA job (#375)

Full Release details in CHANGELOG.md.
jorisroovers added a commit that referenced this issue Mar 7, 2023
This release was primarily focussed on modernizing gitlint's build and test
tooling (details: #378).

General

    Python 3.6 no longer supported (EOL since 2021-12-23) (#379)
    This is the last release to support the sh library (used under-the-hood to
    execute git commands) by setting GITLINT_USE_SH_LIB=1. This is already
    disabled by default since v0.18.0.

Features

    Allow for a single commit in the --commits cmd-line param (#412)
    Gitlint now separates FILE_ENCODING (always UTF-8) from TERMINAL_ENCODING
    (terminal dependent), this should improve issues with unicode. Use
    gitlint --debug to inspect these values. (#424)

Bugfixes

    ignore-by-author-name crashes without --staged (#445)
    Various documentation fixes (#401, #433) - Thanks scop

Development

    Adopted hatch for project management (#384). This significantly improves
    the developer workflow, please read the updated CONTRIBUTING page.
    Adopted ruff for linting, replacing pylint (#404)
    Gitlint now publishes dev builds on every commit to main (#429)
    Gitlint now publishes a latest_dev docker image on every commit to
    main (#451) (#452)
    Dependencies updated
    Many improvements to the CI/CD worfklows
    Improve unit test coverage (#453)
    Integration test fixes on windows (#392, #397)
    Devcontainer improvements (#428)
    Removal of Dockerfile.dev (#390)
    Fix most integration tests on Windows
    Fix Windows unit tests (#383)
    Introduce a gate/check GHA job (#375)

Full Release details in CHANGELOG.md.
@jorisroovers
Copy link
Owner Author

Ok, I've created pypa/hatch#775 for my question on dependabot and hatch environment dependencies.

I'm closing this issue then.

For reference, the entire new development process is documented at
https://jorisroovers.com/gitlint/contributing/

Thanks everyone for your input, was invaluable 🫶

@github-project-automation github-project-automation bot moved this from In Progress to 0.19.0 in gitlint Mar 9, 2023
@jorisroovers jorisroovers added this to the 0.19.0 milestone Mar 9, 2023
@webknjaz
Copy link
Contributor

webknjaz commented Mar 9, 2023

in iniconfig i use pytest-dev/iniconfig@main/.github/workflows/deploy.yml#L43 to ensure test-pypi uploads

I like constructing an expected version variable at the beginning of my workflows. Merges to the main branch expect .dev versions, "publish requests" expect "public" versions and PRs+non-main branches allow the local bit. Then, when making the dists, I inject no-local into pyproject.toml and make Git pretend it hasn't changed (git update-index --assume-unchanged). To make sure various parts of the workflow are in sync, I assert for the version in the artifact names right away. As a bonus point, I use the sdist to run all (well, most of) the testing in CI using the tarball contents rather than a Git clone using https://github.com/re-actors/checkout-python-sdist. This lets me make sure that all the downstream testing bits are shipped in sdist.

@jorisroovers I've also made some comments in your older PRs since I've missed some of those.

@jorisroovers jorisroovers reopened this Mar 9, 2023
@jorisroovers
Copy link
Owner Author

@webknjaz Thanks for all the comments and PR 🙏

Let me work through them over the next week or so!

@jorisroovers
Copy link
Owner Author

@webknjaz Thanks for all the comments and PR 🙏

Let me work through them over the next week or so!

Bunch of new issues were created from this:

Closing this particular issue again now, we can track progress in the others. Thanks!

@ssbarnea
Copy link

Thanks for all the work on this, it clearly looks far better now. While the lack of tox is a red flag for me, mainly because it makes harder to contribute to the project) I love the direction.

I am happy that my mk tool was able to stop the executable in the root and expose it as a test command. Still, even with this no venv management involved and no way to only run linting, no way to run tests with a specific version of python.

Extra kudos for using mkdocs!

@jorisroovers
Copy link
Owner Author

no way to run tests with a specific version of python.

Still on my wish list. I've spend a bit of time getting this to work with hatch, but I was using asdf for managing python installations which was causing some issues. I've switched to rtx more recently, would have to retry again with that.

Extra kudos for using mkdocs!

Have been for years 💙 - that far predates this issue :)

I'm planning to adopt Material for Mkdocs and modernize the docs as part of the current release cycle though!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Issues that are not user-facing but related to gitlint development discussion
Projects
Status: 0.19.0
Development

No branches or pull requests

5 participants