Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inform pants about 3rd party dependencies and constraints #5789

Merged
merged 9 commits into from
Nov 29, 2022

Conversation

cognifloyd
Copy link
Member

@cognifloyd cognifloyd commented Oct 27, 2022

Background

This is another part of introducing pants, as discussed in various TSC meetings.

Related PRs can be found in:

Overview of this PR

Our 3rd party dependencies will be in a lockfile that looks like lockfiles/st2.lock. But, before pants can generate that, we need to add several pieces of metadata including dependencies and python version constraints.

This PR focuses on registering all of our dependencies. Future PRs will add the python version constraints (aka interpreter constraints) and other metadata so that we can actually generate the lockfile.

The Third-party dependencies document (in pantsbuild docs) is especially helpful in understanding the contents of this PR.

Relevant Pants documentation

The purposes of *requirements.txt

Today, *requirements.txt conflates several purposes:

  1. record direct dependencies;
  2. constrain problematic transitive (indirect) dependencies; and
  3. lock everything to make CI and releases more reliable.

With pants, I think we should manage each of these purposes with separate files, leaving requirements.txt to only record our direct dependencies (purpose 1). These files are:

purpose today with pants introduced in
record direct deps *requirements.txt requirements-pants.txt ✔️ this PR
constrain transitive deps *requirements.txt lockfiles/st2-constraints.txt ✔️ this PR
lockfile requirements.txt lockfiles/st2.lock future PR

Today, the first two purposes are actually spread out across multiple files. In this chart, *requirements.txt includes: requirements.txt, fixed-requirements.txt, and **/in-requirements.txt.

The many requirements.txt files and the scripts that compile them to generate the "locked" version of the file are heavily interdependent. To avoid introducing another relationship on all of that infrastructure, I opted to create new files instead: requirements-pants.txt, st2-constraints.txt, and st2.lock (the lockfile has to be added in a follow-up PR).

requirements-pants.txt

requirements-pants.txt does not need to constrain any transitive dependencies or handle locking or pinning any dependencies, so the requirements in it should be broad as possible. If there are known code incompatibilities in our direct dependencies, then we should add a version range to that requirement. Otherwise, we let the tooling (pants+pex+pip) pick the best version given all the other constraints we through at it.

st2-constraints.txt

I put st2-constraints.txt under lockfiles/ since it's primary purpose is to narrow the range of valid versions that pex can choose from when creating the lockfile.

I went through all of our requirements to determine if they are direct or transient dependencies. I put the transient deps in st2-constraints.txt and included comments about:

  • which dependencies require the transient dependency
  • explanatory comments from fixed-requirements.txt (if any)

Everything in st2-constraints.txt is, ideally, temporary. Eventually, we should be able to remove the version constraints on the transient dependencies (deps of our direct deps). With the current *requirements.txt infra, that is very difficult to do, because it is not clear what is a direct dependency and what ended up in there to fix a build. Many of these issues get resolved upstream. Isolating these "exceptional" constraints makes it easier to comment them out and validate whether or not they are still needed.

So, I looked through the git history to determine how much has changed since each one was introduced. The MarkupSafe constraint was added fairly recently in #5581. Most of the other constraints seem to be either (a) no longer necessary, or (b) the only reason to include them is to "lock" or pin that dependency. Locking will be handled by lockfiles/st2.lock, so I left everything except MarkupSafe commented out. They are not needed to create a valid lockfile. Once we have pants running more of our infrastructure, we'll be able to uncomment/update any constraints that are actually still needed.

lockfiles/st2.lock

Again, pants needs more metadata before we can generate this, so it will be added in a follow-up PR. Plus, the lockfile is 4544 lines long, so I want to isolate introducing it in a PR that has basically nothing else in it.

BUILD

Pants needs additional metadata about some of our direct dependencies. Others, like pywinrm, should really only be used in one specific module of our code base (the winrm_runner in this case). Still others are really only needed for tests of a specific pack. These oddities are difficult to express in a simple requirements.txt file, but we can express them cleanly in BUILD files.

Many packages do not declare that they use setuptools because they assume that all virtualenvs will just have it. Pants only includes the bare minimum it needs to run each linter or test. If setuptools is not an explicit dep, then it is not included. For stevedore I told pants about this implicit dependency like this (I did the same for flex as well):

st2/BUILD

Lines 26 to 31 in e9df91d

# stevedore uses pkg_resources w/o declaring the dep
"stevedore": {
"dependencies": [
"//:reqs#setuptools",
]
},

Other packages, like tooz have runtime dependencies that pants can't easily discover. This block makes sure our default tooz backends are included in the relevant venv whenever we use tooz:

st2/BUILD

Lines 32 to 38 in e9df91d

# tooz needs one or more backends (tooz is used by the st2 coordination backend)
"tooz": {
"dependencies": [
"//:reqs#redis",
"//:reqs#zake",
]
},

Still other packages are installed with one name, but imported with another. pants uses a module_mapping to figure out which imports come from which 3rd party dependencies.

We use modules to tell pants that the pywinrm package provides the winrm module:

python_requirement(
name="winrm",
requirements=["pywinrm"],
modules=["winrm"],

And we use module_mapping to do the same for requirements defined in requirements-pants.txt (but only if they aren't in the default module_mapping defined in pants:

st2/BUILD

Lines 4 to 12 in e9df91d

module_mapping={
"gitpython": ["git"],
"python-editor": ["editor"],
"python-json-logger": ["pythonjsonlogger"],
"python-statsd": ["statsd"],
"sseclient-py": ["sseclient"],
"oslo.config": ["oslo_config"],
"RandomWords": ["random_words"],
},

planned changes to pants.toml

We can't adjust config in pants.toml until a future PR as some of that config is interdependent. Briefly, these are the settings that will be needed:

@cognifloyd cognifloyd added this to the pants milestone Oct 27, 2022
@cognifloyd cognifloyd requested review from winem, arm4b, nzlosh, rush-skills, amanda11 and a team October 27, 2022 03:46
@cognifloyd cognifloyd self-assigned this Oct 27, 2022
@pull-request-size pull-request-size bot added the size/L PR that changes 100-499 lines. Requires some effort to review. label Oct 27, 2022
pants.toml Outdated
Comment on lines 38 to 40
# ignore requirements.txt for now, preferring interrim files that are decoupled from
# legacy requirements files generation: requirements-pants.txt & lockfiles/st2-constraints.txt
"/requirements.txt",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

once we remove the legacy stuff, we can move requirements-pants.txt to requirements.txt.

@cognifloyd cognifloyd changed the title inform pants about 3rd party dependencies and constraints Inform pants about 3rd party dependencies and constraints Oct 27, 2022
BUILD Outdated
Comment on lines 5 to 12
"gitpython": ["git"],
"python-editor": ["editor"],
"python-json-logger": ["pythonjsonlogger"],
"python-statsd": ["statsd"],
"sseclient-py": ["sseclient"],
"oslo.config": ["oslo_config"],
"RandomWords": ["random_words"],

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Feel free to upstream these to Pants's default_module_mapping.py. Ditto the other python_requirement targets.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do. But we'll need to keep them here, until we upgrade to a version of pants that includes them.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Huh. It turns out gitpython is already available, so I can delete that already. Cool.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BUILD Outdated Show resolved Hide resolved
lockfiles/st2-constraints.txt Outdated Show resolved Hide resolved
pants.toml Outdated Show resolved Hide resolved
requirements-pants.txt Outdated Show resolved Hide resolved
Comment on lines +90 to +101
# was in fixed-requirements.txt, but not in requirements-pants.txt
# keyczar is used by a python2-only test.
#python-keyczar

###########

# not needed with switch to pytest
#nose
#nose-timer
#nose-parallel
#rednose

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be deleted then?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should delete them once we drop the old requirements infra. We still need to keep both sets of requirements in sync until that happens, so having a note about requirements that we don't need to include is helpful.

# fixed-requirements.txt:
# Fix MarkupSafe to < 2.1.0 as 2.1.0 removes soft_unicode
# >=0.23 was from jinja2
MarkupSafe<2.1.0,>=0.23
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by this file. The only requirement that doesn't seem to be commented out is MarkupSafe.
Why are all the other pinned versions, not uncommented?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because I'm not sure if they're still valid.

With requirements-pants.txt, I've essentially removed all of the versions that were "pinned"/"locked" in fixed-requirements.txt, because that is supposed to be handled by the lockfile. So, the requirements should be as broad as possible, and then the lockfile will lock them to a single version.

The same thing applies with st2-constraints.txt. I believe most of these transitive constraints were in fixed-constraints.txt in order to "pin"/"lock" them. But, some of them might be actual constraints due to known issues in those transitive deps. So, I documented all of them, but left them commented. I hope we can just delete them later.

As we enable other tools like pylint, we'll get additional feedback about whether or not we need some of these constraints. And, after we have the lockfile, we can start "exporting" a virtualenv with pants. That way we can run all of our tests using a virtualenv generated by the lockfile which will tell us if there are any functional issues where we still need these constraints.

Does that make sense?

I suppose we could also uncomment all of these constraints, and then comment one or more of them, if needed, to generate the lockfile.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another issue with our current approach vs using the pants+pex lockfile: generating the lockfile does not support the legacy pip resolver. Our current set of "pinned" dependencies does not work with the newer resolver (which is why we have a really old version of pip pinned), so we need to loosen our requirements to give the resolver room to figure out which versions are actually compatible.

I have successfully tested generating the lockfile with the requirements+constraints in this PR. I will try uncommenting the rest of these transitive dep constraints and see if that still resolves.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I uncommented many of the constraints and I can still generate the lockfile. I left only a few commented.

I really hope we can just delete all of these constraints, hopefully sooner rather than later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just tried to cleanup / standardize the documentation of all of these constraints. @amanda11 does that help?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the answers. I seem to remember there being some problems going up further on the pip version - but that maybe resolved. But if we do end up upping the pip version might need to find the old PR from when I last upped the version - as I seem to recall having to downgrade it.
But agree for now good to keep in synch.

Copy link
Contributor

@amanda11 amanda11 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

BUILD Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
external dependency pantsbuild size/L PR that changes 100-499 lines. Requires some effort to review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants