Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Makefile: use grouped targets for recipes with multiple fixed outputs #6307

Merged

Conversation

whitslack
Copy link
Collaborator

See the section headed "Rules with Grouped Targets" on the Texinfo page (make)Multiple Targets.

Without this fix, Make does not know that these recipes unconditionally make all of their named targets regardless of which target triggers their execution, and Make will blissfully execute multiple instances of any such recipe in parallel, not only wasting CPU cycles but potentially producing incorrect results if the recipe is not atomic in its effects on the file system. With this fix, Make understands that it need only execute such a recipe once to make all of its targets.

In pursuit of the above, move and combine two redundant msggen recipes into the top-level Makefile, and populate its grouped targets from the subordinate Makefiles.

Changelog-None

@whitslack whitslack requested a review from cdecker as a code owner June 5, 2023 08:43
@rustyrussell
Copy link
Contributor

This makes perfect sense, yet completely broke CI.

It seems that contrib/pyln-testing/pyln/testing/node_pb2_grpc.py etc are grossly outdated(?) and yet don't have proper Makefile rules to regenerate them. If you rm it and make, you see it changes a great deal.

Leaving to @cdecker

@whitslack whitslack mentioned this pull request Jun 6, 2023
@whitslack
Copy link
Collaborator Author

grossly outdated

@rustyrussell: It may be because you recently switched to protobuf-python 4 (from 3) but didn't regenerate any of your generated sources. (Why do you keep generated sources under revision control anyway?) The newer protobuf compiler generates quite different code.

@whitslack
Copy link
Collaborator Author

yet don't have proper Makefile rules to regenerate them

@rustyrussell: I think you have rules for all of your protobuf-generated source files. node_pb2_grpc.py is covered by the rule for making $(GRPC_GEN). Maybe you didn't remake it because Make doesn't know that a change in the protobuf compiler version implies that the generated sources are outdated.

@whitslack whitslack force-pushed the Makefile-grouped-targets branch from ab36563 to 41c1430 Compare June 8, 2023 15:51
@whitslack
Copy link
Collaborator Author

I am trying to update the generated sources in such a way that they will not be changed by the CI check, but the current state of this repository imposes an impossible set of constraints:

  • pyproject.toml requires grpcio-tools = "1.54.0"
  • grpcio-tools-1.54.0 requires protobuf = ">=4.21.6,<5.0dev"
  • .github/scripts/setup.sh defines PROTOC_VERSION=3.15.8

Thus, the CI build is installing a version of protoc that does not meet the requirement of the grpcio-tools package that is in use.

The update to the generated sources in 41c1430 was generated using protobuf-python 4.22.3 and grpcio-tools-1.54.0, which are the same versions as are installed in CI, yet the check apparently still fails, though GitHub says the check failed 3 days ago, so I'm not sure whether a new CI run was actually triggered when I force-pushed.

@rustyrussell rustyrussell force-pushed the Makefile-grouped-targets branch from 41c1430 to e56828b Compare June 15, 2023 01:44
@rustyrussell
Copy link
Contributor

Rebased on master, now things should be regenerated...

@rustyrussell
Copy link
Contributor

And two more commits so things at least vaguely work as expected!

@rustyrussell rustyrussell force-pushed the Makefile-grouped-targets branch from ee26591 to 31777e1 Compare June 15, 2023 04:49
@whitslack
Copy link
Collaborator Author

protoc failed: Unknown flag: --experimental_allow_proto3_optional

🥴

@rustyrussell
Copy link
Contributor

Yeah, I have no idea either? I've removed that flag, YOLO!!!

@rustyrussell
Copy link
Contributor

rustyrussell commented Jun 15, 2023

Nope. Looks like this crutch has been removed, and the proto files need fixing? @cdecker?

@whitslack
Copy link
Collaborator Author

Attempting to pose this question with the least amount of snark I can muster: do the CLN devs have a local development environment that mimics the CI environment? This iterative process is looking worryingly like "programming by accident", where you're taking more-or-less haphazard stabs at fixing a problem without having a comprehensive understanding of the problem space. Iterating configuration permutations until your CI tests pass isn't a good development methodology because it leads to extreme fragility.

@rustyrussell
Copy link
Contributor

Attempting to pose this question with the least amount of snark I can muster: do the CLN devs have a local development environment that mimics the CI environment? This iterative process is looking worryingly like "programming by accident", where you're taking more-or-less haphazard stabs at fixing a problem without having a comprehensive understanding of the problem space. Iterating configuration permutations until your CI tests pass isn't a good development methodology because it leads to extreme fragility.

This is an excellent, and thought-provoking point!

But this is a little unfair in this case. It's important to note that CI was failing because it checks the status of a bunch of files we don't even use. So my care factor here is v. v. low. Moreover, I knew I would (and did!) toss this to Christian once he woke, whose code it was for a more serious diagnosis once it got roughly in shape.

I don't have a local CI instance, but I have a build machine which runs the entire test suite, because most of our CI problems are classically:

  1. The GH workflow files, or
  2. The fact that CI is much slower than most dev machines.

Mimicking the GH env is something of a moving target, and inadvisable unless we're actually having trouble replicating something and the latency matters: pushing crap to GH to test it is guaranteed to work. I've seen project where developers start replicating the CI locally rather than testing locally, and adds fragility: I far prefer developers test on their local platforms with idiosyncrasies (my laptop and build box are different Ubuntu versions for a reason, and the CI is YA version).

Getting CI working isn't the aim of the project, and it's important not to lose sight of that, or spend too much time on it.

@whitslack
Copy link
Collaborator Author

whitslack commented Jun 16, 2023

Thank you for the very fair reply.

I don't have a local CI instance, but I have a build machine which runs the entire test suite

And yet you have previously tagged a release (v0.12.1) that didn't even pass your unit test suite, let alone your integration test suite. 🤔 Maybe your process has tightened up since then.

Getting CI working isn't the aim of the project, and it's important not to lose sight of that

And yet you claimed that my proposed improvement to your build process "makes perfect sense yet completely broke CI" and declined to merge it for the v23.05.1 tag, ostensibly for that reason. It seems to me that your CI was already broken when you switched to protobuf-python 4 but neglected to regenerate your generated sources.

Again, why are your generated sources under revision control? They take about a second to generate, so nothing is gained by checking them in. You say you want testing on platforms with different configurations. Well, Gentoo rarely has the same versions of the protobuf components in its official ebuild repository as you're requiring. I've had to backport older versions (and write ebuilds for newer versions!) just so I could regenerate your generated sources without introducing irrelevant changes attributable solely to different protobuf versions. It would be less fragile if the generated sources weren't checked in at all.

@rustyrussell
Copy link
Contributor

Thank you for the very fair reply.

Thoughtful feedback deserves thought!

I don't have a local CI instance, but I have a build machine which runs the entire test suite

And yet you have previously tagged a release (v0.12.1) that didn't even pass your unit test suite, let alone your integration test suite. thinking Maybe your process has tightened up since then.

I don't remember that? We should definitely run proper CI on test releases (rather than having GH test it as if it's going to merge it into master).

Getting CI working isn't the aim of the project, and it's important not to lose sight of that

And yet you claimed that my proposed improvement to your build process "makes perfect sense yet completely broke CI" and declined to merge it for the v23.05.1 tag, ostensibly for that reason.

Sorry, no, I didn't merge it because it didn't fix any known problem! My threshold for point releases is pretty high: must fix observed or nasty issues.

It seems to me that your CI was already broken when you switched to protobuf-python 4 but neglected to regenerate your generated sources.

No, the build process was broken (not regenerating those files when needed, even after distclean). CI caught it because it explicitly checks checked in generated files haven't changed after a build, so when something else triggered it, boom...

Again, why are your generated sources under revision control? They take about a second to generate, so nothing is gained by checking them in. You say you want testing on platforms with different configurations. Well, Gentoo rarely has the same versions of the protobuf components in its official ebuild repository as you're requiring. I've had to backport older versions (and write ebuilds for newer versions!) just so I could regenerate your generated sources without introducing irrelevant changes attributable solely to different protobuf versions. It would be less fragile if the generated sources weren't checked in at all.

Pain for devs, sure. But less pain for those building the source: they are screwed if they have too old (and surely maybe one day, too new!) protobuf compiler. Like people running the latest Ubuntu LTS release.

Our general trend has been to check in fewer generated files over time: we used to check in man pages and things so you didn't need python, but that indeed was such a PITA we eliminated it.

I hope we can indeed remove this. But I don't think we can do it today :(

@rustyrussell
Copy link
Contributor

I am trying to update the generated sources in such a way that they will not be changed by the CI check, but the current state of this repository imposes an impossible set of constraints:

* `pyproject.toml` requires `grpcio-tools = "1.54.0"`

* `grpcio-tools-1.54.0` requires `protobuf = ">=4.21.6,<5.0dev"`

* `.github/scripts/setup.sh` defines `PROTOC_VERSION=3.15.8`

OK, so, turns out these numbers are not comparable. protobuf v>=4 in Python is still using protobuf v3, which is fine?

I was completely confused, too, but this seems to be the case. The compiler version is in fact now date-based, there is no PROTOC v 4...

Thus, the CI build is installing a version of protoc that does not meet the requirement of the grpcio-tools package that is in use.

This may be true, but if so it's hard to tell. I can not find any mention of what protoc might be required for a given Python protobuf lib :(

The update to the generated sources in 41c1430 was generated using protobuf-python 4.22.3 and grpcio-tools-1.54.0, which are the same versions as are installed in CI, yet the check apparently still fails, though GitHub says the check failed 3 days ago, so I'm not sure whether a new CI run was actually triggered when I force-pushed.

@rustyrussell rustyrussell force-pushed the Makefile-grouped-targets branch from 3ab321a to 7dd0239 Compare June 17, 2023 05:56
@whitslack
Copy link
Collaborator Author

OK, so, turns out these numbers are not comparable. protobuf v>=4 in Python is still using protobuf v3, which is fine?

Not on Gentoo, it isn't. dev-python/protobuf-python-4.21.9 (the newest in the main Gentoo repo at present) requires dev-libs/protobuf:0/32, and dev-libs/protobuf-3.20.3 has SLOT="0/31", meaning it doesn't satisfy the subslot dependency, whereas dev-libs/protobuf-21.9 has SLOT="0/32", meaning it does satisfy the subslot dependency. In short, protobuf-python 4.x cannot be installed simultaneously with protobuf 3.x. I had to manually force installation of the older protobuf to test, and Portage was then screaming at me that my system was broken. Of course, it's entirely possible that Gentoo is imposing stricter requirements than upstream actually does.

I was completely confused, too, but this seems to be the case. The compiler version is in fact now date-based, there is no PROTOC v 4...

The Python implementation is in the same GitHub repo as the C++ library. Versioning of the C++ library has dropped the leading "4.", while versioning of the Python module has not, but otherwise protobuf x.y corresponds to protobuf-python 4.x.y.

It may in fact be possible to install non-corresponding versions of protobuf and protobuf-python, but then I'm not sure if the Python module's C++ runtime backend will necessarily work correctly. 😕

@whitslack
Copy link
Collaborator Author

And yet you have previously tagged a release (v0.12.1) that didn't even pass your unit test suite, let alone your integration test suite. thinking Maybe your process has tightened up since then.

I don't remember that? We should definitely run proper CI on test releases (rather than having GH test it as if it's going to merge it into master).

FWIW, it was a case where cln-grpc/src/test.rs had neglected to rename msatoshi to amount_msat, thus causing the Rust test suite to fail. I had to sed that fix in my Gentoo ebuild for 0.12.1 to get the test suite to pass.

@rustyrussell
Copy link
Contributor

OK, so, turns out these numbers are not comparable. protobuf v>=4 in Python is still using protobuf v3, which is fine?

Not on Gentoo, it isn't. dev-python/protobuf-python-4.21.9 (the newest in the main Gentoo repo at present) requires dev-libs/protobuf:0/32, and dev-libs/protobuf-3.20.3 has SLOT="0/31", meaning it doesn't satisfy the subslot dependency, whereas dev-libs/protobuf-21.9 has SLOT="0/32", meaning it does satisfy the subslot dependency. In short, protobuf-python 4.x cannot be installed simultaneously with protobuf 3.x. I had to manually force installation of the older protobuf to test, and Portage was then screaming at me that my system was broken. Of course, it's entirely possible that Gentoo is imposing stricter requirements than upstream actually does.

I was completely confused, too, but this seems to be the case. The compiler version is in fact now date-based, there is no PROTOC v 4...

The Python implementation is in the same GitHub repo as the C++ library. Versioning of the C++ library has dropped the leading "4.", while versioning of the Python module has not, but otherwise protobuf x.y corresponds to protobuf-python 4.x.y.

It may in fact be possible to install non-corresponding versions of protobuf and protobuf-python, but then I'm not sure if the Python module's C++ runtime backend will necessarily work correctly. 😕

Ok, then Gentoo seems like the tightest restriction, so let's follow that. We will match the two versions precisely: we can probably do it by parsing the Python protobuf version and converting it? Let me see what that looks like tomorrow...

@rustyrussell rustyrussell force-pushed the Makefile-grouped-targets branch from 428a26a to 04d4b25 Compare June 19, 2023 02:32
@rustyrussell
Copy link
Contributor

... then rebased on master which also played with Python deps....

@rustyrussell
Copy link
Contributor

... and rebased again on master, now it has some CI fixes...

@rustyrussell rustyrussell force-pushed the Makefile-grouped-targets branch from 04d4b25 to 84b41a7 Compare June 20, 2023 10:43
@rustyrussell
Copy link
Contributor

More flakes, see #6356

@rustyrussell rustyrussell force-pushed the Makefile-grouped-targets branch from 84b41a7 to 2c648b1 Compare June 21, 2023 04:29
See the section headed "Rules with Grouped Targets" on the Texinfo page
`(make)Multiple Targets`.

Without this fix, Make does not know that these recipes unconditionally
make *all* of their named targets regardless of which target triggers
their execution, and Make will blissfully execute multiple instances of
any such recipe in parallel, not only wasting CPU cycles but potentially
producing incorrect results if the recipe is not atomic in its effects
on the file system.  With this fix, Make understands that it need only
execute such a recipe once to make all of its targets.

In pursuit of the above, move and combine two redundant msggen recipes
into the top-level Makefile, and populate its grouped targets from the
subordinate Makefiles.

Changelog-None
…/ in distclean, and rebuild by default.

And add contrib/pyln-testing/pyln/testing/grpc2py.py since we didn't previously know
how to build it!

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
And re-ran poetry update which updated the lock file ofc.
Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
@rustyrussell rustyrussell force-pushed the Makefile-grouped-targets branch from 2c648b1 to 17f886e Compare June 22, 2023 03:54
@rustyrussell rustyrussell merged commit f0a9f11 into ElementsProject:master Jun 23, 2023
@whitslack whitslack deleted the Makefile-grouped-targets branch June 23, 2023 19:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants