Bot missing recent LLVM releases #2531

jakirkham · 2024-05-04T04:47:41Z

It appears the bot started missing recent LLVM releases

The last bot release PR was 18.1.3:

llvmdev v18.1.3 conda-forge/llvmdev-feedstock#260

However the last two needed to be handled manually:

That said, the bot does appear to have detected the releases:

So maybe there is an issue cropping up in the next step

h-vetinari · 2024-05-04T05:54:11Z

Thanks John! Other feedstocks that build from the exact same tag & sources had varying degrees of success.

	18.1.3	18.1.4	18.1.5
llvmdev	bot ✅	manual ❌	manual ❌
clangdev	bot ✅	manual ❌	manual ❌
compiler-rt	bot ✅	manual ❌	no PR ❌
openmp	bot ✅	bot ✅	no PR ❌
lld	bot ✅	manual ❌	no PR ❌
flang	bot ✅	bot ✅	bot ✅
lldb	bot ✅	bot ✅	no PR ❌
mlir	bot ✅	manual ❌	no PR ❌
mlir-python-bindings	bot ✅	bot ✅	no PR ❌

h-vetinari · 2024-05-04T19:55:34Z

Not sure what changed, but the bot opened a bunch of PRs for 18.1.5 ~8 hours ago:

(only mlir still missing)

beckermr · 2024-05-04T20:00:28Z

I think the bot has become sentient and only opens PRs now after we make issues noting they are not there! 😱

J/k but indeed the behavior is puzzling.

Typically the bot will try to make a version PR three times. If those three times fail, then the PR is put in the backlog. PRs in the backlog are tried at random after any newly found versions are tried. So it could be they were backlog and the bot finally cleared them.

h-vetinari · 2024-05-04T21:55:58Z

In case of 18.1.4, the PRs didn't get opened over a period of 2 weeks, which is pretty long. I think the issue is that it fails at all the first three times. The tag with the sources is there for all feedstocks equally. I guess one possible explanation is that upstream created the tags, but left a longer-than-usual gap in uploading the llvm-project-{{ version }}.src.tar.xz sources, enough to fall into the 3 "failed three times" category?

Still, that doesn't really explain why almost all missing PRs now got opened after we were discussing it here - spooky! 😅

jakirkham · 2024-05-04T22:25:59Z

Lol 😂

Here are some other random guesses

Did the fix you made yesterday Matt potentially have an effect on LLVM and friends?

Perhaps another possibility is some dependency changes over time

Another thing of interest might be memory pressure. The LLVM (and Arrow) recipes are a bit more complicated. So may be using more resources than the usual resource light CI jobs have. Recognize there have been improvements made in various places (including conda-build) though don't know which of these fixes are out in releases. If we see more evidence of this, it might be worth profiling

Lastly recall there were some issues in the bot ~2weeks ago that got cleared out. IIRC the first of the missed LLVM version updates was around then

xhochy · 2024-05-06T08:17:44Z

@ytausch Can you also look at this? This is an issue near to the tooling you write.

jakirkham · 2024-05-31T18:22:50Z

Curious how things are looking a month later. Ok if we don't know. Just wanted to check back in 🙂

ytausch · 2024-05-31T20:08:55Z

I am currently pushing on decoupling some of the bot's code that will make not only the version check but also the migration itself (that seems to be failing here) run locally with debugging enabled, which will provide a sustainable solution for problems like this one.

For that reason, I did not prioritize to look into this manually so far. Let me know if you see this differently.

h-vetinari · 2024-06-07T06:05:05Z

Curious how things are looking a month later. Ok if we don't know. Just wanted to check back in 🙂

LLVM 18.1.6 worked fine (bot opened all relevant PRs); LLVM 18.1.7 got tagged >24h ago, but the official release was only ~7h ago.

Since we're generally relying on llvm-project-{{ 18.1.7 }}.src.tar.xz (which isn't generated by github but uploaded by the release manager), it's possible/likely that the bot started looking for a file that wasn't there after the tag appeared. Indeed, the status page lists the llvmdev update as failed with:

3.00 attempts - bot error

I guess this is somewhat unavoidable as long as upstream has a long enough gap between tagging and uploading the tarballs. The solutions I see are:

Retry more often (expensive for the bot infra)
Special-case LLVM feedstocks (probably not worth it)
Switch the LLVM recipes to use the github sources directly -- that way there cannot be a race condition.

I think the last approach might actually be the sanest one.

jakirkham · 2024-06-07T06:27:34Z

Given every project needs to wait in 6hr increments for updates. This seems like reasonable behavior from the bot so far

If there are ways to check for version updates more frequently than 6hrs, that seems like the best path for improvement (and is not specific to LLVM)

h-vetinari · 2024-06-07T06:50:12Z

If there are ways to check for version updates more frequently than 6hrs, that seems like the best path for improvement

I don't see how that changes anything - the bot will just go into a "max failure" state faster. It's the recovery period after having hit max retries that seems to take 2-3 weeks (which presumably is the thing that would be effective to reduce).

In any case, switching to github-generated sources should 100% fix this problem for LLVM (and we don't even have submodules to deal with, so no benefit to using the upstream tarballs).

jakirkham · 2024-06-07T07:02:02Z

The difference would be we would not need to wait another 6hrs once the source is available. We might wait 1hr or perhaps less. It also depends on whether we can move to something event-driven (as opposed to scraping-based)

However the downside with GitHub generated sources is they are dynamically generated on-demand. So their checksums can change between retrievals

h-vetinari · 2024-06-07T10:55:12Z

The difference would be we would not need to wait another 6hrs once the source is available.

The point is we cannot influence the delay between tag creation and when the tarballs are uploaded; this may well be 24h, so retrying more often in that times has no use whatsoever. The only option would be to distinguish somehow between tag and tarball availability, and not count it as a failure if the tag is there, but the tarball isn't. But that's just "more retries" by another name.

However the downside with GitHub generated sources is they are dynamically generated on-demand. So their checksums can change between retrievals

That basically never happens because everyone and their dog depends on them being stable. 😅

beckermr · 2024-06-07T11:01:24Z

You may be able to restrict how the bot searches for versions so that it doesn't find the tag before the tarball is uploaded. I think you'd want to only have it look for URLS and not use github's RSS feed.

beckermr · 2024-06-07T11:02:10Z

Also, I very much doubt it is feasible that the bot could respond to release events from projects in an event driven system.

jakirkham · 2024-06-07T18:45:43Z

However the downside with GitHub generated sources is they are dynamically generated on-demand. So their checksums can change between retrievals

That basically never happens because everyone and their dog depends on them being stable. 😅

On the contrary, this happens quite regularly

This affected us with the conda-build 24.5.0 release ( conda-forge/conda-build-feedstock#226 ) and conda before that ( conda-forge/conda-feedstock#228 (comment) ). There are well documented cases elsewhere ( https://github.com/orgs/community/discussions/45830 ). In fact when I have asked GitHub about stability with these in the past, they have noted they generate artifacts dynamically and run some tests, but checksums can change (so no guarantees). This issue has been going on for quite some time

The general movement (even from GitHub) is more validation around artifacts (not less). Here is a blogpost from GitHub last month on setting up artifact attestations, which provide even more information around the artifacts published beyond being stable (including an associated sha256 checksum). This of course requires stable artifacts produced once (not GitHub autogenerated releases)

Think we should consider carefully how we get are compiler source code and put a preference towards stable artifacts (ideally with more provenance data if possible)

jakirkham · 2024-06-07T18:52:23Z

Also, I very much doubt it is feasible that the bot could respond to release events from projects in an event driven system.

We have long wanted an event driven system ( #54 ). Including for version updates ( #54 (comment) )

Agree this may very well be a more substantial undertaking

That said, don't think we should rule out that possibility or associated discussion simply because of that. Specing it out would be the first step in creating a shovel ready project when someone interested shows up with resources interested in helping out

h-vetinari · 2024-06-07T20:11:03Z

On the contrary, this happens quite regularly

I'm aware of the cases you mention, and I still don't agree with "regularly". The first time GH changed the default compression level it broke the world (e.g. bazel recipes everywhere), and they reverted.

We're relying on GitHub generated tarballs in many hundreds of feedstocks, and I can count on one hand the unexplained hash changes that happened in the last couple of years in working across a similar number of feedstocks.

But even if a spurious change does happen, it is by far a smaller encumbrance than the bot tripping over itself and not opening PRs at all.

jakirkham · 2024-06-07T20:22:06Z

The frequency is not what is at issue. The unreliability is

For core infrastructure (like compilers), we should know reliably what it is produced from

h-vetinari · 2024-06-07T22:13:38Z

The frequency is not what is at issue. The unreliability is

For core infrastructure (like compilers), we should know reliably what it is produced from

We do know what it's produced from, i.e. the exact git tag. Whether the hash changes due to compression level or whatever else is completely irrelevant for provenance. Unless you are thinking about a scenario where github gets so compromised that someone can hijack the tarball generation, but that's not a realistic scenario to me (and we'd have much bigger problems then).

If you can solve the problem of the bot not opening PRs, whether through an event-based solution or some workaround in the bot infra, I'll happily switch back to the "official" tarballs (which, BTW, also aren't audited or signed). But it's not an option to have the bot regularly fail to issue PRs for this interrelated stack of feedstocks that are already a handful to maintain even with bot support.

ytausch · 2024-06-26T08:39:50Z

You may be able to restrict how the bot searches for versions so that it doesn't find the tag before the tarball is uploaded. I think you'd want to only have it look for URLS and not use github's RSS feed.

This would work, yes.

Making this configureable on a per-feedstock basis is probably not too complicated with an additional configuration option in the bot section of conda-forge.yml. However, I am still not really convinced why using the GitHub tarballs is a bad idea, at least for now.

jakirkham · 2024-06-26T19:05:48Z

Here's an example today of a GitHub autogenerated tarball having its checksum change

xref: conda-forge/cuda-python-feedstock#83 (comment)

h-vetinari · 2024-06-26T23:11:17Z

FWIW, since switching to GitHub tags across the LLVM feedstocks, PRs were opened without problems (and no hashing issues observed either)

ytausch · 2024-06-27T11:32:19Z

Here's an example today of a GitHub autogenerated tarball having its checksum change

xref: conda-forge/cuda-python-feedstock#83 (comment)

Hmm, it doesn't seem like this should happen, as GitHub did not announce a change like this. Currently, GitHub-generated source archives should be stable, and they intend to announce any changes to this within six months notice.

~~The checksum change in your example has another reason, I will comment it~~

Edit: Oops, you seem to be right. Probably it only happens very rarely that the hashes differ?

ytausch · 2024-06-27T12:00:17Z

Making this configureable on a per-feedstock basis is probably not too complicated with an additional configuration option in the bot section of conda-forge.yml. However, I am still not really convinced why using the GitHub tarballs is a bad idea, at least for now.

I just found out this feature exists already: bot.version_updates.sources

Will create PRs for the LLVM repos.

this should prevent regro/cf-scripts#2531 from happening again

jakirkham mentioned this issue May 4, 2024

llvmdev 18.1.5 conda-forge/llvmdev-feedstock#263

Merged

5 tasks

h-vetinari mentioned this issue Jun 7, 2024

llvmdev v18.1.7 conda-forge/llvmdev-feedstock#266

Merged

h-vetinari mentioned this issue Jun 7, 2024

clangdev v18.1.7 conda-forge/clangdev-feedstock#287

Merged

ytausch mentioned this issue Jun 27, 2024

Disable Version Sources on a Per-Feedstock Basis #2820

Closed

ytausch added a commit to ytausch/llvmdev-feedstock that referenced this issue Jun 27, 2024

restrict bot to RawURL version update source

8b148db

this should prevent regro/cf-scripts#2531 from happening again

ytausch mentioned this issue Jun 27, 2024

Revert to using Maintainer-Provided sources, configure the bot to only use the RawURL Version Source conda-forge/llvmdev-feedstock#275

Draft

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bot missing recent LLVM releases #2531

Bot missing recent LLVM releases #2531

jakirkham commented May 4, 2024

h-vetinari commented May 4, 2024

h-vetinari commented May 4, 2024

beckermr commented May 4, 2024

h-vetinari commented May 4, 2024

jakirkham commented May 4, 2024

xhochy commented May 6, 2024

jakirkham commented May 31, 2024

ytausch commented May 31, 2024

h-vetinari commented Jun 7, 2024

jakirkham commented Jun 7, 2024

h-vetinari commented Jun 7, 2024

jakirkham commented Jun 7, 2024

h-vetinari commented Jun 7, 2024

beckermr commented Jun 7, 2024

beckermr commented Jun 7, 2024

jakirkham commented Jun 7, 2024

jakirkham commented Jun 7, 2024

h-vetinari commented Jun 7, 2024

jakirkham commented Jun 7, 2024

h-vetinari commented Jun 7, 2024 •

edited

Loading

ytausch commented Jun 26, 2024

jakirkham commented Jun 26, 2024

h-vetinari commented Jun 26, 2024

ytausch commented Jun 27, 2024 •

edited

Loading

ytausch commented Jun 27, 2024 •

edited

Loading

Bot missing recent LLVM releases #2531

Bot missing recent LLVM releases #2531

Comments

jakirkham commented May 4, 2024

h-vetinari commented May 4, 2024

h-vetinari commented May 4, 2024

beckermr commented May 4, 2024

h-vetinari commented May 4, 2024

jakirkham commented May 4, 2024

xhochy commented May 6, 2024

jakirkham commented May 31, 2024

ytausch commented May 31, 2024

h-vetinari commented Jun 7, 2024

jakirkham commented Jun 7, 2024

h-vetinari commented Jun 7, 2024

jakirkham commented Jun 7, 2024

h-vetinari commented Jun 7, 2024

beckermr commented Jun 7, 2024

beckermr commented Jun 7, 2024

jakirkham commented Jun 7, 2024

jakirkham commented Jun 7, 2024

h-vetinari commented Jun 7, 2024

jakirkham commented Jun 7, 2024

h-vetinari commented Jun 7, 2024 • edited Loading

ytausch commented Jun 26, 2024

jakirkham commented Jun 26, 2024

h-vetinari commented Jun 26, 2024

ytausch commented Jun 27, 2024 • edited Loading

ytausch commented Jun 27, 2024 • edited Loading

h-vetinari commented Jun 7, 2024 •

edited

Loading

ytausch commented Jun 27, 2024 •

edited

Loading

ytausch commented Jun 27, 2024 •

edited

Loading