-
Notifications
You must be signed in to change notification settings - Fork 6.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[vcpkg-ci] Current CI problems #21905
Comments
For 2, we always meet it. |
These both look like we have some kind of issue in how osx is talking to Azure Storage; the behavior of vcpkg right now during the "what should I build" analysis is to treat failures to get information about anything as "hash not present, please build it" Robert (@ras0219 / @ras0219-msft ) has been investigating off-and-on.
That one's new to me :(. |
That's an example where it was recorded as a failure correctly. Do you have an example where it was recorded as a success incorrectly? I observe we don't have a nightly CI with #21818 in it yet which is a likely candidate to have broken that one. |
Better look twice. E.g. x64-windows-static is green, but: |
Again, for long running CI tasks this might be mitigated by a time-to-live for negative cache results. |
Hmm the CI in #21925 also rebuilds stuff which is not qt related. (and i only touched qtbase and nothing more) .e.g. nghttp2:x64-osx: pass: 14ca0bbacfdc5f57be35dc249ad9d722fa2c1696220e88525ecdd85ea2a22818 |
This is in the "make the installation plan" step, not the "build the actual things" step |
This is "the world changed" (due to parallel jobs) between making the installation plan (checking cache) and actually installing the package (not checking ATM). |
qtbase triggers a lot of depending ports, and one of them pulls nghttp2 into the installation plan. |
Hmmmm... could the attempt to remove hashes that are in |
The thing that's really strange is that x64-windows reported correctly here |
I don't think so. The installation plan contains everything what is to be installed, but not why (unfortunately). So when a port fails, the reason of installation doesn't make a difference. |
Maybe this is the only native build for windows? |
IIUC |
AHHHH yes, that's the trigger. This even more suggests we need to get out of this postprocessing business |
I also have seen this when I run the |
But it seems that I have this problem even with Log
Here for example the fontconfig port is build twice. The output |
The binary cache sometimes randomly fail because the unzipping randomly fails with:
|
When using |
Another CI issue: Uploading artifacts for some ports fail regularly but more or less unnoticed, in particular port llvm:
Yes: almost 2 hour CI time. |
It is port vcpkg-ci-opencv: It pulls in llvm via halide only for x64-windows. |
https://docs.microsoft.com/en-us/troubleshoot/azure/general/request-body-large indicates it should fail with 4MB or higher, but then https://docs.microsoft.com/en-us/azure/storage/blobs/scalability-targets says:
Implying that not only is >4MB supported, it's encouraged to get higher performance? From https://docs.microsoft.com/en-us/rest/api/storageservices/put-blob, it looks like the maximum upload size is actually based on the requested service version:
So perhaps this can be fixed as simply as passing the right version string to the REST api. Otherwise, we may need to look at moving to using azcopy instead of curl for uploads. |
We already pass a high version value: https://github.com/microsoft/vcpkg-tool/blob/018d5684e78a856aee67789970633b23852836c9/src/vcpkg/base/downloads.cpp#L594-L598 |
Or the actual Azure SDKs. |
Can we perhaps skip vcpkg-ci-opencv:x64-windows via CI baseline until the artifact upload is fixed? |
I observed a similar issue with x64-linux: All other CI runs were finished in < 10 minutes but Linux CI took more than 2 hours although previous CI runs on the same PR on Linux took < 10 minutes as well. |
Looks I created a PR which reliably crashes on all triplets, even after updates: #23668 |
Not a crash: Now I see the reason. Errors on stdout are barely visible. Why can't the tool use stderr for errors? |
Oops, suddenly cache downloads fail for all agents/triplets, and uploads fail with 403 Forbidden: https://dev.azure.com/vcpkg/public/_build?definitionId=27&_a=summary |
SAS token expiry strikes again |
OK should be fixed 😅 |
Since the llvm:x64-windows issue is coming back: |
Right we certainly never expected a single package >5000MB. Given how intermittent that problem has been though I doubt hitting that limit is the actual cause of anything. |
You mean no impact? |
I don't mean "that llvm is accidentally building has no impact", I mean "the 5000MB limit is not likely to be the cause of llvm accidentally building" (because if it were going over a size limit like that, one would expect it to be always broken rather than intermittently broken). My total-shot-in-the-dark guess is that we have multiple agents trying to upload that cache entry at the same time, and Azure Storage is returning an error to the effect of "there are conflicting edits to this thing" or similar, and that is somehow causing us to rebuild the thing, which makes the problem continue ... which explains the problem going away after the build lab has a quiescent period. |
I think that there is no issue with concurrency here (or it would be seen with more ports), and the problem is permanent, not intermittent. What makes the
The llvm rebuilds in CI are often canceled by another push. That's why they are not counted properly by AZP analytics. Another downside of cancellation is that AZP retains no build logs. Still, there are interesting figures from AZP test analytics, with some filtering,14 days: With proper caching, ATM this affects the openimageio PR, #23918. Retained builds e.g. |
FTR the llvm upload error now produces this message in CI:
|
Ping.
The typo is still present, as is the cache upload problem for llvm, at least when needed to build mesa:x64-windows-static-md. |
I have a quick fix for one of those problems :P microsoft/vcpkg-tool#825 |
With rebuilding all uncached ports on |
@BillyONeal CI artifact caching is broken since friday. |
Yeah also encountered this problem. So the options are:
Personally I would switch to azcopy, but maybe only for binaries larger than 5GB so no extra tool is needed in 99% of the uploads. |
|
A collection of issues is currently occuring in vcpkg CI:
The ABI hash of the unexpected package rebuilds is unchanged. CI really rebuilds some dependencies, but doesn't rebuild the depending ports. Example: Rebuilds (unchanged) szip, giflib, some (!) boost ports, but doesn't rebuild (unchanged) hdf5, gdal which depend on the unexpected rebuilds.
osxgeneral:vcpkg ci
occassionally crashes early. example.Original only an osx issue. Now also on windows, [vcpkg-ci] Current CI problems #21905 (comment).
(llvm:x64-windows mitigated by [vcpkg-ci-llvm] Reduce llvm artifact to cacheable size #23896.)
The text was updated successfully, but these errors were encountered: