-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Source Index storage authentication failure #2974
Comments
@directhex was working on this. |
I'm gonna have to give a big ol' shrug emoji here. According to the Azure activity log, no config changes have been made to the storage account in the last week. Subsequent and prior builds were fine, e.g. https://dev.azure.com/dnceng/internal/_build/results?buildId=2463257&view=logs&j=316d5c15-0c50-544e-8051-e6b14a1ab674&t=657dbd0b-e93c-5e59-4ba8-77889f010efc Looking at it, it seems the common element is failed builds take longer than 30 seconds to do the upload. Possibly we're only getting an incredibly short lease, and it's expiring? https://dev.azure.com/dnceng/internal/_build/results?buildId=2462759&view=logs&j=316d5c15-0c50-544e-8051-e6b14a1ab674&t=657dbd0b-e93c-5e59-4ba8-77889f010efc took 36s The "good" builds are in the low twenties of seconds. |
Should we change BuildRetry to true for this? If retry is likely to succeed it would help lessen the impact |
IMHO adding either a retry, or a "mark failure as success so the build as a whole succeeds" is fine - absolute worst case is stale indices if an index upload fails. |
Sounds like a solid theory. @directhex maybe we can refresh our access token right before we upload, after we've tar'ed everything? Alternatively we split-out the step that creates the tarball so that we can login (or refresh) separately, right before we upload. Alternatively/additionally - we might also be able to speed up the tar'ing phase of this by replacing SharpZipLib with the API's @carlossanlop added - which I would expect are faster since they use our zlib under the hood vs SharpZipLib which has an non-vectorized managed implementation. |
Noting here that @directhex is working on a fix for this but I'm unable to assign this issue to him. |
This might accidentally be fixed by dotnet/arcade#14912 |
Seems like the hit count here is high enough to warrant us taking steps to mitigate it, @directhex. What do you suggest we do - add more logging, make a change to how we auth, retries, something else? |
A shot in the dark here, Did we add firewall rules to this storage account's network access? There were a lot of issues with the instructions on how to do that, since it didn't account for all possible IPs, and there's an entire scenario where firewall rules don't work for traffic from within the same region, so that could explain why it seems flaky. |
We did add the firewall rules which were recommended at one point, and yes I think it's feasible that it's a firewall issue. We don't have functional replacement guidance on network isolation for storage containers yet |
Would it be worth reverting those changes while we wait for the new updated guidance and for the NSP features to light up? Whether it helps with reliability or not would still help get some data. |
OK I've flipped the switch to "Public network access: Enabled from all networks" on the storage container in Azure. |
I guess I can't trust the 24h figure from yesterday, due to the Azure outage |
well, we're down to 0 hits in last 7 days so it seems to have worked? |
Build
https://dev.azure.com/dnceng/internal/_build/results?buildId=2462759
Build leg reported
No response
Pull Request
No response
Known issue core information
Fill out the known issue JSON section by following the step by step documentation on how to create a known issue
@dotnet/dnceng
Release Note Category
Release Note Description
Additional information about the issue reported
No response
Known issue validation
Build: 🔎 https://dev.azure.com/dnceng/internal/_build/results?buildId=2462759
Error message validated:
[at UploadIndexStage1.Program.Main
]Result validation: ✅ Known issue matched with the provided build.
Validation performed at: 5/30/2024 4:07:44 PM UTC
Report
Summary
The text was updated successfully, but these errors were encountered: