-
Notifications
You must be signed in to change notification settings - Fork 708
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Intermittent Plugin Failed to Start Error in Azure Pipelines #192
Comments
One note, I saw this happen with Nuget Authenticate Task Version: 0.167.2 as well. |
Not sure if this is the same issue: https://developercommunity.visualstudio.com/content/problem/1014284/nuget-restore-fails-due-to-credentialprovidermicro.html |
Thanks for the pointer Luke! It looks like that issue is a little different since they are not getting the That being said, the logging guidelines in there are helpful. I have turned on those same logs for our pipeline to see if that sheds any more light on the issue. |
Since the plugin failed to start "within 5.467 seconds", as a workaround, you could try use the following NuGet environment variables to extend the timeout and see if you stop seeing the issue. I believe the default for these is 5 seconds.
I think the dotnet version you're using has all the recent fixes the NuGet team worked on in this area. If you run FYI @nkolev92, @rrelyea from the NuGet team, this is a new "Plugin failed to start" issue with dotnet 3.1.301. |
Logs would help us investigate. We haven't seen too much of this recently. |
Here are some logs! They are very short so I can actually just copy/paste them here artifacts-credprovider.log
NuGet_PluginLogFor_dotnet_8d812dbbcf0b368.log
NuGet_PluginLogFor_dotnet_8d812dbbfcf7e15.log
|
The plugin seems to never receive the handshake request, that's weird :/ I don't recall seeing these symptoms before. @satbai anything ring a bell? |
A thing to consider on the plugin end would be updating to newer versions of NuGet.Protocol. |
Thanks! I'll make the change to update NuGet.Protocol - should be able to do it later today. In the meantime, maybe increasing the handshake timeout with NUGET_PLUGIN_HANDSHAKE_TIMEOUT_IN_SECONDS will help? |
Yeah @schultetwin1 try that out. Looking at the symptoms, I'm not sure how much/if that'll help, but worth a shot for sure :) Consider collecting the logs even with an increased timeout, just in case it fails. |
One other interesting tidbit. It always happens with just one source (for those internal to Microsoft, you maybe able to view that source here: https://dev.azure.com/Microsoft/Analog/_packaging?_a=feed&feed=Microsoft.SRaaS) |
Scratch that, I spoke too soon. We are seeing this error with other sources as well. All ADO sources though (which are the only sources we authenticate with) |
Just FYI, I was able to see the version of tools on the build machines. It looks like it's dotnet nuget = NuGet Command Line 5.6.0.5 So maybe it's the fact we are using Nuget 5.6? |
I've seen the same error on 5.5.1 (and your original report says 5.5.1.) My belief is that it's caching related, or related to some state in the Azure Devops nuget auth itself related to a specific account, rather than being a client problem. So I'm also suspicious of most of the "fixes." I'm concerned most of the "fixes" happened to bust the cache or something. So though the error did stop in direct response to the change, it didn't fix the underlying issue. |
@lukeschlather thanks for the pointer on the original report. I edited that. I thought we were using 5.5.1 because that's what the nuget tool installer was installing, but it looks like the DotNet tool installer then installs a newer version of nuget.exe later.
I'm not sure I follow. What is "the change" in this sentence? I have not said the error stopped. |
If you're using the The earlier reports I have read all claim that it's fixed in the latest version of nuget, so it seems that there have been a series of changes focused on fixing this issue and it's I'm a little suspicious if there is some underlying stateful/caching bug that those fixes happened to suppress but did not fix. |
FWIW I tried reproducing on 5.6 with my setup and it ran 3 times without incident in a way that did reproduce the issue on 5.5.1 a month and a half ago. |
If you are having any sort of "task was cancelled" issues, the logs would help us investigate. https://github.com/NuGet/Home/wiki/Plugin-Diagnostic-Logging. |
The experience I had was when I added all the diagnostic logging and debugging info asked for, the problem magically disappeared. So my pipeline just has the debug logging enabled now. There's a private Microsoft-only comment on the above issue that has my full logs attached to them. However they were uninteresting and I can't reproduce the issue anymore. |
New version of the cred provider has been released with updated NuGet.Protocol version |
Awesome, thank you @satbai I've installed the new release on my machine to test. And then we'll need to get this new release into the NuGetAuthenticate plugin, since that's how we get this bits onto our build machines. |
I'll update NuGetAuthenticate as well, but heads up, the release process for that will take a little longer. |
No worries, I understand how that goes. Thank you for pushing this through so quickly. I'll let you know if we continue to see errors (both with the 60 second delay, which change I just got in and once them new NuGetAuthenticate task is released) |
So about a week after my change to increase timeouts (setting |
Has the fix for NuGetAuthenticate rolled out? I just saw this yesterday:
|
This is what I have in my ADO pipeline:
I had previously added this command after nuget authenticate per the discussion in the developercommunity thread I linked earlier:
^ This command had completely eliminated the timeouts; however last week the command broke and just started throwing errors so I had to remove it to unblock my pipeline, but now I'm back to seeing these intermittent errors. |
The update to the NuGet Authenticate task has not been rolled out yet. Did you try increasing the timeout? |
I have not. Are you changing the default timeout? |
Is there any way Azure Devops can instrument the task so that this is detectable on the nuget service end? If there are latency issues with the ADO nuget service that seems like something they should be able to detect and fix. |
As a paying customer I would expect them to treat it as an outage. |
I am also seeing this issue -- Nuget v.5.6.0, Cred Provider 0.1.23 (Nuget Authenticate step will not install 0.1.24 even with reinstall option set). I have set the env variables on my build VMs -- will report back if I see this issue again. |
Seeing this issue very often with dotnet tool install commands on Azure DevOps hosted agents. I will try setting the |
@satbai - Just coming back with a month update. We added the following two env vars to our builds:
This did seem to greatly reduce the number of "plugin failed to start" errors but it did not get rid of them. In the last month we've seen this error happen twice (or hundreds of builds). I've downloaded the nuget and credprovider logs and uploaded them here. Here is the output from
I'd like to suggest we re-open this issue, but I'll leave that up to you and the rest of your team. Thanks! Let me know if you need anymore info. |
FYI @samuelkoppes @herenhuang @edgarrs @phil-hodgson for triage regarding the latest update. |
Just a FYI, this is still happening though it's rare. |
I am also still seeing this issue, although infrequently. Happened on 9/14/20 a few times in a row, then fine since. I'll attempt the work around as a band aid long term, but hoping someone could shed some light on an actual fix. Thanks. |
We've been seeing intermittent errors doing a NuGet restore due to the credential provider timing out. Reccomendations online are to increase the default 5s timeout to something longer. See - https://developercommunity.visualstudio.com/content/problem/1014284/nuget-restore-fails-due-to-credentialprovidermicro.html - microsoft/artifacts-credprovider#192 Hopefully fixes microsoft#6131
We've been seeing intermittent errors doing a NuGet restore due to the credential provider timing out. Reccomendations online are to increase the default 5s timeout to something longer. See - https://developercommunity.visualstudio.com/content/problem/1014284/nuget-restore-fails-due-to-credentialprovidermicro.html - microsoft/artifacts-credprovider#192 Hopefully fixes #6131
Just chiming in that I've been seeing this issue as well recently. I tried adding Edit: Extending the timeout seems to have resolved the issue for me, but I'm worried I've just kicked the can a few months into the future. |
I have a vanilla azure devops pipeline with only the following two steps, running on a hosted agent (windows-latest) and I see this issue occur on the second step (dotnet) after nuget authenticate step runs successfully.
and the error:
Adding the following variables to the pipeline fixes it.
Is this the intended usage? |
One of our pipelines is having this issue again, intermittently. |
Same here, seeing a spike in these errors the last few hours, currently we have not implemented any of the workarounds detailed in this thread |
I created a problem on the ADO developer community https://developercommunity2.visualstudio.com/t/intermittent-nuget-authentication-failures/1336781?from=email |
|
getting the same issue, added the 30sec timeout. seems to improve but not completely fix. |
Restarting the self-hosted agents seemed to work for us. |
I'm getting this on windows-2022 and DotNetCoreCLI@2. Supposedly there was a fix, but I'm still getting an error under 6 seconds. |
We are running into intermittent "plugin failed to start" issues when downloading packages using dotnet in our CI pipeline on Azure DevOps.
Versions
Windows 10 Build Agent
Nuget Authenticate Task Version: 0.167.1
Nuget Version: 5.6.0.6591
.NET Core SDK Version: 3.1.301
Here is a snippet of the error
Here is the original command that was run
The weird thing is, we run this pipeline all the time and this error only pops up every once in awhile. One item of note: This pipeline is building using a build system (ninja) that spins off multiple processes. These processes could be dotnet builds or nuget pulls of packages. My first thought is they might be conflicting with each other?
The text was updated successfully, but these errors were encountered: