Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve retry logic for downloading tasks #3857

Conversation

DenisRumyantsev
Copy link
Contributor

@DenisRumyantsev DenisRumyantsev commented Jun 3, 2022

This PR improves logging for downloading tasks during job initialization.

Related issue

Agent logs example:

[2022-06-06 13:03:37Z INFO TaskManager] The 'CodeQL3000Init' task downloading started.
[2022-06-06 13:03:39Z ERR  TaskManager] Fail to download task 'c450a110-caea-4ea9-8299-297eecc70633 (CodeQL3000Init/0.1.76)' -- Attempt: 1
[2022-06-06 13:03:40Z ERR  TaskManager] System.Threading.Tasks.TaskCanceledException: The operation was canceled.
 ---> System.IO.IOException: Unable to read data from the transport connection: The I/O operation has been aborted because of either a thread exit or an application request..
 ---> System.Net.Sockets.SocketException (995): The I/O operation has been aborted because of either a thread exit or an application request.
<Exception Stack Trace>

[2022-06-06 13:03:40Z INFO TaskManager] Zip file 'D:\agent\_cloud\x2\_work\_tasks\_temp_e15c1cd9-ca4d-4194-9b14-4d2a5794d15a\46becf97-219c-4b9b-9006-1e84fb75a9d1.zip' exists; its size in bytes: 5304320
. . .

[2022-06-06 13:04:04Z INFO TaskManager] The 'CodeQL3000Init' task downloading started.
[2022-06-06 13:04:07Z ERR  TaskManager] Fail to download task 'c450a110-caea-4ea9-8299-297eecc70633 (CodeQL3000Init/0.1.76)' -- Attempt: 2
[2022-06-06 13:04:07Z ERR  TaskManager] System.Threading.Tasks.TaskCanceledException: The operation was canceled.
 ---> System.IO.IOException: Unable to read data from the transport connection: The I/O operation has been aborted because of either a thread exit or an application request..
 ---> System.Net.Sockets.SocketException (995): The I/O operation has been aborted because of either a thread exit or an application request.
<Exception Stack Trace>

[2022-06-06 13:04:07Z INFO TaskManager] Zip file 'D:\agent\_cloud\x2\_work\_tasks\_temp_e15c1cd9-ca4d-4194-9b14-4d2a5794d15a\1922b180-26dc-4168-bc62-91419a1f4900.zip' exists; its size in bytes: 294912
. . .

[2022-06-06 13:04:23Z INFO TaskManager] The 'CodeQL3000Init' task downloading started.
[2022-06-06 13:04:25Z ERR  TaskManager] Fail to download task 'c450a110-caea-4ea9-8299-297eecc70633 (CodeQL3000Init/0.1.76)' -- Attempt: 3
[2022-06-06 13:04:25Z ERR  TaskManager] System.Threading.Tasks.TaskCanceledException: The operation was canceled.
 ---> System.IO.IOException: Unable to read data from the transport connection: The I/O operation has been aborted because of either a thread exit or an application request..
 ---> System.Net.Sockets.SocketException (995): The I/O operation has been aborted because of either a thread exit or an application request.
<Exception Stack Trace>

[2022-06-06 13:04:25Z INFO TaskManager] Zip file 'D:\agent\_cloud\x2\_work\_tasks\_temp_e15c1cd9-ca4d-4194-9b14-4d2a5794d15a\829b4b19-338d-4472-9961-6dc98c84b01f.zip' exists; its size in bytes: 512000
[2022-06-06 13:04:25Z INFO TaskManager] Retry limit to download the 'CodeQL3000Init' task reached.

Pipeline logs example:

Set build variables.
Download all required tasks.
Downloading task: CodeQL3000Init (0.1.76)
##[warning]Task 'CodeQL3000Init' didn't finish download within 3 seconds.
##[warning]Back off 24.052 seconds before retry.
##[warning]Task 'CodeQL3000Init' didn't finish download within 3 seconds.
##[warning]Back off 15.567 seconds before retry.
##[warning]Task 'CodeQL3000Init' didn't finish download within 3 seconds.
##[error]The operation was canceled.
##[debug]System.Threading.Tasks.TaskCanceledException: The operation was canceled.
 ---> System.IO.IOException: Unable to read data from the transport connection: The I/O operation has been aborted because of either a thread exit or an application request..
 ---> System.Net.Sockets.SocketException (995): The I/O operation has been aborted because of either a thread exit or an application request.
<Exception Stack Trace>

Full size of the zip file can not be determined via the result.Length property, and the result.CanSeek property is false:

image

GetTaskContentZipAsync is located in Microsoft.TeamFoundation.DistributedTask.WebApi.TaskAgentHttpClient (metadata).
During its execution, a GET request is sent to Azure DevOps, to get a stream, but not a zip archive with a task, before downloading the zip archive with the task itself. So, the 200 status code in the response does not mean that the task is downloaded successfully, it means that the agent got the stream from the server.

@DenisRumyantsev DenisRumyantsev added the misc Miscellaneous Changes label Jun 3, 2022
@DenisRumyantsev DenisRumyantsev marked this pull request as ready for review June 7, 2022 05:16
@DenisRumyantsev DenisRumyantsev requested review from a team and mmrazik June 7, 2022 10:03
@max-zaytsev max-zaytsev self-requested a review June 7, 2022 20:15
@alexander-smolyakov alexander-smolyakov added enhancement and removed misc Miscellaneous Changes labels Jun 8, 2022
@DenisRumyantsev DenisRumyantsev merged commit 1be109f into master Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants