Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add retry on image push 5xx errors #2043

Merged
merged 1 commit into from
Mar 29, 2021
Merged

Conversation

aaronlehmann
Copy link
Collaborator

Some registries can be flaky and return intermittent 5xx errors. This
change allows those errors to be retried, similarly to network-level
errors.

Note that this needs the upstream containerd fix
containerd/containerd#5276 to work reliably.

This was tested with a registry that was modified to return 504 on every
other manifest PUT. Without the change, exports to the registry fail
every other attempt. With the change and the related containerd change,
exports to the registry always succeed.

Signed-off-by: Aaron Lehmann alehmann@netflix.com

Some registries can be flaky and return intermittent 5xx errors. This
change allows those errors to be retried, similarly to network-level
errors.

Note that this needs the upstream containerd fix
containerd/containerd#5276 to work reliably.

This was tested with a registry that was modified to return 504 on every
other manifest PUT. Without the change, exports to the registry fail
every other attempt.  With the change and the related containerd change,
exports to the registry always succeed.

Signed-off-by: Aaron Lehmann <alehmann@netflix.com>
@@ -65,7 +65,7 @@ func CopyChain(ctx context.Context, ingester content.Ingester, provider content.
handlers := []images.Handler{
images.ChildrenHandler(provider),
filterHandler,
retryhandler.New(remotes.FetchHandler(ingester, &localFetcher{provider}), nil),
retryhandler.New(remotes.FetchHandler(ingester, &localFetcher{provider}), func(_ []byte) {}),
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing nil to retryhandler.New would cause panics if the retry handler tried to log something.

@@ -133,6 +133,7 @@ func (sw *streamWriter) Close() error {
func LoggerFromContext(ctx context.Context) func([]byte) {
return func(dt []byte) {
pw, _, _ := progress.FromContext(ctx)
defer pw.Close()
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found that the client was getting "context cancelled" errors reading status if and only if the retry handler logged anything. It was a bit hard to track down the cause, but it looks like this missing Close kept the status stream open after the build was complete and led to a timeout.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants