-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🐛 CAPD: delete container after failed start to work around port allocation issues #9125
🐛 CAPD: delete container after failed start to work around port allocation issues #9125
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
/lgtm
LGTM label has been added. Git tree hash: 4fb62dc082a474992ef947e94cf76d4349ebf765
|
/area provider/infrastructure-docker |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: killianmuldoon The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/cherry-pick release-1.5 |
@killianmuldoon: once the present PR merges, I will cherry-pick it on top of release-1.5 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherry-pick release-1.4 |
@killianmuldoon: once the present PR merges, I will cherry-pick it on top of release-1.4 in a new PR and assign it to you. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I'll put a hold on the cherry-picks until we get some sort of signal that this is helping the flake. |
@killianmuldoon: new pull request created: #9130 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@killianmuldoon: new pull request created: #9131 In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
// Delete the container and retry later on. This helps getting around the race | ||
// condition where of hitting "port is already allocated" issues. | ||
if innerErr := d.dockerClient.ContainerRemove(ctx, resp.ID, types.ContainerRemoveOptions{Force: true, RemoveVolumes: true}); innerErr != nil { | ||
return errors.Wrapf(innerErr, "error removing container after failed start: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if that mixes the errors in a way that is not easily readable. I think ideally we would use kerrors aggregate (and usually do)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case it should result in:
{InnerErr}: error removing container after failed start: {err}
Ack, kerrors aggregate would have been an option I didn't think of.
Was also thinking about using go's errors.Join
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is not how error wrapping works. As far as I know it appends the innerErr at the end
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(just like the normal fmt.Errorf("adfasfd test: %v", err) would)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Usually we use something like this in CAPI:
reterr = kerrors.NewAggregate([]error{reterr, errors.New("failed to unlock the kubeadm init lock")})
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah so currently it is:
error removing container after failed start: {err}: {innerErr}
I will follow up and use kerrors instead 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx! I know it's a nit in a test provider, just noticed and was thinking about future me trying to parse the error :)
What this PR does / why we need it:
Deletes the container after a failed ContainerStart to improve the retry later on by going again by creating a new container.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #8824