Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[JUJU-1868] Prevent replacement of nil request body on retry #98

Merged
merged 1 commit into from
Jan 19, 2023

Conversation

manadart
Copy link
Member

When operating a MAAS controller via TLS, Juju can observe the following error during (sufficiently concurrent) provisioning:

net/http: cannot rewind body after connection loss

Based on pouring over the http standard library, this appears to happen for GET requests without a body, where the retry logic incorrectly replaces a nil body with a (non-nil) empty buffer.

Here we avoid doing the body reset when it is nil.

I was not able to recreate the exact failure scenario, but there is a new test that verifies that this change behaves correctly.

non-nil empty buffer when the body is nil.

This should prevent observed "net/http: cannot rewind body after
connection loss" errors.
@manadart
Copy link
Member Author

/merge

@jujubot jujubot merged commit fee8032 into juju:v2 Jan 19, 2023
@manadart manadart deleted the v2-client-retry-fix branch January 19, 2023 13:47
jujubot added a commit to juju/juju that referenced this pull request Jan 26, 2023
#15096

The linked issue is being encountered when we saturate the provisioner worker pool and make many concurrent calls to the `ProvisioningInfo` API method for each machine pending provisioning.

Within this method, we create an `environ`, which for MAAS makes calls to the provider API. Our API for MAAS had issues with its own retry logic that should be fixed under juju/gomaasapi#98. The new version of that library is recruited in this patch.

In addition, we eschew here multiple calls to `ProvisioningInfo` for each machine. Instead, we get all provisioning info for pending machines in a single bulk call, and pass the appropriate info to each pooled job.

The testing suite is improved by removal of some custom mock/stub logic in favour of a new generated mock.

## QA steps

We don't have a MAAS with sufficient resources to provision many machines in parallel, but behaviour can be verified by repeatedly:
- Adding a LXD model.
- Running `juju add-machine -n 10` and waiting for quiescence.
- Destroying the model.

For regression, I also set my MAAS up for access via TLS and verified that provisioning still works.

## Documentation changes

None.

## Bug reference

https://bugs.launchpad.net/juju/+bug/1987009
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants