Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hurricane: he.net responds with undocumented response code interval #1415

Closed
3 tasks done
chrisnovakovic opened this issue May 30, 2021 · 9 comments · Fixed by #1417
Closed
3 tasks done

hurricane: he.net responds with undocumented response code interval #1415

chrisnovakovic opened this issue May 30, 2021 · 9 comments · Fixed by #1417

Comments

@chrisnovakovic
Copy link
Contributor

  • Yes, I'm using a binary release within 2 latest releases.
  • Yes, I've searched similar issues on GitHub and didn't find any.
  • Yes, I've included all information below (version, config, etc).

What did you expect to see?

When using the he.net Dynamic DNS provider (hurricane), a descriptive error message being shown when any type of error occurs.

What did you see instead?

A nondescript error from he.net that isn't documented anywhere during the clean-up step:

2021/05/23 12:30:33 [INFO] [...] acme: use dns-01 solver
2021/05/23 12:30:33 [INFO] [...] acme: Preparing to solve DNS-01
2021/05/23 12:30:33 [INFO] [...] acme: Trying to solve DNS-01
2021/05/23 12:30:33 [INFO] [...] acme: Checking DNS record propagation using [216.218.130.2:53]
2021/05/23 12:30:35 [INFO] Wait for propagation [timeout: 5m0s, interval: 2s]
2021/05/23 12:30:42 [INFO] [...] The server validated our request
2021/05/23 12:30:42 [INFO] [...] acme: Cleaning DNS-01 challenge
2021/05/23 12:30:42 [WARN] [...] acme: cleaning up failed: hurricane: attempt to change TXT record _acme-challenge.[...] returned interval
2021/05/23 12:30:42 [INFO] [...] acme: Validations succeeded; requesting certificates
2021/05/23 12:30:46 [INFO] [...] Server responded with a certificate.

Details

Excessively enthusiastic use of the he.net Dynamic DNS provider (hurricane), e.g. during testing, can lead to the service responding with the response code interval. This isn't documented on https://dns.he.net, but after speaking to he.net's DNS Admin, I found that this response code is triggered whenever an update request is sent for a dynamic record more than 5 times in 5 minutes; this has now been increased to 10 times in 2 minutes. Since it's possible to hit these limits while testing lego with LE's staging endpoint, I think it makes sense to recognise this response code and warn users to back off for a minute or two if it is sent.

In the case above, it was the clean-up step that failed with this error, which left the challenge token in the TXT record - @ldez, I'm not sure if there's anything that can or should be done if this happens (it seems like it could be a problem for all DNS providers, not just he.net, but then again I wouldn't expect to run into this problem during normal operation, only during testing).

chrisnovakovic added a commit to chrisnovakovic/lego that referenced this issue May 30, 2021
he.net's Dynamic DNS endpoint responds with `interval` whenever a
dynamic TXT record has been updated more than 10 times in 2 minutes;
this is undocumented on https://dns.he.net, but the behaviour has been
confirmed with he.net's DNS admin. Recognise `interval` as a potential
response code in the `hurricane` DNS provider, and recommend that the
user back off until the cooldown period has passed.

Closes go-acme#1415.
@ldez
Copy link
Member

ldez commented May 30, 2021

Hello,

If you are a HE customer, I invite you to contact HE to demand a real API and real API documentation.
Their API for handling DNS records is undocumented and almost useless.

@ldez
Copy link
Member

ldez commented May 30, 2021

@chrisnovakovic could you give me the complete output?

EDIT: I was hoping that the API returns the interval value, but based on your logs it's not the case.

@chrisnovakovic
Copy link
Contributor Author

Sadly not - that's why I got in touch with their DNS Admin, who confirmed what the limit is. I agree that there's a lot of room for improvement in the documentation for their Dynamic DNS service. I'm still talking to them about the limits of what they consider acceptable use - when I know more, I'll submit another PR tweaking the default timeouts and intervals for hurricane.

@ldez
Copy link
Member

ldez commented May 30, 2021

@chrisnovakovic One question: the rate limit of 10 times in 2 minutes is by record or on the whole API?

@chrisnovakovic
Copy link
Contributor Author

@ldez It's per TXT record, apparently, so the fix in #1417 is good. I'm going to ask he.net if they could add the remaining cooldown time to the interval response too, since it'd be helpful to know that.

@ldez
Copy link
Member

ldez commented May 30, 2021

If it's per record, the fix is not good:

  • because the rate limit is on all the API, not per record.
  • because I misuse the rate limit API (I will explain that later)

@ldez
Copy link
Member

ldez commented May 30, 2021

10 rqs / 2 minutes (120 seconds)

The rate limiter expects a minimal frequency based on one second and a maximum burst size.
The maximum burst size is at least 1.

The burst will consume requests so the frequency must be computed with a number of requests equals to max requests - bust + 1

  • max = x + burst - 1 -> x = max - burst + 1

The base frequency is 10 / 120 = 1 / 12 (5 reqs per minute = 1 req every 12s)
on 2 minutes:

  • with a burst of 1: x = 10 - 0 = 10 then a frequency of 10 / 120 (5 reqs per minute = 1 req every 12s)
  • with a burst of 2: x = 10 - 1 = 9 then a frequency of 9 / 120 (4.5 reqs per minute = 1 req every ~13s)
  • with a burst of 3: x = 10 - 2 = 8 then a frequency of 8 / 120 (4 reqs per minute = 1 req every 15s)
  • with a burst of 4: x = 10 - 3 = 7 then a frequency of 7 / 120 (3.5 reqs per minute = 1 req every ~17s)
  • with a burst of 5: x = 10 - 4 = 6 then a frequency of 6 / 120 (3 reqs per minute = 1 req every 20s)
  • with a burst of 6: x = 10 - 5 = 5 then a frequency of 5 / 120 (2.5 reqs per minute = 1 req every 24s)
  • with a burst of 7: x = 10 - 6 = 4 then a frequency of 4 / 120 (2 reqs per minute = 1 req every 30s)
  • with a burst of 8: x = 10 - 7 = 3 then a frequency of 3 / 120 (1.5 reqs per minute = 1 req every 40s)
  • with a burst of 9: x = 10 - 8 = 2 then a frequency of 2 / 120 (1 req per minute = 1 req every 60s)
  • with a burst of 10: x = 10 - 9 = 1 then a frequency of 1 / 120 (0.5 req per minute = 1 req every 120s)

What does this mean?

  • with a burst of 4, the 4 first requests will be done immediately, after that each request will have to wait for 20s.
  • with a burst of 2, the 2 first requests will be done immediately, after that each request will have to wait for 15s.

If you have 3 domains, and the rate limit is based on the API calls, lego will call 6 times the API:

  • with a burst of 1, the minimal duration of the calls will be 12s x 6 = 72s
  • with a burst of 5, the minimal duration of the calls will be 20s x 1 = 20s

If you have 10 domains, and the rate limit is based on the API calls, lego will call 20 times the API:

  • with a burst of 1, the minimal duration of the calls will be 12s x 20 = 240s
  • with a burst of 5, the minimal duration of the calls will be 20s x 15 = 300s

The max burst size is really important and must be defined with caution:

  • a small burst size will imply that each call will be slow
  • a large burst size will imply that each call after the burst will be very slow, but the first calls will be very fast.

The current API rate limit is very very low, a min frequency of 1/12 (1 req every 12s) is really problematic for an API.

In conclusion, it's really important to be sure if the rate limit is per record or for all the API calls.

@ldez
Copy link
Member

ldez commented May 31, 2021

@chrisnovakovic Do you have more information?

@chrisnovakovic
Copy link
Contributor Author

Thanks for the extra information, @ldez, that was very insightful.

The rate limit that causes the issue raised here is definitely per-record - that's been confirmed privately by he.net's DNS Admin. I haven't yet heard back from them about adding the remaining cooldown time to the interval response (which I think would simplify a lot of the logic you describe here, if only for the hurricane responder).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
2 participants