Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add ratelimit #14

Merged
merged 45 commits into from
Jul 25, 2022
Merged

add ratelimit #14

merged 45 commits into from
Jul 25, 2022

Conversation

switchupcb
Copy link
Owner

No description provided.

@switchupcb
Copy link
Owner Author

This can be completed once #12 is finalized.

We must place the peeked rate limit headers into a map[string]tickers or some equivalent structure that allows us to determine whether we can send a request (prior to sending it using the remaining value) and how much time we would need to wait to send it (if the limit was triggered).

When a new ratelimit bucket is encountered, its ticker will be added to the map. If the request can be sent, it will be sent. Otherwise, the thread will block (using the ticker) until it can be sent. This ensures that the user can handle the result of the Send() function without resorting to channels or managing the rate limit themselves.

In any case, a user (developer) must specify a goroutine function to call Send() concurrently, otherwise the main thread will block (while it waits for the rate limit to expire). This is a typical expected behavior for sending requests.

@switchupcb
Copy link
Owner Author

The current implementation assumes a "rolling" rate limit where a rate limit is NOT triggered IF X requests are made within the window. For example, a global rate limit has 50 requests per second, so making 50 requests in one burst followed by a request every 1/50th of a second after would NOT trigger the rate limit. It's unclear whether this is the case for global rate limits (requires further testing or clarification).

In contrast, per-route rate limits use buckets which do NOT roll. Instead, these are similar to leaky buckets where you are given a set amount of allowed requests per a specific time period. For example, a bucket with 5 requests every 2 seconds allows 5 requests to be made at any time within those two seconds, then resets. As a result, the current implementation of rate limit handling is NOT sufficient to handling dynamic rate limits, but is committed for reference if behavior changes. The implementation can be changed to a sufficient method by only keeping track of the remaining requests, and expiry.

Other: The current implementation is implemented as an interface which allows developers to separate their bot program's among extra executables while still handing rate limits with Redis or any other solution that requires messaging.

switchupcb added 3 commits June 19, 2022 01:01
for basic dynamic internal ratelimits
global ratelimits use buckets
@switchupcb
Copy link
Owner Author

As per the experiment (which uses the https://discord.com/developers/docs/topics/oauth2#get-current-bot-application-information endpoint), a global rate limit uses a bucket-rate limit. This was tested (https://github.com/switchupcb/disgo/blob/b63218b109b55a1ef6de3e7e9f7c52651ac2f97d/wrapper/ratelimit_test.go) by sending 50 requests within one second, waiting 1/50th (or 2/50ths) of a second (within that one second), and then attempting to send another request (within that one second).

If the request is valid, this would mean that Discord generates tokens using a constant rate (every 1/50th second) after the initial 50 tokens were used. In contrast, an invalid request signifies a rate limit bucket which allows 50 requests within a second (specified time period) and requires that second to be over in order to make another request.

Sending the 51st request.
HTTP/1.1 429 Too Many Requests
Server: cloudflare
...
Retry-After: 1
X-Ratelimit-Scope: global
X-Ratelimit-Global: true

Discord uses a bucket rate limit for global rate limits.

One complication discovered during testing is the fact that endpoints are NOT guaranteed to send rate limit headers; contrary to the documentation and response to discord/discord-api-docs#1808 In this endpoint's case, a rate limit header is NOT sent until you actually hit a global rate limit, at which point a Retry-After, X-Ratelimit-Scope:, and X-Ratelimit-Global. Alternatively, only Retry-After is returned for cloudflare bans. This will need to be considered in the complete rate limit implementation.

@switchupcb
Copy link
Owner Author

There are 4 things that need to be considered for the complete rate limit implementation:

  1. Handling Global Rate Limits
  2. Handling Request Rate Limits
  3. Handling Missing Rate Limit Headers
  4. Synchronization of each Rate Limit among multiple goroutines (concurrency)

@switchupcb
Copy link
Owner Author

These diagrams are in relation to a single rate limit bucket. To be specific, the orange step only applies to requests in a single rate limit bucket. This means that there can be multiple "next requests" since "next request" refers to the "next request in its rate limit bucket". Ignore the goroutine label; it isn't possible to send 2 requests on the same thread when one hasn't returned.

Without lock/unlock bucket priority.

image

With lock/unlock bucket priority.

image

As a reminder, this occurs because the initial lock sequence is q1 (429), q2 (429) which moves these locks to the back of the queue. The queue is then q3 (SEND), q4+ (SEND), q1 (PRIORITY), q2 (SEND). q3 - q4+ are unlocked and promptly re-locked, sending those locks to the back of the queue. q1 is sent to a priority line where priority is set to false and unlocked to be sent. Once q1 unlocks, q2+ can be sent — without waiting for the response of q1 — until there are no more remaining requests in the bucket.

@switchupcb
Copy link
Owner Author

Using method b implements 1,3,4 of #14 (comment) for global rate limit buckets. This is then applicable to 2 since handling the global rate limit bucket is the same with exception of parsing the headers for new rate limit buckets.

image

@switchupcb
Copy link
Owner Author

It is possible (albeit extremely unlikely) for the last experiment to be wrong if the same issue explained in #14 (comment) occurred. However, this would mean latency to the first request was more than 1/50th of a second, 2/50th of a second, and 3/50th of a second. In any case, we can test all of the above by sending 101 requests concurrently.

NOTE: One way to ensure the assumptions made are correct is to use another experiment. If the global bucket rate limit theory is correct, we will be able to send a request at .1s, followed by 49 at 1.09s and then 50 at 1.11 s; where the bucket resets at 1.1s.

@switchupcb
Copy link
Owner Author

In the case that a rate limit bucket is initially misconfigured (i.e 55 limit on globals instead of 50), there can be up to Limit - Actual Limit 429 requests. One aspect of this that we must consider is the potential cloudflare ban that occurs at the 10,000th bad request in a 10 minute period. As a result, the following rules apply to handling 429s:

  1. 429 responses must always be read.
  2. However, only one request may be a priority request at a time.
  3. add ratelimit #14 (comment) ignores the case where a 429 doesn't need to be retried and when no other requests are queued. In this case, some mechanism must allow the priority request to wait when there are no other requests queued; and in accordance with 1, ensure that consecutive 429s aren't waited for multiple times.

As a result, the first 429 (for a bucket) is the request that gains priority, but the last 429 is the one you need to use for waiting.

This leaves one scenario that may be considered unintended behavior. If you have a q1 (429), q2 (queued), and q3 (429) and bot.Config.Retries != 0, the following occurs:

  1. q1 429 gets handled and set to priority and unlocks (then locks).
  2. q2 locks, waits, unlocks (then locks) since priority is set to true.
  3. q3 429 gets handled (expiry based on initial time rather than at the time of lock), and NOT set to priority since q1 is already in priority. unlocks (then locks) since priority is set to true.
  4. q1 is sent and priority is set to false.
  5. q2 is sent since priority is set to false.
  6. q3 is sent.

Whereas the user (developer) may expect:

  1. q1 is sent.
  2. q3 is sent (since it was a 429 that was sent before q2 was even queued).
  3. q2 is sent.

How can we fix this?
We know that the first 429 will be queued when the Remaining request value is at the Limit. We know that the next 429 will be queued when the Remaining request value is at the Limit - 1. Since we wait in priority requests (now), we can use priority as a counter that determines how many priority requests there are to send. Then, every 429 will be sent to PRIORITY which results in the following:

  1. q1 429 gets handled and priority++ and unlocks (then locks) to PRIORITY
  2. q2 locks, waits, unlocks (then locks) to RATELIMIT since priority > 0
  3. q3 429 gets handled (expiry based on initial time rather than at the time of lock), and and priority++, unlocks (then locks) to PRIORITY
  4. q1 is sent and priority--.
  5. q2 unlocks, locks since priority > 0
  6. q3 is sent and priority--.
  7. q2 is sent.

As long as priority > 0, priority (429) requests will be sent in order of handling before other queued requests.

@switchupcb
Copy link
Owner Author

Finally, method b presents an issue with setting the time to bucket expiry initially . However, this can be solved with the following assumptions:

  1. a rate limit will not be misconfigured (global) and/or fixed upon a 429 (per-route).
  2. therefore, the expiry of a bucket is only set once, then updated continuously.

When the bot starts, there is no known expiry time UNTIL the first response is received by Discord. However, correctly waiting for this request in one goroutine will result in deadlock (when another goroutine that sent a request calls the same bucket's mutex). The solution is simple: Do not worry about data races when setting the expiry.

The expiry will ONLY be set (multiple times) within the FIRST BUCKET of responses. While this may result in multiple goroutines setting the expiry multiple times in an inconsistent manner (which may result in the FIRST BUCKET expiring late), this is fine since subsequent buckets will be consistent with the exact rate limit.

For example, sending 50 requests at .1s may result in the 50 responses setting expiry to multiple values (.1s ,.2s, .3s, etc). No matter what value is chosen, the bot will wait at least 1 second until it sends the 51st request. Then, following the request's bucket reset, the expiry will be set to a consistent rate of 50 requests every 1 second, over and over. So for this specific action, a mutex lock is NOT required.

The only issue that may arise from this approach is when a request sent for one bucket ends up in the subsequent bucket (due to latency). In that case, we may need to track the amount of requests Received or risk hitting a 429. In practice, it shouldn't occur often since most — if not all — requests land in their respective buckets. In any case, it can be implemented with careful consideration to mutex or atomic usage.

@switchupcb
Copy link
Owner Author

This implementation works for 1,3, 4 and requires optimization (for stability) but works to a certain extent.

includes temporary print statements
@switchupcb
Copy link
Owner Author

switchupcb commented Jul 2, 2022

I have tested this code with 1000, 151, 200, 400, etc, requests queued. There is one issue that occurs occasionally when the wait time for the global rate limit is 1 second. The log for that issue is here:

See edited.

This will be addressed in the following issue in the Discord repository.

using expiry mutex and atomic usage of remaining
@switchupcb
Copy link
Owner Author

"Data races" can be addressed through the use of atomic (for Remaining request count) and the usage of mutex for expiry. Specifically, a RWMutex will prevent excessive time being wasted between goroutines.

A write lock is set when a bucket resets, or expiry is set (in the first response).
An RLock is used to read timing (ratelimit) before sending a request, and setting the expiry for the first request.
All locks almost immediately unlock to prevent a waste of time between goroutines.
Avoid waiting while locked by using a variable.

@switchupcb
Copy link
Owner Author

switchupcb commented Jul 2, 2022

Here is the flow of the program using 101 requests with a 1 second wait time. This example passes, but this commit does NOT solve the issue specified in discord/discord-api-docs#5144.

See edited.

@switchupcb
Copy link
Owner Author

The following commit is adjusted for the updated information in the respective issue. It keeps track of both the outgoing (remaining) requests and incoming (pending) requests to fill the bucket accordingly. Priority has been temporarily removed in order to maintain simplicity.

In addition, the ability to "guarantee" a request's position in the queue is "removed". In reality, there is never a guarantee as to when Discord will receive your request or what order it processes it in, nor when undefined behavior of Go (such as the order of goroutine spawns) results in an "issue". If a user must guarantee the order of their asynchronously queued requests, they should use the tools provided by the language to guarantee that behavior.

In contrast, the ability to grant priority to "requests which were 429'd and retried" over "queued requests" will be kept. In other words, we can ensure that a 429'd request is sent in the following bucket (or following bucket with no 429's ahead of it) over regular queued requests as it's likely expected that these "retried" requests are sent prior to the queued requests.

temporarily removes priority functonality
@switchupcb
Copy link
Owner Author

switchupcb commented Jul 8, 2022

The following log corresponds with the previous commit and showcases how global rate limit buckets either:

  1. Do not expire at the second despite being presented at the second (i.e Fri, 08 Jul 2022 10:26:16 GMT)
  2. Uses a different method of calculating the actual rate limit.
See edited.

In case 1, the actual global rate limit expiry time can only be accounted for once the first rate limit has been hit. Once this occurs, the bot would wait to send requests until that expiry occurs, marking down the exact point in time (point in the second) where Discord resets the bot.

In case 2, we would need to know the actual method of rate limits to correctly handle global rate limits. A workaround could be setting the time between expiry to much longer than needed, but this would only lower the chance of a rate limit occurring. Based on the lack of response to discord/discord-api-docs#5144, the amount of libraries that actually handle global rate limit handling (discordgo does not), and the slash commands rollout, I wouldn't be surprised if it's feature that wasn't made with actual users in mind.

to handle a servicer bucket API
@switchupcb
Copy link
Owner Author

switchupcb commented Jul 10, 2022

The following log corresponds with the previous commit and updates the global rate limit bucket by using the Discord Date as the single source of truth. An assumption is made that the Global Rate Limit is implemented (on Discord's end) using a server-based bucket algorithm where a 50-token bucket is reset at a specific point in time (every second). As a result, incoming responses are used to keep track of the current Bucket.

The first 50 requests can be sent until there are no remaining tokens. When a response is received for the first time, the (uninitialized) bucket will update its Date (which references Discord's Bucket Date). There are two scenarios where the bucket can reset:

  1. When a response is received with a Date that occurs AFTER the current bucket's Date. This indicates that the Discord Bucket has reset. Expiry can be updated to one second after this time, guaranteeing that Expiry does NOT occur before the next Discord Bucket update.

  2. When the current time passes an expiry (that is NOT uninitialized). This expiry is reliable due to the guarantees above.

A reset is calculated using the limit, pending requests, and priority requests to account for responses that occur before or after the current bucket.

Using this implementation, a bucket is always accurate when there are incoming responses. If there aren't incoming responses (i.e first 50 requests), the remaining tokens will be acurrate to the amount of requests you can make. In the worst case, a Bucket is reset with a limited amount of tokens at the edge of a rate limit reset. However, this would imply that there are pending requests which allow you to reset on time.

See edited.
PASS 3.979s

151 request log.

This is confirmed to work with multiple numbers of requests.

@switchupcb
Copy link
Owner Author

A pure implementation of the algorithm above would only rely on Discord's Bucket Date change. However, there is the possibility that you send the Limit (50) requests within the same bucket. In a pure implementation, this would result in your bot sending no more tokens (if a change is NOT detected).

To address the above, we must use a manual reset in certain cases to allow the bot to send requests in new buckets. However, this can result in a situation where tokens are reset, but the date is not. As a result, when the date change is detected, the bot will change its date, but also reset it's tokens again. This is problematic when requests were sent and processed in between the first manual reset, and the event-based reset.

We cannot solve this by updating the Date upon manual resets, since this can result in situations where the Date is no longer accurate with Discord. We cannot use the Expiry to determine whether a manual reset occurred and refill tokens as necessary, since the current time + 1 second will always occur after the expiry. Instead, we only allow manual resets, but adjust the expiry upon a Date change. This results in an expiry that is eventually consistent with the Discord Bucket expiry.

In the worst case, a Bucket is reset with a limited amount of tokens at the edge of a rate limit reset. For example, 50 requests are sent with the first request occurring within the last .1 seconds of the actual expiry, and last request occurring within the last .01 seconds of the actual expiry. As a result, the program will wait until the last .1 seconds of the next bucket to send requests which potentially lowers requests per second initially.

In practice, it is likely for some of those 50 requests to be received in the next bucket (meaning that we receive responses from the next bucket), which allows us to update the expiry to the actual expiry. At this point, a reset bucket could utilize its full second to send requests at any point in time, while still resetting in an accurate position.

@switchupcb
Copy link
Owner Author

The only issue with this implementation is that the Route Limit test takes 30s due to the endpoint we are using. In addition, we may want to ensure that rate limits works with a rate limit test.

@switchupcb
Copy link
Owner Author

Routes for controlling emojis do not follow the normal rate limit conventions. These routes are specifically limited on a per-guild basis to prevent abuse. This means that the quota returned by our APIs may be inaccurate, and you may encounter 429s.
https://discord.com/developers/docs/topics/rate-limits#rate-limits

This explanation regarding Emoji Rate Limits is unclear as to its implementation and will be handled according to normal rate limit conventions. The fact that Discord says that the quota itself will be inaccurate indicates that there is no real solution to this feature. Emoji rate limits are also not handled in any other API Wrappers for this very reason.

We may reconsider this decision there is a better explanation provided (that results in a simple implementation).

@switchupcb switchupcb temporarily deployed to testing July 21, 2022 03:31 Inactive
@switchupcb
Copy link
Owner Author

One potential issue with the current way that route rate limits is that — as stated prior — we are not guaranteed to receive the responses in the order that Discord processes the requests. This can lead to a case where the same bucket is marked with 6 remaining requests, followed by 9 remaining requests. You will inevitably hit a rate limit.

The current implementation passes tests, but only due to the following code:

// requeue the request when it's ready.
if routeBucket != nil {
    <-time.After(time.Until(wait))
}

This code prevents an exhausted bucket from sending more requests (even when it's updated) since it stall's requests in the queue. However, this only works if no other requests are being queued for the route, or if all requests are processed BEFORE a rate limit bucket is updated.

In a similar manner to Global Rate Limits, we can only use the header as assurance that the "deterministic" fields are correct: ID, Limit, Reset, etc. While we must use an internal count to ensure that the remaining requests are accurate.

@switchupcb
Copy link
Owner Author

The other issue is that certain endpoints have rate limits that are "unconventional". These are not actually described in the documentation but I have been able to deduce the types of rate limits Discord has: Global (Requests), Per Route (Requests), Per Resource (Requests), Per Resource Per Routes (Emoji Requests), Global (Gateway), Identify (Gateway). However, the current implementation can easily support these by simply bundling the Route ID and "top-level" URL parameter for requests that are limited by top-level resources. The only caveat being that one must ensure a route is actually a per resource rate limited route, or risk creating excessive entries (albeit for the same bucket).

In short, we will be the only library that handles every single type of rate limit Discord provides.

@switchupcb
Copy link
Owner Author

Correction to #14 (comment). The issue is much simpler than described. As stated, we are not guaranteed to receive the responses in the order that Discord processes the requests. This means that we can send 30 requests before a single response is received, and that first response will state that the Bucket contains 29 remaining requests.

The above code assists with reducing excessive goto RATELIMIT calls while a request in the same bucket waits to be sent. However, removing or keeping it has no effect on the actual issue: We should NOT update the remaining requests for a Bucket while processing a response; unless that response meets a few conditions.

@switchupcb switchupcb marked this pull request as ready for review July 25, 2022 06:34
@switchupcb switchupcb temporarily deployed to testing July 25, 2022 06:35 Inactive
@switchupcb switchupcb merged commit 64ad08e into main Jul 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant