-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add Per Resource Rate Limits #22
Comments
From that issue (discord/discord-api-docs#5557 (comment)). What is the significance of the word "often" (which implies that certain routes may not account for top-level resources)?
Is a user safe assuming that all routes that include channels, guilds, and webhook parameters are per-resource?
Does top-level refer to endpoints that contain those resources first?
Progress can now be made! |
A new rate limit implementation that will adhere to Discord Rate Limits can be created using the following information. InformationIt all starts with a request that we want to send to Discord. Unfortunately, resources aren't unlimited so Discord must rate limit requests accordingly. As an API Consumer, we do NOT want to hit that rate limit (because it makes our API unreliable). In other words, we want to make sure that each request we send has the highest chance of success. Discord adds pressure by banning people who keep hitting rate limits. We want our requests to be sent in an performant manner. This means that we must send requests within the rate limit that Discord provides. However, in an effort to provide ideal performance for their own services, Discord rate limits requests per-route; with different rate limits. In other words, we cannot simply allow X requests per second nor allow X requests per second per route. Instead, we must handle the rate limits as Discord provides them (from responses). Even if we didn't want to handle requests in a performant manner, we would still have to handle rate limits since Discord can change them dynamically. Since there is no way to figure out a rate limit prior to sending a request, we must use an internal rate limiter which matches Discord's rate limiter. The complexity of that has already been discussed and implemented in pull request #14. The remaining complexity stems from the handling of the rate limiter; such that we can rate limit each request per-route (with different rate limit algorithms). If there was one rate limit algorithm strategy, rate limiting would be as easy as mapping each Route to a Rate Limit Bucket Hash (that Discord provides). When that bucket is updated (or deleted), we could simply update the internal state of the Rate Limit Bucket. Unfortunately, the addition of another rate limit algorithm (per-resource) — which can be observed on the same route — complicates this logic by adding another way for the Bucket's state to change. To be specific, a Rate Limit Bucket Hash will not match the Hash of the previous request in multiple scenarios (provided below). Per-Route
Per-Resource
In any case, we always have to account for the fact that a bucket can be updated or deleted (by reading response headers). However, the implementation's complexity lies in handling per-resource routes in a different manner to per-route routes. In other words, in order to figure out whether a route's rate limit bucket has updated, you need to know whether the route is a per-resource route or NOT. Otherwise, you have to guess using a number of detectability methods that add unnecessary overhead to the program. Luckily, the addition of the information in #22 (comment) provides a specification we can use to determine this before sending a route (with certain exceptions). ImplementationThe above implementation showcases why rate limits cannot be handled through the use of — and only of — Rate Limit Bucket Hashes. Instead, for any given request, we need to know the Current Bucket Hash, Route, and the Route's algorithm. When we do not have the hash yet (for the first request), we implement the Default Bucket workaround. We can use the specification above to understand whether we need to track hashes per-route or per-route-resource for every Discord route. Prior to a more direct explanation of an implementation, we should acknowledge that — due to per-route-resource rate limits — there is not a static capacity for the amount of Rate Limit Bucket Hashes for the Rate Limit Bucket Map. In other words, using a per-resource routes will involve more overhead as you increase the amount of different resources that you interact with. As an example, using 4 guilds in a per-resource route entails storing the memory for 4 guild rate limit buckets in the map. Using copygen, we can specify which send functions involve per-resource routes (i.e contains top-level resource) and have the RouteID be set accordingly from the Send function (without overhead). As an example, per-route route IDs may be ExceptionThe above rate limit implementation assumes that Discord will not change a rate limit from per-route to per-resource. More so, it assumes that these changes will be infrequent enough such that the User can fix this by simply restarting the bot (alongside a quick update; in the Go language this requires recompilation unless an external service is used). There are two implications that this decision would result in. Per-Route -> Per-Resource Per-Resource -> Per-Route |
Handling ExceptionsAs a thought experiment, let's also explore a mechanism that attempts to avoid the above scenario entirely. This involves the usage of detectability (which adds a slight overhead; but not enough to effect the max RPS of 50). One could argue that you never need to classify routes by algorithm, since a Bucket Hash is the single source of truth of the rate limit. The issue is that — to support concurrent requests — there must be an internal rate limiter which also updates when the bucket changes. This leads to a requirement for following functionality: If the rate limit algorithm changes, all buckets for that route should be invalidated (such that issues in the above scenario do not occur). The following table showcases all the possibilities of rate-limit algorithm detection using the information available to each request; by sending the same route multiple times. Note that the possibilities involving a change from a
In the last case, we can NOT guarantee whether a new bucket occurs.
The main issue is revealed here. In 2, without prior classification, every subsequent bucket change for a per-resource route could signify a rate limit update. Following this, every subsequent bucket change for a per-route resource (when it happens to be different) could signify a rate limit update (rather than algorithm update). Thus, that information is irrelevant to the rate limit algorithm detection and invalidation mechanism. The same logic applies to tracking the amount of buckets per route. You can have a Route Bucket with 3 requests consisting of different resources still be a per-route rate-limit algorithm; if the bucket updated in between each request. Not to mention that there is no way to send AND ALSO process two requests at the exact same time. There's no criteria to invalidate a rate limit algorithm change beyond enhanced tracking of 429s. This is a product of the lack of consistency (and inability to contact a middle man server) in the Discord Rate Limit specification. Nonetheless, we can also explore 429 tracking which would invalidate the detected ALGORITHM for a given route when the bucket encounters a 429sDiscord shares headers on 429s and tells you whether it's a shared (per-resource) rate limit or user (rate-limit): See exception for documentation on the difference. The only reason Discord would NOT provide this (429 header) in regular requests is due to bandwidth. In essence, this decision is the software equivalent of an MBA cutting costs by removing napkins from the restaurant. You just make it harder on the end user, but Discord shows time and time again... Anyways, one could use either the per-resource specification (that contains exceptions and mistakes) or one of the above methods in an attempt to infer which algorithm is being used. Then, upon hitting a rate limit (that is not global), you can use the 429 header to prune all invalid buckets accordingly. That solves the problem. That would make Discord a genius except for the fact that this wasn't explicitly stated. And the fact that I had to "bully" them to get to this point. Yet in any case, that implementation is reliant on some level of consistency; which means exceptions need to be documented. As as an example, ImplicationExceptions and mistakes result in rate limit algorithms that are suboptimal. Using In any case, using the invalidation implementation described above could result in the correct (per-resource) rate limit algorithm being assigned to When a
At the end of the day, a map of The above problem can technically be solved by classifying every route as per-route. This entails hashing EVERY resource in any route; such that resources that aren't top-level but still rate limited via per-resource are detected. However, this adds extreme overhead (in a similar manner to adding unnecessary key entries). That's why Discord ends up on the left end of the curve. The above implication is why it's important for Discord to stick to its specification rather than have — in their own words — "mistakes and exceptions". ImplementationUse Copygen to create a map of route id's to algorithm types. Then for every request use the algorithm to determine the hash of the request. User (per-route) is simply ID and Shared (per-resource) uses the respective resource. From here, the current implementation applies. Add an invalidation mechanism upon 429 detection accordingly. |
A decision must be made on the specification of exceptions for a full (correct) implementation: discord/discord-api-docs#5605 If exceptions are not documented, the safest method of rate-limiting is to hash every route in a path. As an example, use both |
Discord has responded that exceptions will NOT be stated. Such that Disgo will implement a rate limit algorithm that is flexible enough to account for exceptions, but not 100% without fault. Implementation
DeliverablesAs described in #22 (comment), this implementation will do the following:
To be specific, it will NOT do the following:
Last Point (Discussion)The last point is not an issue with the implementation, but rather the specification. While it can be resolved — via implementation — by hashing every route resource, this would result in every request being hashed, then being pointed to a bucket. This has many implications due to the requirement of the default mechanism (by the API Producer) and is largely inefficient. The current solution is for the developer to identify the specification and correct its implementation in the code. In other words, when the specification is changed, the code needs to be changed with it. Thus, this problem is a byproduct of an inconsistent specification. Discord does not provide a programmatic method to access the specification. This is why a developer who experiences a rate-limit change involving an unspecified exception must identify the change, implement it in the exception function, then redeploy the code. Redeployment can be avoided through hot-reloading — with the use of an interpreter for the hashing function — at the cost of performance. Cases such as these — unspecified rate-limit algorithm changes — are expected to be rare or problematic enough that it prompts a restart of the bot. As a result, an interpreter will not be used. |
While attempting to create a test for per-resource routes in #34, I kept running into routes that were specified as per-resource by the documentation, but were actually per-route. For example, using two different channel parameters with GetChannel resulted in the same Rate Limit Bucket.
Then, I attempted to use GetChannelMessages and ran into the same behavior. Then, I used the GetReactions emoji endpoint with two different channels and two different emojis... Only to hit the same bucket for each. In search of more information, I encountered this issue: discord/discord-api-docs#5312. That's when it occurred to me that Discord is likely applying BOTH a per-route and per-resource rate limit to certain routes. Such that per-resource does not refer to a per-resource rate limit (per route per the user), but rather an additional per-resource rate limit on the route (for certain resources on that route). YET THIS IS NOT SPECIFIED IN THE GODDAMN DOCUMENTATION... FUUUUUUUUUUUUUUU- So what can we do? The Disgo rate limit algorithm is designed to be extendable; such that the ability to use multiple buckets per route request could be implemented (by using slices of bucket IDs). Yet no other wrapper does this. In addition, adhering to per-resource rate limits isn't necessarily required. Other API Wrappers don't adhere to them, and 429s against them aren't counted against the user anyways. In the context above, a per-resource rate limit is effected by other users. Such that adhering to them is a moot point: The state of a per-resource rate limit will always be manipulated by factors the program cannot keep track of. One could argue that attempts to keep track of a per-resource rate limit would allow a bot to be more efficient (by sending less requests using estimation). The alternative — which developers must implement right now — is to send a request multiple times until a ImplicationThe above information can be confirmed by researching the introduction of Cloudflare Bans to the API. Discord stated that most bots — prior to implementing excessive-429 bans — would send a request until it was successful, rather than adhere to the rate limit. So it seems that the per-route rate limits are used to limit the amount of times the user can send a request (in any given period), while the per-resource rate limits simply limit the usage of the resource itself. The user is still required to barrage the Discord server with a request until that request is successful. To reiterate, a user is expected to send a request as many times as possible until it is successful. The Viewing the rate limit header examples with this information showcases that per-resource rate limit buckets aren't even provided.
The bucket has tokens remaining despite the rate limit occurring, indicating that the bucket being shown still applies to the user. This isn't specified in the documentation, though. ImplementationBased on the above information, there is no need for invalidation or detection beyond updating per-route buckets when these are changed. There is no need to implement separate rate limit algorithms for each route, because the user is only expected to manage the per-route rate limit. One could argue here that the documentation is not only unclear, but misleading... That's for another article. Most important is that Disgo already allows the user to control the rate limits that the user is expected to control. As far as implementation goes, Disgo could assist the user by supplying a way to retry a request until it is successful. Currently, the user must set an arbitrary value to |
reverts fill hashing function map #22 (comment)
Jesus christ the rate limit docs really are atrocious. Feels like some employee offhandedly wrote the rate limit docs in 30mins and then the entire Discord team forgot about it since |
Rate Limits have been merged in #14 with one caveat.
Problem
Per Resource (and Per Resource Per Route) rate limits are not yet implemented.
The main reason is due to a lack of information by Discord regarding the following statement: "During calculation, per-route rate limits often account for top-level resources within the path using an identifier". This statement is rather ambiguous because "often" implies that it does NOT apply to every request containing a top-level resource; or started by a top-level resource. The problem is that this statement is also the only sentence regarding Discord's Per Resource Rate Limits; barring
X-RateLimit-Scope
which only indicates that a 429'd request is a Per Resource Rate Limit. There is also no clarification why theX-RateLimit-Scope
calls Per Resource rate limitsshared
, which could imply that multiple users can trigger than rather than just the bot.If we knew the pattern to identify top-level resources, it could be implemented in Disgo's current form. In the worst case (a misidentification of a Route rate limit as a Per Resource rate limit), excessive maps of RouteID's to Hashes are made when the Bucket remains the same; but the Bucket isn't actually duplicated. Unfortunately, that pattern is not confirmed.
Most if not all libraries don't care about Per Resource rate limits since they are not counted against you. As an example where Per Resource rate limits are handled, DiscordGo creates a custom Per Route rate limit bucket for a Per Resource (Emoji) rate limit; limiting it's efficiency. Discord itself states that 429's are inevitable for rate limits such as Emojis, but then also disapproves the strategy of hitting a 429 and adjusting from there.
Implementation
The implementation of Per Resource (and Per Resource Per Route) rate limits in Disgo is rather straightforward.
uint16
tostrings
(which means the rate limiter interface must use strings).Send
functions of endpoints in the list of known Per Resource Routes (using Copygen).These endpoints would provide the route-number (i.e
1
), top-level resource (i.eg
,c
,w
), and top-level resource id (i.e snowflake42123...
) to theSendRequest
function. This would result in a hash that looks something like1g41231
which does not collide with Discord's bucket hash, while still remaining unique to a per resource route.Solution
Identification
In order to solve this issue, we must first identify the pattern (or routes) used as Per Resource rate limits. As an example, we know that Emoji endpoints are all Per Resource Per Route rate limits based on a given channel ID. There are multiple ways we can identify these routes, with the simplest being discord/discord-api-docs#5144. Otherwise, we may need to create the following experiments.
Experiment 1
Create a test that attempts every request up to 50 times a second until a 429 is hit (in that second), then outputs the
X-RateLimit-Scope
with the given route accordingly. If a rate limit is NOT hit, there is no rate limit for that request. Technically the chance of missing rate limits that span longer than a second (i.e Global Gateway Ratelimit) is still possible. However, ashared
(Per Resource) rate limit is expected to contain rate limit headers. As a result, these could be used (instead of a specific time frame) to trigger 429's.Experiment 2
Implement the Per Resource feature on all routes as-is and use logging and/or a detection feature to identify when a Route is switching too much/little. This fast tracks the implementation (by requiring it for this experiment), but this experiment may not be worth the cognitive effort.
The text was updated successfully, but these errors were encountered: