-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Current Window Incrementation Despite Blocking #110
Comments
Is the behavior you're referring to incorrect only for the Redis backend or for the actual |
I think i understand the problem you are probably referring to: the case when multiple clients make requests concurrently without coordination. When the As a quick workaround you can try an exponential backoff on the clients (instead of just waiting for the duration returned by the Counting just the allowed requests will likely require changing the Either solutions introduce breaking changes and should probably be implemented as a separate sliding window algorithm. What do you think? |
I believe the behaviour is same regardless of the backend used because it is being called from a single place and all Increment function increments the current counter regardless. Called from : Line 55 in 9ee5ca3
Yes, I agree with the approach of adding a capacity argument to the Increment() function. Currently we are using a Mitigation cache key with the ttl equal to the nearest time that the request can be accepted in order to skip checking for rate limit altogether. Maybe that can be an addition feature too, for performance? |
How about we keep the backend implementation as is, but update sliding
window itself to handle wait time calculation correctly?
Currently, we calculate the wait time based on the previous and the current
window counters, even if they exceed the capacity. We can update the
calculation to use the smaller between the counter and the capacity, so
once the counter exceeds the capacity, it won't further increase the wait
time.
--
Yen-Ming Lee ***@***.***>
Renat ***@***.***> 於 2025年1月27日 週一 下午11:11寫道:
… I think i understand the problem you are probably referring to: the case
when multiple clients make requests concurrently without coordination.
When the Limit() function returns ErrLimitExhausted with the delay
duration d1 another client may call Limit() shortly after which will
return ErrLimitExhausted with delay duration d2 (d2 > d1).
The first client waits for duration d1, but the Limit() will return
ErrLimitExhausted again because the other client's call incremented the
counter.
As a quick workaround you can try an exponential backoff on the clients
(instead of just waiting for the duration returned by the Limit()
function).
Counting just the allowed requests will likely require changing the
SlidingWindowIncrementer interface (possibly adding a capacity argument
to the Increment() function to check whether the counter should be
incremented) or using a lock to read the current state and increment only
when necessary (a better approach IMO).
Either solutions introduce breaking changes and should probably be
implemented as a separate sliding window algorithm.
What do you think?
—
Reply to this email directly, view it on GitHub
<#110 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABBF7RKVBUDU5767BQOC5L2M4USFAVCNFSM6AAAAABV5QCBQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMJYGA4DMNRQGM>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
To be honest i don't like the approach with adding the Moreover, that logic will still require reading the values from the backend, do the math and potentially writing the new state to the backend (if the request is not throttled). That introduces a race condition and requires a lock. That being said, i would rather proceed with the lock approach like the other limiters in this repo. |
Sorry, i don't quite follow. The idea of the returned time duration is for the client to delay its retry for the given duration so that the retry won't get throttled again. |
Currently, all the backend implementations don't know about the capacity,
and simply increment the current window counter
- https://github.com/mennanov/limiters/blob/master/slidingwindow.go#L102
- https://github.com/mennanov/limiters/blob/master/slidingwindow.go#L128
- https://github.com/mennanov/limiters/blob/master/slidingwindow.go#L195
- https://github.com/mennanov/limiters/blob/master/slidingwindow.go#L270
The previous window counter and the current window counter are returned to
the sliding window itself.
- https://github.com/mennanov/limiters/blob/master/slidingwindow.go#L55
We use these two counters to calculate the "total", and if it is larger
than the capacity, we calculate the "wait" and return it along
with ErrLimitExhausted
-
https://github.com/mennanov/limiters/blob/master/slidingwindow.go#L60-L70
The problem happens in the last step: when "curr" keeps increasing, "total"
will keep increasing too, and we will decline some requests incorrectly
with larger and larger "wait". That's why in your example, d2 will be
larger than d1.
If it works correctly, d2 should be a little bit smaller than d1, and the
difference between d1 and d2, is the difference between the request time
from the first and the second clients. The server should tell both clients
"wait and come back later at this moment"
My proposal is to cap the counter at the capacity at line 59, so once the
counter reaches the capacity, it will not push the total further.
prev = int64(math.Min(float64(prev), float64(s.capacity)))
curr = int64(math.Min(float64(curr), float64(s.capacity)))
…--
Yen-Ming Lee ***@***.***>
Renat ***@***.***> 於 2025年1月29日 週三 下午11:00寫道:
We can update the
calculation to use the smaller between the counter and the capacity, so
once the counter exceeds the capacity, it won't further increase the wait
time.
Sorry, i don't quite follow. The idea of the returned time duration is for
the client to delay its retry for the given duration so that the retry
won't get throttled again.
However in a situation with multiple clients and high volume of requests a
subsequent throttled request "invalidates" the delays returned to the
previous clients.
—
Reply to this email directly, view it on GitHub
<#110 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABBF7XQJPG4MVZX5YJQMHL2NHEW7AVCNFSM6AAAAABV5QCBQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMRTGY3DKMZZGY>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
I believe this is the approach you have mentioned above (correct me if I am wrong) :
@leeym 's approach of blocking requests is what I have implemented with the mitigate logic. In a nut shell, when we know a request has to be retried after a certain delay d1, do not calculate/increment for any further calls until d1 elapses. |
Interesting approach, @leeym would you be able to create a draft PR with it? Ideally i would like to see the test that explains the logic behind capping the The approach with the Lock() seems solid but it also kind of defeats the purpose of this sliding window approach because it's main selling point (IMO) is the lack of the lock. |
@leeym If you cap the counter after the increment is done, how does it help? Won't the next request still fetch current from the backend (which is already increased) and increase it further, leading to ErrLimitExhausted continuously? The way I see it, although you are capping the wait time. Since the current counter increments in backend without capping, it will continue to block longer than the returned wait time. The only thing that this fix does is artificially give the wait time same as the first block. Whereas once the wait time is due, request can still be blocked if a few more requests were made in the meantime. Please correct me if my understanding is wrong. What i am trying to convey is that, once a request is blocked, it should ideally not be counted in the counter as a valid req. Since the application has not "served" the request. Any requests that were blocked, should not affect the counter at all |
Indeed the current counter increments in backend without capping, but it
will *NOT* continue to block longer than the returned wait time.
Since we cap both "prev" and "curr" before calculating "total" and use it
to decide blocking or not, once the counter reaches the capacity "total"
will remain the same and not incorrectly block future requests.
Expectation: all subsequent requests after the first block should be told
to come back at the same time, and the very first request after that should
be accepted.
I just updated the test case in
#116 to show what I described.
--
Yen-Ming Lee ***@***.***>
Cariappa K Ganapathi ***@***.***> 於 2025年2月10日 週一 上午3:16寫道:
… @leeym <https://github.com/leeym> If you cap the counter after the
increment is done, how does it help?
Won't the next request still fetch current from the backend (which is
already increased) and increase it further, leading to ErrLimitExhausted
continuously?
The way I see it, although you are capping the wait time. Since the
current counter increments in backend without capping, it will continue to
block longer than the returned wait time.
The only thing that this fix does is artificially give the wait time same
as the first block. Whereas once the wait time is due, request can still be
blocked if a few more requests were made in the meantime.
Please correct me if my understanding is wrong.
—
Reply to this email directly, view it on GitHub
<#110 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AABBF7WO2L55JSKYFUSEZKD2PCDCTAVCNFSM6AAAAABV5QCBQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNBXGY4DEOBXGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Hello @leeym I am still confused with the working, I have tried my best to explain my scenario using the below data. This is a link to the google sheet, to look at the formulas and structure I have used. In the test case you have written, I think since the entire capacity is being used, the minimum works as expected. My issue is when a counter in current window that is below capacity has been blocked. |
Currently, under the Increment function for Redis implementation, the current window count is increment regardless of whether the request has to be allowed further or not. While trying to rate limit APIs, this causes an issue as the multiple failed hits to the server will push the next allow time further and further in the future. Ideally, once request is dropped, the time that the next request can be accepted should not change. Which can be achieved by only incrementing the current window counter for requests that were actually ALLOWED.
If there is any specific reason to increment the counter before checking for Allow, please let me know, will create a PR to handle both cases.
If not, can the default behaviour be the one that is suggested above?
Code line :
limiters/slidingwindow.go
Line 150 in 9ee5ca3
The text was updated successfully, but these errors were encountered: