Current Window Incrementation Despite Blocking #110

CariappaKGanapathi · 2025-01-27T08:31:53Z

Currently, under the Increment function for Redis implementation, the current window count is increment regardless of whether the request has to be allowed further or not. While trying to rate limit APIs, this causes an issue as the multiple failed hits to the server will push the next allow time further and further in the future. Ideally, once request is dropped, the time that the next request can be accepted should not change. Which can be achieved by only incrementing the current window counter for requests that were actually ALLOWED.

If there is any specific reason to increment the counter before checking for Allow, please let me know, will create a PR to handle both cases.
If not, can the default behaviour be the one that is suggested above?

Code line :

limiters/slidingwindow.go

Line 150 in 9ee5ca3

return prevCount, incr.Val(), nil

mennanov · 2025-01-28T00:27:48Z

Is the behavior you're referring to incorrect only for the Redis backend or for the actual SlidingWindow algorithm (all backends)?

mennanov · 2025-01-28T07:11:09Z

I think i understand the problem you are probably referring to: the case when multiple clients make requests concurrently without coordination.

When the Limit() function returns ErrLimitExhausted with the delay duration d1 another client may call Limit() shortly after which will return ErrLimitExhausted with delay duration d2 (d2 > d1).
The first client waits for duration d1, but the Limit() will return ErrLimitExhausted again because the other client's call incremented the counter.

As a quick workaround you can try an exponential backoff on the clients (instead of just waiting for the duration returned by the Limit() function).

Counting just the allowed requests will likely require changing the SlidingWindowIncrementer interface (possibly adding a capacity argument to the Increment() function to check whether the counter should be incremented) or using a lock to read the current state and increment only when necessary (a better approach IMO).

Either solutions introduce breaking changes and should probably be implemented as a separate sliding window algorithm.

What do you think?

CariappaKGanapathi · 2025-01-29T06:52:39Z

I believe the behaviour is same regardless of the backend used because it is being called from a single place and all Increment function increments the current counter regardless.

Called from :

limiters/slidingwindow.go

Line 55 in 9ee5ca3

prev, curr, err := s.backend.Increment(ctx, prevWindow, currWindow, ttl+s.rate)

Yes, I agree with the approach of adding a capacity argument to the Increment() function. Currently we are using a Mitigation cache key with the ttl equal to the nearest time that the request can be accepted in order to skip checking for rate limit altogether. Maybe that can be an addition feature too, for performance?

leeym · 2025-01-29T21:17:33Z

How about we keep the backend implementation as is, but update sliding window itself to handle wait time calculation correctly? Currently, we calculate the wait time based on the previous and the current window counters, even if they exceed the capacity. We can update the calculation to use the smaller between the counter and the capacity, so once the counter exceeds the capacity, it won't further increase the wait time. -- Yen-Ming Lee ***@***.***> Renat ***@***.***> 於 2025年1月27日週一下午11:11寫道：

…

I think i understand the problem you are probably referring to: the case when multiple clients make requests concurrently without coordination. When the Limit() function returns ErrLimitExhausted with the delay duration d1 another client may call Limit() shortly after which will return ErrLimitExhausted with delay duration d2 (d2 > d1). The first client waits for duration d1, but the Limit() will return ErrLimitExhausted again because the other client's call incremented the counter. As a quick workaround you can try an exponential backoff on the clients (instead of just waiting for the duration returned by the Limit() function). Counting just the allowed requests will likely require changing the SlidingWindowIncrementer interface (possibly adding a capacity argument to the Increment() function to check whether the counter should be incremented) or using a lock to read the current state and increment only when necessary (a better approach IMO). Either solutions introduce breaking changes and should probably be implemented as a separate sliding window algorithm. What do you think? — Reply to this email directly, view it on GitHub <#110 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABBF7RKVBUDU5767BQOC5L2M4USFAVCNFSM6AAAAABV5QCBQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMJYGA4DMNRQGM> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

mennanov · 2025-01-30T06:59:25Z

Yes, I agree with the approach of adding a capacity argument to the Increment() function. Currently we are using a Mitigation cache key with the ttl equal to the nearest time that the request can be accepted in order to skip checking for rate limit altogether. Maybe that can be an addition feature too, for performance?

To be honest i don't like the approach with adding the capacity argument to the Increment() function because it will force the Increment() function to implement the actual throttling logic, which is too much to ask from the SlidingWindowIncrementer abstraction.

Moreover, that logic will still require reading the values from the backend, do the math and potentially writing the new state to the backend (if the request is not throttled). That introduces a race condition and requires a lock.

That being said, i would rather proceed with the lock approach like the other limiters in this repo.

mennanov · 2025-01-30T06:59:38Z

We can update the
calculation to use the smaller between the counter and the capacity, so
once the counter exceeds the capacity, it won't further increase the wait
time.

Sorry, i don't quite follow. The idea of the returned time duration is for the client to delay its retry for the given duration so that the retry won't get throttled again.
However in a situation with multiple clients and high volume of requests a subsequent throttled request "invalidates" the delays returned to the previous clients.

leeym · 2025-01-30T19:33:00Z

Currently, all the backend implementations don't know about the capacity, and simply increment the current window counter - https://github.com/mennanov/limiters/blob/master/slidingwindow.go#L102 - https://github.com/mennanov/limiters/blob/master/slidingwindow.go#L128 - https://github.com/mennanov/limiters/blob/master/slidingwindow.go#L195 - https://github.com/mennanov/limiters/blob/master/slidingwindow.go#L270 The previous window counter and the current window counter are returned to the sliding window itself. - https://github.com/mennanov/limiters/blob/master/slidingwindow.go#L55 We use these two counters to calculate the "total", and if it is larger than the capacity, we calculate the "wait" and return it along with ErrLimitExhausted - https://github.com/mennanov/limiters/blob/master/slidingwindow.go#L60-L70 The problem happens in the last step: when "curr" keeps increasing, "total" will keep increasing too, and we will decline some requests incorrectly with larger and larger "wait". That's why in your example, d2 will be larger than d1. If it works correctly, d2 should be a little bit smaller than d1, and the difference between d1 and d2, is the difference between the request time from the first and the second clients. The server should tell both clients "wait and come back later at this moment" My proposal is to cap the counter at the capacity at line 59, so once the counter reaches the capacity, it will not push the total further. prev = int64(math.Min(float64(prev), float64(s.capacity))) curr = int64(math.Min(float64(curr), float64(s.capacity)))

…

-- Yen-Ming Lee ***@***.***> Renat ***@***.***> 於 2025年1月29日週三下午11:00寫道：

We can update the calculation to use the smaller between the counter and the capacity, so once the counter exceeds the capacity, it won't further increase the wait time. Sorry, i don't quite follow. The idea of the returned time duration is for the client to delay its retry for the given duration so that the retry won't get throttled again. However in a situation with multiple clients and high volume of requests a subsequent throttled request "invalidates" the delays returned to the previous clients. — Reply to this email directly, view it on GitHub <#110 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABBF7XQJPG4MVZX5YJQMHL2NHEW7AVCNFSM6AAAAABV5QCBQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMMRTGY3DKMZZGY> . You are receiving this because you commented.Message ID: ***@***.***>

CariappaKGanapathi · 2025-01-31T10:06:27Z

Moreover, that logic will still require reading the values from the backend, do the math and potentially writing the new state to the backend (if the request is not throttled). That introduces a race condition and requires a lock.

I believe this is the approach you have mentioned above (correct me if I am wrong) :
Limit(){

fetch counters
if request has to be blocked -> do not increment, simply block with delay time
if request can go through -> increment counter, return
}
Locks being expensive and affecting performance is my only issue with this approach.

@leeym 's approach of blocking requests is what I have implemented with the mitigate logic.

In a nut shell, when we know a request has to be retried after a certain delay d1, do not calculate/increment for any further calls until d1 elapses.

CariappaKGanapathi · 2025-02-04T14:19:06Z

@mennanov @leeym Can we freeze on the solution? Would like to work on the same if everyone is aligned.

mennanov · 2025-02-08T01:36:06Z

My proposal is to cap the counter at the capacity at line 59, so once the
counter reaches the capacity, it will not push the total further.
prev = int64(math.Min(float64(prev), float64(s.capacity)))
curr = int64(math.Min(float64(curr), float64(s.capacity)))
…

Interesting approach, @leeym would you be able to create a draft PR with it? Ideally i would like to see the test that explains the logic behind capping the prev and curr with capacity.

The approach with the Lock() seems solid but it also kind of defeats the purpose of this sliding window approach because it's main selling point (IMO) is the lack of the lock.

CariappaKGanapathi · 2025-02-10T11:16:35Z

@leeym If you cap the counter after the increment is done, how does it help?

Won't the next request still fetch current from the backend (which is already increased) and increase it further, leading to ErrLimitExhausted continuously?

The way I see it, although you are capping the wait time. Since the current counter increments in backend without capping, it will continue to block longer than the returned wait time.

The only thing that this fix does is artificially give the wait time same as the first block. Whereas once the wait time is due, request can still be blocked if a few more requests were made in the meantime.

Please correct me if my understanding is wrong.

What i am trying to convey is that, once a request is blocked, it should ideally not be counted in the counter as a valid req. Since the application has not "served" the request. Any requests that were blocked, should not affect the counter at all

leeym · 2025-02-10T19:13:14Z

Indeed the current counter increments in backend without capping, but it will *NOT* continue to block longer than the returned wait time. Since we cap both "prev" and "curr" before calculating "total" and use it to decide blocking or not, once the counter reaches the capacity "total" will remain the same and not incorrectly block future requests. Expectation: all subsequent requests after the first block should be told to come back at the same time, and the very first request after that should be accepted. I just updated the test case in #116 to show what I described. -- Yen-Ming Lee ***@***.***> Cariappa K Ganapathi ***@***.***> 於 2025年2月10日週一上午3:16寫道：

…

@leeym <https://github.com/leeym> If you cap the counter after the increment is done, how does it help? Won't the next request still fetch current from the backend (which is already increased) and increase it further, leading to ErrLimitExhausted continuously? The way I see it, although you are capping the wait time. Since the current counter increments in backend without capping, it will continue to block longer than the returned wait time. The only thing that this fix does is artificially give the wait time same as the first block. Whereas once the wait time is due, request can still be blocked if a few more requests were made in the meantime. Please correct me if my understanding is wrong. — Reply to this email directly, view it on GitHub <#110 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AABBF7WO2L55JSKYFUSEZKD2PCDCTAVCNFSM6AAAAABV5QCBQWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDMNBXGY4DEOBXGM> . You are receiving this because you were mentioned.Message ID: ***@***.***>

CariappaKGanapathi · 2025-02-11T08:21:38Z

Hello @leeym

I am still confused with the working, I have tried my best to explain my scenario using the below data. This is a link to the google sheet, to look at the formulas and structure I have used.

In the test case you have written, I think since the entire capacity is being used, the minimum works as expected. My issue is when a counter in current window that is below capacity has been blocked.

leeym mentioned this issue Feb 8, 2025

cap counters at capacity #116

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Current Window Incrementation Despite Blocking #110

Current Window Incrementation Despite Blocking #110

CariappaKGanapathi commented Jan 27, 2025

mennanov commented Jan 28, 2025

mennanov commented Jan 28, 2025

CariappaKGanapathi commented Jan 29, 2025

leeym commented Jan 29, 2025 via email

mennanov commented Jan 30, 2025 •

edited

Loading

mennanov commented Jan 30, 2025

leeym commented Jan 30, 2025 via email

CariappaKGanapathi commented Jan 31, 2025

CariappaKGanapathi commented Feb 4, 2025

mennanov commented Feb 8, 2025

CariappaKGanapathi commented Feb 10, 2025 •

edited

Loading

leeym commented Feb 10, 2025 via email

CariappaKGanapathi commented Feb 11, 2025

Current Window Incrementation Despite Blocking #110

Current Window Incrementation Despite Blocking #110

Comments

CariappaKGanapathi commented Jan 27, 2025

mennanov commented Jan 28, 2025

mennanov commented Jan 28, 2025

CariappaKGanapathi commented Jan 29, 2025

leeym commented Jan 29, 2025 via email

mennanov commented Jan 30, 2025 • edited Loading

mennanov commented Jan 30, 2025

leeym commented Jan 30, 2025 via email

CariappaKGanapathi commented Jan 31, 2025

CariappaKGanapathi commented Feb 4, 2025

mennanov commented Feb 8, 2025

CariappaKGanapathi commented Feb 10, 2025 • edited Loading

leeym commented Feb 10, 2025 via email

CariappaKGanapathi commented Feb 11, 2025

mennanov commented Jan 30, 2025 •

edited

Loading

CariappaKGanapathi commented Feb 10, 2025 •

edited

Loading