-
Notifications
You must be signed in to change notification settings - Fork 59
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logical scheduling so massive queues are possible. #8
Comments
I've wanted this functionality for a while now but haven't had much time recently to focus on this gem. You're more than welcome to hack on something. :) |
I think that the solution is actually to rate-limit the fetch rather than scheduling things out. I've got the same problem, plus an interaction issue with sidekiq-priority that makes this solution a necessity. What I'm thinking is that, rather than scheduling jobs, the fetch method should be rate limited across all workers. this would actually solve a separate issue, which is the misleading balloon in the 'processed' count as things go back & forth between scheduled & the queue. Looking around I see a fair number of rate limiting gems. My eye was drawn to glutton_ratelimit because of its mention in a number of search results, but also because of its ability to limit based on either a burst strategy (send them all & then wait) or an average strategy (dole it out over a period of time). This seems ideal for dealing with different kinds of APIs. I don't know if you plan to address this any time soon, but unless it's already in the works then I think I'm going to have to do something sooner rather than later. I welcome (hope for, actually) any thoughts or feedback on the topic. |
I've been holding off on implementing this sort of functionality until I have a better idea of the implementation. I agree, fetching may be more efficient than scheduling. You might want to look at Sidekiq::Fetcher#fetch which then uses Sidekiq::Fetcher.strategy to pick a fetching strategy. If you were to use glutton_ratelimit, you could then extend the basic fetcher and rate limit the class RateLimitedFetcher < Sidekiq::BasicFetch
extend GluttonRateLimit
rate_limit :retrieve_work, 5, 60
end ...I'm going to reflect on this a bit more. Thoughts? |
So, I actually think that's about it. I think the rest would be wrapping that up in a sidekiq middleware wrapped in a gem, & then adding some config options to choose the strategy (exhaust vs average). The only question in my mind is how to most effectively wrap it up in middleware. I guess a second, ancillary question is, is this the direction you want to take sidekiq-throttler? or do you see this as being a separate implementation of a similar concept? Either way I'm interested in working on it, & it would be great to work it out with you & not saddle myself with yet another (possibly) redundant gem. |
So, glutton_ratelimit won't work, it's not thread-safe. Of course, I'm sure there's an alternative that is ... just have to find it :/ |
redis_rate_limiter looks pretty good. In the meantime I'm going to try it out with the same subclassing strategy you outlined above. However, since you can really only have one custom fetcher I don't think it's a sustainable strategy in my codebase or even a gem. I believe the solution is to add rate-limiting directly to sidekiq, or at least an improved interface for custom fetchers. |
I went ahead & cut a gem, sidekiq-rate-limiter. It doesn't support procs in the options hash yet, but is otherwise similar. I'll be working to improve it as time allows, but for now it's a decent solution for our purposes. |
Indeed that's a serious and surprising design mistake. Guess it was build for a different purpose. |
Any news about this thread? |
I ended up writing a pull for sidekiq-limit_fetch that allows to suspend processing a queue for some time. |
Sorry everyone, I'm not using Sidekiq or sidekiq-throttler these days. 😰 If anyone wants to take over from here, let me know. I'm also more than happy to link to alternatives. |
As I understand, the way this is working right now: when the threshold is reached, it schedules the jobs for the period from now. When the period cycles, the scheduled jobs drop in, and it all repeats. If I'm wrong about this, please correct me.
The issue I'm having with this is when I want to queue up a massive number of jobs (say 50,000) with the threshold of 50 and a period of 1 minute.
So what happens is every 1 minute just under 50,000 jobs have to get processed. This doesn't scale well.
How I would love to see it work is to smartly delay the items. For example, the first 50 can process now, the next 50 in 1 minute from now. The next 50 in 2 minutes, etc.
This will require some tracking of the current queue so that after all ~50,000 items are scheduled (for as far out as ~1000 minutes from now) it can logically add future items. So in a few hours from now, if I add more items, it should automatically figure out where the end of the queue is time-wise and schedule the new items to be completed at the end of the queue.
Is there any plans to change the functionality to work this way? If not, I might spend some time and try to hack something out.
The text was updated successfully, but these errors were encountered: