Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow configuring whether throttled jobs are put back on the queue immediately or scheduled for the future #150

Merged
merged 22 commits into from
Nov 17, 2024

Conversation

lavaturtle
Copy link
Contributor

This adds support for setting the requeue_strategy for a job, to specify what we should do with throttled jobs. The default is :enqueue, which is the current behavior: re-add it to the end of the queue. The other option is :schedule, which schedules the job for a time in the future when we think we'll have capacity to process it.

It's also possible to set the default_requeue_strategy in the configuration, to set this behavior for all jobs that do not individually specify a requeue_strategy.

This may be relevant to Issue #36.

Unrelatedly, there's a commit in here that changes the format of some require_relative lines, to comply with the new Rubocop rule Style/RedundantCurrentDirectoryInPath. I don't feel strongly about this commit; I only added it so that rubocop would pass.

@ixti
Copy link
Owner

ixti commented Jun 29, 2023

That is awesome! Thank you! I will take a look this weekend.

@ixti
Copy link
Owner

ixti commented Jun 29, 2023

Re: Style/RedundantCurrentDirectoryInPath

I prefer consistency. It is pretty weird to me that rubocop's default for rescue is to be explicit, while for require_relative it's implicit. Let's just disable this cop. I will find time to open a rubocop PR to make that configurable. But for now, I would like it to be simply disabled.

On a side note about this PR, I think I made a bad decision back when I started this gem on providing custom sidekiq_throttle singleton class message, probably we should utilize sidekiq_class_attribute helper instead:

def self.included(base)
  base.sidekiq_class_attribute :sidekiq_throttle_push_back # :enqueue | :schedule
end

In this case we will not need to bother about "keeping track" of job class inheritance.

@ixti ixti self-assigned this Jun 29, 2023
@lavaturtle
Copy link
Contributor Author

Re: Style/RedundantCurrentDirectoryInPath

I prefer consistency. It is pretty weird to me that rubocop's default for rescue is to be explicit, while for require_relative it's implicit. Let's just disable this cop. I will find time to open a rubocop PR to make that configurable. But for now, I would like it to be simply disabled.

Makes sense! I've updated the PR to disable the cop instead of applying it.

.rubocop.yml Outdated Show resolved Hide resolved
@lavaturtle
Copy link
Contributor Author

On a side note about this PR, I think I made a bad decision back when I started this gem on providing custom sidekiq_throttle singleton class message, probably we should utilize sidekiq_class_attribute helper instead:

Ah, interesting! I didn't know about sidekiq_class_attribute. So if we do that, then a job class can specify sidekiq_throttle_push_back in its sidekiq_options, and then we can read it later with get_sidekiq_options?

Would you prefer push_back as the setting name rather than requeue_strategy?

@ixti
Copy link
Owner

ixti commented Jun 29, 2023

sidekiq_class_attribute :sidekiq_throttle_push_back will define a new singleton class method, so the usage will look like:

class MyJob
  include Sidekiq::Job
  include Sidekiq::Throttled::Job

  sidekiq_throttled_push_back :enqueue
end

Naturally, we should check if the value is Proc, and if so - we should call that proc to get the value, so that it will be possible to:

class MyJob
  include Sidekiq::Job
  include Sidekiq::Throttled::Job

  sidekiq_throttled_push_back ->(user_id, *) { user_id.odd? ? :enqueue : :schedule }

  def perform(user_id, more, stuff)
    # ...
  end
end

@ixti
Copy link
Owner

ixti commented Jun 29, 2023

I have no strong feelings about push_back vs requeue_strategy. I just don't like over-use of strategy in names (easy to confuse with Throttled::Strategy). Probably simply: sidekiq_throttled_requeue_with?

Also, just realized, that it would be nice to also allow specifying which queue it should be requeued to.

@lavaturtle
Copy link
Contributor Author

lavaturtle commented Jun 30, 2023

Hmm, I've been working on getting the sidekiq_class_attribute approach to work, and as far as I can tell from my testing, if we do this in Sidekiq::Throttled::Job:

      def self.included(worker)
        worker.send(:extend, ClassMethods)
        worker.sidekiq_class_attribute :sidekiq_throttled_requeue_with  # :enqueue | :schedule
      end

then the job class needs to use it like this:

      class MyThrottledJob
        include Sidekiq::Job
        include Sidekiq::Throttled::Job

        self.sidekiq_throttled_requeue_with = :schedule
        sidekiq_throttle foo: :bar

i.e. it ends up as a class-level variable that needs to be assigned, rather than being something we can use like sidekiq_throttled_requeue_with :schedule.

@lavaturtle
Copy link
Contributor Author

So maybe we invoke sidekiq_class_attribute for the inheritance benefits, but instead of users using sidekiq_throttled_requeue_with in the class directly, we keep the requeue_with option for sidekiq_throttle? That also avoids any weirdness around which order the two are called in.

@lavaturtle
Copy link
Contributor Author

lavaturtle commented Jun 30, 2023

Okay! I've made a number of changes. The basic usage should now look like this:

  sidekiq_throttle threshold: {limit: 123, period: 1.hour}, requeue: {to: :other_queue, with: :schedule}

with support for Procs:

class MyJob
  include Sidekiq::Job
  include Sidekiq::Throttled::Job

  sidekiq_throttle threshold: {limit: 123, period: 1.hour},
                   requeue: {to: ->(user_id, *) { user_id.odd? ? :odd_queue, :even_queue },
                             with: ->(_user_id, more, *) { more > 25 ? :schedule : :enqueue }}

  def perform(user_id, more, stuff)
    # ...
  end
end

and the default configuration looks like this:

Sidekiq::Throttled.configuration.default_requeue_options = {with: :schedule, to: :my_throttled_jobs_queue}

Both :with and :to arguments are optional. If :with is not specified, we'll use the :enqueue method. If :to is not specified, we'll use the queue the job was originally enqueued into.

@danpolyuha
Copy link

Hey @ixti ,
Thanks for your amazing work on this gem.
Sorry for asking here, just don't want to create new threads,
Do you have any estimate on release of 1.0.0 stable?

@woodhull
Copy link
Contributor

Excited to land this PR both for the app I help maintain with @lavaturtle and another product I'm working on. Anything else we can do to help move this and v1 across the finish line? Happy to put some more of our engineering time in on our side if there are discrete tasks you can point us at. We're keen to invest in the gem & code quality.

@ixti
Copy link
Owner

ixti commented Aug 12, 2023

@lavaturtle @woodhull thanks for the amazing PR. I will try to go through it today-tomorrow. Sorry for being unresponsive (was really overwhelmed with some work lately - but slowly getting back at dedicating time to sidekiq-throttled).

@ixti
Copy link
Owner

ixti commented Aug 12, 2023

The only issue I'm scratching my head about right now is the Sidekiq-Pro support. Can't figure out how to make sure it works correctly. Thus thinking on skipping it's support for 1.0.0 release completely and circle back at that task after 1.0.0 release.

@mnovelo
Copy link
Contributor

mnovelo commented Aug 15, 2023

@ixti I'm happy to help with the Sidekiq Pro support, regardless of whether it's for 1.0.0 or later! Let me know if there's a branch you'd like me to test and/or review

@woodhull
Copy link
Contributor

Hey.... Just looping back to this.

We're not using Sidekiq Pro so we aren't sure how to help move this forward on that front. That code is private for paid Sidekiq customers I think?

@mnovelo
Copy link
Contributor

mnovelo commented Nov 8, 2024

the Sidekiq Pro specs pass too 🎉

@anero
Copy link
Contributor

anero commented Nov 8, 2024

Great, thanks for testing @mnovelo!

I was concerned that there may be a problem with how we requeue the job here https://github.com/ixti/sidekiq-throttled/pull/150/files#diff-c15f45918c6d7d825fbfa0856557360b907b492e44879ba9eb1e7f3bad9c42baR89 which is different from how it was previously done (https://github.com/ixti/sidekiq-throttled/pull/150/files#diff-ae016bb4af61725f74c58dac11f63bd38a08d6c5c5519e467902911a51077b0fL27), but I don't know enough about Sidekiq's API to say if it's an actual problem

@mnovelo
Copy link
Contributor

mnovelo commented Nov 8, 2024

oh, that is a problem actually @anero I missed that change. Sidekiq Pro's super_fetch does do important additional logic when requeuing. I can work on a PR to address this, but I would like it if we kept Throttled.requeue_throttled(work) or similar so that super_fetch can run its additional logic

Comment on lines +59 to +63
def reset!
@cooldown_period = 1.0
@cooldown_threshold = 100
@default_requeue_options = { with: :enqueue }
end
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO it's better to introduce #configure! on Sidekiq::Throttledto reconfigure if needed. As#resetwill only work duringconfigure` anyway (config gets frozen)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow you. I understand that this will only work if used as we do here.
What would be the difference that you propose between Sidekiq::Throttled#configure! and Sidekiq::Throttled#configure? I thought a reset! method on Config would be cleaner to set all settings back to their default values. Alternatively I think something like the following would work:

Sidekiq::Throttled.configure do |config|
  config.cooldown_period    = 1.0
  config.cooldown_threshold = 100
  config.default_requeue_options = { with: :enqueue }
end

The problem IMO would be that you'd leak the default values out of Config. Even if declared as constants, I assumed it'd be more correct to keep the defaults encapsulated on the config class.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that config gets frozen after the #configure block. So you can't call Sidekiq::Throttled.config.reset!, thus with current API one will have to:

Sidekiq::Throttled.configure do |config|
  config.reset!

  ...
end

Versus:

Sidekiq::Throttled.configure! do |config|
  ...
end

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After thinking a bit, I think config.reset! is actually cleaner approach :D

Comment on lines 3 to 8
class ThrottledTestJob
include Sidekiq::Job
include Sidekiq::Throttled::Job

def perform(*); end
end
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move this into spec/support/sidekiq

anero and others added 2 commits November 8, 2024 16:31
Co-authored-by: Alexey Zapparov <alexey@zapparov.com>
Signed-off-by: Diego Marcet <anero@users.noreply.github.com>
@anero
Copy link
Contributor

anero commented Nov 8, 2024

oh, that is a problem actually @anero I missed that change. Sidekiq Pro's super_fetch does do important additional logic when requeuing. I can work on a PR to address this, but I would like it if we kept Throttled.requeue_throttled(work) or similar so that super_fetch can run its additional logic

@mnovelo I wasn't sure if there's a way to reference the correct retriever (BasicFetch / SuperFetch) from Sidekiq::Throttled::Strategy#requeue_throttled.
I guess we could perform a similar verification as it's done elsewhere to decide whether Sidekiq Pro is available or not, and requeue using the correct mechanism. Do you've any thoughts on this @ixti ?

@mnovelo
Copy link
Contributor

mnovelo commented Nov 8, 2024

@anero oh, the class of the work variable will be Sidekiq::Pro::SuperFetch::UnitOfWork, so that's one way to distinguish whether the super_fetch strategy should be used. That was already being assumed in the previous implementation because it just delegated the requeuing logic to Sidekiq::Pro::SuperFetch::UnitOfWork#requeue so something like

case work
when Sidekiq::Pro::SuperFetch::UnitOfWork
# super_fetch stuff
else
# basic_fetch stuff
end

could work. Though I haven't dug deep into this yet

@ixti
Copy link
Owner

ixti commented Nov 9, 2024

I think, the most clean solution will be to patch UnitOfWork similar way we patch BasicFetch and SuperFetch. What @mnovelo proposed will work even better, though. :D

@mnovelo
Copy link
Contributor

mnovelo commented Nov 9, 2024

ok, I'll work on a PR into this one then @anero @ixti !

@mnovelo
Copy link
Contributor

mnovelo commented Nov 9, 2024

@anero @ixti here's my proposal controlshift#2

If this looks good, I can write specs for it too

Update requeue strategy to work with SuperFetch
@anero
Copy link
Contributor

anero commented Nov 11, 2024

Thanks for the PR @mnovelo, sorry I merged before seeing your comment on the specs. Want to add those separately? (I'd do it myself but I don't have Sidekiq Pro to be able to run them)

…n case the gem is not installed

Also included: fix rubocop offenses
@anero
Copy link
Contributor

anero commented Nov 12, 2024

@mnovelo I fixed the failing specs. Would you mind adding test coverage for the new code on Sidekiq::Throttled::Strategy to raise code coverage and get the build passing?

@mnovelo
Copy link
Contributor

mnovelo commented Nov 13, 2024

@anero Specs are ready for your review! controlshift#3

@mnovelo
Copy link
Contributor

mnovelo commented Nov 14, 2024

This is looking good! However, I want to highlight something important about the new scheduling strategy for future reference. It is not fully compatible with Sidekiq::Batch because when a job gets rescheduled using this strategy, it is assigned a new job ID. If a job within a batch gets throttled with the new scheduling strategy, the original job will be marked as complete. This could lead the batch to assume that the job has been completed as well. I don't think the new job retains the batch ID, and I didn't make any effort to ensure this or write any specs to test this. In my opinion, there isn't a strong use case for using both the new scheduling strategy and Sidekiq::Batch together. However, I'm happy to help revisit that in the future should the need arise.

@anero
Copy link
Contributor

anero commented Nov 15, 2024

@ixti do you think you'll have time to review and merge (assuming you're OK with @mnovelo comment)?

@ixti
Copy link
Owner

ixti commented Nov 15, 2024

Yeah. I will review, apply changes (if will have any) and merge this weekend.

@ixti
Copy link
Owner

ixti commented Nov 15, 2024

Most likely merge as is - don't want to hold up this any longer ))

@ixti
Copy link
Owner

ixti commented Nov 15, 2024

@freemanoid if you can take a look as well - that would be splendid!

@ixti ixti merged commit 27ec05e into ixti:main Nov 17, 2024
17 checks passed
@ixti
Copy link
Owner

ixti commented Nov 17, 2024

I'm going to release this as v2.0.0.pre1 and open a PR with the refactoring I have in mind later.

I will release this change and the cooldown defaults changes as v1.5.0 instead. Refactoring I have in mind is to make sure requeue throttled would work fine with Job-Iteration alike API and be a bit more isolated from the throttling strategies. Thus that change (when I will have time for it) will deserve v2.0.0 :D

@ixti
Copy link
Owner

ixti commented Nov 17, 2024

Release: #197

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants