Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support bulk enqueuing of jobs which differ only in args/kwargs #340

Merged
merged 25 commits into from
Aug 25, 2022

Conversation

oeoeaio
Copy link
Contributor

@oeoeaio oeoeaio commented Feb 25, 2022

Resolves: #333

Todo:

  • This was built on top of the ruby3 branch (Support Ruby 3 #319) - clean it up after that is merged
  • Complete specs for .bulk_enqueue
  • Rework interface to use block form instead with normal calls to .enqueue
  • Disable que_job_notify trigger in .bulk_enqueue for performance
  • Disable que_state_notify trigger in .bulk_enqueue for performance / completely remove it (removal deferred)
  • Deprecate que_state_notify trigger in the changelog
  • Add option to enable notify in .bulk_enqueue
  • Test .bulk_enqueue in synchronous mode
  • Write documentation
  • Update mass-job-deletion docs to use que.skip_notify
  • Determine performance improvement
  • Add a test for enqueueing a job with ActiveJob within a .bulk_enqueue block
    • ActiveJob does not have a bulk enqueue interface and ActiveJob always adds job options to the Que.enqueue call which we do not support.
  • Add migration instructions to changelog
  • Confirm that the alternate instructions in the ActiveJob error message actually work for ActiveJob jobs - it didn't: removed
  • Note ActiveJob incompatibility in the docs with alternate instructions

To PR and release subsequently:

  • Support enqueueing different job classes within a .bulk_enqueue block
  • Support enqueueing different job_options within a .bulk_enqueue block
  • Deduplicate common code in enqueue and bulk enqueue paths

@oeoeaio oeoeaio changed the title Bulk enqueue Support bulk enqueuing of jobs which differ only in args/kwargs Feb 25, 2022
Copy link
Member

@ZimbiX ZimbiX left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great work.

How about sharing your manual testing runtime results here?

lib/que/job.rb Outdated Show resolved Hide resolved
lib/que/job.bulk_enqueue_spec.rb Outdated Show resolved Hide resolved
lib/que/job.rb Outdated Show resolved Hide resolved
@ZimbiX
Copy link
Member

ZimbiX commented Feb 25, 2022

How about making the method take an option for whether to disable the notify while enqueueing the jobs?

@oeoeaio
Copy link
Contributor Author

oeoeaio commented Feb 25, 2022

How about making the method take an option for whether to disable the notify while enqueueing the jobs?

Interesting idea. How would we do that without also impacting jobs enqueued using the normal enqueue method? The trigger is defined on the table...

@ZimbiX
Copy link
Member

ZimbiX commented Feb 26, 2022

How would we do that without also impacting jobs enqueued using the normal enqueue method? The trigger is defined on the table...

I was thinking something like this - disable and re-enable the trigger in the call to .bulk_enqueue. If it's within a transaction, it wouldn't affect other parallel activities, right?

@jasoncodes
Copy link
Contributor

Note that disabling/enabling triggers is a schema change, meaning it will require a ShareRowExclusiveLock. This will prevent row modifications while the transaction is open.

I’ve found adding a trigger condition which checks a custom setting works well to temporarily stop row triggers on bulk operations. Something like this:

CREATE TRIGGER que_job_notify
AFTER INSERT ON public.que_jobs FOR EACH ROW
WHEN (NOT coalesce(current_setting('que.skip_notify', true)::boolean, false))
EXECUTE FUNCTION public.que_job_notify();

Then use SET LOCAL within the transaction to skip the trigger:

BEGIN;
SET LOCAL que.skip_notify TO TRUE;
…
COMMIT;

@ZimbiX
Copy link
Member

ZimbiX commented Feb 27, 2022

@jasoncodes Good to know! Thanks for the tip <3

Edit: The docs for disabling a trigger confirm this

@ZimbiX
Copy link
Member

ZimbiX commented Mar 24, 2022

Rebased (via cherry-picking just the latest commit) on master now that the Ruby 3 PR has been merged

@ZimbiX
Copy link
Member

ZimbiX commented Jul 22, 2022

@jasoncodes: @AndrewColbeck and I have had to adjust the query you provided, as we found it unfortunately errors after a transaction which sets the variable. It seems that current_setting returns empty string rather than null then, which breaks the casting to boolean:

➜ ./auto/psql
[+] Running 1/0
 ⠿ Container que-db-1  Running                                                                                                                                                                                                                                                 0.0s
psql (13.7 (Debian 13.7-1.pgdg110+1))
Type "help" for help.

que-test=# select coalesce(current_setting('que.skip_notify', true)::boolean, false);
 coalesce 
----------
 f
(1 row)

que-test=# begin; set local que.skip_notify to true; select coalesce(current_setting('que.skip_notify', true)::boolean, false); rollback;
BEGIN
SET
 coalesce 
----------
 t
(1 row)

ROLLBACK
que-test=# select coalesce(current_setting('que.skip_notify', true)::boolean, false);
ERROR:  invalid input syntax for type boolean: ""

Here's a working alternative that avoids the need to cast:

➜ ./auto/psql
[+] Running 1/0
 ⠿ Container que-db-1  Running                                                                                                                                                                                                                                                 0.0s
psql (13.7 (Debian 13.7-1.pgdg110+1))
Type "help" for help.

que-test=# select coalesce(current_setting('que.skip_notify', true), '') = 'true';
 ?column? 
----------
 f
(1 row)

que-test=# begin; set local que.skip_notify to true; select coalesce(current_setting('que.skip_notify', true), '') = 'true'; rollback;
BEGIN
SET
 ?column? 
----------
 t
(1 row)

ROLLBACK
que-test=# select coalesce(current_setting('que.skip_notify', true), '') = 'true';
 ?column? 
----------
 f
(1 row)

que-test=# begin; set local que.skip_notify to true; select coalesce(current_setting('que.skip_notify', true), '') = 'true'; commit;
BEGIN
SET
 ?column? 
----------
 t
(1 row)

COMMIT
que-test=# select coalesce(current_setting('que.skip_notify', true), '') = 'true';
 ?column? 
----------
 f
(1 row)

@jasoncodes
Copy link
Contributor

Good catch. I can confirm this empty string behaviour on at least PG 14.x.

Here’s an alternate solution to consider: nullif(current_setting('que.skip_notify', true), '')::boolean

@ZimbiX
Copy link
Member

ZimbiX commented Jul 22, 2022

Here’s an alternate solution to consider

Thanks for that idea. We tried it out, but it doesn't work - I think because casting null to boolean doesn't:

que-test=# select null::boolean;
 bool 
------
 
(1 row)

que-test=# select null::boolean is null;
 ?column? 
----------
 t
(1 row)

Sorry, I thought casting null to boolean was working before 😅

We've found these two options to be working:

WHEN (nullif(current_setting('que.skip_notify', true), '') is null)
WHEN (NOT coalesce(current_setting('que.skip_notify', true), '') = 'true')

We think we prefer the second form, given it's checking the value is set to something truthy rather than just set to something.

ZimbiX added a commit to greensync/que that referenced this pull request Jul 22, 2022
…mance

Implementation is based on [a tip from @jasoncodes](que-rb#340 (comment))!

Co-Authored-By: Andrew Colbeck <andrewcolbeck@gmail.com>
ZimbiX added a commit to greensync/que that referenced this pull request Jul 22, 2022
…ormance

Implementation is based on [a tip from @jasoncodes](que-rb#340 (comment))!

Co-Authored-By: Andrew Colbeck <andrewcolbeck@gmail.com>
@ZimbiX
Copy link
Member

ZimbiX commented Jul 22, 2022

I've just realised there's a second trigger, que_state_notify, which would also be triggered when inserting each job record. I've briefly looked into that previously, and iirc, it wasn't actually used by anything. So we could think about removing it. Regardless, we should investigate it.

➜ ./auto/psql
[...]
que-test=# \d que_jobs
[...]
Triggers:
    que_job_notify AFTER INSERT ON que_jobs FOR EACH ROW WHEN (NOT COALESCE(current_setting('que.skip_notify'::text, true), ''::text) = 'true'::text) EXECUTE FUNCTION que_job_notify()
    que_state_notify AFTER INSERT OR DELETE OR UPDATE ON que_jobs FOR EACH ROW EXECUTE FUNCTION que_state_notify()

I think I'd still like to add an option in bulk enqueueing for whether to disable the notify (disabling it by default, I reckon).


The bulk_enqueue interface is rather verbose currently:

MyJob.bulk_enqueue(
  [
    { args: ['job1_arg1'], kwargs: { job1_kwarg1: 'x' } },
    { args: ['job2_arg1'], kwargs: { job2_kwarg1: 'x' } },
  ],
)

It could be good to improve that here or subsequently. I was wondering about something like:

Que::Job.with_bulk_enqueue do
  MyJob.enqueue('job1_arg1', job1_kwarg1: 'x')
  MyJob.enqueue('job2_arg1', job2_kwarg1: 'x')
end

@jasoncodes
Copy link
Contributor

I think because casting null to boolean doesn't

Ah, true. The trigger condition must return true and I missed the NOT when proposing the above.

We think we prefer […] checking the value is set to something truthy rather than just set to something.

Agreed entirely. Comparing to 'true' as a string doesn’t feel quite right to me but is technically fine. This string comparison is what I was trying to avoid. One final version for your consideration:

WHEN (nullif(current_setting('que.skip_notify', true), '')::boolean IS DISTINCT FROM true)

I read this one as “run the trigger when when skip_notify is not true”.

@jasoncodes
Copy link
Contributor

I think I'd still like to add an option in bulk enqueueing for whether to disable the notify (disabling it by default, I reckon).

I think disabled by default makes sense.

The bulk_enqueue interface is rather verbose currently.

Agreed. Not being able to use the normal Job enqueue methods is less than ideal too.

I was wondering about something like:

Que::Job.with_bulk_enqueue do
  MyJob.enqueue('job1_arg1', job1_kwarg1: 'x')
  MyJob.enqueue('job2_arg1', job2_kwarg1: 'x')
end

Yes, please! This would be is a much nicer interface and would allow users to adopt bulk enqueuing without a lot of changes to application code.


One could divert the insert_job statements into a thread-local array and flush that array in a single INSERT statement at the end of the with_bulk_enqueue block. This should reduce the need for much of the additional implementation code for this feature by re-using almost all of the existing enqueue code.

The main API difference I can think of is que_attrs won’t be available for use in the return value from enqueue. I’d probably return a job object with an empty hash when enqueue is called in bulk mode. This should ensure things like ActiveJob’s API still work correctly (but without provider_job_id being set).

@ZimbiX
Copy link
Member

ZimbiX commented Jul 23, 2022

I've reworked the interface to use basically what I proposed, but still called .bulk_enqueue. In this form, we might as well add support for enqueueing jobs of different job classes / job_options; and doing so is probably not hard.

To keep track of everything that's left, I've just added a todo list to the first comment.

@ZimbiX
Copy link
Member

ZimbiX commented Jul 23, 2022

One could divert the insert_job statements into a thread-local array and flush that array in a single INSERT statement at the end of the with_bulk_enqueue block. This should reduce the need for much of the additional implementation code for this feature by re-using almost all of the existing enqueue code.

Yeah, that's what I'd been thinking too 😉

The main API difference I can think of is que_attrs won’t be available for use in the return value from enqueue. I’d probably return a job object with an empty hash when enqueue is called in bulk mode. This should ensure things like ActiveJob’s API still work correctly (but without provider_job_id being set).

Ah, I hadn't thought of that! Cheers; done. I'm not really familiar with the ActiveJob part of Que - do you reckon we should add a test for that here?

@ZimbiX
Copy link
Member

ZimbiX commented Jul 23, 2022

Agreed entirely. Comparing to 'true' as a string doesn’t feel quite right to me but is technically fine. This string comparison is what I was trying to avoid.

Yeah, but the settings are strings, so we can't really avoid string stuff afaik, having to use either string parsing or string comparison, of which I presume the performance is similar.

One final version for your consideration:

WHEN (nullif(current_setting('que.skip_notify', true), '')::boolean IS DISTINCT FROM true)

Interesting alternative. I think what we went with is slightly more readable, personally, so I'll stick with that. Thanks though

ZimbiX added a commit to greensync/que that referenced this pull request Jul 23, 2022
To avoid a ShareRowExclusiveLock, which would prevent row modifications while the transaction is open (que-rb#340 (comment))
ZimbiX added a commit to greensync/que that referenced this pull request Aug 23, 2022
…ormance

Implementation is based on [a tip from @jasoncodes](que-rb#340 (comment))!

Co-Authored-By: Andrew Colbeck <andrewcolbeck@gmail.com>
ZimbiX added a commit to greensync/que that referenced this pull request Aug 23, 2022
To avoid a ShareRowExclusiveLock, which would prevent row modifications while the transaction is open (que-rb#340 (comment))
@ZimbiX
Copy link
Member

ZimbiX commented Aug 23, 2022

Rebased to avoid changelog conflict

ZimbiX and others added 21 commits August 25, 2022 16:10
Co-Authored-By: Andrew Colbeck <andrewcolbeck@gmail.com>
…s used

It was consistently failing when run with:

```bash
DATABASE_URL=postgres://que:que@localhost:5697/que-test TESTOPTS="--seed=50554 -v" USE_RAILS=true be rake spec
```

Co-Authored-By: Andrew Colbeck <andrewcolbeck@gmail.com>
…ormance

Implementation is based on [a tip from @jasoncodes](que-rb#340 (comment))!

Co-Authored-By: Andrew Colbeck <andrewcolbeck@gmail.com>
… 'PROCEDURE' for compatibility with older PostgreSQL
Allows fast-failing a test run using:

```bash
TESTOPTS=--fail-fast bundle exec rake spec
```
To avoid a ShareRowExclusiveLock, which would prevent row modifications while the transaction is open (que-rb#340 (comment))
…queue rather than .bulk_enqueue

I added a matcher, `assert_raises_with_message` to make the expectation cleaner. I reckon this should have been part of minitest, like it is in RSpec
… args, and do so only at the end of the .bulk_enqueue block
…ecessary serialisation, making it ~19% faster
Co-Authored-By: Jerome Paul <jerome.paul@greensync.com.au>
Co-Authored-By: Jerome Paul <jerome.paul@greensync.com.au>
Co-authored-by: Brendan Weibrecht <brendan@weibrecht.net.au>
[ActiveJob always adds job options to the Que.enqueue call](https://github.com/rails/rails/blob/main/activejob/lib/active_job/queue_adapters/que_adapter.rb#L25) but that is not supported.
Furthermore ActiveJob does not have a dedicated bulk enqueue interface.

Co-authored-by: Brendan Weibrecht <brendan@weibrecht.net.au>
ZimbiX and others added 2 commits August 25, 2022 16:41
Using Que to manually enqueue an ActiveJob job is not just as simple being able to specify your job class in the `job_class` job option in `Que.enqueue` - there's a lot more to it, e.g.:

```
[2] pry(#<running jobs via ActiveJob>)> TestJobClass.perform_later(1, 2)
=> #<TestJobClass:0x00007fb081f4bcb0 @arguments=[1, 2], @exception_executions={}, @Executions=0, @job_id="227ffa28-57b2-4517-99ef-a53a4136c01a", @priority=nil, @provider_job_id=136557356, @queue_name="default", @Timezone=nil>
[3] pry(#<running jobs via ActiveJob>)> jobs_dataset.all
=> [{:priority=>100,
  :run_at=>2022-08-25 06:25:15.406266 +0000,
  :id=>136557356,
  :job_class=>"ActiveJob::QueueAdapters::QueAdapter::JobWrapper",
  :error_count=>0,
  :last_error_message=>nil,
  :queue=>"default",
  :last_error_backtrace=>nil,
  :finished_at=>nil,
  :expired_at=>nil,
  :args=>
   [{:job_id=>"227ffa28-57b2-4517-99ef-a53a4136c01a",
     :locale=>"en",
     :priority=>nil,
     :timezone=>nil,
     :arguments=>[1, 2],
     :job_class=>"TestJobClass",
     :executions=>0,
     :queue_name=>"default",
     :enqueued_at=>"2022-08-25T06:25:15Z",
     :provider_job_id=>nil,
     :exception_executions=>{}}],
  :data=>{},
  :job_schema_version=>2,
  :kwargs=>{}}]
```

Co-Authored-By: Jerome Paul <jerome.paul@greensync.com.au>
Co-Authored-By: Jerome Paul <jerome.paul@greensync.com.au>
@ZimbiX
Copy link
Member

ZimbiX commented Aug 25, 2022

🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support bulk job creation
4 participants