Fix bug in SQL queue that can cause tasks to be run twice in a multiprocess environment #15421

artfulrobot · 2019-10-07T14:18:13Z

…

Overview

The SQL queue is supposed to ensure that each task on the queue is run once and only once, and in order.

Before

There was the possiblity that some tasks would be run more than once, if multiple queue runners were on the go at the same time.

After

Only one task is run at once, regardless how many simultaneous queue runners there may be.

Technical Details

In a multiprocess situation (e.g. if 2+ crons run at once, or other special setups) the existing implementation of SQL queue could result in tasks being run twice (or even more).

This is explored with the following extension which lets you set up multiple queue processor processes in parallel to attack a queue and then checks whether the jobs all ran once etc.

https://github.com/artfulrobot/queuetest

This MR is the result of a successful patch to the SQL queue (in the extension above this is called Sql2 so that it could be compared with the original).

This MR does not change how queues are handled; jobs must still run one after the other. It just protects the queue from problems in a multiprocess situation.

…rocess environment

civibot · 2019-10-07T14:18:15Z

(Standard links)

If this is your first pull-request for CiviCRM, please browse CONTRIBUTING.md for information about the development and testing processes.
If you are reviewing this pull-request, you may wish to consult the test sites and the Review Standards (long template, short template).

seamuslee001 · 2019-10-07T16:53:10Z

@artfulrobot the failure look related

…of server's.

artfulrobot · 2019-10-08T05:54:55Z

@seamuslee001 Ah! *blush* I should have run the tests! If I had, I would have learnt that CiviCRM's tests assume they can set the time to 2012 without telling MySQL about that.

Have "fixed" the code so that the tests pass by passing in Civi's idea of what the date and time are, instead of using SQL's NOW(). Tests pass for me now.

mattwire · 2019-10-13T15:04:19Z

This was well tested at the sprint in Barcelona. Merging now so we get plenty of testing before 5.20

seamuslee001 · 2019-10-17T01:38:04Z

@artfulrobot @mattwire this caused some test failures in my civigeometry extension testing, This is i think because i am running the queue from within an API call and the lock tables process appears to be causing some flackyness within my code

To reproduce clone civigeometry from the lab then within the folder run phpunit6 --filter 'testGeometryAddressStorage' and then login to the test mysqldb and see that there is still rows in civigeometry_geometry and associated tables which should have been cleared out by the unit test

artfulrobot · 2019-10-17T06:11:13Z

@seamuslee001 gah. just when you think it's working.

So it appears that the LOCK TABLES statement causes a commit of outstanding transactions.

begin;
insert into civicrm_queue_item (queue_name, data,weight,submit_time) values('fred2', 'boo', 0, NOW()); 
lock tables civicrm_queue_item WRITE;
rollback; # does nothing!
unlock tables;

I suspect this is what's upsetting your code - if it's running in a transaction?

I suppose it's impossible to do from within transactions, i.e. it does not make sense; you need an absolute global lock on the queue table to make sure two processes can't claim the same thing. Transactions deal with parallel processes.

I think we should have 2 Sql queues - one that locks and one that doesn't. Then applications can choose which one is suitable.

e.g. if your geometry code does not need to care about the potential situation of multiple processes both taking a queue item and processing it (e.g. doesn't matter if work is done twice, or you know there will only ever be on process running the queue) then you can use the old style one without locks. Alternatively you could rewrite your code so it's not consuming the queue within a transaction - but perhaps there are other situations where the new behaviour would also cause a problem.

I don't know which should be the default one though. @totten ?

seamuslee001 · 2019-10-17T06:12:51Z

Yeh I eventually figured it was transactions so have moved my unit tests out of running a transaction. But it clearly shows an issue if someone decides to run the queue as is through a transaction.

artfulrobot · 2019-10-17T08:28:52Z

Yeah, well maybe the best thing to do would be to restore the old behaviour (because there aren't usually too many multiprocess queue running situations in your typical install) and have the new Sql queue renamed SqlMultiprocessSafe or something? I can do a PR for that if that's the way we go.

Fix bug in SQL queue that can cause tasks to be run twice in a multip…

e38f694

…rocess environment

civibot bot added the master label Oct 7, 2019

artfulrobot mentioned this pull request Oct 7, 2019

Include a new SqlParallel queue type that enables multiple queue runners to process in parallel #15422

Merged

Fix queue code to use CiviCRM's version of what the time is, instead …

f073386

…of server's.

mattwire merged commit 80d3685 into civicrm:master Oct 13, 2019

artfulrobot mentioned this pull request Oct 15, 2019

Improve ACL contact cache building performance by splitting insert and select #15480

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix bug in SQL queue that can cause tasks to be run twice in a multiprocess environment #15421

Fix bug in SQL queue that can cause tasks to be run twice in a multiprocess environment #15421

artfulrobot commented Oct 7, 2019

civibot bot commented Oct 7, 2019

seamuslee001 commented Oct 7, 2019

artfulrobot commented Oct 8, 2019

mattwire commented Oct 13, 2019

seamuslee001 commented Oct 17, 2019

artfulrobot commented Oct 17, 2019

seamuslee001 commented Oct 17, 2019

artfulrobot commented Oct 17, 2019

Fix bug in SQL queue that can cause tasks to be run twice in a multiprocess environment #15421

Fix bug in SQL queue that can cause tasks to be run twice in a multiprocess environment #15421

Conversation

artfulrobot commented Oct 7, 2019

Overview

Before

After

Technical Details

civibot bot commented Oct 7, 2019

seamuslee001 commented Oct 7, 2019

artfulrobot commented Oct 8, 2019

mattwire commented Oct 13, 2019

seamuslee001 commented Oct 17, 2019

artfulrobot commented Oct 17, 2019

seamuslee001 commented Oct 17, 2019

artfulrobot commented Oct 17, 2019