-
-
Notifications
You must be signed in to change notification settings - Fork 825
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug in SQL queue that can cause tasks to be run twice in a multiprocess environment #15421
Conversation
…rocess environment
(Standard links)
|
@artfulrobot the failure look related |
@seamuslee001 Ah! *blush* I should have run the tests! If I had, I would have learnt that CiviCRM's tests assume they can set the time to 2012 without telling MySQL about that. Have "fixed" the code so that the tests pass by passing in Civi's idea of what the date and time are, instead of using SQL's |
This was well tested at the sprint in Barcelona. Merging now so we get plenty of testing before 5.20 |
@artfulrobot @mattwire this caused some test failures in my civigeometry extension testing, This is i think because i am running the queue from within an API call and the lock tables process appears to be causing some flackyness within my code To reproduce clone civigeometry from the lab then within the folder run |
@seamuslee001 gah. just when you think it's working. So it appears that the LOCK TABLES statement causes a commit of outstanding transactions. begin;
insert into civicrm_queue_item (queue_name, data,weight,submit_time) values('fred2', 'boo', 0, NOW());
lock tables civicrm_queue_item WRITE;
rollback; # does nothing!
unlock tables; I suspect this is what's upsetting your code - if it's running in a transaction? I suppose it's impossible to do from within transactions, i.e. it does not make sense; you need an absolute global lock on the queue table to make sure two processes can't claim the same thing. Transactions deal with parallel processes. I think we should have 2 Sql queues - one that locks and one that doesn't. Then applications can choose which one is suitable. e.g. if your geometry code does not need to care about the potential situation of multiple processes both taking a queue item and processing it (e.g. doesn't matter if work is done twice, or you know there will only ever be on process running the queue) then you can use the old style one without locks. Alternatively you could rewrite your code so it's not consuming the queue within a transaction - but perhaps there are other situations where the new behaviour would also cause a problem. I don't know which should be the default one though. @totten ? |
Yeh I eventually figured it was transactions so have moved my unit tests out of running a transaction. But it clearly shows an issue if someone decides to run the queue as is through a transaction. |
Yeah, well maybe the best thing to do would be to restore the old behaviour (because there aren't usually too many multiprocess queue running situations in your typical install) and have the new Sql queue renamed SqlMultiprocessSafe or something? I can do a PR for that if that's the way we go. |
…
Overview
The SQL queue is supposed to ensure that each task on the queue is run once and only once, and in order.
Before
There was the possiblity that some tasks would be run more than once, if multiple queue runners were on the go at the same time.
After
Only one task is run at once, regardless how many simultaneous queue runners there may be.
Technical Details
In a multiprocess situation (e.g. if 2+ crons run at once, or other special setups) the existing implementation of SQL queue could result in tasks being run twice (or even more).
This is explored with the following extension which lets you set up multiple queue processor processes in parallel to attack a queue and then checks whether the jobs all ran once etc.
https://github.com/artfulrobot/queuetest
This MR is the result of a successful patch to the SQL queue (in the extension above this is called Sql2 so that it could be compared with the original).
This MR does not change how queues are handled; jobs must still run one after the other. It just protects the queue from problems in a multiprocess situation.