Skip to content
This repository has been archived by the owner on Jul 16, 2021. It is now read-only.

Making DatabaseQueue production-ready #1364

Closed
kelvinj opened this issue Oct 19, 2018 · 6 comments
Closed

Making DatabaseQueue production-ready #1364

kelvinj opened this issue Oct 19, 2018 · 6 comments

Comments

@kelvinj
Copy link

kelvinj commented Oct 19, 2018

This is my take on the issues with the DatabaseQueue adapter in Laravel.

Problems

Deadlocks

People are running the database queue adapter in production, even if Taylor has said it's not intended for production use.

The issues arise when running multiple queue workers. The DatabaseQueue adapter queries the jobs table to find an appropriate job to do, and if one is found, and update is performed to ensure no other workers try to take it. Because of this 2-step process, and to avoid race conditions, queries within the adapter use lockForUpdate(), which in some situations leads to deadlocks. I have witnessed that the more workers you have, the increasing likelihood of deadlocking.

Data loss potential

One other current issue, as I see it, is the possibility of losing a job when attempting to release back onto the queue because:

  1. The release is done in 2 steps – the original record is deleted, and a new record is inserted
  2. The above is not done in a single transaction – a lost DB connection after the first could lose data

Solution

Deadlocks

My solution would be to flip around the SELECT and UPDATE operations in a way that would avoid the need for transactions. Popping a job off the queue would look like this:

  1. Generate a random string (str_random()) used as a temporary identifier for the current process of taking a job
  2. Use an UPDATE similar to the SELECT query in DatabaseQueue::getNextAvailableJob, whereby the temporary identifier is stored against an available job, and LIMIT 1 used to only affect 1 record
  3. A SELECT query to find the job using the temporary identifier could only return 1 job, but there would be no need for a transaction and a lock.

Because of this process, there would be no need to use lockForUpdate() when deleting / releasing the job either.

A change would be required to the jobs / failed_job table schemas to store the temp identifier.

Data loss potential

This would be remedied by wrapping the 2 operations in a transaction.


Thoughts?

@sisve
Copy link

sisve commented Oct 21, 2018

What you describe is optimistic locking, and it's called optimistic because it counts on every worker playing by the rules. The additional column is often a version (incrementing integer) that is incremented for every change done to the row. It works just as you describe, except that there's no even a remotely small chance to get duplicate with an integer column, while you could have really weirdly bad luck with a random string.

I'm not sure that an UPDATE ... LIMIT 1 is supported on all the database platforms that Laravel supports. However, this is easily circumvented with a SELECT+UPDATE, like it is today, but with a version tag.

  1. SELECT * FROM jobs WHERE ... LIMIT 1
  2. UPDATE jobs SET version = version + 1 WHERE id = ? AND version = ?

If the update succeeds (affected rows == 1), then we got the job. If it fails (affected rows = 0), then another worker managed to increment the version before us. We repeat these two steps until we win the fight and get a job (or until the select returns no rows, in which case we have no available jobs and sleep for some time).

I'll leave further optimizations to https://ph4r05.deadcode.me/blog/2017/12/23/laravel-queues-optimization.html and https://github.com/ph4r05/laravel-queue-database-ph4 which has already implemented this, and more.

@mfn
Copy link

mfn commented Oct 22, 2018

Also note there's a further improvement, negating some of the requirements outlined in the article: the use of SKIP LOCKED.

Nowadays, MysQL 8.0.1 (and Postgres since 9.5) both support this and it's inferior in every way because with SELECT … FOR UPDATE SKIP LOCKED you can skip those problematic rows in a single step, totally avoiding any deadlock scenario.

As long as you've to use a database, the best possible solution ATM is to use SKIP LOCKED.

More information:

@kelvinj
Copy link
Author

kelvinj commented Oct 23, 2018

I'm not sure that an UPDATE ... LIMIT 1 is supported on all the database platforms that Laravel supports.

@sisve damn it, you're right. SQLite doesn't support a LIMIT clause in an UPDATE statement… see the SQLite UPDATE docs here.

@deleugpn
Copy link

deleugpn commented Nov 1, 2018

I wouldn't expect people to use SQLite for database queue. If we start from the premise that database queue only works for low workload and your changes bring it to work with high workload but sacrifices one DBMS, I think a warning on docs would suffice.

@sisve
Copy link

sisve commented Nov 1, 2018

@deleugpn Why would we impose such a limitation, including documentation, when there is a solution that works for SQLite too? Wouldn't it make more sense to use the solution that works for all supported database engines without documenting special cases?

@mfn
Copy link

mfn commented Jan 29, 2020

My mentioned UPDATE FOR SKIP LOCKED was recently implemented, btw: laravel/framework#31287

@kelvinj kelvinj closed this as completed Oct 26, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants