Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5.5] Add retries to delete reserved on jobs table to fix dead lock issues on heavy queue loads #22369

Closed

Conversation

victorboissiere
Copy link

I have had issues for days, first about the redis driver because it has exhausted my server RAM then with the database driver which produced many deadlocks. I work for a customer that really needs to keep track of pending queues and failed_jobs. Everything works as perfect with the database driver except when I have a lot of jobs.

The php artisan queue:work command would hangs after 100/200 jobs because of deadlocks and those commands would then never process the job. My queue processing logic stops and I have to restart everything manually. The issue is really well explained by @micmania1 in #7046 (end of the issue).

Big thanks to @micmania1 for the solution that helped me put all the work I've done to production! I would love this to be merged as it would solve one big issue of the database driver for heavy loads.

@victorboissiere victorboissiere changed the base branch from master to 5.5 December 8, 2017 22:43
@victorboissiere victorboissiere changed the title Add retries to delete reserved on jobs table to fix dead lock issues on heavy queue loads [5.5] Add retries to delete reserved on jobs table to fix dead lock issues on heavy queue loads Dec 8, 2017
@sisve
Copy link
Contributor

sisve commented Dec 9, 2017

Link to the comment; #7046 (comment)

I dislike this change just because it has a mentality of "try it again" instead of figuring out the actual problem. Could you describe your table structure and what indexes you have? What in your show engine innodb status indicates that this solution is appropriate? What types of locks are involved in your deadlocks?

This change sounds like something you guessed at, and the verification is doubtful. The code comments (and code) that $conn->transaction($callback, $retries) uses implies that it does not in fact retry anything in case of deadlocks.

// On a deadlock, MySQL rolls back the entire transaction so we can't just
// retry the query. We have to throw this exception all the way out and
// let the developer handle it in another way. We will decrement too.
if ($this->causedByDeadlock($e) &&
$this->transactions > 1) {
$this->transactions--;
throw $e;
}

I run the database queue driver myself with spikes of 25k waiting jobs, processed by ~40 workers on ~16 machines without any issues. I mention this slightly for bragging, but mostly to tell you that if you have deadlock problems at your numbers you need to look into what actually caused it, and if you have appropriate indexes.

The next step to debug your problem and find a proper solution would be to tell us the structure of your jobs table (a create table to reproduce it would suffice), and the output of show engine innodb status when the deadlocks occur.

@sisve
Copy link
Contributor

sisve commented Dec 9, 2017

I've reread the code again, and I now see that the code I linked in the previous comment is about nested transactions which isn't the case here.

if ($this->database->table($this->table)->lockForUpdate()->find($id)) {
$this->database->table($this->table)->where('id', $id)->delete();
}
}, 5);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This hard coding of 5 seems strange. At the minimum it probably should be a configurable number pulled from the DB queue config? It can be defaulted to 5 there?

@taylorotwell
Copy link
Member

The database driver will never be a good solution at scale. It's just fundamentally not the right solution, as many articles on the Internet can tell you about relational databases and queues.

A better approach is figuring out why Redis is filling up your memory. Or maybe using Beanstalkd?

@victorboissiere
Copy link
Author

Ok thank you all for your answers and taking the time to look into it. I'll try to use Beanstalkd then!

@sisve
Copy link
Contributor

sisve commented Dec 12, 2017

@victorboissiere I never got the table structure from you. We just had a deadlock report in #7046 and #22394 that is about a gap lock on an index that isn't part of the default 5.5 jobs table. Can you verify if you have that index from an earlier upgraded Laravel version, or if you have a "pure" 5.5 jobs table without the index?

@victorboissiere
Copy link
Author

@sisve I used a pure 5.5 jobs table using php artisan queue:table. I'm now using Beanstalkd with no issue at all.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants