-
Notifications
You must be signed in to change notification settings - Fork 759
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improved Worker Reliability #83
Conversation
Any word on the possibility of this getting merged? Seems like a very good patch and could help solve the painful issue of restarting many workers when you deploy... |
The first part seems pretty close to https://github.com/ebernhardson/php-resque-pool Rebooting all workers would be as simple as |
@ShonM: It is certainly similar, but there are some differences, some of which are significant. For one, php-resque-pool acts as a wrapper process around the PHP Resque library, while this pull request integrates this kind of functionality into the PHP Resque library itself. This is significant because PHP Resque already did most of this anyway - the changes I've introduced here simply complete this functionality. This in no way affects the usefulness of php-resque-pool, however, because it still does a couple of things that PHP Resque still doesn't natively handle. Specifically, the ability to use a YAML document to configure a worker pool, and the ability to specify multiple queue groups in the same pool (for example, 1 worker on one queue, another 1 on a second queue, and 4 on every queue - this is three separate queue groups in a single worker pool). Each pool created using only the native support can only operate on one queue group at a time, and the configuration can only be set via the environment. I could also comment on how variable the definition of "as simple as" tends to be, but that would just be trolling, so I'll refrain from further comment on that. :) Ultimately, though, the decision of whether to include this patch will probably come down to whether PHP Resque is intended to operate identically to its Ruby counterpart, or if it is seen as independent of that project and should stand on its own. (And probably also on whether or not the Ruby version handles multiple workers natively...) That's a factor I can't speak to - only @chrisboulton can make that call. |
I'd like to see this get merged in. While php-resque-pool works, it's a distinct project that may not be as well supported as this one. |
Had a bit of a scare earlier today when updating our internal version of PHP-Resque to the latest while still preserving the modifications here. (We also have reasons to forbid the use of Composer, so I had to add the This is a problem because PHP doesn't resume execution at the point where the alarm signal was received. At least, it doesn't do so properly. So the child forks created to execute jobs in a safe space ended up never terminating - or, indeed, completing their assigned tasks - and very quickly things were so broken they were useless. Took a while to track down the reversion, but once removed, things resumed their former solid operation. Just to be safe, I checked that the same reversion hadn't made its way into this PR. Luckily, it hadn't, but I had forgotten to remove an output coloration artifact. I have removed that now, and updated the PR to merge in the latest content of |
Updated again with latest changes upstream. Had to resolve a merge conflict involving the PREFIX env var addition, so I figured I'd push the merge commit to keep the pull simple, should this PR get merged. |
Following up again so this doesn't get lost completely in the shuffle of issues and PRs. This code has been in use, in production, for months now - at least as long as the PR has existed, and IIRC, somewhat longer - with no negative effects. So, yeah, just curious what the verdict on this kind of functionality will be. |
I'd like to see this in too, what do you think @chrisboulton ? 👍 |
👍 please consider merging this... |
A side-benefit of getting this merged would be the possibility of being able to run this via Foreman. (Foreman, like Monit, thinks the main process has gone away and bails immediately on a |
Hey guys I just stumble into this problem when I tried to supervise php resque consumers with threads (COUNT=>1). Is there any plans to merge this? Or any interim solution? Thanks! |
+1 on this, trying to use php-resque on heroku and experiencing the same problem |
Hmm I am seeing
bin/ doesn't seem to be tested, so the ci/unit tests pass even though this fails problem is at HERE--> I don't think |
Hrm. Not sure what I was thinking, there... Must have recently been working with JavaScript, I suppose. As to |
Any update on this? Would really love to see this merged! |
It would really be great to see this merged. |
@chrisboulton I would love this merged too. Having the ability to correctly monitor multiple spawned workers is just what I need. Can you please have a look? |
+1 this would be awesome to merge. |
789364f
to
0a374d8
Compare
Specifically, this improves the multi-worker forking process by having the parent process stick around and monitor the children. If a child process dies, and the supervisor has not been instructed to shut them all down, it will spawn a new child to replace it. It will pass signals down to all children. Currently, it only passes the signals explicitly watched by the Worker class. Children will be told that they have a parent process, and check periodically whether the supervisor is still running. This check is performed at the beginning of reserve() to ensure orphaned workers handle their condition properly. That is, if the supervisor is gone, the child will exit. This is intended to allow users to kill the supervisor and realistically expect that no new jobs will be processed until a new supervisor (or single-worker instance) is started. The other change, not related to worker-supervisor relationships, nevertheless still involves forking. Specifically, when a job process exits, the exit code is already being handled if it is greater than zero. However, if it is zero, the code prior to this commit assumed the exit was performed by PHP Resque itself. Usually, this is correct. The trouble comes when jobs exit without specifying an exit status - PHP defaults these to 0, for whatever reason. In these cases, the job status is not properly updated. Generally speaking, if you're writing jobs yourself - in their entirety - they will responsibly *not* use exit() at any point. But if your jobs use a framework, the chances of an unexpected exit() causing issues go up immensely. So, if the exit code is zero, and the job status hasn't been updated from queued or running, this commit's code causes the status to be updated to completed. This assumes that such jobs are properly written to only use exit status 0 for success, so some users might wish to change it to failed, but that's not best practice in the slightest, so I wouldn't encourage that. Signed-off-by: Daniel Hunsaker <danhunsaker@gmail.com>
Signed-off-by: Daniel Hunsaker <danhunsaker@gmail.com>
- Use `if` instead of logic operators - Check for `false` explicitly
0a374d8
to
8968e8a
Compare
Applied in the new repo, resque/php-resque. |
Specifically, this improves the multi-worker forking process by having the
parent process stick around and monitor the children. If a child process
dies, and the supervisor has not been instructed to shut them all down, it
will spawn a new child to replace it. It will pass signals down to all
children. Currently, it only passes the signals explicitly watched by the
Worker class. Children will be told that they have a parent process, and
check periodically whether the supervisor is still running. This check is
performed at the beginning of reserve() to ensure orphaned workers handle
their condition properly. That is, if the supervisor is gone, the child will
exit. This is intended to allow users to kill the supervisor and
realistically expect that no new jobs will be processed until a new
supervisor (or single-worker instance) is started.
The other change, not related to worker-supervisor relationships,
nevertheless still involves forking. Specifically, when a job process exits,
the exit code is already being handled if it is greater than zero. However,
if it is zero, the code prior to this commit assumed the exit was performed
by PHP Resque itself. Usually, this is correct. The trouble comes when jobs
exit without specifying an exit status - PHP defaults these to 0, for
whatever reason. In these cases, the job status is not properly updated.
Generally speaking, if you're writing jobs yourself - in their entirety -
they will responsibly not use exit() at any point. But if your jobs use a
framework, the chances of an unexpected exit() causing issues go up immensely.
So, if the exit code is zero, and the job status hasn't been updated from
queued or running, this commit's code causes the status to be updated to
completed. This assumes that such jobs are properly written to only use exit
status 0 for success, so some users might wish to change it to failed, but
that's not best practice in the slightest, so I wouldn't encourage that.
Signed-off-by: Daniel Hunsaker danhunsaker@gmail.com