-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graceful shutdown for stalled jobs ( lock renewal ) #484
Comments
here comes the answers: Lock renewal failure detection/notification |
@manast my issue specifically regards the stalled job that is still running (just slowly). #308 has a similar issue, but theirs was resolved by simply not stalling meanwhile based on our legacy code quality I expect some of our jobs to stall, and I'd like to notice that and cancel execution within the job processing code. I guess I could setup a listener on every worker monitoring for stalled jobs, but still how would I know which of the two jobs currently running is the stalled job? Specifically either of these two locations are where I would expect my job processor to be able to know immediately when the job was unable to renew the lock so it could shut itself down. Line 684 in d5646a0
Line 688 in d5646a0
I don't see any "stalled" or "lock" parameters inside the Job object, but ideally I would need to know as soon as possible if I'm stalled and another worker has started up to try again. It looks like within the job processor I could call Job.takeLock, and check if the result value was false, null, or a caught error then exit processing of the job. |
@manast #488 is enticing, but I could effectively stop my own processing with a simple notice or ability to check job.hasLock() intermittently to confirm I'm still ok to keep processing. The reason would be I might want to gracefully shutdown e.g. delete temporary files and close database connections. I don't necessarily need the memory segmentation and overhead of using IPC either. I like the option of running jobs as child processes though, in cases where I might be running unpredictable low quality code it would certainly give that code a better chance and isolate it's impacts. |
Bump, would also love to see a graceful shutdown in the |
First of all sorry for muddying up the issue tracker, I'll be sure to help close two tickets to make up for this.
Active & Unlocked -> Queued
I see the code for stalled jobs and time-renewed locks. I don't see the process for checking the queue for active and unlocked jobs to migrate back into inactive/queued for another job consumer to take up.
Do you know where this code is located so I can review the process exactly?
Disabling auto-lock renewal
If we wanted to manually trigger the lock renewal, would that just involve setting the built-in lock renewal timer to a higher value, or would that also influence the stalled job monitor in a negative way? I'd ideally like each job's own event loop to control the renewal so if one job were to slow enormously it could lose the lock and another consumer could start over.
Lock renewal failure detection/notification
There is a TODO in the lock renewal code for notifying the consumer. This one is pretty important to us because running two jobs at once could trigger API rate limiting. Is there a best practice currently to detect the lock renewal failure? Promise cancellation comes into play here, but that's outside of the scope of the question and I'm already imagining the mess of code for handling that.
The text was updated successfully, but these errors were encountered: