Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[5.6] Multi server scheduling cron support #22137

Closed
wants to merge 5 commits into from
Closed

[5.6] Multi server scheduling cron support #22137

wants to merge 5 commits into from

Conversation

laurencei
Copy link
Contributor

@laurencei laurencei commented Nov 19, 2017

Simple solution to allowing scheduling to occur on multiple servers simulatenously. Issue recently came up here: #22129

Using the existing Mutex system, we create a second stamp relating to the "minute" of this cron.

So using ->withoutOverlapping() ensures that the command is only run once per cron expression across any server in your farm, as well as ensuring it does not overlap with a command already running longer than a minute (regardless which server started it). Best of both worlds.

Now multiple servers can run the scheduler without fear of the same command running twice.

p.s. Perhaps for 5.6 we could invert, and make ->withoutOverlapping() the default, and instead make developers use a new command ->allowOverlapping() if they want it to run simulatenously or on multiple servers?

@sisve
Copy link
Contributor

sisve commented Nov 19, 2017

[...] ensures that the command is only run once per minute across any server in your farm [...]

This is a single special case. How about if I want to execute jobs every fifth minute, and server A started at 12:00, and server B at 12:01?

@laurencei
Copy link
Contributor Author

laurencei commented Nov 19, 2017

Then Server B wont run the event - because the cron expression fails? We only reach this point if both servers pass the same cron expression.

I've changed the original post to now read "cron expression" rather than "minute".

edit: actually - I can think of a race condition where this might occur if Server B runs another long running process that takes more than 60s. I've got a solution, I'll amend the PR now.

@sisve
Copy link
Contributor

sisve commented Nov 19, 2017

Mutex::create is called when the job starts executing, not when the task matched the crontab. Imagine that one server started doing something that took longer than a minute, so when the important withoutOverlapping-task is triggered an additional minute has passed.

@laurencei
Copy link
Contributor Author

laurencei commented Nov 19, 2017

@sisve - ok, fixed. We use the time the event was created to do matching to crontab schedule and the mutex.

This will prevent strange issues if one server runs a longer process first, and ensures a job can only ever run once per defined cron expression regards of other processes and timelines.

@taylorotwell
Copy link
Member

Your description doesn't explain "why" or "how" this actually prevents the overlap across multiple servers. It just states that it "does". We need better explanation of the "how".

@laurencei
Copy link
Contributor Author

laurencei commented Nov 20, 2017

@taylorotwell - sure, no dramas:

  1. When the Scheduler evaluates all events that pass the filters - the event(s) scheduled to run that "cron minute" are currently placed in an array. We now have those events store their creation time, so we always know when it was evaluated to pass and run.
  2. When the scheduler eventually tells each event to actually run (which could be anywhere from a few seconds to a few minutes), the event attempts to obtain a mutex relating to the timestamp down to a minute resolution.
  3. Only if the event can obtain a time mutex does it then try to obtain an event mutex (and then carries on as normal).

So let's say we have Server A and Server B both running schedule:run each minute with the following command:

$schedule->command('run:report')->withoutOverlapping()->everyFiveMinutes();

At 00:05 both Server A and Server B will run schedule:run. Both servers run the event filters, and determine that the command run:report is due to run.

Server B happens to be 3 microseconds faster than Server A, and creates a mutex run_report_mutex_0005 (the timestamp mutex). Because Server B was able to obtain a timestamp mutex - it then also goes ahead and creates the running event mutuex run_report_mutex.

Server A comes along, and tries to obtain a timestamp mutex. But it fails, because Server B already has it, so Server A skips that event and carries on the list.

What is important here is that the timestamp mutex is never deleted, even when the event is complete, unlike the event mutex. i.e. when Server B is complete, it will delete the event mutex run_report_mutex - but the run_report_mutex_0005 is never deleted, and thus cant be run twice for that cron expression.

The reason we store the event creation time now in the constructor is to prevent race conditions that @sisve describes. So if Server A gets the event list, but runs another event for 3-4 minutes before coming along to this one - it is still evaluated against the time the cron expression passed - not the actual time. This is important to prevent strange race conditions that could occur, and makes the system robust to handle large scheduler lists.

Does that make sense? Let me know if you need more info - happy to discuss further...

@laurencei
Copy link
Contributor Author

laurencei commented Nov 20, 2017

Actually - another thought:

Rather than use ->withoutOverlapping() at all - we could leave that function untouched.

Instead we could make this default scheduler behaviour in 5.6, and add a new command ->allowMultipleServers() (or something) if you explicity want a cron to be allowed to run on each server for the same cron expression (perhaps for disk cleanup or something).

This seperates the concerns, and leaves the other function of what ->withoutOverlapping() already does untouched. There is no need for them to be tied together.

Let me know if you want me to change the PR to be that?

@taylorotwell
Copy link
Member

This is all sort of dependent on them using something like Redis or Memcached as their cache driver, correct?

@taylorotwell
Copy link
Member

I do think a new method could be useful. In case you are fine with multiple servers running the same job but not overlapping locally.

@crynobone
Copy link
Member

At 00:05 both Server A and Server B will run schedule:run. Both servers run the event filters, and determine that the command run:report is due to run.

With this changes if Server A handle the command at 00:05 but take 8 minutes to finish, at 00:10 does this code prevent Server B from running the command, since Server A still working on it.

This was the argument on the related issue.

@laurencei
Copy link
Contributor Author

This is all sort of dependent on them using something like Redis or Memcached as their cache driver, correct?

They need a "central" cache across all servers - yes. Ideally Redis/Memcache - although I think the Database driver would work ok too (might be issues around atomic locks?). They can also write their own Mutex driver and do it anyway they want (since the Mutex can be injected).

I do think a new method could be useful.

@taylorotwell - so can I clarify - do you want the PR to make this default behavior for 5.6 - and you can opt out with a function on your commands like ->allowMultipleServerExecution(). Or will the current behaviour stay as is - and you can opt in using ->disableMultipleServerExecution().

My preference/idea is to make it default behavior for 5.6, and you 'opt out' as most people would want this as default I would guess? It means the scheduler is ready to go for load balancing, without needing to change code when the time comes.

With this changes if Server A handle the command at 00:05 but take 8 minutes to finish, at 00:10 does this code prevent Server B from running the command, since Server A still working on it.

@crynobone - the PR will be changing, so the ->withoutOverlapping() part is unchanged. All this PR will do is prevent two servers running the same command for the same cron expression. i.e. for 00:05 only one of the servers will end up running the command.

So yes - this handles the issues discussed in the other thread.

@taylorotwell
Copy link
Member

@laurencei so @crynobone is correct that server B would run the command again even though the 0:05 run is still running? I don't think that would be the expected behavior for me if I was going to use this feature.

Imagine I have $schedule->command('something')->withoutOverlapping()->onlyOnOneServer()... just using pseudo-code there, but I think I would expect the command to only be run by one server per cron expression but ALSO don't allow the next cron expression to run either if the previous one is still running on some other server.

@laurencei
Copy link
Contributor Author

laurencei commented Nov 22, 2017

@taylorotwell - the scenario you have given and expect is exactly what will occur (once I update the PR with the new command you asked for) - yes.

So we can do it like this:

$schedule->command('something')->withoutOverlapping()->onlyOnOneServer(): run once on one server, and prevent any other instance of it starting on any server until complete

$schedule->command('something')->onlyOnOneServer(): run once per cron expression, but could start on same/different server simulatenously if cron schedule matches again later (i.e. the next passes cron time).

$schedule->command('something')->withoutOverlapping(): runs as it does now (i.e. will be unchaged) - so you can only have one job running at any one time, but it is possible to potentially duplicate on some cron expressions -i.e. the original issue discussed at: #22129

$schedule->command('something') - run on all servers each cron pass. Could be useful if you want to run ->command('disk:cleanup') or something explicit for each server


What I am asking though, is if we make ->onlyOnOneServer() a requirement, then everyone who wants this feature needs to add it to every command. This might include packages - which could be an issue, because if the package does not add the command - then it'll stop you being able to run the scheduler across your farm entirely.

So instead for 5.6, perhaps we make the default behaviour to only run on one server, and you can add ->runOnAllServers() to explicity say you want it to run on all. That would be a more sensible default IMO because it means you dont need to change any code if you start load balancing, including for packages etc. It would be relatively rare for a command to intentionally be needed on each server (but you can do it when needed)

So what I am proposing is:

$schedule->command('something') - only runs once on a single server per cron expression across the server farm.
$schedule->command('something')->withoutOverlapping() - only runs once per cron expression, and will not start again if still running on a server somewhere.
$schedule->command('something')->runOnAllServers() - run once on every server in farm. Must be explicity turned on.

Let me know which way you want me to go on this - and I'll fix the PR.

@taylorotwell
Copy link
Member

I would rather make it opt-in for now. Maybe later we can make it a default once it gains more traction and I play with it more.

@taylorotwell
Copy link
Member

@laurencei do you have an ETA on finishing this out?

@laurencei
Copy link
Contributor Author

@taylorotwell - today - I'm doing it now.

@laurencei
Copy link
Contributor Author

Closing for a new PR as it's completely changed now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants