Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethink the Scheduler's Framework Permit API #82385

Closed
ahg-g opened this issue Sep 5, 2019 · 22 comments · Fixed by #83756
Closed

Rethink the Scheduler's Framework Permit API #82385

ahg-g opened this issue Sep 5, 2019 · 22 comments · Fixed by #83756
Assignees
Labels
kind/feature Categorizes issue or PR as related to a new feature. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.

Comments

@ahg-g
Copy link
Member

ahg-g commented Sep 5, 2019

The Permit extension point allows running multiple registered permit plugins, if more than one permit plugin returns Wait with a timeout, then we wait for the shortest of the two timeouts.

The expectation is that the plugins that issue Wait will have logic, likely baked into the plugin itself, that accepts the pod at a later time.

However, different plugins will have different criteria to accept/reject the pod, and given the current API, a plugin could accept the pod while the criteria of another plugin may not be met yet.

At the high-level, we need to change the permit API so that an accept is done for a specific plugin, and the pod will be accepted only after all plugins that issued wait get an accept. We also may want to keep adjusting the timeout every time the pod gets accepted by a plugin (start with the smallest timeout, and if the plugin that requested that timeout accepts the pod, we wait for the next smallest timeout minus the time we waited so far...).

We will also need to provide an API to check which or how many plugins have not yet accepted a pod, this is to allow plugins like gang scheduling to wait until all other plugins accepted the pod, and only then the gang scheduling plugin accept the pod.

/sig scheduling

@ahg-g ahg-g added the kind/feature Categorizes issue or PR as related to a new feature. label Sep 5, 2019
@k8s-ci-robot k8s-ci-robot added the sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling. label Sep 5, 2019
@danielqsj
Copy link
Contributor

/cc

@hex108
Copy link
Contributor

hex108 commented Sep 6, 2019

Thanks! It is reasonable. I could work on it.

We will also need to provide an API to check which or how many plugins have not yet accepted a pod, this is to allow plugins like gang scheduling to wait until all other plugins accepted the pod, and only then the gang scheduling plugin accept the pod.

Would it be a public or private API? Is there any other uses cases that might use it? If not, private functions seems enough.

@SataQiu
Copy link
Member

SataQiu commented Sep 6, 2019

/cc

@draveness
Copy link
Contributor

draveness commented Sep 6, 2019

We could add an events channel parameter and a deadline time channel as a return value in the permit plugin API:

type PermitStatus struct {
	PluginName string
	Status     *Status
}

type PermitPlugin interface {
	Plugin
	Permit(pc *PluginContext, p *v1.Pod, nodeName string, events <-chan PermitStatus) (*Status, <-time.Time)
}
  • the events channel could broadcast the events to all the permit plugins.
  • the time channel could updates the deadline of the plugins' permit process.

@draveness
Copy link
Contributor

Thanks! It is reasonable. I could work on it.

/assign @hex108

@Adirio
Copy link
Contributor

Adirio commented Sep 6, 2019

Doesn't it make more sense to have separate timeouts for each permit and cancel them in case one reduses to accept the plugin at any time?

@draveness
Copy link
Contributor

Doesn't it make more sense to have separate timeouts for each permit and cancel them in case one reduses to accept the plugin at any time?

After we make pluginContext conform to context.Context in #82072, we could use the context to sync the cancel signal across plugins.

@Adirio
Copy link
Contributor

Adirio commented Sep 6, 2019

Also the WithDeadline or WithTimeout may be useful to implement the waiting time of each Permit plugin, the returned CancelFunc would be called if the framework wants to wake the plugins.

A Permit plugin would call one of these funcions, pass the returned CancelFunc to the framework in case it wants to wake her up and the wait for the returned Context.Done() channel. This channel is closed either when the deadline is met, the cancel function is called from the framework, or the parent context is closed.

@ahg-g
Copy link
Member Author

ahg-g commented Sep 6, 2019

Thanks all for the suggestions. @hex108 please start a design doc with the various options with examples how each option would be used, we can continue the discussion there.

@hex108
Copy link
Contributor

hex108 commented Sep 6, 2019

Thanks all! As @ahg-g suggested, I'll write a design doc for it.

@hex108
Copy link
Contributor

hex108 commented Sep 26, 2019

Sorry for late reply, I have been busy in the past days.

I write a proposal in https://docs.google.com/document/d/1ftrPlERI5GVzWh0qAM_fChrhcpsBTYyyCleVvmMNEJ4/edit?usp=sharing. PTAL @ahg-g @draveness @Adirio @danielqsj @SataQiu

@Adirio
Copy link
Contributor

Adirio commented Sep 26, 2019

@hex108 Can I ask why are you storing a map[string]time.Dutation instead of a map[string]time.Timer? Is having multiple timers at the same time a performance concern?

When a plugin specifies a timeout, time.Afterfunc is called with a function that sends a reject to the s chan *Status. When a call to allow(pluginName string) is done, the corresponding time.Timer.Stop() method is called. That will avoid having to compute all the remaining duration as the timers would all be created at the start.

@hex108
Copy link
Contributor

hex108 commented Sep 26, 2019

@Adirio Thanks for the suggestion! I planned to start only one timer for each waitingPod. After some thought, your suggestion is better. Doc updated.

@Adirio
Copy link
Contributor

Adirio commented Sep 26, 2019

@hex108 Looks nice, I made a small comment on a part that was not updated to the time.Timer approach.

@Adirio
Copy link
Contributor

Adirio commented Sep 27, 2019

A bit related to this, are plugins that are waiting and set an initial timeout allowed to extend this timeout? I think this feature could also open some doors, such as co-schedulers that decide that a pod needs to be scheduled in the next batch instead of this one.

@hex108
Copy link
Contributor

hex108 commented Sep 29, 2019

A bit related to this, are plugins that are waiting and set an initial timeout allowed to extend this timeout? I think this feature could also open some doors, such as co-schedulers that decide that a pod needs to be scheduled in the next batch instead of this one.

There is no special need for this now. We could consider it when it is actually needed.

@ahg-g
Copy link
Member Author

ahg-g commented Oct 2, 2019

Thanks @hex108, the proposal looks good to me.

It is worth pointing out that if plugin p1 is waiting for plugin p2 to allow the pod, and p2 is also waiting for p1 to allow the pod, the pod will never get scheduled and will always time out. One solution is to provide some sort of an "IntentToAllow" interface though which plugins can show their intent to accept the pod before actually allowing it.

We can think more about the above case in a latter iteration of Permit API if it turns out to be an issue (i.e., determining an ordering to solve the deadlock is not possible).

@hex108
Copy link
Contributor

hex108 commented Oct 8, 2019

the proposal looks good to me.

Thanks for review, I'll send a PR for it.

It is worth pointing out that if plugin p1 is waiting for plugin p2 to allow the pod, and p2 is also waiting for p1 to allow the pod, the pod will never get scheduled and will always time out. One solution is to provide some sort of an "IntentToAllow" interface though which plugins can show their intent to accept the pod before actually allowing it.
We can think more about the above case in a latter iteration of Permit API if it turns out to be an issue (i.e., determining an ordering to solve the deadlock is not possible).

I'll investigate more about it. Is there any use case for it now?

@Adirio
Copy link
Contributor

Adirio commented Oct 8, 2019

It is worth pointing out that if plugin p1 is waiting for plugin p2 to allow the pod, and p2 is also waiting for p1 to allow the pod, the pod will never get scheduled and will always time out. One solution is to provide some sort of an "IntentToAllow" interface though which plugins can show their intent to accept the pod before actually allowing it.

We can think more about the above case in a latter iteration of Permit API if it turns out to be an issue (i.e., determining an ordering to solve the deadlock is not possible).

I don't see any real use case for it, but I will post some random thoughts.

  1. What would IntentToAllow do related to timeouts? Would it remove the timer? Would it set a new timer in order to avoid infinite waits?
  2. Can we determine an order for external plugins? Will the plugins need to announce in any way which other plugins they are waiting for? In this case we could use some graph-theory concepts to ensure there is no deadlock (if the graph is not cyclic, there is no deadlock).
  3. Will there be plugins that wait for all other plugins? This would need to be considered as an special case or two of them will always create a deadlock. If they are special-cased, they could wait for all non-special-cased plugins to Allow before their IntentToAllow is emited/called, and then wait for all plugins (including the scpecial-cased ones) to IntentToAllow or Allow before they finally Allow.
  4. Can a plugin that emited/called an IntentToAllow deny the scheduling task afterwards or is it forced to allow it? If it has to allow it, what is the difference between IntentToAllow and Allow? And if it is not forced, what's the defference between IntentToAllow and not doing anything?
  5. Do we really want to have this plugin dependency? Aren't plugins supposed to be isolated from each other? Which kind of functionality would require this inter-plugin dependencies? I think that the complexity that these inter-dependency introduces into the Permit plugins requires real use cases.

@ahg-g
Copy link
Member Author

ahg-g commented Oct 8, 2019

Thanks Adirio for the detailed thoughts, I agree that we will only try to solve this if it turns out to be an issue and that there is a use case for it. I didn't think that much how we are going to solve this, and whether something like IntentToAllow will actually be general enough solution, it may work only in the case of two plugins waiting for each other.

But, as you say, lets wait until someone comes with a use case.

@hex108
Copy link
Contributor

hex108 commented Oct 11, 2019

Just sent a PR #83756 for it. Please help review it.

@ahg-g
Copy link
Member Author

ahg-g commented Oct 16, 2019

/cc @liu-cong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/feature Categorizes issue or PR as related to a new feature. sig/scheduling Categorizes an issue or PR as relevant to SIG Scheduling.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants