-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate to create a Custom Scheduler to schedule TaskRun pods #3052
Comments
It seems like writing a custom scheduler is pretty straightforward: https://github.com/kelseyhightower/scheduler but dealing with edge cases would probably be a lot of effort. I wonder if it would be possible to write a best-effort scheduler that runs first, but bails out to the real one in complex situations. |
I suggest that we can evaluate that if Affinity Assistant can be implement by the Scheduling Framework. And we maybe can enhance the coscheduling to support the requirement. https://github.com/kubernetes-sigs/scheduler-plugins/tree/master/pkg/coscheduling |
@denkensk |
I think a good concrete next step here would be for someone to experiment/prototype with the scheduler framework and report back to the community with any findings/demos, and help us concretely understand what the code would look like to, for instance, replace AA with custom scheduling. Based on those findings we could start a design doc to more concretely outline requirements and next steps, or maybe determine that delving into scheduling really isn't worth the effort and shouldn't be pursued at this time. @vincent-pli is that something you'd be interested in exploring and driving? |
@imjasonh Anyway, I will make a demo and back here. |
@imjasonh @denkensk Please take a look. |
This looks cool @vincent-pli We should probably imitate the logic of the Affinity Assistant in a scheduler and add it to the experimental repository. So that we eventually can replace the Affinity Assistant with the scheduler. |
@jlpettersson @imjasonh |
That would be great, I'd be happy to do any reviews and approve any PRs to add it to experimental. If we decide to try to move it into Tekton core we'd need a TEP, but it sounds like it should be usable without that in the near term at least. Thanks! |
Great, let's add it to the experimental firstly. |
…same volume but maybe run on different nodes. This is a draft version, try to introduce `Scheduler framework` to handle the issue, for now we adopt `affinity assistant` but has issue to measure total resource requirements. Details please check issue: tektoncd/pipeline#3052 I think we will enhance it soon based on further discussion, thanks.
…same volume but maybe run on different nodes. This is a draft version, try to introduce `Scheduler framework` to handle the issue, for now we adopt `affinity assistant` but has issue to measure total resource requirements. Details please check issue: tektoncd/pipeline#3052 I think we will enhance it soon based on further discussion, thanks.
…same volume but maybe run on different nodes. This is a draft version, try to introduce `Scheduler framework` to handle the issue, for now we adopt `affinity assistant` but has issue to measure total resource requirements. Details please check issue: tektoncd/pipeline#3052 I think we will enhance it soon based on further discussion, thanks.
…same volume but maybe run on different nodes. This is a draft version, try to introduce `Scheduler framework` to handle the issue, for now we adopt `affinity assistant` but has issue to measure total resource requirements. Details please check issue: tektoncd/pipeline#3052 I think we will enhance it soon based on further discussion, thanks.
When analyzing this a bit deeper with the Design doc: Task parallelism when using workspace and the following discussions in the API WG in december - I don't see that a custom scheduler is helping us that much with the problems, but it adds more code, complexity and perhaps introduce new problems. I think we can close this. I think the alternative described in #3638 might help us more. @vincent-pli let me know if you have a different standpoint after been contributing to this. Closing this for now. |
…same volume but maybe run on different nodes. This is a draft version, try to introduce `Scheduler framework` to handle the issue, for now we adopt `affinity assistant` but has issue to measure total resource requirements. Details please check issue: tektoncd/pipeline#3052 I think we will enhance it soon based on further discussion, thanks.
Feature request
Investigate if it would be doable to create a Custom Scheduler for scheduling TaskRun pods, e.g. co-scheduling pods that share workspace PVC volume.
Use case
When the affinity assistant was introduced it solved problems with concurrent access to workspace volumes and deadlock if pods were scheduled to different AZ.
Using pod-affinity to achieve Node Affinity for TaskRun pods was the least complex solution that was evaluated.
The current solution works for common cases, but it is not a perfect solution. E.g. there may be problems when TaskRun require different amount of resources and the Nodes need to be autoscaled up as in #3049
Adding a custom scheduler will probably introduce more complexity and code. But it probably solve the problem in a more generic way than using the Affinity Assistant.
The text was updated successfully, but these errors were encountered: