-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP]Autoscaling Controller #385
Conversation
Changes made:
TODO:
|
go/controller/autoscaler/cluster.go
Outdated
@@ -147,7 +147,7 @@ func (c *K8sCluster) SyncResource() error { | |||
} | |||
for resname, q := range item.Status.Allocatable { | |||
if resname == v1.ResourceCPU { | |||
totalCPU += float64(q.Value()) + float64(q.MilliValue())/1000.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Go's constant is very special:
Go is a statically typed language that does not permit operations that mix numeric types. You can't add a float64 to an int, or even an int32 to an int. Yet it is legal to write 1e6*time.Second or math.Exp(1) or even 1<<('\t'+2.0). In Go, constants, unlike variables, behave pretty much like regular numbers. This post explains why that is and what it means.
So it's more idiomatic to simple use 1000
here, don't worry about accidentally messing up anything. The Go compiler is very strict about type (doesn't allow type promotion), so in the case of using constants, as far as I know, if anything could go wrong, it won't compile.
Go's constant is very carefully designed, safe and intuitive.
Please see: https://blog.golang.org/constants
go/controller/cluster.go
Outdated
limits[podLimitName] = value | ||
} | ||
for _, container := range pod.Spec.Containers { | ||
AddResourceList(&reqs, container.Resources.Requests) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to reduce the lines by using AddResourceList
.
Removed getting resource from InitContainers
because currently paddle job trainer seems do not use initcontainer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@typhoonzero from ChenXi's article we are going to use the controller in a general cluster, so could be pods other than Paddle pods.
go/controller/cluster.go
Outdated
allLimits[key] = a | ||
} | ||
// get non-terminated pods from all namespaces all nodes. | ||
// FIXME(typhoonzero): scan all pods is not a efficient way. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to get pod resource requests from all the nodes. I simplified this by get all the pods in one API call.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there's too much pods, this may take a lot of memory though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome!
Btw, I think the memory limit is fine, 1MB can fit thousands of "medium-sized" data structures.
go/autoscaler/autoscaler.go
Outdated
CPURequestKilo float64 | ||
CPULimitKilo float64 | ||
CPUTotalKilo float64 | ||
CPURequestMilli int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
K8s CPU request can be at least 0.1
core which means 100m
(100 milli CPU), change kilo to milli
@@ -27,7 +27,8 @@ def wait_pods_running(label_selector, desired): | |||
print "label selector: %s, desired: %s" % (label_selector, desired) | |||
while True: | |||
count = count_pods_by_phase(label_selector, 'Running') | |||
if count == int(desired): | |||
# NOTE: pods may be scaled. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pods may be scaled.
|
||
func TestScaleDryRunNoMoreCPU(t *testing.T) { | ||
r := ClusterResource{ | ||
CPULimitMilli: 1000, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1000 milli("1000m") means 1 cpu core, "100m" means 0.1 cpu core.
@@ -17,7 +17,7 @@ spec: | |||
containers: | |||
- name: trainer | |||
image: paddlepaddle/paddlecloud-job | |||
imagePullPolicy: Always | |||
imagePullPolicy: Never |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious why it's "Never"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, That's for my local running, will update.
I think If we need to add a scheduler, we need to first define an interface for the scheduler, have a naive implementation for the interface, and defer the decision about which kind of scheduler to implement for as long as possible (in this way we have more information to make the correct decision). Currently the bottleneck is not the scheduler, given that the functionalities are still not complete:
I could be wrong and we can discuss more about it, but I think currently we should not our time on implementing the CFS. Btw, the check in contains https://github.com/sakeven/RbTree/blob/master/rbtree.go , maybe the best way is to use it (or a fork) as a dependency. |
go/controller/utils.go
Outdated
func AddResourceList(a *v1.ResourceList, b v1.ResourceList) { | ||
// AddResourceList add another v1.ResourceList to first's inner | ||
// quantity. v1.ResourceList is equal to map[string]Quantity | ||
func AddResourceList(a v1.ResourceList, b v1.ResourceList) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a
is of type map
, don't need to pass as pointer: https://github.com/golang/go/wiki/CodeReviewComments#copying
I understand from the C programming language's perspective
AddResourceList(&reqs, container.Resources.Requests)
is more clear than AddResourceList(reqs, container.Resources.Requests)
about reqs
has been modified. I think that's fine in Go, since req
is a map
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
go/controller/cluster.go
Outdated
func getPodsTotalRequestsAndLimits(podList *v1.PodList) (reqs v1.ResourceList, limits v1.ResourceList, err error) { | ||
reqs, limits = v1.ResourceList{}, v1.ResourceList{} | ||
for _, pod := range podList.Items { | ||
for _, container := range pod.Spec.Containers { | ||
AddResourceList(reqs, container.Resources.Requests) | ||
AddResourceList(limits, container.Resources.Limits) | ||
} | ||
|
||
for _, container := range pod.Spec.InitContainers { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From ChenXi's article we are going to use the controller in a general cluster, so even though PaddlePaddle does not use init pods, other people could use init pods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right! Thanks!
I intended to push some codes that may be useful in the future. Currently we do scale by scaning the pods one by one without consider the priority or faireness of the jobs. This may due to some "important" jobs may never get scaled. I'll add some design doc later to try to make clear of this. I think I'll keep the choice of CFS here because we intend to let all job share resources equally(can have job weights). I'll leave the interface here and add a design doc. |
@typhoonzero I see, thanks! Maybe we can first deploy our current implementation on the cluster, and work on scheduler after we get more experience with real usage. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Fix #384
This is under development, I created this PR to show the differences.