Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do not remove replica field if set to zero #1305

Merged
merged 2 commits into from
Feb 5, 2020
Merged

Conversation

pooneh-m
Copy link
Contributor

Starting from Agones v1.2 the replica field was made optional. The API is also tagged with omitempty, meaning for JSON serialization the field is omitted if set to zero. omitempty tag makes the API backward incompatible with Agones v1.1.

Starting from Agones v1.2 the replica field was made optional. The API is also tagged with omitempty, meaning for JSON serialization the field is omitted if set to zero. omitempty tag makes the API backward incompatible with Agones v1.1.
@markmandel
Copy link
Member

If we make this change, how do we update Replicas, without changing the replica count?

This is particularly important for Fleets that are controlled by Autoscalers.

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 2d58b30d-5ff5-4312-84a8-c19e0a388b72

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@pooneh-m
Copy link
Contributor Author

this change, how do we update Replicas, without changing the replica count?

This is particularly important for Fleets that are controlled by Autoscalers.

Oh, thanks for reminding me. I forgot the use-case. I'll close the PR then.

@pooneh-m pooneh-m closed this Jan 30, 2020
@markmandel
Copy link
Member

I'd like to reopen this for discussion, because I was thinking about this more, and I think adding omitemtpy might have been a mistake:

  • If you want to make a change through the Go API without the replica field, you have PATCH (which is really what it is for)
  • In the case of the autoscaler, if it changes the replicas field before you update, the K8s api will refuse to update the GameServer resource until you are on the same generation anyway
  • Having omitempty makes it super hard to use the go library to scale a fleet to 0 without using a Patch command.

So I'm going to reopen this, mainly as well to see if all the e2e tests pass, and to hear if anyone has additional thoughts and opinions.

Sorry it took me so long to come around on this. Sometimes it takes me a while. 😞

@markmandel markmandel reopened this Jan 30, 2020
@markmandel
Copy link
Member

It might be worth noting that there is a UpdateScale interface as well for Fleets, so you could use that to scale to 0.

@aLekSer do you have thoughts on this?

@aLekSer
Copy link
Collaborator

aLekSer commented Jan 30, 2020

I will test this steps again on this PR changes RollingUpdate shuts down all servers at once
A E2E test was added for this TestFleetAutoScalerRollingUpdate at https://github.com/googleforgames/agones/pull/1168/files

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: d93a9a6c-364b-425c-9d69-e52966914a00

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/1305/head:pr_1305 && git checkout pr_1305
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.4.0-56dc031

@markmandel
Copy link
Member

I will test this steps again on this PR changes RollingUpdate shuts down all servers at once
A E2E test was added for this TestFleetAutoScalerRollingUpdate at https://github.com/googleforgames/agones/pull/1168/files

Thanks! Looks like e2e tests have passed.

This shouldn't affect the kubectl test you did previously, as it doesn't route through our go code - it talks directly to the Kubernetes API. That being said, let's still double check to make sure.

If the kubectl test passes for you @aLekSer , I recommend we merge this PR.

@aLekSer
Copy link
Collaborator

aLekSer commented Jan 30, 2020

RollingUpdate should be working, I will follow the docs tomorrow to see that there are now regressions, there are couple of options how to change a Fleet.
As for now I have tested on this change with this Fleet yaml:

apiVersion: "agones.dev/v1"
kind: Fleet
metadata:
  name: fleet-example
spec:
  scheduling: Packed
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 1
  template:
    metadata:
      labels:
        foo: ba2r
    spec:
      ports:
      - name: default
        portPolicy: Dynamic
        containerPort: 26000
      health:
        initialDelaySeconds: 30
        periodSeconds: 60
      template:
        spec:
          containers:
          - name: simple-udp
            image: gcr.io/agones-images/udp-server:0.16
---
apiVersion: "autoscaling.agones.dev/v1"
kind: FleetAutoscaler
metadata:
  name: server-autoscaler
spec:
  fleetName: fleet-example
  policy:
    type: Buffer
    buffer:
      bufferSize: 5
      minReplicas: 5
      maxReplicas: 10

If I change the label and do kubectl apply -f ./f.yaml again, rollingUpdate keeps working.

@roberthbailey roberthbailey requested review from aLekSer and removed request for EricFortin January 31, 2020 23:10
@markmandel
Copy link
Member

@aLekSer gentle bump - have you had a chance to test this out yet?

@aLekSer
Copy link
Collaborator

aLekSer commented Feb 4, 2020

Hello, partly yes, will update ASAP.

@aLekSer
Copy link
Collaborator

aLekSer commented Feb 5, 2020

Regarding the docs, I have tested with this PR kubectl edit :

When Fleet update contains only changes to the replicas parameter, then new GameServers will be created/deleted straight away, which means in that case maxSurge and maxUnavailable parameters for a RollingUpdate will not be used.

And the opposite if we have replicas and other field change than RollingUpdate is still working.

However when I use fleet with fleetautoscaler than we break the doc statement:

1. With `kubectl apply`: you should omit replicas parameter in a Fleet Spec before re-applying the Fleet configuration.

https://gist.github.com/aLekSer/f0da2fb8a5a43d170d0dd711bfbd6f33
I did created a fleet with maxSurge one and replicas: 5, FAS with buffer of 5. Allocated one GS and then update the label scaling from 1 to 5 was in one second time, it should be step by step: remove one - add one:

5m41s       Normal    ScalingGameServerSet    Fleet             Scaling active GameServerSet flat-example-2rpmh from 5 to 6                                                                              
2m29s       Normal    ScalingGameServerSet    Fleet             Scaling inactive GameServerSet flat-example-2rpmh from 6 to 5                                                                            
2m26s       Normal    ScalingGameServerSet    Fleet             Scaling inactive GameServerSet flat-example-2rpmh from 5 to 4                                                                            
2m20s       Normal    ScalingGameServerSet    Fleet             Scaling inactive GameServerSet flat-example-2rpmh from 4 to 3                                                                            
2m11s       Normal    CreatingGameServerSet   Fleet             Created GameServerSet flat-example-t7bv5                                                                                                 
2m10s       Normal    ScalingGameServerSet    Fleet             Scaling inactive GameServerSet flat-example-2rpmh from 3 to 2                                                                            
2m10s       Normal    ScalingGameServerSet    Fleet             Scaling active GameServerSet flat-example-t7bv5 from 1 to 2                                                                              
2m10s       Normal    ScalingGameServerSet    Fleet             Scaling active GameServerSet flat-example-t7bv5 from 2 to 3                                                                              
2m10s       Normal    ScalingGameServerSet    Fleet             Scaling active GameServerSet flat-example-t7bv5 from 3 to 4                                                                              
2m9s        Normal    ScalingGameServerSet    Fleet             Scaling active GameServerSet flat-example-t7bv5 from 4 to 5     

@aLekSer
Copy link
Collaborator

aLekSer commented Feb 5, 2020

Tested this on version 1.3.0 - same issue applies to it, so docs was inaccurate.
https://gist.github.com/aLekSer/f0da2fb8a5a43d170d0dd711bfbd6f33
Just as a side note. Replicaset does contain omitempty however it is using pointer:

// ReplicaSetSpec is the specification of a ReplicaSet.
type ReplicaSetSpec struct {
	// Replicas is the number of desired replicas.
	// This is a pointer to distinguish between explicit zero and unspecified.
	// Defaults to 1.
	// More info: https://kubernetes.io/docs/concepts/workloads/controllers/replicationcontroller/#what-is-a-replicationcontroller
	// +optional
	Replicas *int32 `json:"replicas,omitempty" protobuf:"varint,1,opt,name=replicas"`

As well as DeploymentSpec
https://github.com/kubernetes/kubernetes/blob/7e8f31b31318104f0288350652eaca0b1c84d4fc/staging/src/k8s.io/api/apps/v1/types.go#L274

Copy link
Collaborator

@aLekSer aLekSer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested this on version 1.3.0 - same issue applies to it, so docs was inaccurate.
https://gist.github.com/aLekSer/f0da2fb8a5a43d170d0dd711bfbd6f33
Overall I can approve this PR as we don't have regressions.

Just as a side note. Replicaset does contain omitempty however it is using pointer:

// ReplicaSetSpec is the specification of a ReplicaSet.
type ReplicaSetSpec struct {
	// Replicas is the number of desired replicas.
	// This is a pointer to distinguish between explicit zero and unspecified.
	// Defaults to 1.
	// More info: https://kubernetes.io/docs/concepts/workloads/controllers/replicationcontroller/#what-is-a-replicationcontroller
	// +optional
	Replicas *int32 `json:"replicas,omitempty" protobuf:"varint,1,opt,name=replicas"`

As well as DeploymentSpec
https://github.com/kubernetes/kubernetes/blob/7e8f31b31318104f0288350652eaca0b1c84d4fc/staging/src/k8s.io/api/apps/v1/types.go#L274

Mark @markmandel do we need to change Replicas to pointer at some point?

@markmandel
Copy link
Member

markmandel commented Feb 5, 2020

I did created a fleet with maxSurge one and replicas: 5, FAS with buffer of 5. Allocated one GS and then update the label scaling from 1 to 5 was in one second time, it should be step by step: remove one - add one:

Sounds like we should create a bug to track this separately.

Mark @markmandel do we need to change Replicas to pointer at some point?

Using a pointer seems like an elegant solution, in that while it does change the API, it won't actually break functionality (wherein the previous fix did break functionality, because you couldn't scale to 0).

That being said, we need to work out what the root cause of the issue is here. kubectl apply doesn't go through our go code, so there's something in our controller(s) that is causing this, and see if that change will solve the problem - otherwise there is probably no point in making the change at this stage.

@markmandel markmandel added the kind/cleanup Refactoring code, fixing up documentation, etc label Feb 5, 2020
@markmandel markmandel added this to the 1.4.0 milestone Feb 5, 2020
Copy link
Member

@markmandel markmandel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's move forward with this, and track the other issue separately 👍

@google-oss-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: aLekSer, markmandel, pooneh-m

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [markmandel,pooneh-m]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@google-oss-robot
Copy link

New changes are detected. LGTM label has been removed.

@markmandel markmandel merged commit b9d123d into master Feb 5, 2020
@markmandel markmandel deleted the pooneh-m-patch-1 branch February 5, 2020 17:23
@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 29078eed-6068-4332-82ca-8ad11d7a2fb5

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/GoogleCloudPlatform/agones.git pull/1305/head:pr_1305 && git checkout pr_1305
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.4.0-a1df680

ilkercelikyilmaz pushed a commit to ilkercelikyilmaz/agones that referenced this pull request Oct 23, 2020
Starting from Agones v1.2 the replica field was made optional. The API is also tagged with omitempty, meaning for JSON serialization the field is omitted if set to zero. omitempty tag makes the API backward incompatible with Agones v1.1.

Co-authored-by: Mark Mandel <mark.mandel@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved kind/cleanup Refactoring code, fixing up documentation, etc size/XS
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants