Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add QPS settings to Allocation endpoints #1863

Merged
merged 3 commits into from
Oct 28, 2020

Conversation

markmandel
Copy link
Member

What type of PR is this?

Uncomment only one /kind <> line, hit enter to put that in a new line, and remove leading whitespace from that line:

/kind breaking

/kind bug

/kind cleanup
/kind documentation
/kind feature
/kind hotfix

What this PR does / Why we need it:

Allocation endpoints where throttled to the default ~4qps for a Kubernetes client.

Matching the controller settings on standard QPS and Burst to allow higher throughput.

Which issue(s) this PR fixes:

Closes #1852

Special notes for your reviewer:

Code can be reviewed, but we should wait until we get the benchmarking tool from @ilkercelikyilmaz to confirm throughput before merging.

@markmandel markmandel added kind/bug These are bugs. area/performance Anything to do with Agones being slow, or making it go faster. labels Oct 22, 2020
@google-cla google-cla bot added the cla: yes label Oct 22, 2020
Copy link
Member

@roberthbailey roberthbailey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@markmandel
Copy link
Member Author

LGTM (it matches https://github.com/googleforgames/agones/blob/master/cmd/controller/main.go).

Copy paste driven development 😁

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 5b4309c9-c535-46c1-8a7c-770515e73f86

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/1863/head:pr_1863 && git checkout pr_1863
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.10.0-8aaf76d

Copy link
Contributor

@pooneh-m pooneh-m left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix.

@@ -65,6 +71,8 @@ func parseEnvFlags() config {
viper.SetDefault(remoteAllocationTimeoutFlag, 10*time.Second)
viper.SetDefault(totalRemoteAllocationTimeoutFlag, 30*time.Second)

pflag.Int32(apiServerSustainedQPSFlag, 100, "Maximum sustained queries per second to send to the API server")
pflag.Int32(apiServerBurstQPSFlag, 200, "Maximum burst queries per second to send to the API server")
Copy link
Contributor

@pooneh-m pooneh-m Oct 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

viper.GetInt(apiServerBurstQPSFlag) to follow the pattern?
Is there any case that the defaults are used? Why not set defaults similar to the helm default?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason - I literally just copied exactly what was in the controller/main.go - figured might as well keep it all consistent.
https://github.com/googleforgames/agones/blob/master/cmd/controller/main.go#L258

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use the same default across stack, if the default values are not meant to be different, regardless of the controller.

If we set the default to 100 here and 400 as the environment variable, I can imagine looking at the code, one may assume the default is 100 as they have not set that on the environment variable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also thought this was confusing, but it's consistent with the agones controller. Maybe a cleanup there would be better than propagating confusing dueling default values?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we feel we should cleanup here, or do it in a subsequent PR? I don't mind either way tbh.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For allocator, because we are making the change in this PR, we should fix it in this PR. For controller, we can fix it in a later PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thought I hit this, and apparently I missed it. Fix incoming!

@pooneh-m
Copy link
Contributor

Can you please add the Helm config documentations as well?

@markmandel
Copy link
Member Author

Can you please add the Helm config documentations as well?

🤦 I knew I forgot something.

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 16c1f228-d1fb-4144-97c3-6998b2cf1893

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/1863/head:pr_1863 && git checkout pr_1863
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.10.0-e3140c3

@pooneh-m
Copy link
Contributor

Looks pretty good. Thanks for the change!

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: 6dee8416-e990-4cb4-a2ec-2a85b74687ac

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@markmandel
Copy link
Member Author

Just did a test with the tooling from #1867

root@fba8ccd77f67:/go/src/agones.dev/agones/test/load/allocation/grpc# TESTRUNSCOUNT=1 ./runAllocation.sh 40 100
Run number:  1
started: 2020-10-27 23:25:28.81615566 +0000 UTC m=+0.000918033
(failed(client=2,allocation=79): rpc error: code = Unknown desc = error updating allocated gameserver: Operation cannot be fulfilled on gameservers.agones.dev "load-test-fleet-bx24p-t7sd9": the object has been modified; please apply your changes to the latest version and try again
...
(failed(client=6,allocation=90): rpc error: code = Unknown desc = error updating allocated gameserver: Operation cannot be fulfilled on gameservers.agones.dev "load-test-fleet-bx24p-z25b7": the object has been modified; please apply your changes to the latest version and try again
finished: 2020-10-27 23:26:31.579609788 +0000 UTC m=+62.764372203

So that was 3980 allocation in 1 minute 3 seconds, so 63 QPS. (This is also from my home laptop while I'm on a hangout 😄)

So looks like once I handle the above comments, this PR should be good to go!

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: 7c99adb4-b665-4fc0-ba11-fc7e47d5b805

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/1863/head:pr_1863 && git checkout pr_1863
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.10.0-0b3c82d

Allocation endpoints where throttled to the default ~4qps for a
Kubernetes client.

Matching the controller settings on standard QPS and Burst to allow
higher throughput.

Closes googleforgames#1852
@google-oss-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: markmandel, pooneh-m

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:
  • OWNERS [markmandel,pooneh-m]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@agones-bot
Copy link
Collaborator

Build Failed 😱

Build Id: a367188d-1a88-4590-a4e1-268bbcefde8e

To get permission to view the Cloud Build view, join the agones-discuss Google Group.

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: f1e60b75-338b-4086-8613-a111b7e9f716

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/1863/head:pr_1863 && git checkout pr_1863
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.10.0-6836e0e

@google-oss-robot
Copy link

New changes are detected. LGTM label has been removed.

@agones-bot
Copy link
Collaborator

Build Succeeded 👏

Build Id: a569f684-10e0-4852-bc11-4441d0c249af

The following development artifacts have been built, and will exist for the next 30 days:

A preview of the website (the last 30 builds are retained):

To install this version:

  • git fetch https://github.com/googleforgames/agones.git pull/1863/head:pr_1863 && git checkout pr_1863
  • helm install ./install/helm/agones --namespace agones-system --name agones --set agones.image.tag=1.10.0-d485bbc

@pooneh-m pooneh-m merged commit 8c2fa6f into googleforgames:master Oct 28, 2020
@markmandel markmandel deleted the bug/allocator-qps branch October 28, 2020 18:34
@markmandel markmandel added this to the 1.10.0 milestone Nov 3, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved area/performance Anything to do with Agones being slow, or making it go faster. cla: yes kind/bug These are bugs. size/M
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Allocator throttled by default K8s Client requests per second
5 participants