Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load Testing Framework for internal and external usage #412

Closed
markmandel opened this issue Nov 9, 2018 · 15 comments
Closed

Load Testing Framework for internal and external usage #412

markmandel opened this issue Nov 9, 2018 · 15 comments
Labels
area/build-tools Development tooling. I.e. pretty much everything in the `build` directory. kind/design Proposal discussing new features / fixes and how they should be implemented kind/feature New features for Agones
Milestone

Comments

@markmandel
Copy link
Member

markmandel commented Nov 9, 2018

Problem

We need some way to (a) be able to load test Agones at scale and (b) help users or Agones load test it for their own workloads.

Notes

I feel like we should be able to work both of these out at the same time - if we can create a framework for load testing we can use internally, such that it can also be used externally, that would be ideal.

Thoughts, feeling and opinions are appreciated 😄

Research

@markmandel markmandel added kind/feature New features for Agones kind/design Proposal discussing new features / fixes and how they should be implemented area/build-tools Development tooling. I.e. pretty much everything in the `build` directory. labels Nov 9, 2018
@markmandel markmandel changed the title Load Testing Framework for Agones Load Testing Framework for internal and external usage Nov 9, 2018
@cyriltovena
Copy link
Collaborator

cyriltovena commented Nov 12, 2018

I would vote for locust and we could build some pre-build plan working against the kubernetes api.
Also very interested in helping/building on this.

@markmandel
Copy link
Member Author

Maybe we start with a simple example using something like xonotic/simple-udp? and then we can look at how to make it a bit more customisable?

I've not got strong opinions personally on load testing systems.

Side thought - we may also want to think about how we can automate some of this. Wondering if we should have nightly jobs for autoscale testing and load testing??? (but that can be a phase two)

@stephbu
Copy link

stephbu commented Nov 28, 2018

Definitely +1 having a standard "small" load that can be assigned a port, generates logs, simulates and emits metrics, can defer response to SIGTERM (upto and beyond termination grace period) is really useful. We use sample PSU curves from real data that we scale and/or offset to simulate daily, as well as compress timeline to accelerate growth.

@Zanderax
Copy link

I'd be interested in using this. It'd be good to be able to show stats from a cluster at scale.

@pm7h
Copy link
Contributor

pm7h commented Dec 18, 2018

I have started looking into using Locust for this. From a quick scan, it seems to be straight-forward. We can start by writing a client to start Agones game servers. We will also need to define the test scenarios.

In the first step, we can run the test from a single machine. We can then extend it spin up master and slave nodes: https://docs.locust.io/en/stable/running-locust-distributed.html. A single docker image can run as standalone, master or slave: https://docs.locust.io/en/latest/running-locust-docker.html

@markmandel
Copy link
Member Author

markmandel commented Dec 19, 2018

Should we make a plan for what types of load tests we should have in our system?
I'm thinking of several that could be possible:

  1. How many game servers until things break
  2. How many allocations per second until things break
  3. How fast can we allocate
  4. How long does it take to scale up a Fleet

There are likely more?

Then there is likely also load testing for real CPU & network metrics for determining limits etc for the gameserver itself - to which I'm not sure how to tackle.

@pm7h
Copy link
Contributor

pm7h commented Dec 20, 2018

Makes sense. So, two categories of tests:

  1. Load testing: 1,2,3,4 above.
  2. Performance tests for CPU and network metrics.

Let's start by 1 since it seems to be more straight-forward using Locust. I will think about what approach we can take for 2.

@markmandel markmandel added this to the 0.8.0 milestone Jan 9, 2019
@pm7h
Copy link
Contributor

pm7h commented Jan 24, 2019

Adding my notes on the design:

Design

We will focus on two categories of tests, performance tests and load tests. These two categories have different requirements and goals which implies different test approaches.

Performance Tests

The goal of performance tests is to provide metrics on various operations such as fleet scaling up/down. The existing Agones e2e test framework can be used for performance tests.

Test Cases

Fleet scaling up. Create a fleet of size 1, increase the size to 100/1000/100000, and measure the time it takes to fully scale up the fleet.

In addition to the time it takes to fully scale up the fleet, the test should also emit continues metrics on game servers. This includes how many game servers are in different states (PortAllocation, Creating, Starting, Scheduled, RequestReady, Ready).

If tested with GKE, this test should be repeated with GKE cluster Autoscaling enabled and disabled. When GKE cluster Autoscaling is disabled we should test two scenarios. One where the cluster has sufficient capacity and one where it does not.

Fleet scaling down. Create a fleet of size 100/1000/100000, scale down to 1, and measure the time it takes to fully scale down the fleet.

In addition to the time it takes to fully scale down the fleet, the test should also emit continues metrics on game servers. This includes how many game servers are in different states (PortAllocation, Creating, Starting, Scheduled, RequestReady, Ready).

If tested with GKE, this test should be repeated with GKE cluster Autoscaling enabled and disabled. When GKE cluster Autoscaling is disabled we should test two scenarios. One where the cluster has sufficient capacity and one where it does not.

Load Tests

Load tests aim to test the performance of the system under heavy load. Game server allocation is an example where multiple parallel operations should be tested.

Locust is a good option for load tests. Unfortunately, Locust integration with go is not stable so the only options are raw HTTP requests, or the Python client library.

Locust can be easily integrated with other open source tools for storage and visualization. I have tested integration with Graphite and Grafana. Prometheus is more powerful that Graphite and is therefore a better option to Graphite.

The final Locust tasks that are for running the test, and the server that is being tested should be containerized for easy adoption.

Test Cases

GameServerAllocation. Create a fleet of size 10/100/1000, and allocate multiple game servers in parallel. Measure the time it takes to allocate a game server. This test includes two scenarios, one in which the number of allocations exceeds the fleet size and one that it doesn’t. The tests should evaluate whether allocation time depends on the number of ready GameServers.

@pm7h
Copy link
Contributor

pm7h commented Jan 25, 2019

Observations

Testing Fleet scaling up/down with GKE Cluster Autoscaling enabled

Test Environment

GKE cluster with the following configurations:

  • Node version: 1.11.6-gke.2
  • Node image: Container-Optimized OS (cos)
  • Machine type: n1-highmem-4 (4 vCPUs, 26 GB memory)
  • Automatic node upgrades: Disabled
  • Automatic node repair: Enabled
  • Autoscaling: On
  • Minimum size (in all zones): 3
  • Maximum size (in all zones): 100
  • Preemptible nodes: Disabled
  • Boot disk type: Standard persistend disk
  • Boot disk size in GB (per node): 100

Results - Fleet Scaling

I have observed that with GKE Cluster Autoscaling enabled, scaling up the Fleet gets stuck at some point.

  • Pod: At this point there are a number of pods that are out of CPU (example error: Node didn't have enough resource: cpu, requested: 130, used: 3840, capacity: 3920) but no new nodes are being created.
  • GameServer: The corresponding game servers get stuck at "Scheduled" state. The GameServerSet for the Fleet has a number of UnhelathyDelete where a number of GameServers where successfully deleted.
  • Quota: No GCE quota errors.
  • Other errors: I noticed errors on getting external address for GameServers (example error: error getting external address for GameServer : error retrieving node for Pod : node "" not found)

Testing Fleet scaling up/down with GKE Cluster Autoscaling disabled

  • Average time to spin up a Fleet: 9.3 seconds
  • Average time to scale up a Fleet from 0 to 1000: 19.4 minutes

On Fleet scaling down, I have observed that there are cases where the Fleet scales down (all game servers are deleted), but the Fleet is not updated and still shows 1000 ready GameServers.

Test Environment

GKE cluster with the following configurations:

  • Size (in all zones): 60
  • Node version: 1.11.6-gke.2
  • Node image: Container-Optimized OS (cos)
  • Machine type: n1-highmem-4 (4 vCPUs, 26 GB memory)
  • Total cores: 240 vCPUs
  • Total memory: 1,560.00 GB
  • Automatic node upgrades: Disabled
  • Automatic node repair: Enabled
  • Autoscaling: Off
  • Preemptible nodes: Disabled
  • Boot disk type: Standard persistend disk
  • Boot disk size in GB (per node): 100

Results - Fleet Autoscaling

Test Scenario. Spin up a fleet, scale it up to 100 replicas, and then scale down to 0. Repeat multiple times.

locust-fleet-scaling

Results - Fleet Allocation

Test Scenario. Spin up a fleet, scale it up to 100 replicas, and then start a Locust test where 100 users try to do a game server allocation in parallel.

locust-allocation-grafana
locust-metrics
locust-percentiles
allocations-agones-controller

@roberthbailey
Copy link
Member

@markmandel - I see that this is marked as part of the 0.12.0 milestone (but it was also in 0.11.0, 0.10.0, 0.9.0, and 0.8.0). Is it part of the milestone optimistically (hoping for someone to finish it)?

Also, for @markmandel or @pm7h - can you summarize what we think remains for this task? I know that @ilkercelikyilmaz has another test harness that does some load testing that maybe falls under this area as well.

@roberthbailey
Copy link
Member

Reading through the other issues in the 0.12.0 milestone, I see that this is referenced from the top level plan for the 1.0 release, which at least partially answers my questions here:

Once 1.0 features are complete, performance testing will be completed, and the project will publish supported cluster sizes, fleet sizes and throughput metrics based on the current code base.

@pm7h
Copy link
Contributor

pm7h commented Jul 12, 2019

I think the main remaining item is providing automation and dashboards for running these tests.

@markmandel markmandel modified the milestones: 0.12.0, 1.0 Aug 1, 2019
@markmandel markmandel removed this from the 1.0.0 milestone Sep 10, 2019
@markmandel markmandel added this to the 1.1.0 milestone Sep 10, 2019
@markmandel markmandel modified the milestones: 1.1.0, 1.2.0 Oct 22, 2019
@markmandel markmandel removed this from the 1.2.0 milestone Dec 4, 2019
@markmandel
Copy link
Member Author

@roberthbailey @ilkercelikyilmaz do we feel we can close this, now that we have the scenario load tests?

@roberthbailey
Copy link
Member

I think so. We now have the locust load tests, the allocation load tests (gRPC and k8s API), and now the scenario tests as well. We have been using the allocation load tests to verify that new k8s versions don't introduce memory leaks and the scenario tests can be used to verify performance over a long period of time.

@markmandel
Copy link
Member Author

CLOSING!

@markmandel markmandel added this to the 1.22.0 milestone Mar 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/build-tools Development tooling. I.e. pretty much everything in the `build` directory. kind/design Proposal discussing new features / fixes and how they should be implemented kind/feature New features for Agones
Projects
None yet
Development

No branches or pull requests

6 participants