Skip to content
This repository has been archived by the owner on Sep 17, 2024. It is now read-only.

Latest commit

 

History

History
49 lines (32 loc) · 2.79 KB

healthy-to-healthy-fully-scaled.md

File metadata and controls

49 lines (32 loc) · 2.79 KB

What happens when deploying a 'good' build when the service is already fully scaled up?

In this test we went from application version ABC to XYZ in the stack:

The main aim of this test was to establish whether deploying whilst a service is fully scaled up works as desired.

Highlights

The current implementation leads to a temporary scale down during deployment.

The number of instances that the service will be scaled down by is: maximumCapacity - minimumCapacity, so this could be a significant drop for a service with a high maximum capacity.

Timeline

  1. Build number 98 was deployed (in order to start the test from a clean state - running build ABC)

  2. The service was scaled up by repeatedly invoking our scale-out script.

  3. The service scales up to 9 instances (from 3).

  4. Build number 100 was deployed (which updates to build XYZ)

  5. The CFN stack playground-CODE-scaling-asg-rolling-update started updating:

    First:

    Rolling update initiated. Terminating 9 obsolete instance(s) in batches of 6, while keeping at least 3 instance(s) in service. Waiting on resource signals with a timeout of PT5M when new instances are added to the autoscaling group.

    Then 6 instances are terminated and 6 new ones are launched:

    Terminating instance(s) [i-0333b7c2687c1ab46,i-04427ad5d2e5aa426,i-009b2c94810830dc5,i-0357d971d597edbbc,i-087ddecad98eebd05,i-047dbb0efa5bf5123]; replacing with 6 new instance(s).

    At this point we are under-provisioned by 6 instances.

  6. 6 SUCCESS signals are received. At this point we are provisioned correctly again.

  7. 3 more instances are terminated and 3 more are launched:

    Terminating instance(s) [i-07b18ed78618ef26a,i-0f94470f722e91778,i-0dc27b65fc7911afe]; replacing with 3 new instance(s).

    At this point we are under-provisioned by 3 instances.

  8. 3 SUCCESS signals are received and the deployment completes. At this point we are provisioned correctly again.

Unfortunately this means that the deployment causes us to temporarily run with 3 instances serving traffic (and later 6 instances) when we really need 9 to cope with the load (see healthy hosts panel).

Full details can be seen via the dashboard.

Potential Mitigations

See the potential mitigations described in the partially-scaled scenario, which also apply here.