In this test we went from application version ABC
to XYZ
in the stack:
ScalingAsgRollingUpdate
(CFN stackplayground-CODE-scaling-asg-rolling-update
)
The main aim of this test was to establish whether deploying whilst a service is fully scaled up works as desired.
The current implementation leads to a temporary scale down during deployment.
The number of instances that the service will be scaled down by is: maximumCapacity - minimumCapacity
, so this
could be a significant drop for a service with a high maximum capacity.
-
Build number 98 was deployed (in order to start the test from a clean state - running build
ABC
) -
The service was scaled up by repeatedly invoking our
scale-out
script. -
The service scales up to 9 instances (from 3).
-
Build number 100 was deployed (which updates to build
XYZ
) -
The CFN stack
playground-CODE-scaling-asg-rolling-update
started updating:First:
Rolling update initiated. Terminating 9 obsolete instance(s) in batches of 6, while keeping at least 3 instance(s) in service. Waiting on resource signals with a timeout of PT5M when new instances are added to the autoscaling group.
Then 6 instances are terminated and 6 new ones are launched:
Terminating instance(s) [i-0333b7c2687c1ab46,i-04427ad5d2e5aa426,i-009b2c94810830dc5,i-0357d971d597edbbc,i-087ddecad98eebd05,i-047dbb0efa5bf5123]; replacing with 6 new instance(s).
At this point we are under-provisioned by 6 instances.
-
6
SUCCESS
signals are received. At this point we are provisioned correctly again. -
3 more instances are terminated and 3 more are launched:
Terminating instance(s) [i-07b18ed78618ef26a,i-0f94470f722e91778,i-0dc27b65fc7911afe]; replacing with 3 new instance(s).
At this point we are under-provisioned by 3 instances.
-
3
SUCCESS
signals are received and the deployment completes. At this point we are provisioned correctly again.
Unfortunately this means that the deployment causes us to temporarily run with 3 instances serving traffic (and later 6 instances) when we really need 9 to cope with the load (see healthy hosts panel).
Full details can be seen via the dashboard.
See the potential mitigations described in the partially-scaled scenario, which also apply here.