Scale an application with Heat

The Heat orchestration service can help with one of the most important topics in cloud computing: scalability. When a cloud application suffers for heavy load, there are two ways to scale:

Vertical Scale. This is the most obvious solution: when the server resources are not enough, use a more powerful server. Vertical scaling resizes the compute instance to a larger flavor, so that it gets more CPUs, more RAM and more disk space. This type of scaling works well up to a point. Once the maximum supported number of CPUs, RAM and disk space that an instance can have, there are no way to scale more. If the load generated by users continues to grow beyond that, we need to find a different way to scale.
Horizontal Scale. The other approach to scalability is to have the application running on a cluster of multiple servers. When an application runs on two or more instances, a load balancer is required to distributes client requests among servers. In this way, the application will be able to handle much larger volumes of clients. With this type of scaling, it is going to take longer to reach limits, since we continue to add servers behind the load balancer until we exhaust compute resources in the cloud.

In this section, we are going to deploy an horizontal scaling stack by mean of Heat templates. Just to keep things simple, we are going to scale a simple web server providing a static page. We'll use two approach: manual scale and automatic scale (autoscaling).

Manual Horizontal Scaling

Manual scaling requires the user scales the cluster manually when the load from clients reach the cluster limits. We start by writing a simple Heat template to deploy an Apache webserver on Ubuntu. The most clever part of the template is installing Apache in Ubuntu and customize the home page with the IP address of the server. Connecting to the Apache through the load balancer and refreshing the home page should show a changing IP address, because each time the page will be handled by a different server. We use the cloudinit capability to run an user data script at time of the instance start.

Here a snippet of cluster-heat-stack.yaml template:

resources:
  webserver:
    type: OS::Nova::Server
    properties:
      image: ubuntu
      flavor: small
      key_name: demokey
      networks:
        - network: { get_param: private_network }
      user_data: |
        #!/bin/bash
        apt-get install apache2 -y
        echo "Hello "$(hostname -I)"!" > /var/www/html/index.html

Then we need for a way to create multiple identical copies of the Apache webserver. Heat provides a resource type called OS::Heat::ResourceGroup. This resource wraps any standard resource definition, like a compute server, and creates multiple identical copies of that resource. So, change the snippet above as following:

parameters:
  cluster_size:
    type: number
    label: Cluster size
    description: Number of webserver instances in the cluster.
    default: 2

resources:
  cluster:
    type: OS::Heat::ResourceGroup
    properties:
      count: { get_param: cluster_size }
      resource_def:
        type: OS::Nova::Server
        properties:
          image: ubuntu
          flavor: small
          key_name: demokey
          networks:
            - network: { get_param: private_network }
          user_data: |
            #!/bin/bash
            apt-get install apache2 -y
            echo "Hello "$(hostname -I)"!" > /var/www/html/index.html

The count property defines how many copies of the application to start. We set this value from a parameter cluster_size, which the user can set. The resource_def property is where the resource that is getting scaled is configured.

Once we have the application running on multiple instances, we need for a load balancer that presents itself as the entry point to clients. The load balancer will accept the requests from clients and internally dispatch them to the actual servers.

Add the following to the template:

resources:
...
  loadbalancer:
    type: OS::Neutron::LoadBalancer
    properties:
      members: { get_attr: [cluster, refs] }
      pool_id: { get_resource: pool }
      protocol_port: { get_attr: [pool, vip, protocol_port] }

  pool:
    type: OS::Neutron::Pool
    properties:
      lb_method: ROUND_ROBIN
      protocol: HTTP
      subnet: { get_param: private_subnet }
      vip: { "protocol_port": 80 }

The loadbalancer resource creates a Load Balancer application based on the HAProxy relying on the OpenStack LBaaS Neutron plugin. Make sure to enable that pluging before to attemp to run the stack, see LBaaS Configuration. The Load Balancer takes the server list provided by the ResourceGroup cluster servers as its members.

Also we defined a Load Balancer pool resource where we specify:

the load balance method (ROUND_ROBIN)
the protocol (HTTP)
the Virtual IP (VIP) where it is listening for
the port where it is listening for (80)

Note: when not specified in the pool, as in the above case, the Virtual IP address of the Load Balancer is automatically picked up from the subnet parameter.

To make the Load Balancer addressable from the external network, we allocate a Floating IP address and assign it to the Load Balancer:

resources:
...
vip_floating_ip:
    type: OS::Neutron::FloatingIP
    properties:
      floating_network_id: { get_param: public_network }

  vip_floating_association:
    type: OS::Neutron::FloatingIPAssociation
    properties:
      floatingip_id: { get_resource: vip_floating_ip }
      port_id: { get_attr: [ pool, vip, port_id ] }
      fixed_ip_address: { get_attr: [ pool, vip, address ] }

Add some useful output for the user:

outputs:
 floating_ip:
   description: Floating IP address assigned to the instance
   value: { get_attr: [vip_floating_ip, floating_ip_address] }

The complete template file can be found here: cluster-heat-stack.yaml.

To check things work as espected, create a simple bash script sending an http GET request every 3 seconds:

# vi httpget.sh
#!/bin/bash
address=$1
echo "Getting web page from "$address
while true; do
 curl http://$address
 sleep 3
done

Cretate the stack with 3 Apache servers and run the script against the public IP address of the Load Balancer

# heat stack-create cluster-heat-stack \
-f cluster-heat-stack.yaml \
-P "cluster_size=3"

# heat stack-show cluster-heat-stack | grep output
|     "output_key": "floating_ip",
|     "output_value": "172.120.1.206"

# PublicIP = 172.120.1.206

# ./httpget.sh $PublicIP
Getting web page from 172.120.1.206
Hello 192.168.1.53 !
Hello 192.168.1.52 !
Hello 192.168.1.51 !
Hello 192.168.1.52 !
Hello 192.168.1.53 !
Hello 192.168.1.51 !
^C

As we see the Load Balancer is fully working sending the client requests to all the Apache server instances in a Round Robin fashion. Howewer, this form of scaling is not so useful since in case of increased load we have to manually update the stack with more servers.

# heat stack-update cluster-heat-stack \
-f cluster-heat-stack.yaml \
-P "cluster_size=4"

# ./httpget.sh $PublicIP
Getting web page from 172.120.1.206
Hello 192.168.1.53 !
Hello 192.168.1.52 !
Hello 192.168.1.51 !
Hello 192.168.1.53 !
Hello 192.168.1.52 !
Hello 192.168.1.51 !
^C

In the next section, we are going to make this process automatic by leveraging on the Ceilometer Metering service.

Automatic Horizontal Scaling

The Ceilometer metering service, along with the Heat orchestration service can be combined to achieve the autoscaling based on infrastructure events like high CPU usage, memory exhaustion and/or network overload.

In this section, we are going to implement a simple autoscaling template to scale up and down a cluster of compute instance. The condition to trigger the autoscaling of the infrastructure will be the High CPU usage, defined by a threshold. When any of the cluster servers will suffer an high CPU usage, an High CPU alarm is rised triggering a policy to scale up the cluster. In the same way, when the CPU usage will back under the treshold, a Low CPU alarm is rised triggering another policy to scale down the cluster.

To keep things simple, we start with a Heat template creating a compute server into an Heat resource of type OS::Heat::AutoScalingGroup. The autoscaling group wraps any standard resource definition, like a compute server, and creates multiple identical copies of that resource. Here the snippet:

resources:
...
  group:
    type: OS::Heat::AutoScalingGroup
    properties:
      desired_capacity: 2
      max_size: 5
      min_size: 1
      resource:
        type: OS::Nova::Server
        properties:
          image: { get_param: server_image }
          flavor: { get_param: server_flavor }
          key_name: { get_param: server_key }
          networks:
            - network: { get_param: server_network }

The autoscaling group require the max and min number of resource instances and the desired capacity in normal conditions. In our case, we set min to 1 instance and max to 5 server instances. The desired capacity is set to 2 meaning the autoscaling group keeps only two servers during normal conditions.

Then we define the scaling policies, one for the scale up and another one for the scale down

resources:
...
  scaleup_policy:
    type: OS::Heat::ScalingPolicy
    properties:
      adjustment_type: change_in_capacity
      auto_scaling_group_id: { get_resource: group }
      cooldown: 60
      scaling_adjustment: 1

  scaledown_policy:
    type: OS::Heat::ScalingPolicy
    properties:
      adjustment_type: change_in_capacity
      auto_scaling_group_id: { get_resource: group }
      cooldown: 60
      scaling_adjustment: -1

The Ceilometer service alarms will be used as trigger conditions for the above policies. Adding the following alarm resources to the template:

  cpu_alarm_high:
    type: OS::Ceilometer::Alarm
    properties:
      meter_name: cpu_util
      statistic: avg
      period: 60
      evaluation_periods: 1
      threshold: 75
      alarm_actions:
        - {get_attr: [scaleup_policy, alarm_url]}
      comparison_operator: gt

  cpu_alarm_low:
    type: OS::Ceilometer::Alarm
    properties:
      meter_name: cpu_util
      statistic: avg
      period: 60
      evaluation_periods: 1
      threshold: 25
      alarm_actions:
        - {get_attr: [scaledown_policy, alarm_url]}
      comparison_operator: lt

The metering service will raise an cpu_alarm_high alarm when the average avg cpu utilization cpu_util in percentage, will be greater gt than the treshold 75 during the number of evaluation periods 1 of 600 seconds. The alarm action will trigger the scale up defined policy that will increase the autoscaling group as specified by the scaling_adjustment policy parameter. Similar logic will be applied for the scaling down.

The complete template file can be found here: autoscale-heat-stack.yaml.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

autoscaling-heat.md

autoscaling-heat.md

Scale an application with Heat

Manual Horizontal Scaling

Automatic Horizontal Scaling

Files

autoscaling-heat.md

Latest commit

History

autoscaling-heat.md

File metadata and controls

Scale an application with Heat

Manual Horizontal Scaling

Automatic Horizontal Scaling