The Heat orchestration service can help with one of the most important topics in cloud computing: scalability. When a cloud application suffers for heavy load, there are two ways to scale:
-
Vertical Scale. This is the most obvious solution: when the server resources are not enough, use a more powerful server. Vertical scaling resizes the compute instance to a larger flavor, so that it gets more CPUs, more RAM and more disk space. This type of scaling works well up to a point. Once the maximum supported number of CPUs, RAM and disk space that an instance can have, there are no way to scale more. If the load generated by users continues to grow beyond that, we need to find a different way to scale.
-
Horizontal Scale. The other approach to scalability is to have the application running on a cluster of multiple servers. When an application runs on two or more instances, a load balancer is required to distributes client requests among servers. In this way, the application will be able to handle much larger volumes of clients. With this type of scaling, it is going to take longer to reach limits, since we continue to add servers behind the load balancer until we exhaust compute resources in the cloud.
In this section, we are going to deploy an horizontal scaling stack by mean of Heat templates. Just to keep things simple, we are going to scale a simple web server providing a static page. We'll use two approach: manual scale and automatic scale (autoscaling).
Manual scaling requires the user scales the cluster manually when the load from clients reach the cluster limits. We start by writing a simple Heat template to deploy an Apache webserver on Ubuntu. The most clever part of the template is installing Apache in Ubuntu and customize the home page with the IP address of the server. Connecting to the Apache through the load balancer and refreshing the home page should show a changing IP address, because each time the page will be handled by a different server. We use the cloudinit capability to run an user data script at time of the instance start.
Here a snippet of cluster-heat-stack.yaml
template:
resources:
webserver:
type: OS::Nova::Server
properties:
image: ubuntu
flavor: small
key_name: demokey
networks:
- network: { get_param: private_network }
user_data: |
#!/bin/bash
apt-get install apache2 -y
echo "Hello "$(hostname -I)"!" > /var/www/html/index.html
Then we need for a way to create multiple identical copies of the Apache webserver. Heat provides a resource type called OS::Heat::ResourceGroup
. This resource wraps any standard resource definition, like a compute server, and creates multiple identical copies of that resource. So, change the snippet above as following:
parameters:
cluster_size:
type: number
label: Cluster size
description: Number of webserver instances in the cluster.
default: 2
resources:
cluster:
type: OS::Heat::ResourceGroup
properties:
count: { get_param: cluster_size }
resource_def:
type: OS::Nova::Server
properties:
image: ubuntu
flavor: small
key_name: demokey
networks:
- network: { get_param: private_network }
user_data: |
#!/bin/bash
apt-get install apache2 -y
echo "Hello "$(hostname -I)"!" > /var/www/html/index.html
The count
property defines how many copies of the application to start. We set this value from a parameter cluster_size
, which the user can set. The resource_def
property is where the resource that is getting scaled is configured.
Once we have the application running on multiple instances, we need for a load balancer that presents itself as the entry point to clients. The load balancer will accept the requests from clients and internally dispatch them to the actual servers.
Add the following to the template:
resources:
...
loadbalancer:
type: OS::Neutron::LoadBalancer
properties:
members: { get_attr: [cluster, refs] }
pool_id: { get_resource: pool }
protocol_port: { get_attr: [pool, vip, protocol_port] }
pool:
type: OS::Neutron::Pool
properties:
lb_method: ROUND_ROBIN
protocol: HTTP
subnet: { get_param: private_subnet }
vip: { "protocol_port": 80 }
The loadbalancer
resource creates a Load Balancer application based on the HAProxy relying on the OpenStack LBaaS Neutron plugin. Make sure to enable that pluging before to attemp to run the stack, see LBaaS Configuration. The Load Balancer takes the server list provided by the ResourceGroup cluster servers as its members.
Also we defined a Load Balancer pool
resource where we specify:
- the load balance method (ROUND_ROBIN)
- the protocol (HTTP)
- the Virtual IP (VIP) where it is listening for
- the port where it is listening for (80)
Note: when not specified in the pool, as in the above case, the Virtual IP address of the Load Balancer is automatically picked up from the subnet parameter.
To make the Load Balancer addressable from the external network, we allocate a Floating IP address and assign it to the Load Balancer:
resources:
...
vip_floating_ip:
type: OS::Neutron::FloatingIP
properties:
floating_network_id: { get_param: public_network }
vip_floating_association:
type: OS::Neutron::FloatingIPAssociation
properties:
floatingip_id: { get_resource: vip_floating_ip }
port_id: { get_attr: [ pool, vip, port_id ] }
fixed_ip_address: { get_attr: [ pool, vip, address ] }
Add some useful output for the user:
outputs:
floating_ip:
description: Floating IP address assigned to the instance
value: { get_attr: [vip_floating_ip, floating_ip_address] }
The complete template file can be found here: cluster-heat-stack.yaml.
To check things work as espected, create a simple bash script sending an http GET request every 3 seconds:
# vi httpget.sh
#!/bin/bash
address=$1
echo "Getting web page from "$address
while true; do
curl http://$address
sleep 3
done
Cretate the stack with 3 Apache servers and run the script against the public IP address of the Load Balancer
# heat stack-create cluster-heat-stack \
-f cluster-heat-stack.yaml \
-P "cluster_size=3"
# heat stack-show cluster-heat-stack | grep output
| "output_key": "floating_ip",
| "output_value": "172.120.1.206"
# PublicIP = 172.120.1.206
# ./httpget.sh $PublicIP
Getting web page from 172.120.1.206
Hello 192.168.1.53 !
Hello 192.168.1.52 !
Hello 192.168.1.51 !
Hello 192.168.1.52 !
Hello 192.168.1.53 !
Hello 192.168.1.51 !
^C
As we see the Load Balancer is fully working sending the client requests to all the Apache server instances in a Round Robin fashion. Howewer, this form of scaling is not so useful since in case of increased load we have to manually update the stack with more servers.
# heat stack-update cluster-heat-stack \
-f cluster-heat-stack.yaml \
-P "cluster_size=4"
# ./httpget.sh $PublicIP
Getting web page from 172.120.1.206
Hello 192.168.1.53 !
Hello 192.168.1.52 !
Hello 192.168.1.51 !
Hello 192.168.1.53 !
Hello 192.168.1.52 !
Hello 192.168.1.51 !
^C
In the next section, we are going to make this process automatic by leveraging on the Ceilometer Metering service.
The Ceilometer metering service, along with the Heat orchestration service can be combined to achieve the autoscaling based on infrastructure events like high CPU usage, memory exhaustion and/or network overload.
In this section, we are going to implement a simple autoscaling template to scale up and down a cluster of compute instance. The condition to trigger the autoscaling of the infrastructure will be the High CPU usage, defined by a threshold. When any of the cluster servers will suffer an high CPU usage, an High CPU alarm is rised triggering a policy to scale up the cluster. In the same way, when the CPU usage will back under the treshold, a Low CPU alarm is rised triggering another policy to scale down the cluster.
To keep things simple, we start with a Heat template creating a compute server into an Heat resource of type OS::Heat::AutoScalingGroup
. The autoscaling group wraps any standard resource definition, like a compute server, and creates multiple identical copies of that resource. Here the snippet:
resources:
...
group:
type: OS::Heat::AutoScalingGroup
properties:
desired_capacity: 2
max_size: 5
min_size: 1
resource:
type: OS::Nova::Server
properties:
image: { get_param: server_image }
flavor: { get_param: server_flavor }
key_name: { get_param: server_key }
networks:
- network: { get_param: server_network }
The autoscaling group require the max and min number of resource instances and the desired capacity in normal conditions. In our case, we set min to 1 instance and max to 5 server instances. The desired capacity is set to 2 meaning the autoscaling group keeps only two servers during normal conditions.
Then we define the scaling policies, one for the scale up and another one for the scale down
resources:
...
scaleup_policy:
type: OS::Heat::ScalingPolicy
properties:
adjustment_type: change_in_capacity
auto_scaling_group_id: { get_resource: group }
cooldown: 60
scaling_adjustment: 1
scaledown_policy:
type: OS::Heat::ScalingPolicy
properties:
adjustment_type: change_in_capacity
auto_scaling_group_id: { get_resource: group }
cooldown: 60
scaling_adjustment: -1
The Ceilometer service alarms will be used as trigger conditions for the above policies. Adding the following alarm resources to the template:
cpu_alarm_high:
type: OS::Ceilometer::Alarm
properties:
meter_name: cpu_util
statistic: avg
period: 60
evaluation_periods: 1
threshold: 75
alarm_actions:
- {get_attr: [scaleup_policy, alarm_url]}
comparison_operator: gt
cpu_alarm_low:
type: OS::Ceilometer::Alarm
properties:
meter_name: cpu_util
statistic: avg
period: 60
evaluation_periods: 1
threshold: 25
alarm_actions:
- {get_attr: [scaledown_policy, alarm_url]}
comparison_operator: lt
The metering service will raise an cpu_alarm_high
alarm when the average avg
cpu utilization cpu_util
in percentage, will be greater gt
than the treshold 75
during the number of evaluation periods 1
of 600
seconds. The alarm action will trigger the scale up defined policy that will increase the autoscaling group as specified by the scaling_adjustment
policy parameter. Similar logic will be applied for the scaling down.
The complete template file can be found here: autoscale-heat-stack.yaml.