Hello EC2, Part 3: Self-healing Cluster



This is the third in a series of posts about setting up a scalable and robust “Hello World” web server using Amazon EC2. The target audience is developers with some experience using EC2. This third installment looks at setting up a self-healing, load-balanced cluster of EC2 instances.

Introduction

The very first installment introduced the idea of using instance metadata and cloud-init to cause an EC2 instance launched from a stock AMI to configure itself when it first starts up. Now we will leverage that technique to set up a self-healing cluster of instances behind Elastic Load Balancing (ELB). The instances will be launched by Amazon within a fixed-size Auto Scaling Group, which will utilize the health check configured into the load balancer to replace instances that are not functioning properly.

The Code

My code is available in bitbucket:

https://bitbucket.org/rimey/hello-ec2-launch/

The essential parameters defining the cluster are in the file named config.py:

cluster = 'testcluster'
puppet_source = 'git@bitbucket.org:rimey/hello-ec2-puppetboot.git'

region = 'us-east-1'
zones = ['us-east-1a', 'us-east-1b']
total_instances = 2

instance = {
    'image_id': 'ami-a0ba68c9',     # us-east-1 oneiric i386 ebs 20120222
    'instance_type': 't1.micro',
    'key_name': 'awskey',
    'security_groups': ['default'],
}

I use the cluster name (“testcluster”) to name the Load Balancer, the Auto Scaling Group, and the Launch Configuration. A Launch Configuration encapsulates the parameters that an Auto Scaling Group requires in order to launch a new EC2 instance.

The Python code for launching the cluster is in launch_cluster.py. It uses the boto library to do the following:

  1. Create a distributed HTTP load balancer forwarding port 80 to instances in the availability zones specified in config.py.
  2. Configure an HTTP health check to be performed on the instances behind the load balancer. This will query a specified resource (/health) and check for a 200 response status. The web server only needs to return some response for that resource.
  3. Define a Launch Configuration with the same parameters we previously passed to run_instances(), including the user data script.
  4. Create an autoscaling group using this launch configuration and the load balancer’s health check to maintain the specified total number of instances across the specified availability zones.

The same file also includes code for shutting down the cluster.

Trying it Out

To run launch_cluster.py, you must have boto installed, you must have your AWS credentials in ~/.boto, and you must have a suitable key pair and security group configured in AWS as I outlined earlier.

Running ./launch_cluster.py prints the following messages confirming that the cluster is starting up:

Creating load balancer.
URL = http://testcluster-374147217.us-east-1.elb.amazonaws.com/
Creating launch configuration.
Creating auto scaling group.

A few minutes later, the URL (which will be different each time) will begin serving a spartan web page saying “Hello EC2!”. You can use the AWS management console to follow the status of the EC2 instances as they start up (under EC2 / Instances), as well as their health status from the perspective of the load balancer (under EC2 / Load Balancers / Instances).

I tried simulating a failure by running “sudo /etc/init.d/nginx stop” on one of the instances, while simultaneously hitting the load balancer with 250 requests per second (“250 users”) from the blitz.io load testing service. The effect, as you can see in the Hit Rate chart below, was to cause about half of the requests to result in timeouts until the load balancer marked the malfunctioning instance as unhealthy about ten seconds later. The quick recovery is due to my use of the smallest possible values for the health check interval (5 seconds) and threshold (2 failed checks).

A couple of minutes later, the autoscaling group terminated the unhealthy instance and launched a replacement, to which the load balancer eventually started forwarding requests.

Conclusion

With autoscaling groups, the importance of having new instances come up serving requests is crystal clear, no matter whether that is accomplished by using cloud-init or custom AMIs.

Unfortunately, my script for launching and tearing down clusters is mainly only useful for demonstration purposes. In production, it would be important to be able to adjust the configuration of the load balancer and the autoscaling group while the cluster is operational. It is regrettable that the Amazon management console does not support autoscaling groups, since graphical dashboards can be particularly nice for seeing the current configuration and status at a glance and making small adjustments.

Don’t forget to tear down your cluster when you are done with it. Run “./launch_cluster.py --shutdown”, wait at least five minutes, and then run it again. It will print nothing at all if all resources have been released.

The next installment will look into deploying updates to running instances.

This article was written by Ken Rimey and has later been edited by other employees of Codento.

2 kommenttia artikkeliin ”Hello EC2, Part 3: Self-healing Cluster

  1. Fantastic post. Did you write about deploy into running instances?

    Even with this configurate, did you use EBS storages images?

Vastaa

Sähköpostiosoitettasi ei julkaista. Pakolliset kentät on merkitty *