Auto Scaling with HAProxy

September 3rd, 2010 — 6:57pm

In my last post I showed you how to setup Auto Scaling with Elastic Load Balancing. In this post I will show you how you can utilize Amazon’s Auto Scaling with an instance running HAProxy, and have HAProxy automatically update it’s configuration files when your setup scales up and down.

The Architecture

For the rest of this post I am going to assume that we have two ec2 images we are using. The first image is our load balancer running HAProxy, this image is setup to forward incoming traffic to our second image, which is the application image. The application image will be setup with Amazon’s Auto Scaling to build scale up and down depending on the load of the instances. Lets assume our load balancer image has a AMI of ami-10000000 and our application image has an AMI of ami-20000000.

Auto Scaling Setup

The setup for the auto scaling will be pretty similar to what we did in the previous post. First we will setup a launch config:

as-create-launch-config auto-scaling-test --image-id ami-20000000 --instance-type c1.medium

Then we will create our auto scaling group:

as-create-auto-scaling-group auto-scaling-test --availability-zones us-east-1a --launch-configuration auto-scaling-test --max-size 4 --min-size 2

You will notice we ran this command without setting a load balancer. Since we are using HAProxy we do not need to set this up. Finally we will create our trigger for the scaling:

as-create-or-update-trigger auto-scaling-test --auto-scaling-group auto-scaling-test --measure CPUUtilization --statistic Average --period 60 --breach-duration 120 --lower-threshold 30 --lower-breach-increment"=-1" --upper-threshold 60 --upper-breach-increment 2

Once you have executed these commands, you will have 2 new application instances launched. This presents us with a problem, in order to send traffic to these instances we need to update the HAProxy config file so that it knows where to direct the traffic.

Updating HAProxy

In the following example I am going to show you how with the power of a simple script we can monitor for instances being launched or removed, and update HAProxy accordingly. I will be using S3 storage, PHP and the Amazon PHP Library to do this. You can use any programming language you prefer if you care to rewrite this code, the key is understanding what is going on.

In order for us to identify the running application instances we will need to know the AMI of the instance. We know our application instance has an AMI of ami-20000000. We could store this information straight into our script, however if we ever rebuilt the application instance, we would then have to update the script, which would then force us to have to rebuild our HAProxy instance. Not a lot of fun. What I like to do, is to store the AMI in S3. That way if I ever update my application instance, I can just change a small file in S3 and have my load balancer pick up those changes. Lets assume I stored the AMI of the application image in a file called ami-application and uploaded it to an S3 bucket so that it can be found at http://stefans-test-ami-bucket.s3.amazonaws.com/ami-application.

The Script

Basically what our script is going to do is the following:

  • Get the AMI from S3 of our application image
  • Get a list of all running instances from Amazon, and log which ones match our application image
  • Get a default config file for HAProxy
  • Append the IP addresses of the running application instances to the config file
  • Compare the new config file to the old config file, if they are the same no action is needed, if they are different, replace the haproxy.cfg and restart the server

Our default config file for HAProxy will essentially be the full config without any server directives in the backend section. For example, our default config file could look something like:

global
        log 127.0.0.1   local0
        log 127.0.0.1   local1 notice
        maxconn 50000
        user haproxy
        group haproxy
        daemon
        chroot /var/chroot/haproxy

defaults
        log       global
        mode    tcp
        option   httplog
        option   forwardfor
        retries   2
        redispatch
        maxconn       50000
        timeout connect    10000
        timeout client        30000
        timeout server       60000
        stats uri /ha_stats
        stats realm Global\ statistics
        stats auth myusername:mypassword

frontend www *:80
        maxconn 40000
        mode http
        default_backend www_farm

backend www_farm
        mode http
        balance roundrobin

Now that we have our default config file, all we have to do to generate our HAProxy config is to add a server line to the end of it for each instance we have running. Lets look at the PHP code to do that:

<?php
// setup our include path, this is needed since the script is run from cron
ini_set("include_path", ".:../:./include:../include:/path/to/this/script/update-haproxy");

// including the Amazon EC2 PHP Library
require_once("Amazon/EC2/Client.php");

// include the config file containing your AWS Access Key and Secret
include_once ('.config.inc.php');

// location of AMI of the application image
$ami_location = 'http://stefans-test-ami-bucket.s3.amazonaws.com/ami-application';
$ami_id = chop(file_get_contents($ami_location));


// connect to Amazon and pull a list of all running instances
$service = new Amazon_EC2_Client(AWS_ACCESS_KEY_ID,
                                       AWS_SECRET_ACCESS_KEY);

$response = $service->describeInstances($request);

$describeInstancesResult = $response->getDescribeInstancesResult();
$reservationList = $describeInstancesResult->getReservation();


// loop the list of running instances and match those that have an AMI of the application image
$hosts = array();
foreach ($reservationList as $reservation) {
        $runningInstanceList = $reservation->getRunningInstance();
        foreach ($runningInstanceList as $runningInstance) {
                $ami = $runningInstance->getImageId();

                $state = $runningInstance->getInstanceState();

                if ($ami == $ami_id &#038;&#038; $state->getName() == 'running') {

                        $dns_name = $runningInstance->getPublicDnsName();

                        $app_ip = gethostbyname($dns_name);

                        $hosts[] = $app_ip;
                }
        }
}

// get our default HAProxy configuration file
$haproxy_cfg = file_get_contents("/share/etc/.default-haproxy.cfg");

foreach ($hosts as $i=>$ip) {
        $haproxy_cfg .= '
        server server'.$i.' '.$ip.':80 maxconn 250 check';
}
// test if the configs differ
$current_cfg = file_get_contents("/path/to/haproxy.cfg");
if ($current_cfg == $haproxy_cfg) {
        echo "everything is good, configs are the same.\n";
}
else {
        echo "file out of date, updating.\n";
        file_put_contents('/path/to/this/script/.latest-haproxy.cfg', $haproxy_cfg);
        system("cp /path/to/this/script/.latest-haproxy.cfg /path/to/haproxy.cfg");
        system("/etc/init.d/haproxy reload");
}
?>

I think this script is pretty self explanatory, and does what we outlined as our goals for it above. If you ran the script from command line your HAProxy config file would get updated, and the server would be restarted.

Now that we have our working script, the last thing we need to do is setup this script to run on cron. I find updating every 2-5 minutes is sufficient to keep your config updated for auto scaling.

One of the nice benefits of having this script is it allows you the ability to easily pre-scale your solution if you know you are going to receive a big traffic spike. All you have to do is launch as many new instances of your application image and the script will manage the setup of HAProxy for you.

Conclusion

With under 100 lines of code and a few tools we were able to setup a script to keep the HAProxy config file up to date with your running application instances. This allows us the ability to use HAProxy instead of Amazon Load Balancing, but still get to have all the benefits of Auto Scaling. Lastly it is to note this script is just an example, and should be tailored to your own environment as you see fit. It is also best to store this script, and the default HAProxy config in a EBS volume if possible, as it will save you from having to rebuild your instance.

Category: Amazon Web Services, Auto Scaling, HAProxy, Scaling, Scripts
Tags: , , , , 9 comments »

9 Responses to “Auto Scaling with HAProxy”

  1. Guna Santos

    Hi Stefan!
    Congrats by the script… very useful and enligthening! Thank you very much

    I have one suggestion. Instead using crond, the created instances may execute the script via a SSH or RSH command to Haproxy machine, what do you think about?

    Best Regards,

  2. Marky

    @stefan

    This has been useful. but we opted to do this in shell script instead of php.

    @Guna

    If done via remote ssh execution, how would the command be executed when the instance is terminated by the autoscaling?
    IMHO cron would still be more reliable and I had the unpleasent experience of unscheduled reboots by amazon itself … which changes the private IPs

  3. Stefan Klopp

    @Marky would love to see the code if you are able to share!

  4. fskinhead

    Very nice script but never tried this yet. Should i create a new instance just to run this command in ssh or this can be done in an existing EC2 instance?

    Thanks.

  5. Stefan Klopp

    @fskinhead you can run this on an existing EC2 instance fine.

  6. fskinhead

    I see. I’ll try this one. Thanks @stefan.

  7. fskinhead

    @stefan

    Can you post a steps on how to install the API tools? I am trying to launch a config for auto scaling but it gives me an error “command not found” while executing the as-create-config-launch-config command. I just followed the installation steps in configuring the API tools but still error persist. Maybe you can guide me here or on the other blog topic.

  8. Stefan Klopp

    If you have installed the API tools correctly, I would imagine your problem is that you don’t have the commands in your path. Try using the full or relative path to the command:

    ~/.ec2/bin/as-create-config-launch-config

    where ~/.ec2/bin/ is the directory you put the commands in.

  9. isbaran

    Hello,

    i believe letting application servers update haproxy configuration is a better way. In the case of changed private ip’s, which is a very rare situation, haproxy will detect health check url is not accessible and won’t route traffic to that instance.
    Also you won’t have to wait for cronjob, changes will be applied immediately.

    You’ll have to put a script to remove the application server being shut down from haproxy config. while shrinking, but again i think this is a better way than scanning all instances with a specific ami. Of course some scripting is required but thats the nature of this job.


Leave a Reply



Back to top