Rotating EBS Snapshots

If you use Elastic Block Storage (EBS) for storing your files on your ec2 instances you more than likely backup those files using the ec2 snapshots. If you don’t already do this you should probably start, as EBS volumes are not 100% fault tolerant, and can (and do) degrade just like normal drives. A good script for taking snapshots of data can be found on the alestic.com website, called ec2-consistent-snapshot. You can find all the information for this script here:

http://alestic.com/2009/09/ec2-consistent-snapshot

How to Rotate EBS Snapshots

After using the ec2-consistent-snapshot script for a while I realized I would eventually need to find something to rotate these backups as they were growing out of control. Some of our volumes were having snapshots done every hour, and that was adding up quickly. Google provided me with no easy solution for rotating the snapshots, so I decided to write my own script.

Essentially what I wanted was to have a script that would rotate the snapshots in Grandfather-Father-Son type setup. I wanted to have hourly backups kept for 24 hours, daily kept for a week, weekly kept for a month, and monthly kept for a year. Anything older than that I don’t want, however the script can be tweaked to allow for older backups.

Basically what the script does is the following:

  • Gets a list of all snapshots and puts them into an array indexed by the volume and the date the snapshot was taken
  • For a given volume organize the snapshots so that there are only hourly snapshots for 1 day, daily snapshots for 1 week, weekly snapshots for 1 month, and monthly snapshots for 1 year and collect which snapshots require deleting.
  • Delete the snapshots that are set for delete.

I wrote the script in PHP, mainly because it is what I feel most comfortable using. I am also once again using the Amazon PHP library. Here is the script in it’s entirety.


<?php
/*
 * rotate-ebs-snapshots.php
 *
 * Author: Stefan Klopp
 * Website: http://www.kloppmagic.ca
 * Requires: Amazon ec2 PHP Library
 *
 */

ini_set("include_path", ".:../:./include:../include:/PATH/TO/THIS/SCRIPT");

// include the amazon php library
require_once("Amazon/EC2/Client.php");
require_once("Amazon/EC2/Model/DeleteSnapshotRequest.php");

// include our configuration file with out ACCESS KEY and SECRET
include_once ('.config.inc.php');

$service = new Amazon_EC2_Client(AWS_ACCESS_KEY_ID,
                                       AWS_SECRET_ACCESS_KEY);


// setup our array of snapshots
$snap_array = setup_snapshot_array();

// call to rotate (you can call this for every volume you want to rotate)
rotate_standard_volume('VOLUME_ID_YOU_WISH_TO_ROTATE');


/* 
 * Used to setup an array of all snapshots for a given aws account
 */
function setup_snapshot_array() {
    global $service;
    // Get a list of all EBS snapshots
    $response = $service->describeSnapshots($request);

    $snap_array = array();

    if ($response->isSetDescribeSnapshotsResult()) {
        $describeSnapshotsResult = $response->getDescribeSnapshotsResult();
        $snapshotList = $describeSnapshotsResult->getSnapshot();
        foreach ($snapshotList as $snapshot) {
            if ($snapshot->getStatus() == 'completed') {

                    // date is in the format of 2009-04-30T15:32:00.000Z
                    list($date, $time) = split("T", $snapshot->getStartTime());

                    list($year, $month, $day) = split("-", $date);
                    list($hour, $min, $second) = split(":", $time);

                    // convert the date to unix time
                    $time = mktime($hour, $min, 0, $month, $day, $year);

                    $new_row = array(
                            'snapshot_id'=>$snapshot->getSnapShotId(),
                            'volume_id'=>$snapshot->getVolumeId(),
                            'start_time'=>$time
                    );
                    // add to our array of snapshots indexed by the volume_id
                    $snap_array[$new_row['volume_id']][$new_row['start_time']] = $new_row;
            }
        }
    }

    // sort each volumes snapshots by the date it was created
    foreach ($snap_array as $vol=>$vol_snap_array) {
            krsort($vol_snap_array);
            $snap_array[$vol] = $vol_snap_array;
    }

    return($snap_array);
}

/*
 * Used to rotate the snapshots
 */
function rotate_standard_volume($vol_id) {
        global $snap_array, $service;

        // calculate the date ranges for snapshots
        $one_day = time() - 86400;
        $one_week = time() - 604800;
        $one_month = time() - 2629743;
        $one_year = time() - 31556926;

        $hourly_snaps = array();
        $daily_snaps = array();
        $weekly_snaps = array();
        $monthly_snaps = array();
        $delete_snaps = array();

        echo "Beginning rotation of volume: {$vol_id}\n";

        foreach($snap_array[$vol_id] as $time=>$snapshot) {

                echo "Testing snapshot {$snapshot['snapshot_id']} with a date of ".date("F d, Y @ G:i:s", $time)."... ";

                if ($time >=  $one_day) {
                        echo "Snapshot is within a day lets keep it.\n";
                        $hourly_snaps[$time] = $snapshot;
                }
                elseif ($time < $one_day &#038;&#038; $time >= $one_week) {
                        $ymd = date("Ymd", $time);
                        echo "Snapshot is daily {$ymd}.\n";

                        if (is_array($daily_snaps[$ymd])) {
                                echo "Already have a snapshot for {$ymd}, lets delete this snap.\n";
                                $delete_snaps[] = $snapshot;
                        }
                        else {
                                $daily_snaps[$ymd] = $snapshot;
                        }
                }
                elseif ($time < $one_week &#038;&#038; $time >= $one_month) {
                        $week = date("W", $time);
                        echo "Snapshot is weekly {$week}.\n";

                        if (is_array($weekly_snaps[$week])) {
                                echo "Already have a snapshot for week {$week}, lets delete this snap.\n";
                                $delete_snaps[] = $snapshot;
                        }
                        else {
                                $weekly_snaps[$week] = $snapshot;
                        }
                }
                elseif ($time < $one_month &#038;&#038; $time >= $one_year) {
                        $month = date("m", $time);
                        echo "Snapshot is monthly {$month}.\n";

                        if (is_array($monthly_snaps[$month])) {
                                echo "Already have a snapshot for month {$month}, lets delete this snap.\n";
                                $delete_snaps[] = $snapshot;
                        }
                        else {
                                $monthly_snaps[$month] = $snapshot;
                        }
                }
                else{
                        echo "Snapshot older than year old, lets delete it.\n";
                        $delete_snaps[] = $snapshot;
                }
        }

        foreach ($delete_snaps as $snapshot) {
                echo "Delete snapshot {$snapshot['snapshot_id']} with date ".date("F d, Y @ H:i", $snapshot['start_time'])." forever.\n";
                $request = new Amazon_EC2_Model_DeleteSnapshotRequest();
                $request->setSnapshotId($snapshot['snapshot_id']);
                $response = $service->deleteSnapshot($request);
        }
        echo "\n";
}

You can either run the script by editing the call to rotate_standard_volume. You can call this method for each volume you wish to rotate snapshots for. Also feel free to change the values of the date ranges to keep snapshots for a given date range for longer or shorter periods.

Finally to make this script effective you should have it run at least once a day via cron.

Conclusion

If you are like me and utilize EBS snapshots for backups of your data you will likely need to rotate those snapshots at some point. With the script above you should be able to quickly and easily rotate your snapshots. With a few tweaks you should be able to easily customize the rotation schedule to suit your needs.

Auto Scaling with HAProxy

In my last post I showed you how to setup Auto Scaling with Elastic Load Balancing. In this post I will show you how you can utilize Amazon’s Auto Scaling with an instance running HAProxy, and have HAProxy automatically update it’s configuration files when your setup scales up and down.

The Architecture

For the rest of this post I am going to assume that we have two ec2 images we are using. The first image is our load balancer running HAProxy, this image is setup to forward incoming traffic to our second image, which is the application image. The application image will be setup with Amazon’s Auto Scaling to build scale up and down depending on the load of the instances. Lets assume our load balancer image has a AMI of ami-10000000 and our application image has an AMI of ami-20000000.

Auto Scaling Setup

The setup for the auto scaling will be pretty similar to what we did in the previous post. First we will setup a launch config:

as-create-launch-config auto-scaling-test --image-id ami-20000000 --instance-type c1.medium

Then we will create our auto scaling group:

as-create-auto-scaling-group auto-scaling-test --availability-zones us-east-1a --launch-configuration auto-scaling-test --max-size 4 --min-size 2

You will notice we ran this command without setting a load balancer. Since we are using HAProxy we do not need to set this up. Finally we will create our trigger for the scaling:

as-create-or-update-trigger auto-scaling-test --auto-scaling-group auto-scaling-test --measure CPUUtilization --statistic Average --period 60 --breach-duration 120 --lower-threshold 30 --lower-breach-increment"=-1" --upper-threshold 60 --upper-breach-increment 2

Once you have executed these commands, you will have 2 new application instances launched. This presents us with a problem, in order to send traffic to these instances we need to update the HAProxy config file so that it knows where to direct the traffic.

Updating HAProxy

In the following example I am going to show you how with the power of a simple script we can monitor for instances being launched or removed, and update HAProxy accordingly. I will be using S3 storage, PHP and the Amazon PHP Library to do this. You can use any programming language you prefer if you care to rewrite this code, the key is understanding what is going on.

In order for us to identify the running application instances we will need to know the AMI of the instance. We know our application instance has an AMI of ami-20000000. We could store this information straight into our script, however if we ever rebuilt the application instance, we would then have to update the script, which would then force us to have to rebuild our HAProxy instance. Not a lot of fun. What I like to do, is to store the AMI in S3. That way if I ever update my application instance, I can just change a small file in S3 and have my load balancer pick up those changes. Lets assume I stored the AMI of the application image in a file called ami-application and uploaded it to an S3 bucket so that it can be found at http://stefans-test-ami-bucket.s3.amazonaws.com/ami-application.

The Script

Basically what our script is going to do is the following:

  • Get the AMI from S3 of our application image
  • Get a list of all running instances from Amazon, and log which ones match our application image
  • Get a default config file for HAProxy
  • Append the IP addresses of the running application instances to the config file
  • Compare the new config file to the old config file, if they are the same no action is needed, if they are different, replace the haproxy.cfg and restart the server

Our default config file for HAProxy will essentially be the full config without any server directives in the backend section. For example, our default config file could look something like:

global
        log 127.0.0.1   local0
        log 127.0.0.1   local1 notice
        maxconn 50000
        user haproxy
        group haproxy
        daemon
        chroot /var/chroot/haproxy

defaults
        log       global
        mode    tcp
        option   httplog
        option   forwardfor
        retries   2
        redispatch
        maxconn       50000
        timeout connect    10000
        timeout client        30000
        timeout server       60000
        stats uri /ha_stats
        stats realm Global\ statistics
        stats auth myusername:mypassword

frontend www *:80
        maxconn 40000
        mode http
        default_backend www_farm

backend www_farm
        mode http
        balance roundrobin

Now that we have our default config file, all we have to do to generate our HAProxy config is to add a server line to the end of it for each instance we have running. Lets look at the PHP code to do that:

<?php
// setup our include path, this is needed since the script is run from cron
ini_set("include_path", ".:../:./include:../include:/path/to/this/script/update-haproxy");

// including the Amazon EC2 PHP Library
require_once("Amazon/EC2/Client.php");

// include the config file containing your AWS Access Key and Secret
include_once ('.config.inc.php');

// location of AMI of the application image
$ami_location = 'http://stefans-test-ami-bucket.s3.amazonaws.com/ami-application';
$ami_id = chop(file_get_contents($ami_location));


// connect to Amazon and pull a list of all running instances
$service = new Amazon_EC2_Client(AWS_ACCESS_KEY_ID,
                                       AWS_SECRET_ACCESS_KEY);

$response = $service->describeInstances($request);

$describeInstancesResult = $response->getDescribeInstancesResult();
$reservationList = $describeInstancesResult->getReservation();


// loop the list of running instances and match those that have an AMI of the application image
$hosts = array();
foreach ($reservationList as $reservation) {
        $runningInstanceList = $reservation->getRunningInstance();
        foreach ($runningInstanceList as $runningInstance) {
                $ami = $runningInstance->getImageId();

                $state = $runningInstance->getInstanceState();

                if ($ami == $ami_id && $state->getName() == 'running') {

                        $dns_name = $runningInstance->getPublicDnsName();

                        $app_ip = gethostbyname($dns_name);

                        $hosts[] = $app_ip;
                }
        }
}

// get our default HAProxy configuration file
$haproxy_cfg = file_get_contents("/share/etc/.default-haproxy.cfg");

foreach ($hosts as $i=>$ip) {
        $haproxy_cfg .= '
        server server'.$i.' '.$ip.':80 maxconn 250 check';
}
// test if the configs differ
$current_cfg = file_get_contents("/path/to/haproxy.cfg");
if ($current_cfg == $haproxy_cfg) {
        echo "everything is good, configs are the same.\n";
}
else {
        echo "file out of date, updating.\n";
        file_put_contents('/path/to/this/script/.latest-haproxy.cfg', $haproxy_cfg);
        system("cp /path/to/this/script/.latest-haproxy.cfg /path/to/haproxy.cfg");
        system("/etc/init.d/haproxy reload");
}
?>

I think this script is pretty self explanatory, and does what we outlined as our goals for it above. If you ran the script from command line your HAProxy config file would get updated, and the server would be restarted.

Now that we have our working script, the last thing we need to do is setup this script to run on cron. I find updating every 2-5 minutes is sufficient to keep your config updated for auto scaling.

One of the nice benefits of having this script is it allows you the ability to easily pre-scale your solution if you know you are going to receive a big traffic spike. All you have to do is launch as many new instances of your application image and the script will manage the setup of HAProxy for you.

Conclusion

With under 100 lines of code and a few tools we were able to setup a script to keep the HAProxy config file up to date with your running application instances. This allows us the ability to use HAProxy instead of Amazon Load Balancing, but still get to have all the benefits of Auto Scaling. Lastly it is to note this script is just an example, and should be tailored to your own environment as you see fit. It is also best to store this script, and the default HAProxy config in a EBS volume if possible, as it will save you from having to rebuild your instance.