Viewing Pageviews for Custom Variables in Google Analytics

November 6th, 2013 — 8:31am

Recently I started using Google Analytics Custom Variables to track additional information about one of my sites. What I discovered when viewing the Custom Variables section in GA however, was that it was showing visits and not pageviews. What I wanted to see was what page each variable value was set on, and the number of pageviews it had.

To do this you need to create a segment in GA. Start by clicking the segment dropdown near the top of most reports:

Screen Shot 2013-11-06 at 8.13.45 AM

Next click “Create New Segment”. You will then be presented with a bunch of options of how to segment your traffic. Under “Advanced” click “Conditions”. Then choose “Custom Variable (Key X)” where X is the key you used when you setup your custom variable on your site. Change the comparison operator to “exactly matches”, then choose the key you used. At this point you could view your segment. However if you used multiple values for your variable, it is best to setup a segment for each value. To do this simply click the “And” button to the right of your first condition, and add a second filter with “Custom Variable (Value X)”, “exactly matches” and the value of the variable in the input box. Click “Save” and you should now see results for your custom segment.

Screen Shot 2013-11-06 at 8.21.43 AM

If you repeat these steps for every value of your custom variables, you can then view multiple segments and see which custom variable value performed the best.

Screen Shot 2013-11-06 at 8.26.56 AM

Comment » | Google Analytics

Reducing Writes on Amazon RDS

June 13th, 2012 — 11:25pm

Recently I transitioned a client from using an ec2 based MySQL server to Amazon RDS. After the migration I noticed a large number of writes to the system which was degrading the performance of MySQL. Inserts and updates were taking multiple seconds to complete, and this was on a larger sized instance than the ec2 solution.

I initially found this article that suggested changing the innodb_flush_log_at_trx_commit variable to either 0 or 2 could help solve the problem. However even after this change the system was still writing extensively to disk.

I dug a little deeper and found in MySQL a large majority of queries where writing temporary tables which would explain the extensive writing to disk. After analysing the previous my.cnf file and comparing to the RDS instance I realized I failed to mirror all the variables I previously had setup. Specifically the tmp_table_size, max_heap_table_size and query_cache_size. Of the 3 the one that had the most dramatic affect on the performance and writs was the query_cache_size. After setting this variable to what the ec2 instance was using the CPU load and system writes drop substantially.

See the following charts:

So if you run into a similar problem try tweaking your query_cache_size and see if that affects your system writes as dramatically as it did for me.

Comment » | Amazon Web Services, MySQL, Performance, RDS

Quick Debugging Trick in WordPress

August 5th, 2011 — 1:13am

I always found it a pain when writing new plugins/themes in wordpress to have to turn on or off the WP_DEBUG flag to get errors to show. Plus often times I will be working on a site that others are also viewing or working on, so turning on the flag would disrupt their work. A really simple solution to this problems was to wrap a define statement for the WP_DEBUG flag in an if statement. For example:


if ($_GET['debug'] == 1) {
define("WP_DEBUG", 1);
}
else {
define("WP_DEBUG", 0);
}

The beauty of the code is that you can trigger debugging easily by adding ?debug=1 onto your uri. This saves you from having to change the wp-config.php file, as well as solves the issue of multiple users/developers.

Note this should only be done on a development site. Having this on a live WordPress install could open up your site to be more easily exploiting as you are giving your attackers access to the notifications, warnings and errors.

4 comments » | Short Notes, Wordpress

Mirroring A Hard Drive with DD in Linux

July 23rd, 2011 — 8:37am

I recently bought a new hard drive for my laptop. Well if you can consider 3 months ago recently. I had grand plans of installing Mac OS on the drive, and making a hackintosh of my laptop. Well I got lazy, so this brand new hard drive sat in a box, until of course this week. With an impending move to Germany, I got new motivation to install the drive, at the very least to increase my hard drive size.

One of my big issues with upgrading my hard drive was I wanted to retain all my files, and well pretty much wanted everything to work as it did before (I have multiple partitions, running primarily Ubuntu but also Windows). After a little searching, the best solution appeared to use the dd command.

The beauty of dd is it copies data at the block level, thus you can do exact mirror of drives. While this does take a little extra time it does make for a perfect copy of your current drive. dd is simple to run, but make sure you use the flags correctly. If you reverse the input and output file you can easily destroy your existing drive. Here is the basic command:

dd if=inputfile of=outputfile

Well that is simple enough. To do my hard drive upgrade I simply booted off an Ubuntu cd, opened a terminal, plugged in my new hard drive via a USB case, then ran dd. The process took around 5 hours. When it was done I swapped out my old hard drive for the new one and my system booted flawlessly.

After checking the system booted fine, I booted the Ubuntu cd one more time, then ran gparted and resized my old partitions to take advantage of the extra space of the new drive.

dd is a really simple way to mirror drives or partitions, whether for backing up or migrating to new drives. Just make sure you are absolutely sure about your input and output files to the command, because if you get the wrong, you will find yourself in tears trying to recover your data.

Comment » | Linux

WordPress Vim Syntax Highlighting

May 22nd, 2011 — 9:18pm

Lately I have been doing a lot more work in WordPress. I primarily code in Vim, and it always bugged me there was no syntax highlighting for WordPress in Vim (or at least that I could find), so I decided to write one. I started off pulling all the functions from this page: http://codex.wordpress.org/Function_Reference. However I found this page to be pretty incomplete, so I continued to add functions to the file as I went along.

I decided to release the file to the public in case anyone is looking for WordPress syntax highlighting. Note, by no means is this file complete, but is a good base. Feel free to contact me to fill in any functions I am missing.

Find it on Github

To install, download and place in your .vim/syntax/ directory. Then load by running in vim:

:set syn=wordpress

Or place something like the following in your .vimrc file:

autocmd BufEnter *.php :set syn=wordpress

6 comments » | Vim, Wordpress

FauquierCam.com Updated

February 1st, 2011 — 6:35pm

Today I launched the an updated version of FauquierCam.com. I redid the design of the page so it looks a lot cleaner, as well added a blog to showcase notable pictures and videos from the camera. You can check it out at: http://www.fauquiercam.com.

Comment » | Fauquier Cam, Short Notes

How to Setup a Webcam in Linux

November 5th, 2010 — 6:32pm

At the beginning of this year I announced the launch of a webcam I had setup at my parents house in the Kootenays. In the announcement I talked briefly how the whole application worked, however wanted to get into more technical details here.

Hardware

  • Mini Compaq computer with Ubuntu Server installed
  • Canon A310 digital camera
  • USB Cable
  • Dedicated Server Hosted Remotely

Overview

The mini Compaq computer is located at my parents place sitting in their sun room. It is connected to the Internet, as well via a USB cable it is connected to the Canon A310 camera, which is setup on a small tripod pointing out my parents window.

On the server a cron is setup to run a script every 5 minutes. The script will determine if there is daylight enough outside to take a photo, and there is it will execute a gphoto2 command that will trigger the camera to take a photo and download the image to the computer. Once the image is on the computer it is then FTP’ed to my dedicated server for processing.

A cron is setup on my dedicated server for a script that checks an incoming folder for newly uploaded images. When a new image is found, it will process the image, creating several different file sizes, upload each file to Amazon S3 for storage and distribution through Amazon Cloudfront, and store a record of the image in a MySQL database.

Finally I have a web front end that reads from the database to display the latest photo, along with an archive of all the older photos that had been taken. Lets de-construct each step in the process to document how it all works.

Taking the Photo

The wonderful open source library gphoto2 does most of the work here. Gphoto2 allows you to control a camera in Linux via the command line. There is a large list of cameras on the gphoto2 website that are supported. Since I didn’t want to spend a lot of money on this project I bought an old Canon camera that was noted to work well with gphoto2.

On the mini computer at my parents a cron executes a PHP script every 5 minutes. The PHP script first checks to see if there are any other instances of gphoto2 running, if there are it exits. Next it determines at what time sunset and sunrise is, and if the time of day is between those two times as we don’t want to take photos of the darkness of night. Next the script checks to see if it is in the first 15 of sunrise, or last 15 minutes of sunset, if so we want to use a higher ISO on the camera to take the photo. Finally the gphoto2 script is called to take the photo.

Here is our cron entry:

*/5 * * * * /usr/bin/php /PATH/TO/BIN/bin/call-shoot-camera.php

Here is what my script looks like:


<?php
/**
 * Script to determine if it is light outside and whether to take a photo
 **/

// directory configuration
$base_dir = "/PATH/TO/HOME/FOLDER";
$archive_dir = $base_dir . "archive/".date("Y") . "/" . date("m") . "/" . date("d") . "/";

// check to see if gphoto2 is already running
$ps = `ps ua`;
if (strstr($ps, 'gphoto2')) {
    return 0; 
}

// fifteen minutes in seconds
$fifteen_min = 900;

// setup directory for todays date
if (!is_file($archive_dir)) {
    `/bin/mkdir -p $archive_dir`;
}

// sunrise/sunset config
$zenith=96.8;
$long = -118.00737;
$lat = 49.8672;

$now = time();

$sunrise = date_sunrise($now, SUNFUNCS_RET_TIMESTAMP, $lat, $long, $zenith, -8);
$sunset = date_sunset($now, SUNFUNCS_RET_TIMESTAMP, $lat, $long, $zenith, -8);

$first_fifteen = $sunrise + $fifteen_min;
$second_fifteen = $sunrise + $fifteen_min + $fifteen_min;

$last_fifteen = $sunset - $fifteen_min;
$second_last_fifteen = $sunset - $fifteen_min - $fifteen_min;

// set the ISO higher if we are in the first 15 minutes of sunrise or last 15 minutes of sunset
if (($now >= $sunrise &#038;&#038; $now <= $first_fifteen) || ($now >= $last_fifteen &#038;&#038; $now <= $sunset)) {
    $iso = 2;
}
else {
    $iso = 1;
}


// take a photo if the time of day is between sunrise and sunset
if ($now >= $sunrise &#038;&#038; $now <= $sunset) {
    `/usr/bin/gphoto2 --set-config capturetarget=0 --set-config capture=1 
--set-config imagesize="medium 1" --set-config aperture=4 --set-config iso=$iso 
--set-config photoeffect=1 --capture-image-and-download 
--filename "/PATH/TO/CAPTURE/capture/archive/%Y-%m-%d_%H:%M:%S.jpg" 
--hook-script=/PATH/TO/BIN/bin/upload-and-set-new-photo.php`;
}

return 0;
?>

I will leave you to discover what all the flags do when calling the gphoto2 command, however would like to point out the –hook-script flag. What this flag does is call a script after the newly taken image has been downloaded from the camera to the computer. Thus we are able to call a second script that will upload our new photo to our server. This script is extremely basic and uses the standard linux FTP command.


<?php


if (getenv('ACTION') == 'download') {
    $file = getenv('ARGUMENT');
    $array = preg_split("/\//", $file);

    $final_file = array_pop($array);

    `/usr/bin/ftp -n -i -p HOSTNAME_OF_SERVER <<EOF
user FTPUSERNAME FTPPASSWORD
binary
put $file $final_file`;
}

As you can see, a really basic script that simply checks to see if the file was downloaded and if so to upload via ftp to our dedicated server.

I wanted to point out one final flag of gphoto2 before moving on to the dedicated server processing. An extremely handy flag when using gphoto2 with a new camera is the –list-config flag. This will tell you all the different config items of the camera you can set when using gphoto2.

Processing the Photo

On the dedicated server images are being uploaded to an incoming folder to be processed by another script. The processing script will scan the incoming folder looking for new images. Before processing an image, the script will check to make sure the image has not been written to within 20 seconds of the current time to make sure that the photo currently not being uploaded.

Once it is determined that the photo is not being uploaded we can process the image. The processing is done using the GD PHP extension, then uploaded to Amazon S3 using undesigned’s fantastic S3 PHP library. Finally the image information is stored in a MySQL database table.

Here is how the script looks:


<?php

// include our config file (contains db passwords and aws keys)
include_once("/PATH/TO/CONFIG/conf/config.inc");

// check if the incoming folder is already being processed
if (file_exists($cam_config['tmp_dir'] . 'process-incoming.pid')) {
    exit;
}
else {
    file_put_contents($cam_config['tmp_dir'] . 'process-incoming.pid', getmypid());
}

// libraries needed for s3 uploads and MySQL access
include_once($cam_config['base_dir'] . 'classes/s3-php5-curl/S3.php');
include_once($cam_config['base_dir'] . 'classes/db.cls');

$db = new db($cam_config['db_server'], $cam_config['db_user'], $cam_config['db_password'], $cam_config['db']);
$s3 = new S3($cam_config['aws_access_key'], $cam_config['aws_secret_key']);

// read all new files
$file_array = array();
if ($handle = opendir($cam_config['incoming_dir'])) {
    while (false !== ($file = readdir($handle))) {
        if ($file != "." &#038;&#038; $file != "..") {
            $file_array[] = $file;
        }
    }
    closedir($handle);
}

// get some exif info
foreach ($file_array as $file) {
    $file_full_path = $cam_config['incoming_dir'].$file;
    $file_base = basename($file, '.jpg');

    $filemtime = filemtime($file_full_path);

    $diff = time() - $filemtime;

    // skip any file that is still being written to
    if ($diff <= 20) {
        continue;
    }

    $exif = exif_read_data($file_full_path, 'File');

    $timestamp = $exif['FileDateTime'];

    $archive_dir = $cam_config['archive_dir'] . date("Y/m/d", $timestamp);

    if (!file_exists($archive_dir)) {
        `mkdir -p $archive_dir`;
    }

    $height = $exif['COMPUTED']['Height'];
    $width = $exif['COMPUTED']['Width'];
    // loop the different file sizes we want to create
    foreach ($cam_config['dir_array'] as $dir=>$new_width) {

        if ($dir == 'full') {
            // upload fullsize image to s3
            $s3->putObjectFile($file_full_path, $cam_config['bucket'], baseName($file_full_path), S3::ACL_PUBLIC_READ);
            `cp $file_full_path $archive_dir`;
        }
        else {
            // resize the image
            $new_height = ceil($new_width/$width * $height);
            $src = @imagecreatefromjpeg($file_full_path);
            $dst = @imagecreatetruecolor($new_width, $new_height);
            @imagecopyresampled($dst, $src, 0, 0, 0, 0, $new_width, $new_height, $width, $height);

            $new_file = $cam_config['tmp_dir'] . $file_base . '-' . $dir . '.jpg';
            @imagejpeg($dst, $new_file, 90);
            @imagedestroy($src);
            @imagedestroy($dst);

            // upload resized image to s3
            $s3->putObjectFile($new_file, $cam_config['bucket'], baseName($new_file), S3::ACL_PUBLIC_READ);
            `cp $new_file $archive_dir`;
        }

        // insert the photo into the database
        $picture_row = array(
            'cam_id'=>$cam_config['cam_id'],
            'timestamp'=>$timestamp,
            'filename'=>$file_base
        );

        $result = $db->createRow("picture", $picture_row);
    }

    `rm -f $file_full_path`;
    `rm -f {$cam_config['tmp_dir']}*.jpg`;
}

// remove the pid
unlink($cam_config['tmp_dir'] . 'process-incoming.pid');

With this script running on a cron we can now process any new file that gets uploaded to our server, upload it to Amazon S3, and log it in our database. With the photos now stored in the database we can easily setup a simple homepage to display the latest photo, and produce an archive of photos taken over time.

Summary

With the help of a few powerful open source programs and libraries and a small bit of hardware we are able to easily setup a camera that takes a photo every 5 minutes, and posts it to the Internet. You can view the webcam I setup at www.fauquiercam.com. In the future I will extend the processing further such that after every day, week, or month a time lapse is generated using mencoder.

4 comments » | Amazon S3, Fauquier Cam, Scripts

Featured on YUIblog.com

September 28th, 2010 — 9:18am

Today I have a guest post featured on Yahoo’s YUI Blog talking about implementing YUI components on the Car Rental Express website (the company I work for). It is a general overview of some of the components we utilize the most and why we chose to use them.

Comment » | Javascript, Short Notes, YUI

Rotating EBS Snapshots

September 14th, 2010 — 6:50pm

If you use Elastic Block Storage (EBS) for storing your files on your ec2 instances you more than likely backup those files using the ec2 snapshots. If you don’t already do this you should probably start, as EBS volumes are not 100% fault tolerant, and can (and do) degrade just like normal drives. A good script for taking snapshots of data can be found on the alestic.com website, called ec2-consistent-snapshot. You can find all the information for this script here:

http://alestic.com/2009/09/ec2-consistent-snapshot

How to Rotate EBS Snapshots

After using the ec2-consistent-snapshot script for a while I realized I would eventually need to find something to rotate these backups as they were growing out of control. Some of our volumes were having snapshots done every hour, and that was adding up quickly. Google provided me with no easy solution for rotating the snapshots, so I decided to write my own script.

Essentially what I wanted was to have a script that would rotate the snapshots in Grandfather-Father-Son type setup. I wanted to have hourly backups kept for 24 hours, daily kept for a week, weekly kept for a month, and monthly kept for a year. Anything older than that I don’t want, however the script can be tweaked to allow for older backups.

Basically what the script does is the following:

  • Gets a list of all snapshots and puts them into an array indexed by the volume and the date the snapshot was taken
  • For a given volume organize the snapshots so that there are only hourly snapshots for 1 day, daily snapshots for 1 week, weekly snapshots for 1 month, and monthly snapshots for 1 year and collect which snapshots require deleting.
  • Delete the snapshots that are set for delete.

I wrote the script in PHP, mainly because it is what I feel most comfortable using. I am also once again using the Amazon PHP library. Here is the script in it’s entirety.


<?php
/*
 * rotate-ebs-snapshots.php
 *
 * Author: Stefan Klopp
 * Website: http://www.kloppmagic.ca
 * Requires: Amazon ec2 PHP Library
 *
 */

ini_set("include_path", ".:../:./include:../include:/PATH/TO/THIS/SCRIPT");

// include the amazon php library
require_once("Amazon/EC2/Client.php");
require_once("Amazon/EC2/Model/DeleteSnapshotRequest.php");

// include our configuration file with out ACCESS KEY and SECRET
include_once ('.config.inc.php');

$service = new Amazon_EC2_Client(AWS_ACCESS_KEY_ID,
                                       AWS_SECRET_ACCESS_KEY);


// setup our array of snapshots
$snap_array = setup_snapshot_array();

// call to rotate (you can call this for every volume you want to rotate)
rotate_standard_volume('VOLUME_ID_YOU_WISH_TO_ROTATE');


/* 
 * Used to setup an array of all snapshots for a given aws account
 */
function setup_snapshot_array() {
    global $service;
    // Get a list of all EBS snapshots
    $response = $service->describeSnapshots($request);

    $snap_array = array();

    if ($response->isSetDescribeSnapshotsResult()) {
        $describeSnapshotsResult = $response->getDescribeSnapshotsResult();
        $snapshotList = $describeSnapshotsResult->getSnapshot();
        foreach ($snapshotList as $snapshot) {
            if ($snapshot->getStatus() == 'completed') {

                    // date is in the format of 2009-04-30T15:32:00.000Z
                    list($date, $time) = split("T", $snapshot->getStartTime());

                    list($year, $month, $day) = split("-", $date);
                    list($hour, $min, $second) = split(":", $time);

                    // convert the date to unix time
                    $time = mktime($hour, $min, 0, $month, $day, $year);

                    $new_row = array(
                            'snapshot_id'=>$snapshot->getSnapShotId(),
                            'volume_id'=>$snapshot->getVolumeId(),
                            'start_time'=>$time
                    );
                    // add to our array of snapshots indexed by the volume_id
                    $snap_array[$new_row['volume_id']][$new_row['start_time']] = $new_row;
            }
        }
    }

    // sort each volumes snapshots by the date it was created
    foreach ($snap_array as $vol=>$vol_snap_array) {
            krsort($vol_snap_array);
            $snap_array[$vol] = $vol_snap_array;
    }

    return($snap_array);
}

/*
 * Used to rotate the snapshots
 */
function rotate_standard_volume($vol_id) {
        global $snap_array, $service;

        // calculate the date ranges for snapshots
        $one_day = time() - 86400;
        $one_week = time() - 604800;
        $one_month = time() - 2629743;
        $one_year = time() - 31556926;

        $hourly_snaps = array();
        $daily_snaps = array();
        $weekly_snaps = array();
        $monthly_snaps = array();
        $delete_snaps = array();

        echo "Beginning rotation of volume: {$vol_id}\n";

        foreach($snap_array[$vol_id] as $time=>$snapshot) {

                echo "Testing snapshot {$snapshot['snapshot_id']} with a date of ".date("F d, Y @ G:i:s", $time)."... ";

                if ($time >=  $one_day) {
                        echo "Snapshot is within a day lets keep it.\n";
                        $hourly_snaps[$time] = $snapshot;
                }
                elseif ($time < $one_day &#038;&#038; $time >= $one_week) {
                        $ymd = date("Ymd", $time);
                        echo "Snapshot is daily {$ymd}.\n";

                        if (is_array($daily_snaps[$ymd])) {
                                echo "Already have a snapshot for {$ymd}, lets delete this snap.\n";
                                $delete_snaps[] = $snapshot;
                        }
                        else {
                                $daily_snaps[$ymd] = $snapshot;
                        }
                }
                elseif ($time < $one_week &#038;&#038; $time >= $one_month) {
                        $week = date("W", $time);
                        echo "Snapshot is weekly {$week}.\n";

                        if (is_array($weekly_snaps[$week])) {
                                echo "Already have a snapshot for week {$week}, lets delete this snap.\n";
                                $delete_snaps[] = $snapshot;
                        }
                        else {
                                $weekly_snaps[$week] = $snapshot;
                        }
                }
                elseif ($time < $one_month &#038;&#038; $time >= $one_year) {
                        $month = date("m", $time);
                        echo "Snapshot is monthly {$month}.\n";

                        if (is_array($monthly_snaps[$month])) {
                                echo "Already have a snapshot for month {$month}, lets delete this snap.\n";
                                $delete_snaps[] = $snapshot;
                        }
                        else {
                                $monthly_snaps[$month] = $snapshot;
                        }
                }
                else{
                        echo "Snapshot older than year old, lets delete it.\n";
                        $delete_snaps[] = $snapshot;
                }
        }

        foreach ($delete_snaps as $snapshot) {
                echo "Delete snapshot {$snapshot['snapshot_id']} with date ".date("F d, Y @ H:i", $snapshot['start_time'])." forever.\n";
                $request = new Amazon_EC2_Model_DeleteSnapshotRequest();
                $request->setSnapshotId($snapshot['snapshot_id']);
                $response = $service->deleteSnapshot($request);
        }
        echo "\n";
}

You can either run the script by editing the call to rotate_standard_volume. You can call this method for each volume you wish to rotate snapshots for. Also feel free to change the values of the date ranges to keep snapshots for a given date range for longer or shorter periods.

Finally to make this script effective you should have it run at least once a day via cron.

Conclusion

If you are like me and utilize EBS snapshots for backups of your data you will likely need to rotate those snapshots at some point. With the script above you should be able to quickly and easily rotate your snapshots. With a few tweaks you should be able to easily customize the rotation schedule to suit your needs.

7 comments » | Amazon Web Services, Elastic Block Storage, Scripts

Auto Scaling with HAProxy

September 3rd, 2010 — 6:57pm

In my last post I showed you how to setup Auto Scaling with Elastic Load Balancing. In this post I will show you how you can utilize Amazon’s Auto Scaling with an instance running HAProxy, and have HAProxy automatically update it’s configuration files when your setup scales up and down.

The Architecture

For the rest of this post I am going to assume that we have two ec2 images we are using. The first image is our load balancer running HAProxy, this image is setup to forward incoming traffic to our second image, which is the application image. The application image will be setup with Amazon’s Auto Scaling to build scale up and down depending on the load of the instances. Lets assume our load balancer image has a AMI of ami-10000000 and our application image has an AMI of ami-20000000.

Auto Scaling Setup

The setup for the auto scaling will be pretty similar to what we did in the previous post. First we will setup a launch config:

as-create-launch-config auto-scaling-test --image-id ami-20000000 --instance-type c1.medium

Then we will create our auto scaling group:

as-create-auto-scaling-group auto-scaling-test --availability-zones us-east-1a --launch-configuration auto-scaling-test --max-size 4 --min-size 2

You will notice we ran this command without setting a load balancer. Since we are using HAProxy we do not need to set this up. Finally we will create our trigger for the scaling:

as-create-or-update-trigger auto-scaling-test --auto-scaling-group auto-scaling-test --measure CPUUtilization --statistic Average --period 60 --breach-duration 120 --lower-threshold 30 --lower-breach-increment"=-1" --upper-threshold 60 --upper-breach-increment 2

Once you have executed these commands, you will have 2 new application instances launched. This presents us with a problem, in order to send traffic to these instances we need to update the HAProxy config file so that it knows where to direct the traffic.

Updating HAProxy

In the following example I am going to show you how with the power of a simple script we can monitor for instances being launched or removed, and update HAProxy accordingly. I will be using S3 storage, PHP and the Amazon PHP Library to do this. You can use any programming language you prefer if you care to rewrite this code, the key is understanding what is going on.

In order for us to identify the running application instances we will need to know the AMI of the instance. We know our application instance has an AMI of ami-20000000. We could store this information straight into our script, however if we ever rebuilt the application instance, we would then have to update the script, which would then force us to have to rebuild our HAProxy instance. Not a lot of fun. What I like to do, is to store the AMI in S3. That way if I ever update my application instance, I can just change a small file in S3 and have my load balancer pick up those changes. Lets assume I stored the AMI of the application image in a file called ami-application and uploaded it to an S3 bucket so that it can be found at http://stefans-test-ami-bucket.s3.amazonaws.com/ami-application.

The Script

Basically what our script is going to do is the following:

  • Get the AMI from S3 of our application image
  • Get a list of all running instances from Amazon, and log which ones match our application image
  • Get a default config file for HAProxy
  • Append the IP addresses of the running application instances to the config file
  • Compare the new config file to the old config file, if they are the same no action is needed, if they are different, replace the haproxy.cfg and restart the server

Our default config file for HAProxy will essentially be the full config without any server directives in the backend section. For example, our default config file could look something like:

global
        log 127.0.0.1   local0
        log 127.0.0.1   local1 notice
        maxconn 50000
        user haproxy
        group haproxy
        daemon
        chroot /var/chroot/haproxy

defaults
        log       global
        mode    tcp
        option   httplog
        option   forwardfor
        retries   2
        redispatch
        maxconn       50000
        timeout connect    10000
        timeout client        30000
        timeout server       60000
        stats uri /ha_stats
        stats realm Global\ statistics
        stats auth myusername:mypassword

frontend www *:80
        maxconn 40000
        mode http
        default_backend www_farm

backend www_farm
        mode http
        balance roundrobin

Now that we have our default config file, all we have to do to generate our HAProxy config is to add a server line to the end of it for each instance we have running. Lets look at the PHP code to do that:

<?php
// setup our include path, this is needed since the script is run from cron
ini_set("include_path", ".:../:./include:../include:/path/to/this/script/update-haproxy");

// including the Amazon EC2 PHP Library
require_once("Amazon/EC2/Client.php");

// include the config file containing your AWS Access Key and Secret
include_once ('.config.inc.php');

// location of AMI of the application image
$ami_location = 'http://stefans-test-ami-bucket.s3.amazonaws.com/ami-application';
$ami_id = chop(file_get_contents($ami_location));


// connect to Amazon and pull a list of all running instances
$service = new Amazon_EC2_Client(AWS_ACCESS_KEY_ID,
                                       AWS_SECRET_ACCESS_KEY);

$response = $service->describeInstances($request);

$describeInstancesResult = $response->getDescribeInstancesResult();
$reservationList = $describeInstancesResult->getReservation();


// loop the list of running instances and match those that have an AMI of the application image
$hosts = array();
foreach ($reservationList as $reservation) {
        $runningInstanceList = $reservation->getRunningInstance();
        foreach ($runningInstanceList as $runningInstance) {
                $ami = $runningInstance->getImageId();

                $state = $runningInstance->getInstanceState();

                if ($ami == $ami_id &#038;&#038; $state->getName() == 'running') {

                        $dns_name = $runningInstance->getPublicDnsName();

                        $app_ip = gethostbyname($dns_name);

                        $hosts[] = $app_ip;
                }
        }
}

// get our default HAProxy configuration file
$haproxy_cfg = file_get_contents("/share/etc/.default-haproxy.cfg");

foreach ($hosts as $i=>$ip) {
        $haproxy_cfg .= '
        server server'.$i.' '.$ip.':80 maxconn 250 check';
}
// test if the configs differ
$current_cfg = file_get_contents("/path/to/haproxy.cfg");
if ($current_cfg == $haproxy_cfg) {
        echo "everything is good, configs are the same.\n";
}
else {
        echo "file out of date, updating.\n";
        file_put_contents('/path/to/this/script/.latest-haproxy.cfg', $haproxy_cfg);
        system("cp /path/to/this/script/.latest-haproxy.cfg /path/to/haproxy.cfg");
        system("/etc/init.d/haproxy reload");
}
?>

I think this script is pretty self explanatory, and does what we outlined as our goals for it above. If you ran the script from command line your HAProxy config file would get updated, and the server would be restarted.

Now that we have our working script, the last thing we need to do is setup this script to run on cron. I find updating every 2-5 minutes is sufficient to keep your config updated for auto scaling.

One of the nice benefits of having this script is it allows you the ability to easily pre-scale your solution if you know you are going to receive a big traffic spike. All you have to do is launch as many new instances of your application image and the script will manage the setup of HAProxy for you.

Conclusion

With under 100 lines of code and a few tools we were able to setup a script to keep the HAProxy config file up to date with your running application instances. This allows us the ability to use HAProxy instead of Amazon Load Balancing, but still get to have all the benefits of Auto Scaling. Lastly it is to note this script is just an example, and should be tailored to your own environment as you see fit. It is also best to store this script, and the default HAProxy config in a EBS volume if possible, as it will save you from having to rebuild your instance.

9 comments » | Amazon Web Services, Auto Scaling, HAProxy, Scaling, Scripts

Back to top