Using Rackspace Cloud Load Balancers to make Cloud Servers disposable and immutable

Depending on your needs, Rackspace’s implementation of Cloud Servers is either better – or worse – than Amazon EC2. I happen to see several advantages that Rackspace has over EC2:

  1. Cloudfiles images are insanely easy to manage, and any server can be spun up from any snapshot. This is so much easier than trying to create a custom AMI that it’s not even worth discussing.
  2. The initial buy-in is incredibly low – a single small server will run you about $12 on Rackspace, while running ~$70 on EC2. Yes, I know that the hardware is basically apples-to-oranges, but we’re talking about initial buy-in here, not what $70 will get you. I can run a web server box and a PostgreSQL box separately for well under a single small instance on EC2.
  3. The support – oh the support. Zip, zilch, nada on EC2… while on Rackspace – a human in under 20 seconds. It’s glorious.

However, I grew to love the idea of throw-away servers with EC2. Alestic, Ubuntu, and other well-known institutions provide ready-to-go AMIs, and we spent a little bit of time writing code that would take a vanilla API, run a shell script, and turn it in to whatever it needed to be in no time flat. We plugged this scripting in to our CI server, and would have a successful build of our production branch of code spin up new servers with new code, drop the old ones, and no one would be the wiser.

Since my new business is fledgling, I didn’t have the money to run multiple EC2 instances with RDS as another expense. Rackspace has always had a place in my heart since my days of running $10k/month in servers from one of my old jobs. While their cloud offerings are certainly less mature than Amazon’s, the price was right, and it was pretty easy to get going.

I knew there had to be a way to get the kind of flexibility I had with EC2 on Rackspace, even though they built their offerings completely oppositely – servers are supposed to be long-running, save state, etc.

Enter Cloud Load Balancers

As soon as Rackspace opened their Cloud Load Balancer product to beta (and I discovered a Python API to it) I knew I had found what I needed. The idea was this – write a script against their API that would spin up new servers, drop them behind static load balancers, drop out the old servers, and no one would be the wiser. Rackspace’s load balancers even had a huge advantage over Amazon’s Elastic Load Balancers – I could point an A record at them.

There were, however, a few hurdles I hadn’t expected.

Amazon has this neat little feature where you can pass in data to a newly booting server. You simply have your rc.local file call a special IP address, download the data, and in our case, execute said data as a script. That would get you in the door and your code running.

Rackspace was more difficult – much more. They allow you to pass in files to a newly booting server, but those files are then marked as read-only with no execute permissions. So passing in an rc.local file would mark that file as un-executable, and it wouldn’t run.

Hey cron, can you help me out?

The solution is an ugly one, but here it is. There are three files I pass in to a newly booting web server.

/etc/rc.local

/etc/rc.local.stage1

And the crux of the whole thing:

/etc/cron.d/initialrunner

Let’t take a look at these files. rc.local does as it should – it has some scripting that runs whenever the server reboots – in this case launching django-ztask – our async backgrounding app build on top of 0MQ (but that’s another story for another day).

rc.local.stage1 is run once and only once, and is the script that sets up our whole server. I’ve only put the first few lines, but you can see what they do – update the server, install git, and as you can imagine, do a bunch of other things (load in our git public keys, install apache, side-load our apache conf files, etc.).

But like I said – all three of these files will be marked as read-only and neither script will be executable. So we toss in a script in to /etc/cron.d/initialrunner that does 4 things the first time you reboot the server:

  1. Echo “hello world” in to a log file as an indication to me that it was run
  2. Change the permissions of /etc/rc.local and /etc/rc.local.stage1
  3. Execute /etc/rc.local.stage1 to set up the server, sending output to a log file

Simple enough.

Putting it all together

The build script (written in Python) is somewhat long, but I’m going to put it here, and explain below what it’s doing. This is the script that my CI server runs after a successful unittest build of code committed to our production branch.

build.py

So what’s going on here?

The if __name__ == '__main__': section at the bottom of the script is what gets run when the script is called from a shell. This is what it does in order:

  1. It instantiates a cloudservers.CloudServers object instance, (from the official python-cloudservers Python API package, available here) passing in our username and API key
  2. Calls create_server, passing in the name of the initial image to use and the size of the server.
  3. Calls wait_for_server which waits for the server to be done setting up. This takes the keyword argument with_url_ping that we’ll use later to not only wait for the server to respond, but acknowledge our site is up as well.
  4. Sleeps for 20 seconds (Rackspace seems to have a few places where it needs a couple seconds to catch up with itself)
  5. Calls the the update method on the server instance, which has the happy side effect of REBOOTING the server. That – guess what – causes our /etc/cron.d/initialrunner script to be called, which starts building our server out!
  6. Wait for the server again, waiting for the ping to return true this time, which means that Apache is up, mods have been enabled, it’s configured properly, and serving our content.
  7. … At this point in time, our REAL script also uses paramkio (SSH library) to run server migrations and run our unittest suite against the installed code, but I left that out.
  8. Creates an instance of cloudlb.CloudLoadBalancer object, (from the official python-cloudlb Python API package, available here) passing in our username, API key, and which of the data centers to use. (Hint: there are only two.)
  9. Call replace_servers, which takes the new servers, appends them to the load balancer, waits 20 seconds, and removes the old servers. It then deletes the old servers from existence.

It may seem somewhat complicated, but it’s not much different than using BOTO to do the same thing on EC2. There’s also nothing stopping you from modifying this script slightly to create multiple servers each time. The replace_servers function is already aware of replacing all servers with all new servers (notice it takes our new server in an array, expecting to load as many new servers behind our load balancers as we ask it to).

The whole script works using internal IP addresses – each Rackspace Cloud Server gets two IPs – a public and private IP address. Servers are able to address each other faster using the internal IP.

Any downsides?

Yeah, a couple. Every now and again the server build will just die. Seems to be a Rackspace issue and I need to get in touch with support about it, but since the final wait_for_server call will fail over by deleting our hanging server without touching existing servers, it’s not a big deal and the build script is just re-run.

Also, Rackspace load balancers work in an odd fashion – they can only run a single port. They can share IP addresses (see the replace_servers function which, if finding no existing load balancers, will create them for you) so your DNS doesn’t have to do anything fancy, but to run a website with port 80 and 443, you have to pay for two load balancers. Still relatively cheap, but an annoying downside.

Conclusion, maybe?

With a little love from the official API packages and Rackspace’s new Cloud Load Balancers, I was able to achieve the same immutable-server effect that I like from EC2, for a fraction of the price. While many people like long-running servers and using things like Capistrano or Fabric for deployment, I prefer that servers build new each time, test themselves, then simply replace outdated servers with not a clue that it’s happening to the end user. The beauty of this is that with a single line of code change, I can update what image new servers are built with, instantly taking advantage of an updated OS release.

This method isn’t for everyone, and it’s a lot more popular with EC2, but it’s possible with Rackspace, and I’d prefer to stick with Rackspace (and reap the benefits of their amazing support) and still get what I want. :)

(Image courtesy of clix on sxc.hu)

  • Dan Tagg

    Have you tried Fabric? It allows you to connect to your new server and run the necessary commands from your local machine. It works well with boto, so I imagine it’d work well with rackspace cloud too.

  • Pablo Chico de Guzman

    We were using ssh to solve the same problem, but i like the crontab approach much better. Also, we found an issue in thos solution: sometimes (10-20%) the crontab is executed before the files are injected. Thr chmod command fails and the crontab is not executed anymore.
    We have define the crontab to be scheduled every minute. The, we do:
    1/* * * * root (chmod 700 script_path && rm crontab_path && script_path) 2>> crontab.err >> crontab.out

    And you will see in crontab.err that the first execution of the cronyab sometime fails because script_path has not been injected yet.

    Also, I would like to ask you about the same problem in windows machines. Any suggestion?

  • http://davemartorana.com Dave Martorana

    Hi Pablo!

    I have sinced moved from Rackspace to AWS – AWS simply was lightyears ahead of Rackspace, and my budget increased.

    Since I left, Rackspace moved to OpenStack. The likelyhood is OpenStack has better methods to build a machine from scratch on launch – this being especially important for automatic scaling. I don’t know much about OpenStack, so I can’t speak intelligently to it. (The same goes for Windows.)

    Sorry I can’t be more help!

    Dave

  • Andrew

    Interesting article, but can you tell me what part of the above server configuration makes the server “immutable”?

  • http://davemartorana.com Dave Martorana

    It’s a philosophy, as opposed to actually locking the server in to pure immutability. In effect, if you want to make a change to the server, you spin up a new one and blow the old one away.

    So build once, to be thrown away when something changes.