Also on twitter ( twitter.com/nutrun )

Incremental deployment

I’ve recently had a chance to look at a high availability system designed and built by Forward colleagues Andy Kent and Paul Ingles. It is a critical web service with a very high impact of failure. Essentially, it must stay up at all times.

The service is hosted on Amazon EC2. It makes use of EC2’s geographically distributed regions and different availability zones within each region, fronted by AWS Elastic Load Balancing and additional global DNS fail over outside of EC2/AWS.

high-availability-arch

A part of the project that struck me as particularly interesting is the deployment strategy Paul and Andy settled on. Regardless of how much trust we have in our builds and QA process, deployments become a whole different, much more stressful activity when critical systems like the one under discussion are involved. Andy mentioned it is important to find the balance between what to automate and bits that should require manual input.

# deploy.rb

task :us_1b do
  set :region, 'us-east-1'
  set :servers, us_1b
  # More US 1b specific setup...
end

task :eu_1a do
  set :region, 'eu-west-1'
  set :servers, eu_1a
  # More EU 1a specific setup...
end

This service is incrementally deployed one availability zone at a time, e.g. cap us_1b deploy. Each deployment step is manual – it requires someone to push the button. This means that if something goes wrong, only part of the system will be affected, achieving significant redundancy. If the failure was severe enough to bring the system down, only one availability zone in one region will fail and the load balancers will make sure that this failure is transparent to end users and does not overall affect the entire system.

5 Responses to “Incremental deployment”

  1. Dmytro Shteflyuk Says:

    Why not use capistrano multistage?

  2. Tweets that mention Incremental deployment « nutrun -- Topsy.com Says:

    [...] This post was mentioned on Twitter by Peter Waldschmidt, George Malamidis. George Malamidis said: Me on @pingles's and @andykent's critical, high availability system deployment strategy http://bit.ly/8ZJxIz [...]

  3. volcane Says:

    Here’s something similar I came up with recently:

    http://www.devco.net/archives/2009/11/06/test_driven_deployment_-_mcollective_puppet_cucumber.php

  4. Andrew Johnstone Says:

    Hi,

    I’m curious to know how you handle high availability, in particular with your DNS and non AWS failover. In the event of complete failure at our current provider we can reconfigure all of our services within one hour onto EC2. However the biggest issue for us is redirecting DNS or redirecting traffic to the new servers. The clients that use our services are large corporates that cache DNS.

    Cheers

    Andy

  5. George Malamidis Says:

    Hi Andy,

    We use a third party DNS service that load balances between AWS regions and between the AWS / “non-AWS” servers. All requests go to one domain that gets delegated internally, at DNS level, so clients only have to deal with one DNS entry that never changes.

Leave a Reply