Nov 02 2008

HTTP accelerator cache purging

The use of an HTTP accelerator such as Varnish or Squid in reverse proxy/accelerator mode can drastically improve a web application's content delivery capabilities. Successfully implementing caching comes with numerous challenges but the fundamental goal is straightforward: A stack's dynamic content generating layer should ideally not have to generate the same content more than once.

require "rubygems"
require "sinatra"

def guitars
  @@guitars ||= ['Les Paul', 'SG']
end

get "/guitars" do
  guitars * ', '
end

This application exposes a /guitars resource, a request for which will always hit the application server if no caching has been in place. This can prove suboptimal had this been a high traffic website, especially if the operation of generating the content is system resource intensive. Luckily this problem has been solved before. A running instance of Varnish, for example, will only require the following configuration to enable caching of all resources the application serves.

backend default {
  .host = "127.0.0.1";
  .port = "4567";
}

One of the challenges associated with caching has to do with the cached content's freshness. We want to relieve server stress as much as possible, but we also need our application's consumers to receive correct data at all times. Let's assume that the application contacts guitar manufacturers' websites once a day to refresh its inventory and we have scheduled this operation to complete at 16:00 every day. This suggests that the cached resource should be refreshed every day at four o'clock in the afternoon to reflect the latest list of available guitar models. One of the ways of achieving this in HTTP is by making use of the Expires header, whose semantics are understood by (hopefully) any caching aware HTTP component.

require "time"

get "/guitars" do
  headers "Expires" => Time.parse("16:00").httpdate
  guitars * ', '
end

Things aren't always as straightforward. In many cases we cannot fully control the exact time or frequency a resource's content changes. The example application also comes with an admin interface, allowing the guitar list administrators to manually enter new guitar models.

post "/guitars" do
  guitars << params["guitar"]
  redirect("/guitars")
end

It is clear that a means for arbitrary expiration of cached content needs to be available in order to maintain content freshness. With Varnish, this capability comes in two flavors, one of which involves the use of a PURGE HTTP call. The following configuration enables this functionality.

acl purge {
  "localhost";
}

sub vcl_recv {
  if (req.request == "PURGE") {
    if (!client.ip ~ purge) {
      error 405 "Not allowed.";
    }
    lookup;
  }
}

sub vcl_hit {
  if (req.request == "PURGE") {
    set obj.ttl = 0s;
    error 200 "Purged.";
  }
}

sub vcl_miss {
  if (req.request == "PURGE") {
    error 404 "Not in cache.";
  }
}

To natively make use of this in Ruby, we need to extend the Net::HTTP library to support the PURGE method.

require "net/http"
require "uri"

module Net
  class HTTP
    class Purge < HTTPRequest
      METHOD = "PURGE"
      REQUEST_HAS_BODY = false
      RESPONSE_HAS_BODY = false
    end

    def purge(path, initheader=nil)
      request(Purge.new(path, initheader))
    end
  end
end

def purge_cache(u)
  uri = URI.parse(u)
  query = "?#{uri.query}" if uri.query
  Net::HTTP.new(uri.host, uri.port).start {|h| h.purge("#{uri.path}#{query}")}
end

Now we can expire the cached /guitars resource every time the list is amended.

post "/guitars" do
  guitars << params["guitar"]
  purge_cache("http://localhost/guitars")
  redirect("/guitars")
end

Although this method is effective, there can be cases where the bidirectional coupling between the application and caching layers might be undesirable. With the fundamental functional pieces in place, however, it is not hard to implement a more elaborate strategy such as the one described in Cache Channels in order to reduce the application layer's knowledge of the caching infrastructure.