Archive for November, 2008

Rack cache headers

Saturday, November 8th, 2008

Rack is an interface between web servers and Ruby web frameworks. The HTTP protocol, amongst other things, defines requirements on HTTP caches in terms of header fields that control cache behavior. The purpose of this article is to demonstrate a possible implementation of a piece of Rack Middleware which enables web application developers to configure a web application’s resource cache related headers in a non obtrusive, centralized manner.

Rack supports the notion of Middleware, pieces of code that sit between the HTTP request and response life cycle. Rack::Lint, for example, validates an application’s requests and responses according to the Rack specification.

Rack::Handler::Mongrel.run(
  Rack::Lint.new(app), :Host => "0.0.0.0", :Port => 9999
)

Similarly, if we were to implement a cache header producing layer on top of Rack we’d end up with a construct similar to the following.

Rack::Handler::Mongrel.run(
  Rack::Lint.new(
    Rack::CacheHeaders.new(app)
  ), :Host => "0.0.0.0", :Port => 9999
)

Here’s a possible way of configuring how an application provides HTTP caching headers based on URL path patterns.

Rack::CacheHeaders.configure do |cache|
  cache.max_age("/rock", 3600)
  cache.expires("/metal", "16:00")
end

Following is a potential implementation for the above.

module Rack
  class CacheHeaders
    def initialize(app)
      @app = app
    end

    def call(env)
      result = @app.call(env)
      header = Configuration[env['PATH_INFO']].to_header
      result[1][header.key] = header.value
      result
    end

    def self.configure(&block)
      yield Configuration
    end

    class Configuration
      def self.max_age(path, duration)
        paths[path] = MaxAge.new(duration)
      end

      def self.expires(path, date)
        paths[path] = Expires.new(date)
      end

      def self.[](key)
        paths[key]
      end

      def self.paths
        @paths ||= {}
      end
    end

    class MaxAge
      def initialize(duration)
        @duration = duration
      end

      def to_header
        Header.new("Cache-Control", "max-age=#{@duration}, must-revalidate")
      end
    end

    class Expires
      def initialize(date)
        @date = date
      end

      def to_header
        Header.new("Expires", Time.parse(@date).httpdate)
      end
    end

    class Header < Struct.new(:key, :value);end
  end
end

The code below is a minimal Rack based application.

require "rubygems"
require "rack"

app = proc {|env| [200, {"Content-Type" => "text/plain"}, "hello"]}

Rack::Handler::Mongrel.run(
  Rack::Lint.new(
    Rack::CacheHeaders.new(app)
  ), :Host => "0.0.0.0", :Port => 9999
)

In order to observe the caching related headers the application’s responses are decorated with we can use curl or something similar, i.e curl -I http://0.0.0.0:9999/rock or curl -I http://0.0.0.0:9999/metal. Output should look something like the following.

air:~ gmalamid$ curl -I http://0.0.0.0:9999/rock
HTTP/1.1 200 OK
Connection: close
Date: Sat, 08 Nov 2008 00:51:23 GMT
Cache-Control: max-age=3600, must-revalidate
Content-Type: text/plain
Content-Length: 5

air:~ gmalamid$ curl -I http://0.0.0.0:9999/metal
HTTP/1.1 200 OK
Connection: close
Date: Sat, 08 Nov 2008 00:51:16 GMT
Content-Type: text/plain
Expires: Sat, 08 Nov 2008 16:00:00 GMT
Content-Length: 5

Understanding and employing HTTP cache configuration not only enables harnessing the power of tools like Varnish or Squid, it also makes good citizens in a diverse ecosystem of HTTP aware browsers and caches outside an application’s knowledge or control.

HTTP accelerator cache purging

Sunday, November 2nd, 2008

The use of an HTTP accelerator such as Varnish or Squid in reverse proxy/accelerator mode can drastically improve a web application’s content delivery capabilities. Successfully implementing caching comes with numerous challenges but the fundamental goal is straightforward: A stack’s dynamic content generating layer should ideally not have to generate the same content more than once.

require "rubygems"
require "sinatra"

def guitars
  @@guitars ||= ['Les Paul', 'SG']
end

get "/guitars" do
  guitars * ', '
end

This application exposes a /guitars resource, a request for which will always hit the application server if no caching has been in place. This can prove suboptimal had this been a high traffic website, especially if the operation of generating the content is system resource intensive. Luckily this problem has been solved before. A running instance of Varnish, for example, will only require the following configuration to enable caching of all resources the application serves.

backend default {
  .host = "127.0.0.1";
  .port = "4567";
}

One of the challenges associated with caching has to do with the cached content’s freshness. We want to relieve server stress as much as possible, but we also need our application’s consumers to receive correct data at all times. Let’s assume that the application contacts guitar manufacturers’ websites once a day to refresh its inventory and we have scheduled this operation to complete at 16:00 every day. This suggests that the cached resource should be refreshed every day at four o’clock in the afternoon to reflect the latest list of available guitar models. One of the ways of achieving this in HTTP is by making use of the Expires header, whose semantics are understood by (hopefully) any caching aware HTTP component.

require "time"

get "/guitars" do
  headers "Expires" => Time.parse("16:00").httpdate
  guitars * ', '
end

Things aren’t always as straightforward. In many cases we cannot fully control the exact time or frequency a resource’s content changes. The example application also comes with an admin interface, allowing the guitar list administrators to manually enter new guitar models.

post "/guitars" do
  guitars << params["guitar"]
  redirect("/guitars")
end

It is clear that a means for arbitrary expiration of cached content needs to be available in order to maintain content freshness. With Varnish, this capability comes in two flavors, one of which involves the use of a PURGE HTTP call. The following configuration enables this functionality.

acl purge {
  "localhost";
}

sub vcl_recv {
  if (req.request == "PURGE") {
    if (!client.ip ~ purge) {
      error 405 "Not allowed.";
    }
    lookup;
  }
}

sub vcl_hit {
  if (req.request == "PURGE") {
    set obj.ttl = 0s;
    error 200 "Purged.";
  }
}

sub vcl_miss {
  if (req.request == "PURGE") {
    error 404 "Not in cache.";
  }
}

To natively make use of this in Ruby, we need to extend the Net::HTTP library to support the PURGE method.

require "net/http"
require "uri"

module Net
  class HTTP
    class Purge < HTTPRequest
      METHOD = "PURGE"
      REQUEST_HAS_BODY = false
      RESPONSE_HAS_BODY = false
    end

    def purge(path, initheader=nil)
      request(Purge.new(path, initheader))
    end
  end
end

def purge_cache(u)
  uri = URI.parse(u)
  query = "?#{uri.query}" if uri.query
  Net::HTTP.new(uri.host, uri.port).start {|h| h.purge("#{uri.path}#{query}")}
end

Now we can expire the cached /guitars resource every time the list is amended.

post "/guitars" do
  guitars << params["guitar"]
  purge_cache("http://localhost/guitars")
  redirect("/guitars")
end

Although this method is effective, there can be cases where the bidirectional coupling between the application and caching layers might be undesirable. With the fundamental functional pieces in place, however, it is not hard to implement a more elaborate strategy such as the one described in Cache Channels in order to reduce the application layer’s knowledge of the caching infrastructure.