HTTP accelerator cache purging
The use of an HTTP accelerator such as Varnish or Squid in reverse proxy/accelerator mode can drastically improve a web application's content delivery capabilities. Successfully implementing caching comes with numerous challenges but the fundamental goal is straightforward: A stack's dynamic content generating layer should ideally not have to generate the same content more than once.
require "rubygems" require "sinatra" def guitars @@guitars ||= ['Les Paul', 'SG'] end get "/guitars" do guitars * ', ' end
This application exposes a
/guitars
resource, a request for which will always hit the application server if no caching has been in place. This can prove suboptimal had this been a high traffic website, especially if the operation of generating the content is system resource intensive. Luckily this problem has been solved before. A running instance of Varnish, for example, will only require the following configuration to enable caching of all resources the application serves.
backend default { .host = "127.0.0.1"; .port = "4567"; }
One of the challenges associated with caching has to do with the cached content's freshness. We want to relieve server stress as much as possible, but we also need our application's consumers to receive correct data at all times. Let's assume that the application contacts guitar manufacturers' websites once a day to refresh its inventory and we have scheduled this operation to complete at 16:00 every day. This suggests that the cached resource should be refreshed every day at four o'clock in the afternoon to reflect the latest list of available guitar models. One of the ways of achieving this in HTTP is by making use of the
Expires
header, whose semantics are understood by (hopefully) any caching aware HTTP component.
require "time" get "/guitars" do headers "Expires" => Time.parse("16:00").httpdate guitars * ', ' end
Things aren't always as straightforward. In many cases we cannot fully control the exact time or frequency a resource's content changes. The example application also comes with an admin interface, allowing the guitar list administrators to manually enter new guitar models.
post "/guitars" do guitars << params["guitar"] redirect("/guitars") end
It is clear that a means for arbitrary expiration of cached content needs to be available in order to maintain content freshness. With Varnish, this capability comes in two flavors, one of which involves the use of a
PURGE
HTTP call. The following configuration enables this functionality.
acl purge { "localhost"; } sub vcl_recv { if (req.request == "PURGE") { if (!client.ip ~ purge) { error 405 "Not allowed."; } lookup; } } sub vcl_hit { if (req.request == "PURGE") { set obj.ttl = 0s; error 200 "Purged."; } } sub vcl_miss { if (req.request == "PURGE") { error 404 "Not in cache."; } }
To natively make use of this in Ruby, we need to extend the
Net::HTTP
library to support the
PURGE
method.
require "net/http" require "uri" module Net class HTTP class Purge < HTTPRequest METHOD = "PURGE" REQUEST_HAS_BODY = false RESPONSE_HAS_BODY = false end def purge(path, initheader=nil) request(Purge.new(path, initheader)) end end end def purge_cache(u) uri = URI.parse(u) query = "?#{uri.query}" if uri.query Net::HTTP.new(uri.host, uri.port).start {|h| h.purge("#{uri.path}#{query}")} end
Now we can expire the cached
/guitars
resource every time the list is amended.
post "/guitars" do guitars << params["guitar"] purge_cache("http://localhost/guitars") redirect("/guitars") end
Although this method is effective, there can be cases where the bidirectional coupling between the application and caching layers might be undesirable. With the fundamental functional pieces in place, however, it is not hard to implement a more elaborate strategy such as the one described in Cache Channels in order to reduce the application layer's knowledge of the caching infrastructure.