Also on twitter ( twitter.com/nutrun )

Archive for July, 2008

Cache watch

Tuesday, July 29th, 2008

Web frameworks like Merb or Rails provide convenient ways for caching output data to static files or other stores, used for improving a web application’s performance. Caching is typically handled inside controller classes. With merb-cache, for example, we can cache an entire page by doing something along the lines of:

class Foo < Merb::Controller
  cache_page :index
end

Expiring cached data is handled with a number of instance methods available to controllers, such as expire_page(key) or expire_all_pages. This implies that cache expiration needs to be put explicitly in place inside actions.

The most common event signifying the need for cache expiration is the modification of the underlying data which has at some point been cached. More often than not, this means some sort of write (insert, update, delete) storage operation, which in turn means that cache expiration is closer to storage aware parts of the application rather than controllers. With this in mind, it would be useful to be able to configure cache expiration in a manner similar to that of cache creation, for example:

class Foo < Merb::Controller
  cache_page :index
  cache_watch :foo_store, :bar_store
end

The cache_watch :foo_store, :bar_store line signifies that any cached artifacts associated with this controller need to be expired whenever a data altering operation takes place in the context of the FooStore or BarStore classes.

Approaching data altering operations as events presents a good case for employing the Observer pattern in order to enable cache expiration when such events take place. ActiveRecord, for instance, offers means for adding hooks to persistent objects’ life cycle methods in the form of Observers.

class FooObserver < ActiveRecord::Observer
  def after_save(foo)
    expire_cache
  end
end

Putting it all together, we can create a module that enables configuring cache expiration declaratively inside controllers in a way reminiscent to how cache creation is handled.

module CacheInvalidator
  def cache_watch(controller, *models)
    models.each {|model| (@entries ||= Set.new) << Entry.new(controller, model)}
  end

  def activate!
    @entries.each do |entry|

      return nil if Kernel.const_defined?(entry.class_name)

      entry.log

      observer = Class.new(ActiveRecord::Observer) do
        include CacheInvalidator
        observe(entry.model)
        define_method(:entry) {entry}
      end

      Kernel.const_set(entry.class_name, observer)
      observer.instance
    end
  end

  module_function :watch
  module_function :activate!

  def after_save(model)
    destroy_cache
  end

  def after_destroy(model)
    destroy_cache
  end

  private

  def destroy_cache
    FileUtils.rm_f(entry.file_path) if File.file?(entry.file_path)
    FileUtils.rm_r(entry.dir_path) if File.directory?(entry.dir_path)
  end

  class Entry

    attr_reader :controller, :model

    def initialize(controller, model)
      @controller, @model = controller, model
    end

    def class_name
      (controller.name.gsub(/\:\:/, '') + model.to_s.camelize + "CacheObserver").intern
    end

    def ==(other)
      controller == other.controller && self.model == other.model
    end

    def file_path
      "#{dir_path}.xml"
    end

    def dir_path
      "#{APP_ROOT}/public/cache/#{@controller.name.underscore}"
    end

    def log
      logger.info "Cache-watching #{model.to_s.camelize} for #{controller}"
    end
  end
end

By including the CacheInvalidator module we can declare cache invalidation rules inside controllers.

class FooController < Merb::Controller
  include CacheInvalidator
  cache_page :index
  cache_watch :FooStore, :BarStore
end

The cache can be activated where app initialization tasks are kept, such as init.rb in Merb.

Merb::BootLoader.after_app_loads do
   CacheInvalidator.activate!
end

Cacheable HTTP search query results

Tuesday, July 15th, 2008

I have worked on a number of web applications which required searching catalogs of data based on filtering criteria. The most common implementation I see involves issuing a GET request to a search service, providing the search criteria as part of the request’s query string.


http://example.com/search?category=music&subcategory=rock&page=7

This approach does not easily lend itself to static resource caching, one of the most effective ways to improve a web app’s performance. Regardless of the level of optimization applied to application code, fine tuning of database queries, even the addition of something like memcached, a request reaching the application server is unlikely to be served more efficiently than if it was handled by a high performance HTTP server like Nginx.

By approaching search queries as RESTful HTTP resources uniquely identified by a URI as opposed to RPC based commands we should be able to cache the results the first time they are processed following a search request.


http://example.com/search_results/someuniqueidentifier

The unique identifier part of the URI can take the form of a hash which, when deserialized, will provide the application with the filter criteria for the search. This assumes that the client and server share a common protocol, one which defines how the hash for the URI is constructed. For example, it is a good idea that there is an expected order for the set of criteria. While searches for {category : music, subcategory : rock} and {subcategory : music, category : rock} will produce the same results, using both combinations will cause the resource to be cached twice under two separate URIs, resulting in a performance penalty.

A potential solution can involve Base64 encoding and decoding a string constructed using a predefined format and comprising of the filter criteria.

CGI.unescape(identifier).unpack('m')[0] # => "music,rock,,,,7,30"

This method will not be useful for plain HTML fronted websites. It requires a potent enough client with the ability to dynamically construct URIs based on filter criteria. JavaScript, ActionScript or generic web service consumer applications are all good candidates.

Testing web services with ActiveResource

Thursday, July 10th, 2008

ActiveResource can be a useful tool for abstracting away low level HTTP or data marshaling details when testing web services with an XML schema and URI patterns which respect the Rails protocol for REST.

Here’s a possible implementation for use in tests that exercise a service from the outside, a sort of black box web service testing approach, if you’d like.

def resource(name)
  class_name = name.to_s.camelize
  return class_name.constantize if Object.const_defined?(class_name.intern)
  rsrc = Class.new(ActiveResource::Base) do
    self.site = "http://localhost:4001/api"
    self.element_name = name.to_s
  end
  Object.const_set(class_name.intern, rsrc)
end

Let’s imagine an API call to http://localhost:4001/api/categories.xml which returns a list of product categories with their respective subcategories. Following is a potential response to a GET request to the afore mentioned URI.

<?xml version="1.0" encoding="UTF-8"?>
<categories type="array">
  <category>
    <id type="integer">3</id>
    <name>Music</name>
    <subcategories type="array">
      <subcategory type="Category">
        <id type="integer">4</id>
        <name>Rock</name>
      </subcategory>
      <subcategory type="Category">
        <id type="integer">5</id>
        <name>Metal</name>
      </subcategory>
    </subcategories>
  </category>
</categories>

Invoking resource :category in the test will provide a Category class. Category is an ActiveResource child which can be used to exercise the /categories end point of the API.

class ApiTest < Test::Unit::TestCase
  resource :category

  def test_categories
    categories = Category.find(:all)
    assert_equal(1, categories.size)
    assert_equal("Music", categories.first.name)
  end

  def test_subcategories
    subcategories = Category.find(:all).first.subcategories
    assert_equal(2, subcategories.size)
    assert_equal("Metal", subcategories[1].name)
  end

  def test_category_creation
    Category.create(:name => "Hacking")
    assert_equal(3, Category.find(:all).size)
  end
end