Jul 15 2008

Cacheable HTTP search query results

I have worked on a number of web applications which required searching catalogs of data based on filtering criteria. The most common implementation I see involves issuing a GET request to a search service, providing the search criteria as part of the request's query string.

http://example.com/search?category=music&subcategory=rock&page=7

This approach does not easily lend itself to static resource caching, one of the most effective ways to improve a web app's performance. Regardless of the level of optimization applied to application code, fine tuning of database queries, even the addition of something like memcached, a request reaching the application server is unlikely to be served more efficiently than if it was handled by a high performance HTTP server like Nginx.

By approaching search queries as RESTful HTTP resources uniquely identified by a URI as opposed to RPC based commands we should be able to cache the results the first time they are processed following a search request.

http://example.com/search_results/someuniqueidentifier

The unique identifier part of the URI can take the form of a hash which, when deserialized, will provide the application with the filter criteria for the search. This assumes that the client and server share a common protocol, one which defines how the hash for the URI is constructed. For example, it is a good idea that there is an expected order for the set of criteria. While searches for {category : music, subcategory : rock} and {subcategory : music, category : rock} will produce the same results, using both combinations will cause the resource to be cached twice under two separate URIs, resulting in a performance penalty.

A potential solution can involve Base64 encoding and decoding a string constructed using a predefined format and comprising of the filter criteria.

CGI.unescape(identifier).unpack('m')[0] # => "music,rock,,,,7,30"

This method will not be useful for plain HTML fronted websites. It requires a potent enough client with the ability to dynamically construct URIs based on filter criteria. JavaScript, ActionScript or generic web service consumer applications are all good candidates.