Rails Summit Latin America
Wednesday, August 13th, 2008Danilo and I will be talking about REST (or maybe not…) at the Rails Summit Latin America, October 15, 2008. Many thanks to everyone who’s given me the opportunity to participate.
Danilo and I will be talking about REST (or maybe not…) at the Rails Summit Latin America, October 15, 2008. Many thanks to everyone who’s given me the opportunity to participate.
Is the free lunch really over? It surely is a question that troubles many software developers. Constrained by the laws of physics, processor manufacturing has definitely changed its rules of play the last few years. It steadily and increasingly becomes near impossible to extract more juice out of a CPUs single core.
Since I started thinking about the implications of the multi-core evolution, I always kept an open eye for situations where taking advantage of multi-core CPUs would profit my work. It is almost certain that the reason has to do with my work being primarily around server side applications, but I’m still to come against many situations where adopting a multi-core influenced approach would have provided additional benefit which could have been achieved by exclusively following this paradigm.
The problem is undeniably evident if we approach it from the side of computational units restricted by the laws of physics. It seems like we will always have a healthy appetite for increased performance, and given we can’t get much more out of one core, we must start thinking and programming in a multi-core context. We could do with OpenOffice being more feature rich and faster, thus our desktop needs to be more potent.
At the same time, the web, and networking in general, is increasingly influencing the way we think about, use and create software. Considering the OpenOffice example, there is already a myriad of applications moving similar functionality over to the web. Networking brings distributed solutions to the table, which, alongside other applications, are widely employed for improving software performance.
Next to physics, the software world is governed by the laws economics. The creation of software must result in some form of social, or financial, or other profit, part of which is achieved by minimizing associated costs. It is almost certain that vendors will claim that a data center of quad-core equipped slices is the next answer to our software woes, but it pays to remember that a cloud of commodity hardware might, in some situations, improve rate of return. The lunch was never free, but today, just like 10 years ago, it’s really about how cheap the lunch is.
The need for concurrency remains an undeniable must, but whether its mainstream representation will be that of multi-core friendly programming or distributed over a network architectures remains to be seen.
Web frameworks like Merb or Rails provide convenient ways for caching output data to static files or other stores, used for improving a web application’s performance. Caching is typically handled inside controller classes. With merb-cache, for example, we can cache an entire page by doing something along the lines of:
class Foo < Merb::Controller cache_page :index end
Expiring cached data is handled with a number of instance methods available to controllers, such as expire_page(key) or expire_all_pages. This implies that cache expiration needs to be put explicitly in place inside actions.
The most common event signifying the need for cache expiration is the modification of the underlying data which has at some point been cached. More often than not, this means some sort of write (insert, update, delete) storage operation, which in turn means that cache expiration is closer to storage aware parts of the application rather than controllers. With this in mind, it would be useful to be able to configure cache expiration in a manner similar to that of cache creation, for example:
class Foo < Merb::Controller cache_page :index cache_watch :foo_store, :bar_store end
The cache_watch :foo_store, :bar_store line signifies that any cached artifacts associated with this controller need to be expired whenever a data altering operation takes place in the context of the FooStore or BarStore classes.
Approaching data altering operations as events presents a good case for employing the Observer pattern in order to enable cache expiration when such events take place. ActiveRecord, for instance, offers means for adding hooks to persistent objects’ life cycle methods in the form of Observers.
class FooObserver < ActiveRecord::Observer
def after_save(foo)
expire_cache
end
end
Putting it all together, we can create a module that enables configuring cache expiration declaratively inside controllers in a way reminiscent to how cache creation is handled.
module CacheInvalidator
def cache_watch(controller, *models)
models.each {|model| (@entries ||= Set.new) << Entry.new(controller, model)}
end
def activate!
@entries.each do |entry|
return nil if Kernel.const_defined?(entry.class_name)
entry.log
observer = Class.new(ActiveRecord::Observer) do
include CacheInvalidator
observe(entry.model)
define_method(:entry) {entry}
end
Kernel.const_set(entry.class_name, observer)
observer.instance
end
end
module_function :watch
module_function :activate!
def after_save(model)
destroy_cache
end
def after_destroy(model)
destroy_cache
end
private
def destroy_cache
FileUtils.rm_f(entry.file_path) if File.file?(entry.file_path)
FileUtils.rm_r(entry.dir_path) if File.directory?(entry.dir_path)
end
class Entry
attr_reader :controller, :model
def initialize(controller, model)
@controller, @model = controller, model
end
def class_name
(controller.name.gsub(/\:\:/, '') + model.to_s.camelize + "CacheObserver").intern
end
def ==(other)
controller == other.controller && self.model == other.model
end
def file_path
"#{dir_path}.xml"
end
def dir_path
"#{APP_ROOT}/public/cache/#{@controller.name.underscore}"
end
def log
logger.info "Cache-watching #{model.to_s.camelize} for #{controller}"
end
end
end
By including the CacheInvalidator module we can declare cache invalidation rules inside controllers.
class FooController < Merb::Controller include CacheInvalidator cache_page :index cache_watch :FooStore, :BarStore end
The cache can be activated where app initialization tasks are kept, such as init.rb in Merb.
Merb::BootLoader.after_app_loads do CacheInvalidator.activate! end
I have worked on a number of web applications which required searching catalogs of data based on filtering criteria. The most common implementation I see involves issuing a GET request to a search service, providing the search criteria as part of the request’s query string.
http://example.com/search?category=music&subcategory=rock&page=7
This approach does not easily lend itself to static resource caching, one of the most effective ways to improve a web app’s performance. Regardless of the level of optimization applied to application code, fine tuning of database queries, even the addition of something like memcached, a request reaching the application server is unlikely to be served more efficiently than if it was handled by a high performance HTTP server like Nginx.
By approaching search queries as RESTful HTTP resources uniquely identified by a URI as opposed to RPC based commands we should be able to cache the results the first time they are processed following a search request.
http://example.com/search_results/someuniqueidentifier
The unique identifier part of the URI can take the form of a hash which, when deserialized, will provide the application with the filter criteria for the search. This assumes that the client and server share a common protocol, one which defines how the hash for the URI is constructed. For example, it is a good idea that there is an expected order for the set of criteria. While searches for {category : music, subcategory : rock} and {subcategory : music, category : rock} will produce the same results, using both combinations will cause the resource to be cached twice under two separate URIs, resulting in a performance penalty.
A potential solution can involve Base64 encoding and decoding a string constructed using a predefined format and comprising of the filter criteria.
CGI.unescape(identifier).unpack('m')[0] # => "music,rock,,,,7,30"
This method will not be useful for plain HTML fronted websites. It requires a potent enough client with the ability to dynamically construct URIs based on filter criteria. JavaScript, ActionScript or generic web service consumer applications are all good candidates.
ActiveResource can be a useful tool for abstracting away low level HTTP or data marshaling details when testing web services with an XML schema and URI patterns which respect the Rails protocol for REST.
Here’s a possible implementation for use in tests that exercise a service from the outside, a sort of black box web service testing approach, if you’d like.
def resource(name)
class_name = name.to_s.camelize
return class_name.constantize if Object.const_defined?(class_name.intern)
rsrc = Class.new(ActiveResource::Base) do
self.site = "http://localhost:4001/api"
self.element_name = name.to_s
end
Object.const_set(class_name.intern, rsrc)
end
Let’s imagine an API call to http://localhost:4001/api/categories.xml which returns a list of product categories with their respective subcategories. Following is a potential response to a GET request to the afore mentioned URI.
<?xml version="1.0" encoding="UTF-8"?>
<categories type="array">
<category>
<id type="integer">3</id>
<name>Music</name>
<subcategories type="array">
<subcategory type="Category">
<id type="integer">4</id>
<name>Rock</name>
</subcategory>
<subcategory type="Category">
<id type="integer">5</id>
<name>Metal</name>
</subcategory>
</subcategories>
</category>
</categories>
Invoking resource :category in the test will provide a Category class. Category is an ActiveResource child which can be used to exercise the /categories end point of the API.
class ApiTest < Test::Unit::TestCase
resource :category
def test_categories
categories = Category.find(:all)
assert_equal(1, categories.size)
assert_equal("Music", categories.first.name)
end
def test_subcategories
subcategories = Category.find(:all).first.subcategories
assert_equal(2, subcategories.size)
assert_equal("Metal", subcategories[1].name)
end
def test_category_creation
Category.create(:name => "Hacking")
assert_equal(3, Category.find(:all).size)
end
end
A large portion of the internet is governed by HTTP and the World Wide Web in particular is designed based on the REST architectural style. It makes sense to design web applications or web based services in a way that respects and harnesses the web’s underlying architecture.
When it comes to developing web applications, Model-View-Controller (MVC) is one of the dominant architectural patterns current web frameworks are based on. MVC is not restricted to building web apps, on the contrary, its history can be traced back to 1979 and Smalltalk and has been originally applied to the development of applications which involved user interfaces.
The majority of Ruby web frameworks, especially the ones inspired by Rails, employ MVC and offer some sort of support for REST style application development, typically by defining resources which can be accessed through a URI and manipulated by making use of standard HTTP methods such as GET, PUT, POST, DELETE.
The above unveils an obvious similarity between the way HTTP resources can be manipulated – the four verbs can fundamentally constitute CRUD operations – and another common tier in web applications nowadays, databases.
Controllers in Merb, Rails or other similar Ruby, or not, web frameworks are a busy abstraction. A controller typically needs to dispatch to relevant actions, consolidate HTTP payloads, deal with sessions, sometimes caching, etc. These controllers are usually REST aware, meaning that they will by default map routed URI HTTP operations to a standard set of actions, namely index, show, create, edit, update, destroy.
If we focus on our application exposing strictly REST resource based interfaces, and assume that these resources directly map to the application’s database schema, we can relieve controllers from some of the associated strain by abstracting away the discussed common functionality.
module CrudTemplate
def resource
raise "You must define a resource"
end
def index
instance_variable_set(resource_sym_plural, resource.find(:all))
render
end
def show
assign_resource(resource.find(params[:id]))
render
end
alias edit show
alias delete show
def new
assign_resource(resource.new(resource_attrs))
render
end
def create
r = resource.new(resource_attrs)
assign_resource(r)
if r.save
on_create_success(r)
else
on_create_failure(r)
end
end
def on_create_success(r)
redirect(resource_sym)
end
alias on_update_success on_create_success
def on_create_failure(r)
assign_resource(r)
render(:new, :status => 400)
end
def update
r = resource.find(params[:id])
if r.update_attributes(resource_attrs)
on_update_success(r)
else
on_update_failure(r)
end
end
def on_update_failure(r)
assign_resource(r)
render(:edit)
end
def destroy
if resource.destroy(params[:id])
on_destroy_success(r)
else
on_destroy_failure(r)
end
redirect(resource_sym)
end
def self.included(controller)
controller.show_action(*shown_actions)
end
protected
def resource_attrs
{}
end
def self.shown_actions
[:index, :show, :create, :new, :edit, :update]
end
private
def assign_resource(r)
instance_variable_set(resource_sym, r)
end
def resource_sym
@resource_sym ||= :"@#{resource.name.underscore.split("/").last}"
end
def resource_sym_plural
@resource_sym_plural ||= :"@#{resource.name.underscore.split("/").last.pluralize}"
end
end
By doing so, we can write controllers that look something like the following.
class Reservations < Application
include CrudTemplate
def resource
Reservation
end
def on_create_success
flash[:notice] = "Thank you"
redirect("/")
end
protected
def self.shown_actions
[:new, :create]
end
def resource_attrs
params[:reservation].merge(session[:member])
end
end
Things are usually more complicated. The above model falls short for the majority of web applications I’ve worked on. Resources rarely are direct matches to database tables and there is usually good reason for them not to be. Applications involve complex business logic, spanning further from what a set CRUD operations is appropriate for. One might argue that business logic can be incorporated into Models (as in ORM classes), but I generally prefer to avoid keeping business logic near the persistence layer and opt for a database agnostic, rich domain tier.
This however doesn’t imply that controllers shouldn’t think in terms of resources. Controllers are close to the web, and the web works well with resources. It suffices for domain layer endpoints that intend to communicate with a controller to expose an interface the controller understands. If we define that interface so that it matches its database specific counterpart, we can achieve the best of both worlds.
Controllers can transparently operate on plain ruby components which include an AbstractResource module (interface) and choose to implement any of its methods, or directly on ORM models, such as ActiveRecord classes, where appropriate.
module AbstractResource
attr_reader :params
def initialize(params = {})
@params = params
end
def save
raise "Implement me"
end
def update_attributes(attrs = {})
raise "Implement me"
end
def valid?
raise "Implement me"
end
def errors
raise "Implement me"
end
module ClassMethods
def delete(id)
raise "Implement me"
end
def find(id)
raise "Implement me"
end
end
def self.included(target)
target.extend(ClassMethods)
end
end
P.S. Credit due to Carlos Villela whose observations have been the core and inspiration behind the ideas in this article.
Stuart and I will be talking about Synthesis at this month’s North West Ruby User Group meet up in Manchester on Tuesday the 24th of June. Registration details and directions to the venue can be found on the event’s page at nwrug.org.
One of the features that attracted me to Merb was the ability to test controllers in an independent, lightweight manner. In essence, this involves instantiating a controller class, passing it a FakeRequest and calling methods (actions) on the controller object.
Let’s consider a controller which collaborates with a service.
class Foo < Merb::Controller
def bar
service = Service.new
session[:metal] = service.metal
@zz = service.rock
render
end
end
class Service
def rock
"zz top"
end
def metal
"metallica"
end
end
Testing the controller is as straightforward as creating an instance of Foo, setting it up, calling bar and interrogating it.
class FooTest < Test::Unit::TestCase
def setup
@foo = Foo.new(Merb::Test::RequestHelper::FakeRequest.new)
@foo.request.session = {}
@foo.bar
end
def test_puts_metallica_in_session
assert_equal("metallica", @foo.session[:metal])
end
def test_assigns_zz_top
assert_equal("zz top", @foo.assigns(:zz))
end
end
I’m not sure why the controller’s session variable has to be explicitly initialized, had it been present would make testing slightly cleaner.
DataMapper is fast becoming a credible contender in the Ruby ORM field. The first – and only at this early stage – thing that temporarily disappointed me was the following scenario.
class Foo include DataMapper::Resource property :id, Integer, :serial => true property :title, String end
Running this produces ArgumentError: Unknown adapter name: default, suggesting that a database connection needs to be setup in order to use any objects that include the DataMapper::Resource module. This is something I would rather not have to do for my dependency neutral test suite, in which all calls to ORM objects are simulated using mocks.
I soon realized that DataMapper doesn’t require a database connection to be present, but needs to know which adapter to use. If we’re not interested in interacting with the database, using DataMapper::Adapters::AbstractAdapter does the trick.
DataMapper.setup(:default, "abstract::") class Foo include DataMapper::Resource property :id, Integer, :serial => true property :title, String end Foo.new(:title => "metal").title # => "metal"
Synthesized testing is about accurately simulating object interactions and verifying that each end point of every interaction has been tested to work. The end result of a code base tested employing this strategy forms a specification of the application’s ecosystem in terms of object communication.
Danilo has been recently contributing some excellent work around visual representations of the above. The code is being developed on the Synthesis experimental branch on github.
Consider the Synthesis test_project example.
class DataBrander
BRAND = "METAL"
def initialize(storage)
@storage = storage
end
def save_branded(data)
@storage.save "#{BRAND} - #{data}"
end
def dont_do_this
@storage.ouch!
end
end
class Storage
def initialize(filename)
@filename = filename
end
def save(val)
File.open(@filename, 'w') {|f| f < val}
end
def ouch!
raise Problem
end
end
class Problem < Exception;end
Below are the complete specs for the above implementation.
describe DataBrander do
it "should save branded to storage" do
storage = Storage.new("")
storage.should_receive(:save).with("METAL - rock")
DataBrander.new(storage).save_branded("rock")
end
it "should delegate problem" do
storage = Storage.new("")
storage.should_receive(:ouch!).and_raise(Problem.new)
proc {DataBrander.new(storage).dont_do_this}.should raise_error(Problem)
end
end
describe Storage do
it "should save to file" do
begin
Storage.new("test.txt").save("rock")
File.read("test.txt").should == "rock"
ensure
FileUtils.rm_f("test.txt")
end
end
it "should raise problem on ouch!" do
proc { Storage.new("").ouch! }.should raise_error(Problem)
end
end
A Synthesis run using the DOT formatter produces:
Removing the "should save to file" spec will cause the Synthesis task to fail.
Below is how a real (relatively small) project looks like.
I find the ability to inspect our application modeling through such a representation a very appealing added benefit to the confidence in our system Synthesis provides us with. The DOT formatter will become part of the Synthesis gem as soon as we iron out the few remaining glitches.