Also on twitter ( twitter.com/nutrun )

Archive for the ‘Software’ Category

Live component rotation

Thursday, January 22nd, 2009

Many applications comprise of a number of components, the majority of which are shared by others in the system. Different parts of the system exercise their collaborators in a variety of ways, think of a website where data is periodically processed by jobs and stored in a database while presentation modules handle rendering the data in ways meaningful to end users. Shared resources can yield the unwanted side effect of performance degradation when a given component is being pushed too hard to perform part of its tasks, affecting each piece of the system that depends on it. In the shared database website example, the website might suffer low response times while potentially heavy on the database processing jobs are running.

One way of getting around this problem involves creating more than one instances of the shared resource, one of which is considered “live”, the one the system’s clients interact with, and perform expensive operations on a copy which will itself become live the moment these operations conclude. This solution does not apply to every situation but can be useful in scenarios where real time is not a concern. In the example website’s case, we can create a copy of the database on which we run the processing jobs. The front end components run off the “stale”, live database copy whose performance is not affected by the jobs. Once the jobs complete we can switch databases and repeat the live component rotation process as needed. Live component rotation also nicely lends itself to distribution, as component copies can exist on different physical hosts.

Virtualization and cloud computing make this method all the more interesting. Imagine hosting a database server on Amazon EC2 with its static data stored on an EBS volume. We can snapshot the EBS volume, fire up a new EC2 instance, attach the snapshot to it, run the job and rotate live database instances once the jobs are complete with most parts of the system never having to worry about the costly operations taking place.

Code on demand

Saturday, January 10th, 2009

Code-on-demand on the web is commonly encountered in the form of JavaScript or applets. As we examine the web as a platform for services spanning beyond the typical server/browser interaction, it’d be interesting to further explore the code-on-demand constraint from a service integration perspective.

One of the advantages of offering executable code alongside a service’s data is client simplification by code reuse. For example, we can distribute a library that’s specific to the data on offer, so interested clients can make use of that functionality and avoid having to re-implement it. Another advantage is distributing computational load, which would otherwise have to be handled by the server, to clients.

To put things into perspective, consider a simplistic web API call that lists guitar models. Much like a JavaScript include, the response to http://example.com/guitars contains a line which advertises a guitar model Ruby library available at /libguit.rb.

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01//EN">
<html>
  <head>
    <title>Guitars</title>
    <script type="text/ruby" charset="utf-8" src="/libguit.rb"></script>
  </head>
  <body>
    <ul id="guitars">
      <li>SG</li>
      <li>Les Paul</li>
      <li>Tele</li>
      <li>Strat</li>
    </ul>
  </body>
</html>

The libguit library has one method for iterating over an alphabetically sorted list of guitars.

module LibGuit
  class List
    def initialize(guitars)
      @guitars = guitars
    end

    def each_guitar_alphabetically(&block)
      @guitars.sort.each(&block)
    end
  end
end

Interested clients can load and use the library together with the retrieved data. Code-on-demand is an optional constraint, so clients that cannot interpret the code, Ruby in this case, or are not interested in using the library can safely ignore it without side effects.

require "rubygems"
require "hpricot"
require "open-uri"

doc = Hpricot(open("http://example.com/guitars"))

libguit_address = (doc / 'script[@type="text/ruby"]')[0][:src]
libguit_src = open("http://example.com#{libguit_address}").read
eval(libguit_src)

guitars = (doc / "#guitars li").map { |e| e.html }
LibGuit::List.new(guitars).each_guitar_alphabetically { |g| puts g }

This is a superficial example, but imagine a service which advertises an e-commerce website’s daily updated catalog of products. Instead of clients making queries like /products.xml?category=sports&sort=price, they could once a day download a zipped version of the day’s entire catalog and a library to manipulate its entries, relieving the service from any further requests and at the same time avoid maintenance costs, in case the data’s structure changes, as long as this is well abstracted by the on-demand library.

At this point many would voice well founded, security implication based objections. Although one could propose a security system reminiscent to that of applets, I would opt for a controlled environment where trust is granted, such as inter-department service offer/consumption inside the company. Also, in an Internet where many of us store our private email on Gmail or trust Amazon’s S3 with mission critical data, I wouldn’t have a problem dynamically loading code provided by, say, Amazon. It’s not very difficult to put basic safeguards in place to avoid catastrophic effects and, in any case, every option is viable as long as the benefits outweigh the costs.

Rack cache headers

Saturday, November 8th, 2008

Rack is an interface between web servers and Ruby web frameworks. The HTTP protocol, amongst other things, defines requirements on HTTP caches in terms of header fields that control cache behavior. The purpose of this article is to demonstrate a possible implementation of a piece of Rack Middleware which enables web application developers to configure a web application’s resource cache related headers in a non obtrusive, centralized manner.

Rack supports the notion of Middleware, pieces of code that sit between the HTTP request and response life cycle. Rack::Lint, for example, validates an application’s requests and responses according to the Rack specification.

Rack::Handler::Mongrel.run(
  Rack::Lint.new(app), :Host => "0.0.0.0", :Port => 9999
)

Similarly, if we were to implement a cache header producing layer on top of Rack we’d end up with a construct similar to the following.

Rack::Handler::Mongrel.run(
  Rack::Lint.new(
    Rack::CacheHeaders.new(app)
  ), :Host => "0.0.0.0", :Port => 9999
)

Here’s a possible way of configuring how an application provides HTTP caching headers based on URL path patterns.

Rack::CacheHeaders.configure do |cache|
  cache.max_age("/rock", 3600)
  cache.expires("/metal", "16:00")
end

Following is a potential implementation for the above.

module Rack
  class CacheHeaders
    def initialize(app)
      @app = app
    end

    def call(env)
      result = @app.call(env)
      header = Configuration[env['PATH_INFO']].to_header
      result[1][header.key] = header.value
      result
    end

    def self.configure(&block)
      yield Configuration
    end

    class Configuration
      def self.max_age(path, duration)
        paths[path] = MaxAge.new(duration)
      end

      def self.expires(path, date)
        paths[path] = Expires.new(date)
      end

      def self.[](key)
        paths[key]
      end

      def self.paths
        @paths ||= {}
      end
    end

    class MaxAge
      def initialize(duration)
        @duration = duration
      end

      def to_header
        Header.new("Cache-Control", "max-age=#{@duration}, must-revalidate")
      end
    end

    class Expires
      def initialize(date)
        @date = date
      end

      def to_header
        Header.new("Expires", Time.parse(@date).httpdate)
      end
    end

    class Header < Struct.new(:key, :value);end
  end
end

The code below is a minimal Rack based application.

require "rubygems"
require "rack"

app = proc {|env| [200, {"Content-Type" => "text/plain"}, "hello"]}

Rack::Handler::Mongrel.run(
  Rack::Lint.new(
    Rack::CacheHeaders.new(app)
  ), :Host => "0.0.0.0", :Port => 9999
)

In order to observe the caching related headers the application’s responses are decorated with we can use curl or something similar, i.e curl -I http://0.0.0.0:9999/rock or curl -I http://0.0.0.0:9999/metal. Output should look something like the following.

air:~ gmalamid$ curl -I http://0.0.0.0:9999/rock
HTTP/1.1 200 OK
Connection: close
Date: Sat, 08 Nov 2008 00:51:23 GMT
Cache-Control: max-age=3600, must-revalidate
Content-Type: text/plain
Content-Length: 5

air:~ gmalamid$ curl -I http://0.0.0.0:9999/metal
HTTP/1.1 200 OK
Connection: close
Date: Sat, 08 Nov 2008 00:51:16 GMT
Content-Type: text/plain
Expires: Sat, 08 Nov 2008 16:00:00 GMT
Content-Length: 5

Understanding and employing HTTP cache configuration not only enables harnessing the power of tools like Varnish or Squid, it also makes good citizens in a diverse ecosystem of HTTP aware browsers and caches outside an application’s knowledge or control.

HTTP accelerator cache purging

Sunday, November 2nd, 2008

The use of an HTTP accelerator such as Varnish or Squid in reverse proxy/accelerator mode can drastically improve a web application’s content delivery capabilities. Successfully implementing caching comes with numerous challenges but the fundamental goal is straightforward: A stack’s dynamic content generating layer should ideally not have to generate the same content more than once.

require "rubygems"
require "sinatra"

def guitars
  @@guitars ||= ['Les Paul', 'SG']
end

get "/guitars" do
  guitars * ', '
end

This application exposes a /guitars resource, a request for which will always hit the application server if no caching has been in place. This can prove suboptimal had this been a high traffic website, especially if the operation of generating the content is system resource intensive. Luckily this problem has been solved before. A running instance of Varnish, for example, will only require the following configuration to enable caching of all resources the application serves.

backend default {
  .host = "127.0.0.1";
  .port = "4567";
}

One of the challenges associated with caching has to do with the cached content’s freshness. We want to relieve server stress as much as possible, but we also need our application’s consumers to receive correct data at all times. Let’s assume that the application contacts guitar manufacturers’ websites once a day to refresh its inventory and we have scheduled this operation to complete at 16:00 every day. This suggests that the cached resource should be refreshed every day at four o’clock in the afternoon to reflect the latest list of available guitar models. One of the ways of achieving this in HTTP is by making use of the Expires header, whose semantics are understood by (hopefully) any caching aware HTTP component.

require "time"

get "/guitars" do
  headers "Expires" => Time.parse("16:00").httpdate
  guitars * ', '
end

Things aren’t always as straightforward. In many cases we cannot fully control the exact time or frequency a resource’s content changes. The example application also comes with an admin interface, allowing the guitar list administrators to manually enter new guitar models.

post "/guitars" do
  guitars << params["guitar"]
  redirect("/guitars")
end

It is clear that a means for arbitrary expiration of cached content needs to be available in order to maintain content freshness. With Varnish, this capability comes in two flavors, one of which involves the use of a PURGE HTTP call. The following configuration enables this functionality.

acl purge {
  "localhost";
}

sub vcl_recv {
  if (req.request == "PURGE") {
    if (!client.ip ~ purge) {
      error 405 "Not allowed.";
    }
    lookup;
  }
}

sub vcl_hit {
  if (req.request == "PURGE") {
    set obj.ttl = 0s;
    error 200 "Purged.";
  }
}

sub vcl_miss {
  if (req.request == "PURGE") {
    error 404 "Not in cache.";
  }
}

To natively make use of this in Ruby, we need to extend the Net::HTTP library to support the PURGE method.

require "net/http"
require "uri"

module Net
  class HTTP
    class Purge < HTTPRequest
      METHOD = "PURGE"
      REQUEST_HAS_BODY = false
      RESPONSE_HAS_BODY = false
    end

    def purge(path, initheader=nil)
      request(Purge.new(path, initheader))
    end
  end
end

def purge_cache(u)
  uri = URI.parse(u)
  query = "?#{uri.query}" if uri.query
  Net::HTTP.new(uri.host, uri.port).start {|h| h.purge("#{uri.path}#{query}")}
end

Now we can expire the cached /guitars resource every time the list is amended.

post "/guitars" do
  guitars << params["guitar"]
  purge_cache("http://localhost/guitars")
  redirect("/guitars")
end

Although this method is effective, there can be cases where the bidirectional coupling between the application and caching layers might be undesirable. With the fundamental functional pieces in place, however, it is not hard to implement a more elaborate strategy such as the one described in Cache Channels in order to reduce the application layer’s knowledge of the caching infrastructure.

Parallelize by process

Sunday, October 26th, 2008

Performing computations in parallel is a popular technique for improving application performance and can be achieved in a number of ways, most commonly by employing threads or by splitting workload in a number of concurrent processes.

Memory usage is often a headache with large dataset computations. While memory optimization is something to be sought after, tracking down memory leaks can become tedious and time consuming. We can decrease the chances of a heavy job running a system’s memory dry by coming up with a strategy for fragmenting the job into a number of shorter running processes. By doing so, any memory used by a worker process will be released the moment the process completes. Additionally, we can run job fragments in parallel, allow ourselves to harness the operating system’s multi-core capabilities and potentially distribute worker processes over a number of physical hosts and scale out when the need arises. Smaller processes also dictate more manageable chunks of code which are easier to maintain, optimize and test.

Let’s look at an example where a job involves fetching a large number of categorized products from various sources and processes them for use by our own application.

class Job
  def perform
    ADDRESSES.each do |address|
      category = load_category(address)
      category.products.each { |product| process(product) }
    end
  end

  def process(product)
    #some intensive computation
  end

  def load_category(address)
    #load an addressable category dataset
  end
end

Let’s assume that the ADDRESSES constant in the example is a list consisting of entries such as example.com/toys, example.com/phones, example.org/guitars, etc. The job fetches the addressable by category product datasets, iterates over the products and performs a long processing operation on each. Supposing that after every possible optimization the job takes three hours to complete, we can at best run the job eight times a day. What happens if the product categories are updated more often than eight times a day and a requirement in order for our application to be successful suggests that it needs to deal with fresh data all the time?

One natural split can involve creating a worker process for each address entry. We can do so by extracting the majority of the code from the Job class into a Worker class meant to run as a standalone process.

class Worker
  def self.process_category(address)
    category = load_category(address)
    category.products.each { |product| process(product) }
  end

  def self.process(product)
    #some intensive computation
  end

  def self.load_category(address)
    #load an addressable category dataset
  end
end

Worker.process_category(ARGV[0]) if ARGV.size == 1

Each worker will operate on a significantly smaller dataset and will complete much faster than the initial long running job. Any memory used by each worker will be immediately released the moment the process finishes execution.

After the latest change, Job can take on the role of instrumenting the worker processes. We start by only allowing an arbitrary maximum number of concurrent workers, three in this case.

require "thread"

class Job
  def initialize
    @worker_count, @mutex = 3, Mutex.new
  end

  def perform
    ADRESSESES.each do |address|
      sleep 0.1 until @worker_count > 0
      @worker_count -= 1
      Thread.new do
        system("ruby worker.rb #{address}")
        @mutex.synchronize {@worker_count += 1}
      end
    end
  end
end

At this point it is a good idea to run the job and monitor the time it takes for it to complete while also measuring system resource usage. This way we can determine the optimal number of concurrent worker processes based on the system’s specs. Once available resources have been exhausted and both Job and Worker have been sufficiently optimized, we can start thinking about running workers on separate physical nodes.

Anarchic versus controlled scalability

Saturday, October 4th, 2008

With the number of websites at the time of this writing in the region of one hundred and sixty million and more than a trillion webpages, the Web is the largest network infrastructure to date. Figures like this are nothing short of enviable and so the web’s architecture has been increasingly influencing software authors’ design decisions to the extend of emergent trends that place this approach in habitats where it hasn’t traditionally been commonplace, such as that of “enterprise” middleware.

The Web’s possibly most notable triumph is offering its citizens the ability to exist and adapt in a context that is difficult to control or predict. The design has achieved its monumental scalability by following the set of constraints which compose the REST architectural style. Alongside other objectives, these constraints were put together in order for systems to effectively satisfy a need for anarchic scalability but – and this is something we must not forget – the benefits of these constraints come with associated trade-offs.

Architectural decisions should involve weighing the costs and benefits they introduce to the specific topic they attempt to address. There is no universal solution to every design problem and, while REST has proven successful in achieving anarchic scalability, not all systems exist in wild, disorderly environments. Introducing REST constraints in a system that doesn’t need to be as loosely controlled as the web can incur unnecessary overhead.

Section 5.1.3 Stateless from Roy Fielding’s seminal Architectural Styles and the Design of Network-based Software Architectures paper is a good example. Particular interest for this discussion lies in the second paragraph:

Like most architectural choices, the stateless constraint reflects a design trade-off. The disadvantage is that it may decrease network performance by increasing the repetitive data (per-interaction overhead) sent in a series of requests, since that data cannot be left on the server in a shared context.

Let’s consider an imaginary example, an auction service which publishes price updates and accepts bids on auctioned items. As a given – this is a private auction – 3000 consumers will interact with the service, each of those subscribing to price updates and placing bids whenever they see fit. These consumers must be authorized to interact with the service.

If we were to carry out the above over HTTP, a potential implementation would involve the service publishing an item’s current price as a feed, with the consumers subscribing to it and polling for updates. The service enforces a polling frequency of 10 seconds per consumer. For one item, this will result in 6 * 60 * 24 * 3000 = 25,920,000 requests/day. Consumers also need to be authorized to access the resource, so, respecting the statelessness constraint, 25,920,000 handshakes/day will take place. If we assume that an item receives 20,000 bids a day, the system becomes subject to 25,900,000 unnecessary requests and handshakes.

The 20,000 bids/day assumption suggests an average bid frequency of 86400/20000 = 4.32 seconds. The 10 second interval polling frequency is suboptimal when it comes to consumers being able to act on price updates in near real time.

We can optimize by making the consumers friendlier by respecting ETag, Last-Modified, conditional GET and partial GET instructions as proposed by the service. These manage to reduce some unnecessary network usage, but do not reduce the number of requests, nor do they decrease the number of handshakes. Caching and reverse proxies are also commonly employed for relieving server stress, although, due to the close to real time requirement of this scenario, configuring those effectively can be tricky.

In contrast, if we were to implement the example on top of an event driven, stateful transport such as XMPP, the service could publish updates on PubSub nodes, consumers would subscribe to those and receive updates as they happen. By doing so, we’re looking at 20,000 messages, equal to the number of bids and 3,000 handshakes, equal to the number of connections, equal to the number of consumers. The number of unnecessary requests/handshakes is reduced to zero.

The latter does not make a good candidate for an environment where the number of consumers interacting with the service is outside our control. With each consumer maintaining an open connection, the service never gets the opportunity to release system resources and there is a finite number of persistent connections a physical infrastructure can accommodate.

Adopting established, widely understood open standards introduces a plethora of benefits. HTTP, BitTorrent, XMPP, SMTP, FTP all have contributed to internet scale success stories and all come with associated merits and trade-offs. When faced with choice, we should examine the benefits and drawbacks of each, relative to the characteristics of the environment the system exists in. More interestingly, we should investigate combining available options so that one complements the others’ strengths while countering potential sacrifices.

Efficient data imports

Tuesday, September 23rd, 2008

An application’s performance is affected, among other things, by the performance of its parts. A large number of current applications contain a database layer which I’ve noticed become neglected more often than it deserves. This is unfortunate because there are a lot of quick performance victories that can be achieved by harnessing a database’s strong points.

Let’s think of an application which periodically collects large amounts of data, adapts it from a foreign structure into its native domain and stores the results in a database for further use. Data units must be unique, something we need to enforce each time a new import takes place.

One way of achieving this would be to construct domain native objects or structures by parsing the external data feeds and check against the existence of duplicates in the database, using a custom hashcode identity mechanism. We can store the hashcode values in a UNIQUE database column to ensure data integrity.

DATA.each {|e| DB[:entries] << e rescue nil}

This code iterates over the adapted object enumeration and attempts a database insert for each entry, ignoring any exceptions due to uniqueness violations. It also introduces the significant overhead of performing a number of database queries equal to the number of entries included in the imported collection.

Bulk inserts are nothing new and most, if not all, modern databases offer this functionality, which is also supported by the majority of database access application libraries. Ruby’s Sequel, for instance, allows bulk insert operations with the multi_insert method.

DB[:entries].multi_insert(DATA)

There’s a caveat here, as this operation will terminate the moment a duplicate entry violation error occurs. MySQL offers the INSERT IGNORE construct which is particularly useful in this scenario. Using the IGNORE keyword will cause errors that occur while executing the INSERT statement to be treated as warnings.

Looking to investigate the performance boost associated with the above technique, I’ve put together a small extension for Sequel, enabling the toolkit to make use of INSERT IGNORE.

module InsertIgnore
  def ignore_duplicates!
    @ignore = true
    self
  end

  def multi_insert_sql(columns, values)
    columns = column_list(columns)
    values = values.map {|r| literal(Array(r))}.join(Sequel::MySQL::Dataset::COMMA_SEPARATOR)
    ignore = @ignore ? " IGNORE " : ' '
    ["INSERT#{ignore}INTO #{source_list(@opts[:from])} (#{columns}) VALUES #{values}"]
  end
end

This can be used like this:

Sequel::MySQL::Dataset.send(:include, InsertIgnore)
DB[:entries].ignore_duplicates!.multi_insert(DATA)

Inserting 100,000 records, some of them duplicates, using the application loop approach which issues an insert query for each entry took about 49 seconds on my laptop. Its INSERT IGNORE counterpart took about 4 seconds.

There are things to watch out for when using the latter approach. We can potentially construct very large queries, depending on the number of records we intend to insert. MySQL sets the maximum length of packets with the max_allowed_packet system variable which defaults to 1 kilobyte and can be increased up to 1 gigabyte. Loading such large datasets in memory can prove problematic, so slicing the import in chunks is probably a good idea.

In like manner, it’s worth mentioning MySQL’s ON DUPLICATE KEY UPDATE, which updates an existing column subsequent to a failed insert due to a duplicate value violation.

EventMachine MapReduce

Tuesday, September 9th, 2008

MapReduce is a parallel computation strategy useful for scaling large data set processing by distributing workload over multiple worker nodes. The distributed nature of MapReduce suggests network communication and, with that in mind, I thought I’d put together a demonstration employing EventMachine, a library which makes efficient network programming relatively simple in Ruby.

Before going any further, I should mention that the code examples have not been optimized for production use, they only illustrate what’s possible. Also, it’s worth bringing up two established Ruby libraries for tackling similar problems, Starfish and Skynet. It’s advisable that these existing options are investigated before delving into custom alternatives.

MapReduce essentially consists of two steps (although intermediate phases usually need be present for real world implementations), map and reduce. map refers to the higher order function also known as transform or collect and is the operation that is typically distributed and involves a number of nodes performing the transformation of a data set into another set of data. reduce refers to the higher order function, sometimes called fold, inject or other, which is in this case used for collecting the results of map to build a return value.

Counting the number of word occurrences in a large number of documents is one of the examples most commonly used for describing MapReduce. A number of distributed jobs is spawned, splitting document contents into words. The results of these operations are passed to a reduce process whose job is to sum its input.

Map processes can be EventMachine servers. We can have an arbitrary number of those running on a number of physical nodes.

module Map
  def receive_data(path)
    document = File.read(path)
    word_counts = document.split(' ').map { |word| [word, 1] }
    send_data(Marshal.dump(word_counts))
    close_connection_after_writing
  end
end

EM.run {EM.start_server("localhost", 5555, Map)}

A reduce process can send job requests to those servers, receive and process the results.

class Reduce < EM::Connection
  @@all = []

  def initialize(*args)
    super
    @doc, @data = args[0], ''
  end

  def post_init
    send_data(@doc)
  end

  def receive_data(data)
    @data << data
  end

  def unbind
    Reduce.job_completed
    @@all += Marshal.load(@data)
    unless Reduce.pending_jobs?
      groups = @@all.group_by {|word| word[0] }
      groups.each { |g| p "#{g[0]} : #{g[1].size}" }
      EM.stop
    end
  end

  def self.send_map_job(port, doc)
    @job_count ||= 0
    increment_job_count
    EM.connect("localhost", port, Reduce, doc)
  end

  def self.increment_job_count
    @job_count += 1
  end

  def self.pending_jobs?
    @job_count != 0
  end

  def self.job_completed
    @job_count -= 1
  end
end

EM.run do
  {
    5555 => 'docs/america.txt',
    6666 => 'docs/da-vinci.txt'
  }.each { |port, doc| Reduce.send_map_job(port, doc) }
end

The example lacks plumbing code which would make things flexible enough and, as you might have noticed, works on a single node (localhost), but hopefully illustrates a mechanism for distributing workload over a networked farm.

Phusion Passenger on Amazon EC2

Wednesday, August 20th, 2008

Phusion Passenger has come a long way since its first public release, significantly simplifying the deployment of Ruby web applications on Apache servers, especially since the addition of support for Rack.

You can use this example Capile if you’d like to get started quickly with trying out Passenger deployments on Amazon EC2.

It is assumed that your environment has been previously configured for launching EC2 AMIs. If not, you might want to read the EC2 Getting Started Guide, or refer to the first bits of this article.

By completing the following steps, we will end up with a running Debian AMI, with Ruby 1.8.7, Rubygems 1.2.0, Apache2 and Passenger installed.

First, find the section about AWS credentials in the Capfile and replace the values with yours. These are :keypair, :account_id, :access_key_id, :secret_access_key, :pk and :cert. Once this is done, invoke:

cap instance:start

Copy the instance id from the output of this command and use it as the value for the :instance_id field in the Capfile. Call ec2-describe-instances until the AMI has been started. Use the instance URL that comes for the :instance_url field in the Capfile. Next invoke:

cap instance:bootstrap

This will install Apache2 and Passenger on the instance. Once this step is complete, you can navigate to the instance URL from a browser and see the default page served by the newly installed, Passenger enabled Apache. At this point – optionally and for demonstration purposes – you can invoke:

cap instance:example_app

This will install the Merb gems, create a flat Merb application in the instance’s /var/www/example directory, set it up for use with Passenger (create public, log and tmp directories and add a config.ru Rack configuration file as required by Passenger) and setup an Apache virtual host in order for Passenger to serve the application. Once this step is complete, navigate to the instance’s URL and you should see a page served by Merb.

There’s another couple of convenient commands in the Capfile, cap instance:ssh and cap instance:stop.

Rails Summit Latin America

Wednesday, August 13th, 2008

Rails Summit Latin America

Danilo and I will be talking about REST (or maybe not…) at the Rails Summit Latin America, October 15, 2008. Many thanks to everyone who’s given me the opportunity to participate.