Also on twitter ( twitter.com/nutrun )

Archive for June, 2007

JMS with JRuby and ActiveMQ

Saturday, June 30th, 2007

One of the most interesting (and powerful) features of JRuby is the ability to access and manipulate Java classes and libraries. The code below creates, configures and starts an instance of an ActiveMQ Broker that accepts connections based on the Stomp protocol.

[ruby]
require “java”
require “apache-activemq-4.1.1.jar”

include_class “org.apache.activemq.broker.BrokerService”

broker = BrokerService.new
broker.add_connector(“stomp://localhost:61613″)
broker.start
[/ruby]

Supposing this code has been saved in a file named broker.rb, and given a working JRuby installation, we can start the Broker by invoking jruby broker.rb.

The apache-activemq-4.1.1.jar archive must be in the same location as the broker.rb script, in order for it to be successfully loaded. Alternatively it can be placed in any of the locations described in the Require and Load behavior page on the JRuby Wiki.

Ruby supports Stomp via the Stomp library which can be installed as a gem (gem install stomp). Using Stomp we can create Listeners that can subscribe to Topics on our queue.

[ruby]
require “rubygems”
require “stomp”

conn = Stomp::Connection.open(“”, “”, “localhost”, 61613, false)
conn.subscribe(“/topic/testing”, {:ack => :auto})

loop do
p conn.receive.body
end
[/ruby]

Similarly, a Publisher would look something along the lines of:

[ruby]
require “rubygems”
require “stomp”

conn = Stomp::Connection.open(“”, “”, “localhost”, 61613, false)
conn.send(“/topic/testing”, “Rock!!!”, {:persistent => false})
[/ruby]

Given the Broker is running, we can start the Listener, which will subscribe to the specified topic and print out any incoming messages. This can be demonstrated by running the Publisher that will post a text message to /topic/testing.

Parsing Microformats: rel-tag, adr, hCard

Wednesday, June 27th, 2007

rel-tag

Out of all Microformats, rel-tag is one of the simplest, therefore one of the easiest to parse.

I find it useful to treat Microformat object representations as Structures without behavior, with a set of member attributes that map to their properties to capture their state. A Scanner class takes care of parsing a given piece of HTML to collect all rel-tag occurrences.

[ruby]
class RelTagTest < Test::Unit::TestCase
def test_rel_tag_extraction
html = %(


)

tags = RelTagScanner.find_all(html)

assert_equal(2, tags.size)
assert_equal(“tech”, tags[0].name)
assert_equal(“http://technorati.com/tag/tech”, tags[0].url)
assert_equal(“rock”, tags[1].name)
assert_equal(“http://technorati.com/tag/rock”, tags[1].url)
end
end
[/ruby]

The implementation for the above specification is compact and straightforward.

[ruby]
require “rubygems”
require “hpricot”

class RelTag < Struct.new(:url, :name);end

class RelTagScanner
def self.find_all(html)
(Hpricot(html)/"[@rel=tag]").map do |tag|
RelTag.new(tag[:href], tag.inner_text)
end
end
end
[/ruby]

The RelTag class is a Struct with two members, url and name. In RelTagScanner’s find_all method, we ask Hpricot to fetch all elements with a rel="tag" attribute and from those we extract the url and value to populate the RelTag objects.

adr

Even though adr’s schema specifies more properties, parsing it is more straightforward than rel-tag as all the adr fields are marked up with class="field" constructs.

[ruby]
class AdrTest < Test::Unit::TestCase
def test_adr_extraction
html = %(

665 3rd St.
Suite 207

San Francisco,
CA
94107

U.S.A.

)

adr = AdrScanner.find_all(html)[0]

assert_equal(“665 3rd St.”, adr.street_address)
assert_equal(“Suite 207″, adr.extended_address)
assert_equal(“San Francisco”, adr.locality)
assert_equal(“CA”, adr.region)
assert_equal(“94107″, adr.postal_code)
assert_equal(“U.S.A.”, adr.country_name)
end
end
[/ruby]

Again, the Adr class extends a new instance of Struct with members that correspond to the adr spec’s Property List.

[ruby]
class Adr < Struct.new(:post_office_box, :extended_address, :street_address, :locality, :region, :postal_code, :country_name);end

class AdrScanner
def self.find_all(html)
doc = Hpricot(html)
(doc/".adr").map do |adr|
Adr.new(*Adr.members.map { |m| (adr/".#{m.gsub('_', '-')}").inner_text })
end
end
end
[/ruby]

In this case, it suffices for the AdrScanner to detect all elements that are marked up as class="adr". For each of those elements, we extract the matching nested properties (e.g. class="locality") and pass them as an array to the constructor of Adr.

hCard

Things get slightly more complicated when parsing hCards. hCard is a compound Microformat, in the sense that it contains other Microformats, notably adr. In addition to that, the tel property can appear more than once, whilst it can also accept an optional type attribute.

[html]

Phone: +1-727-231-0101

Fax:
+1-727-258-0207

[/html]

Keeping this in mind, it would make sense to treat the simple members of hCard in a similar fashion to that of the previous examples. The adr member should be of type Adr, whereas the phone numbers could go in a Hash field and retrieved as hcard.tels[:type]

[ruby]
class HcardTest < Test::Unit::TestCase
def setup
html = %(

Wikimedia Foundation Inc.
200 2nd Ave. South #358
St. Petersburg,
FL
33701-4313
USA

Phone: +1-727-231-0101
Email:

Fax:
+1-727-258-0207

)
@hcard = HcardScanner.find_all(html)[0]
end

def test_simple_members
assert_equal(“Wikimedia Foundation Inc.”, @hcard.fn)
assert_equal(“Wikimedia Foundation Inc.”, @hcard.org)
assert_equal(“info@wikimedia.org”, @hcard.email)
end

def test_tels
assert_equal(“+1-727-231-0101″, @hcard.tels[:default])
assert_equal(“+1-727-258-0207″, @hcard.tels[:fax])
end

def test_adr
assert_equal(“200 2nd Ave. South #358″, @hcard.adr.street_address)
assert_equal(“St. Petersburg”, @hcard.adr.locality)
assert_equal(“FL”, @hcard.adr.region)
assert_equal(“33701-4313″, @hcard.adr.postal_code)
assert_equal(“USA”, @hcard.adr.country_name)
end
end
[/ruby]

Because tel does not necessarily require the class="type" and class="value" attributes, we are treating the number that is marked up solely as class="tel" as the default.

[ruby]
class Hcard < Struct.new(:fn, :org, :email, :tels, :adr);end

class HcardScanner
def self.find_all(html)
doc = Hpricot(html)
(doc/".vcard").map do |vcard|
hcard = Hcard.new(*[:fn, :org, :email].map { |m| (vcard/".#{m}").inner_text })
hcard.tels = find_tels(vcard)
hcard.adr = AdrScanner.find_all(vcard.to_html)[0]
hcard
end
end

private

def self.find_tels(vcard)
tels = {}
(vcard/".tel").each do |tel|
type = (tel/".type").inner_text
if type.empty?
type = :default
value = tel.inner_text
else
type = type.downcase.to_sym
value = (tel/".value").inner_text
end
tels[type] = value
end
tels
end
end
[/ruby]

For each vcard element found in the HTML, we construct a new Hcard object. We can use the AdrScanner to extract the adr element and pass it on to the Hcard. For all occurrences of tel, we have to check for the presence of class="type" and class="value" and add them as entries to the Hcard#tels hash. In the absence of those two attributes, we add the phone number to the tels hash keyed as :default.

Microformats: Machine CSS

Saturday, June 16th, 2007

I have in the past expressed skepticism against the claim that Microformats are “Designed for humans first and machines second”.

I would argue that Microformats are to machines what CSS is to humans.

There are more than one similarities between what CSS and Microformats are trying to achieve, or even how they manifest themselves in terms of implementation. They do, after all, share a common platform – HTML.

One of the most important common goals shared by Microformats and CSS is making the resources they decorate more meaningful to the receiver of those resources. And while making a heading bright orange and bold would make it look more like a heading, something directly linked to a human reader’s understanding, decorating a div with class="vcard" facilitates a program’s perception of what the marked-up data is representing and how it is to be treated.

Imagine a tag cloud in two states, before and after its entries have been enhanced with rel-tag. A casual human reader would always perceive the entries as tags regardless of rel-tag and would in fact be oblivious to its existence. The human reader recognizes the tags because of the way they look, as instructed by the web page’s stylesheet. A tag aggregator script on the other hand would not recognize the content of the cloud as tags before it was marked as so with rel-tag.

Ultimately, and through various levels of indirection, most, if not all, information is to be used by/for humans. It is probably the amount of levels of indirection that signifies what is designed primarily for humans and what is designed primarily for machines.

Thank you Safari

Wednesday, June 13th, 2007

Thank you safari

I have a been a Camino devotee for a long time, but the improved search UI on the recently released Safari 3 Public Beta has left me impressed and finds me particularly pleased to have discovered it works just as well when searching through a web page’s source code.

Erubis

Friday, June 8th, 2007

If you haven’t heard of it already, or if you have and have yet to give it a spin, I strongly recommend Erubis as an alternative to ERB for your Rails app. Setting it up is dead simple: require 'erubis/helpers/rails_helper' in config/environment.rb. We’ve been using it for all of our Rails projects for the last few months and have noticed a dramatic rendering speed bump.