<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>nutrun &#187; Microformats</title>
	<atom:link href="http://nutrun.com/weblog/category/microformats/feed/" rel="self" type="application/rss+xml" />
	<link>http://nutrun.com</link>
	<description>nutrun</description>
	<lastBuildDate>Thu, 24 Jun 2010 11:14:11 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>Parsing Microformats: rel-tag, adr, hCard</title>
		<link>http://nutrun.com/weblog/parsing-microformats-rel-tag-adr-hcard/</link>
		<comments>http://nutrun.com/weblog/parsing-microformats-rel-tag-adr-hcard/#comments</comments>
		<pubDate>Wed, 27 Jun 2007 17:32:15 +0000</pubDate>
		<dc:creator>George Malamidis</dc:creator>
				<category><![CDATA[Microformats]]></category>
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://new-site.nutrun.com/?p=98</guid>
		<description><![CDATA[rel-tag Out of all Microformats, rel-tag is one of the simplest, therefore one of the easiest to parse. I find it useful to treat Microformat object representations as Structures without behavior, with a set of member attributes that map to their properties to capture their state. A Scanner class takes care of parsing a given [...]]]></description>
			<content:encoded><![CDATA[<h3>rel-tag</h3>
<p>
	Out of all <a href="http://microformats.org">Microformats</a>, <a href="http://microformats.org/wiki/rel-tag">rel-tag</a> is one of the simplest, therefore one of the easiest to parse.
</p>
<p>
	I find it useful to treat Microformat object representations as <a href="http://nutrun.com/weblog/ruby-struct/">Structures</a> without behavior, with a set of member attributes that map to their properties to capture their state. A Scanner class takes care of parsing a given piece of HTML to collect all rel-tag occurrences.
</p>
<p>[ruby]<br />
class RelTagTest < Test::Unit::TestCase<br />
  def test_rel_tag_extraction<br />
    html = %(<br />
      <a href="http://technorati.com/tag/tech" rel="tag">tech</a><br />
      <a href="http://technorati.com/tag/rock" rel="tag">rock</a><br />
    )</p>
<p>    tags = RelTagScanner.find_all(html)</p>
<p>    assert_equal(2, tags.size)<br />
    assert_equal(&#8220;tech&#8221;, tags[0].name)<br />
    assert_equal(&#8220;http://technorati.com/tag/tech&#8221;, tags[0].url)<br />
    assert_equal(&#8220;rock&#8221;, tags[1].name)<br />
    assert_equal(&#8220;http://technorati.com/tag/rock&#8221;, tags[1].url)<br />
  end<br />
end<br />
[/ruby]</p>
<p>
	The implementation for the above specification is compact and straightforward.
</p>
<p>[ruby]<br />
require &#8220;rubygems&#8221;<br />
require &#8220;hpricot&#8221;</p>
<p>class RelTag < Struct.new(:url, :name);end</p>
<p>class RelTagScanner<br />
  def self.find_all(html)<br />
    (Hpricot(html)/"[@rel=tag]").map do |tag|<br />
      RelTag.new(tag[:href], tag.inner_text)<br />
    end<br />
  end<br />
end<br />
[/ruby]</p>
<p>
	The <code>RelTag</code> class is a <code>Struct</code> with two members, <code>url</code> and <code>name</code>. In <code>RelTagScanner</code>&#8216;s <code>find_all</code> method, we ask <a href="http://code.whytheluckystiff.net/hpricot/">Hpricot</a> to fetch all elements with a <code>rel="tag"</code> attribute and from those we extract the url and value to populate the <code>RelTag</code> objects.
</p>
<h3>adr</h3>
<p>
	Even though <a href="http://microformats.org/wiki/adr">adr</a>&#8216;s schema specifies more properties, parsing it is more straightforward than rel-tag as all the adr fields are marked up with <code>class="field"</code> constructs.
</p>
<p>[ruby]<br />
class AdrTest < Test::Unit::TestCase<br />
  def test_adr_extraction<br />
    html = %(</p>
<div class="adr">
<div class="street-address">665 3rd St.</div>
<div class="extended-address">Suite 207</div>
<p>      <span class="locality">San Francisco</span>,<br />
      <span class="region">CA</span><br />
      <span class="postal-code">94107</span></p>
<div class="country-name">U.S.A.</div>
</p></div>
<p>    )</p>
<p>    adr = AdrScanner.find_all(html)[0]</p>
<p>    assert_equal(&#8220;665 3rd St.&#8221;, adr.street_address)<br />
    assert_equal(&#8220;Suite 207&#8243;, adr.extended_address)<br />
    assert_equal(&#8220;San Francisco&#8221;, adr.locality)<br />
    assert_equal(&#8220;CA&#8221;, adr.region)<br />
    assert_equal(&#8220;94107&#8243;, adr.postal_code)<br />
    assert_equal(&#8220;U.S.A.&#8221;, adr.country_name)<br />
  end<br />
end<br />
[/ruby]</p>
<p>
	Again, the <code>Adr</code> class extends a new instance of <code>Struct</code> with members that correspond to the adr spec&#8217;s <a href="http://microformats.org/wiki/adr#Property_List">Property List</a>.
</p>
<p>[ruby]<br />
class Adr < Struct.new(:post_office_box, :extended_address, :street_address, :locality, :region, :postal_code, :country_name);end</p>
<p>class AdrScanner<br />
  def self.find_all(html)<br />
    doc = Hpricot(html)<br />
    (doc/".adr").map do |adr|<br />
      Adr.new(*Adr.members.map { |m| (adr/".#{m.gsub('_', '-')}").inner_text })<br />
    end<br />
  end<br />
end<br />
[/ruby]</p>
<p>
	In this case, it suffices for the <code>AdrScanner</code> to detect all elements that are marked up as <code>class="adr"</code>. For each of those elements, we extract the matching nested properties (e.g. <code>class="locality"</code>) and pass them as an array to the constructor of <code>Adr</code>.
</p>
<h3>hCard</h3>
<p>
	Things get slightly more complicated when parsing <a href="http://microformats.org/wiki/hcard">hCards</a>. hCard is a compound Microformat, in the sense that it contains other Microformats, notably adr. In addition to that, the <code>tel</code> property can appear more than once, whilst it can also accept an optional <code>type</code> attribute.
</p>
<p>[html]</p>
<div>Phone: <span class="tel">+1-727-231-0101</span></div>
<div>
  <span class="tel"><br />
    <span class="type">Fax</span>:<br />
    <span class="value">+1-727-258-0207</span><br />
  </span>
</div>
<p>[/html]</p>
<p>
	Keeping this in mind, it would make sense to treat the simple members of hCard in a similar fashion to that of the previous examples. The <code>adr</code> member should be of type <code>Adr</code>, whereas the phone numbers could go in a Hash field and retrieved as <code>hcard.tels[:type]</code>
</p>
<p>[ruby]<br />
class HcardTest < Test::Unit::TestCase<br />
  def setup<br />
    html = %(
<div class="vcard">
<div class="fn org">Wikimedia Foundation Inc.</div>
<div class="adr">
<div class="street-address">200 2nd Ave. South #358</div>
<div>
          <span class="locality">St. Petersburg</span>,<br />
          <abbr class="region" title="Florida">FL</abbr><br />
          <span class="postal-code">33701-4313</span>
        </div>
<div class="country-name">USA</div>
</p></div>
<div>Phone: <span class="tel">+1-727-231-0101</span></div>
<div>Email: <span class="email">info@wikimedia.org</span></div>
<div>
        <span class="tel"><br />
          <span class="type">Fax</span>:<br />
          <span class="value">+1-727-258-0207</span><br />
        </span>
      </div>
</p></div>
<p>)<br />
    @hcard = HcardScanner.find_all(html)[0]<br />
  end</p>
<p>  def test_simple_members<br />
    assert_equal(&#8220;Wikimedia Foundation Inc.&#8221;, @hcard.fn)<br />
    assert_equal(&#8220;Wikimedia Foundation Inc.&#8221;, @hcard.org)<br />
    assert_equal(&#8220;info@wikimedia.org&#8221;, @hcard.email)<br />
  end</p>
<p>  def test_tels<br />
    assert_equal(&#8220;+1-727-231-0101&#8243;, @hcard.tels[:default])<br />
    assert_equal(&#8220;+1-727-258-0207&#8243;, @hcard.tels[:fax])<br />
  end</p>
<p>  def test_adr<br />
    assert_equal(&#8220;200 2nd Ave. South #358&#8243;, @hcard.adr.street_address)<br />
    assert_equal(&#8220;St. Petersburg&#8221;, @hcard.adr.locality)<br />
    assert_equal(&#8220;FL&#8221;, @hcard.adr.region)<br />
    assert_equal(&#8220;33701-4313&#8243;, @hcard.adr.postal_code)<br />
    assert_equal(&#8220;USA&#8221;, @hcard.adr.country_name)<br />
  end<br />
end<br />
[/ruby]</p>
<p>
	Because <code>tel</code> does not necessarily require the <code>class="type"</code> and <code>class="value"</code> attributes, we are treating the number that is marked up solely as <code>class="tel"</code> as the default.
</p>
<p>[ruby]<br />
class Hcard < Struct.new(:fn, :org, :email, :tels, :adr);end</p>
<p>class HcardScanner<br />
  def self.find_all(html)<br />
    doc = Hpricot(html)<br />
    (doc/".vcard").map do |vcard|<br />
      hcard = Hcard.new(*[:fn, :org, :email].map { |m| (vcard/".#{m}").inner_text })<br />
      hcard.tels = find_tels(vcard)<br />
      hcard.adr = AdrScanner.find_all(vcard.to_html)[0]<br />
      hcard<br />
    end<br />
  end</p>
<p>  private</p>
<p>  def self.find_tels(vcard)<br />
    tels = {}<br />
    (vcard/".tel").each do |tel|<br />
      type = (tel/".type").inner_text<br />
      if type.empty?<br />
        type = :default<br />
        value = tel.inner_text<br />
      else<br />
        type = type.downcase.to_sym<br />
        value = (tel/".value").inner_text<br />
      end<br />
      tels[type] = value<br />
    end<br />
    tels<br />
  end<br />
end<br />
[/ruby]</p>
<p>
	For each <code>vcard</code> element found in the HTML, we construct a new <code>Hcard</code> object. We can use the <code>AdrScanner</code> to extract the adr element and pass it on to the <code>Hcard</code>. For all occurrences of <code>tel</code>, we have to check for the presence of <code>class="type"</code> and <code>class="value"</code> and add them as entries to the <code>Hcard#tels</code> hash. In the absence of those two attributes, we add the phone number to the <code>tels</code> hash keyed as <code>:default</code>.</p>
]]></content:encoded>
			<wfw:commentRss>http://nutrun.com/weblog/parsing-microformats-rel-tag-adr-hcard/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Microformats: Machine CSS</title>
		<link>http://nutrun.com/weblog/microformats-machine-css/</link>
		<comments>http://nutrun.com/weblog/microformats-machine-css/#comments</comments>
		<pubDate>Sat, 16 Jun 2007 12:08:21 +0000</pubDate>
		<dc:creator>George Malamidis</dc:creator>
				<category><![CDATA[Microformats]]></category>
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://new-site.nutrun.com/?p=97</guid>
		<description><![CDATA[I have in the past expressed skepticism against the claim that Microformats are &#8220;Designed for humans first and machines second&#8221;. I would argue that Microformats are to machines what CSS is to humans. There are more than one similarities between what CSS and Microformats are trying to achieve, or even how they manifest themselves in [...]]]></description>
			<content:encoded><![CDATA[<p>
I have in the past <a href="http://nutrun.com/weblog/how-microformats-will-simplify-the-web/" title="How microformats will simplify the web">expressed skepticism</a> against the claim that Microformats are <cite>&#8220;Designed for humans first and machines second&#8221;</cite>.
</p>
<p>
I would argue that <strong>Microformats are to machines what CSS is to humans</strong>.
</p>
<p>
There are more than one similarities between what CSS and Microformats are trying to achieve, or even how they manifest themselves in terms of implementation. They do, after all, share a common platform &#8211; HTML.
</p>
<p>
One of the most important common goals shared by Microformats and CSS is making the resources they decorate more meaningful to the receiver of those resources. And while making a heading bright orange and bold would make it look more like a heading, something directly linked to a human reader&#8217;s understanding, decorating a <code>div</code> with <code>class="vcard"</code>  facilitates a program&#8217;s perception of what the marked-up data is representing and how it is to be treated.
</p>
<p>
Imagine a tag cloud in two states, before and after its entries have been enhanced with <a href="http://microformats.org/wiki/rel-tag" title="rel-tag">rel-tag</a>. A casual human reader would always perceive the entries as tags regardless of <code>rel-tag</code> and would in fact be oblivious to its existence. The human reader recognizes the tags because of the way they look, as instructed by the web page&#8217;s stylesheet. A tag aggregator script on the other hand would not recognize the content of the cloud as tags before it was marked as so with <code>rel-tag</code>.
</p>
<p>
Ultimately, and through various levels of indirection, most, if not all, information is to be used by/for humans. It is probably the amount of levels of indirection that signifies what is designed primarily for humans and what is designed primarily for machines.</p>
]]></content:encoded>
			<wfw:commentRss>http://nutrun.com/weblog/microformats-machine-css/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>HMachine</title>
		<link>http://nutrun.com/weblog/hmachine/</link>
		<comments>http://nutrun.com/weblog/hmachine/#comments</comments>
		<pubDate>Thu, 24 May 2007 23:20:45 +0000</pubDate>
		<dc:creator>George Malamidis</dc:creator>
				<category><![CDATA[Microformats]]></category>
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://new-site.nutrun.com/?p=94</guid>
		<description><![CDATA[20 odd lines of a dirty hack for a Microformats parser (thanks to open-uri and Hpricot). %w(rubygems hpricot open-uri).each {&#124;l&#124; require l} module Microformats class Microformat < Struct def self.for(uri) mf = new name = mf.class.name.split('::').last.downcase doc = Hpricot(open(uri)) members.each do &#124;m&#124; eval %{ val = doc%('.#{name} .#{m.gsub('_', '-')}') mf.#{m} = val.inner_text.strip if not val.nil? [...]]]></description>
			<content:encoded><![CDATA[<p>
20 odd lines of a dirty hack for a <a href="http://microformats.org/" title="Microformats">Microformats</a> parser (thanks to <a href="http://www.ruby-doc.org/stdlib/libdoc/open-uri/rdoc/" title="open-uri">open-uri</a> and <a href="http://code.whytheluckystiff.net/hpricot/" title="Hpricot">Hpricot</a>).
</p>
<pre>
%w(rubygems hpricot open-uri).each {|l| require l}
module Microformats
  class Microformat < Struct
    def self.for(uri)
      mf = new
      name = mf.class.name.split('::').last.downcase
      doc = Hpricot(open(uri))
      members.each do |m|
        eval %{
          val = doc%('.#{name} .#{m.gsub('_', '-')}')
          mf.#{m} = val.inner_text.strip if not val.nil?
        }
      end
      mf
    end
    class << self; alias :/ :for end
  end
end
</pre>
<p>
Adding support for microformat specifications can be achieved as:
</p>
<pre>
module Microformats
  class MyFormat < Microformat.new(:x, :y, :z);end
end
</pre>
<p>
Where <code>:x, :y, :z</code> are the Microformat's properties.<br />
As a more concrete example, let's add support for (part of) <a href="http://microformats.org/wiki/hreview" title="HReview">HReview</a>:
</p>
<pre>
class HReview < Microformat.new(:summary, :fn,
                                :dtreviewed, :description,
                                :rating);end
</pre>
<p>
In action...
</p>
<pre>
include Microformats

hr = HReview.for('http://www.amk.ca/books/h/Velocity_of_Honey')
p hr.fn

# =&gt; "The Velocity of Honey: And More Science of Everyday Life"
</pre>
<p>
... or even slicker...
</p>
<pre>
hr = HReview/'http://www.amk.ca/books/h/Velocity_of_Honey'
</pre>
]]></content:encoded>
			<wfw:commentRss>http://nutrun.com/weblog/hmachine/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>How microformats will simplify the web</title>
		<link>http://nutrun.com/weblog/how-microformats-will-simplify-the-web/</link>
		<comments>http://nutrun.com/weblog/how-microformats-will-simplify-the-web/#comments</comments>
		<pubDate>Tue, 17 Apr 2007 00:09:54 +0000</pubDate>
		<dc:creator>George Malamidis</dc:creator>
				<category><![CDATA[Microformats]]></category>
		<category><![CDATA[Software]]></category>

		<guid isPermaLink="false">http://new-site.nutrun.com/?p=89</guid>
		<description><![CDATA[For the humorous sake of it, let me start with what I don&#8217;t like about Microformats. To cite the site (heh..) Microformats are &#8220;Designed for humans first and machines second&#8221;. Every time I see the word &#8220;humans&#8221; in relation to something to do with technology, I kind of turn red. Partly because I&#8217;m the sort [...]]]></description>
			<content:encoded><![CDATA[<p>
For the humorous sake of it, let me start with what I <em>don&#8217;t</em> like about <a href="http://microformats.org/" title="Microformats">Microformats</a>. To cite the site (heh..) Microformats are <cite>&#8220;Designed for humans first and machines second&#8221;</cite>. Every time I see the word <em>&#8220;humans&#8221;</em> in relation to something to do with technology, I kind of turn red. Partly because I&#8217;m the sort of person who would always choose to use the ATM instead of going through the <em>Human Bank-Employee</em>, partly because everything <em>designed for humans</em> is going to end up being used most and foremost by <a href="http://nutrun.com/weblog/making-spam-more-human/" title="Making SPAM more Human">spammers</a>.
</p>
<p>
You see, there&#8217;s no denying that software is meant to be used by or facilitate the lives of humans, things like Microformats however are only interesting &#8211; and, at least at a low level, are destined to remain so &#8211; to a very specific subset of humans: <em>Programmers</em>. So I&#8217;d be way better off spared the hippy <em>humane</em> talk.
</p>
<p>
Having gotten this semantic (there we go&#8230;) complaint off my chest, I think Microformats are great. In fact, I find them to be an idea as big as REST over SOAP style Web Services.
</p>
<p>
The magic lies in the simplicity that escorts Microformats and the household status of their platform. There&#8217;s no funny/strict schema in a mile, only mere <em>semantic enhancements</em> to one of the most widely used mediums on the web today: <strong>HTML</strong> (primarily, although I think it should be <em>exclusively</em>).
</p>
<p>
Let&#8217;s take the following example of something reminiscent of a weblog post:
</p>
<pre>
&lt;html&gt;
  &lt;head&gt;&lt;title&gt;Who needs Atom?&lt;/title&gt;&lt;/head&gt;
  &lt;body&gt;
    &lt;h1&gt;Who &gt;needs Atom?&lt;/h1&gt;
    &lt;h2&gt;
      Posted on Friday, April 13, 2007
      by Dave Mustaine
    &lt;/h2&gt;
    &lt;p&gt;
      Really... Who needs it? Or RSS, come to think of it...
    &lt;/p&gt;
  &lt;/body&gt;
&lt;/html&gt;</pre>
<p>
Now, let&#8217;s add a tiny bit of non-intrussive, albeit meaningful semantic coating to our mark-up:
</p>
<pre>
&lt;html&gt;
  &lt;head&gt;&lt;title&gt;Who needs Atom?&lt;/title&gt;&lt;/head&gt;
  &lt;body&gt;
    &lt;h1 class="title"&gt;Who needs Atom?&lt;/h1&gt;
    &lt;h2&gt;
      Posted on &lt;span class="date"&gt;Friday, April 13, 2007&lt;/span&gt;
      by &lt;span class="author"&gt;Dave Mustaine&lt;/span&gt;
    &lt;/h2&gt;
    &lt;p class="content"&gt;
      Really... Who needs it? Or RSS, for the part?...
    &lt;/p&gt;
  &lt;/body&gt;
&lt;/html&gt;
</pre>
<p>
Suddenly, our document can say <em>a lot</em> to a syndication engine or human developer with a <a href="http://code.whytheluckystiff.net/hpricot/"> website scraping API</a> at hand. And there&#8217;s no need to maintain a <code>feed.xml</code>, or anything similar. The website really <em>is</em> the weblog.
</p>
<p>
What Microformats, alongside REST are proudly showcasing is how much can be achieved by concentrating on two simple things: Meaningful URLs (<em>Where</em> the resources are and will be) and meaningful mark-up (<em>What</em> the resources are about). Once this is achieved, anyone can do whatever they want with them, because the only thing a website needs to qualify as a weblog, resume, web-service API &#8211; and the beat goes on&#8230; &#8211; is <em>a little bit of meaning</em>.</p>
]]></content:encoded>
			<wfw:commentRss>http://nutrun.com/weblog/how-microformats-will-simplify-the-web/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
