Beginner’s Guide to Discovery – Part II: People vs. Machines

Discovery is a machine-oriented process. While people can access the data being discovered, the focus is on making it easy for machines to access and extract. But when the data is intended for both people and machines developers have to choose between providing it once or offering it in two separate containers.

Discovery in Plain Sight

The fact that screen scraping drives many useful services means that there is great value for machines to be able to access human-readable data. Microformats offer a powerful and simple mechanism to add machine-readable context to such data. An important design goal of Microformats is that the information remains visible when viewed in a browser – the data visible to the end user from within a web browser is identical to the data readable by a machine parsing the HTML document source.

For example, I can write about my friend Chris Messina who works for Vidoop and follows my Twitter stream, and at the same time allow machines to find out which word is his first name, which is last, which part is his organizational affiliation, and who owns the Twitter stream I mentioned – all of which are obvious to a human reader.

Looking at the HTML source of this blog post will reveal how this is done using the hCard and XFN Microformats:

I can write about my friend <div class=”vcard”>
<a class=”url” href=” http://factoryjoe.com/”&gt;
<span class=”fn n”>
<span class=”given-name”>Chris</span>
<span class=”family-name”>Messina</span>
</span>
</a> who works for
<a class=”fn org url” href=”http://vidoop.com/”&gt;Vidoop</a>
</div>
and follows my
<a href=”http://twitter.com/theRazorBlade&#8221; rel=”me”>Twitter stream</a>, and at the

In this example, hCard is used to add semantic meaning to Chris’ name and employer while the XFN rel=”me” attribute indicates a relationship between the owner of this page – me – and the resource hyper-linking the ‘Twitter stream’ text. Since this blog acts as my online identity, claiming other resources as ‘me’ provides an easy way to enhance it. To make this claim stronger and verifiable, I can have my Twitter page point back to my blog using the same technique, which Twitter does for me automatically. Since this bidirectional referencing requires control (which implies ownership) of both resources, it can only be made by the resources’ true owner.

Combining hCard and rel=”me” can further enhance the marked-up information to indicate that of all the contact information available on my blog, the one marked with a rel=”me” attribute is about me, while all the rest are people I have some kinds of relationships with. Another useful Microformat is hCalendar for dates and events.

Using Microformats machines can perform identity discovery by extracting information about us. Microformats are designed in a way that makes the information available in plain sight for humans, putting people and machines at an equal level. For example, the information I wrote about Chris is visible so that when he visits my blog, he can see what I wrote about him in its entirety. For example, I should not give his home phone number in a blog post and make it hard for him to find out about that, while making it even easier for telemarketer to extract this information.

The Hidden Layer

Visitors to my blog might care about who I am, what I look like, what other blogs I read, how to get in touch with me, what copyright license I use, and what I usually write about. However, there is an additional set of information about me that is not directly valuable to people but can greatly enhance my experience and presence online. This includes my preferred providers for identity, content aggregation, address book, social graph, and others. It can also provide persistent configuration that can enhance my online experience.

OpenID is an example for such a hidden layer where visitors to my blog do not care who my identity provider is. By specifying it in the blog’s HTML header it remains hidden from human visitors while allowing me to use the blog’s URL as my OpenID identifier. Given the early state of discovery, there aren’t many services available today in the hidden layer but as the technology evolves, many other preferences can be added next to the OpenID information. XRDS-Simple takes things further by moving information such as my OpenID provider from the HTML page to a separate document more suitable for hidden machine-only data.

Know Your Audience

Choosing what belongs in the hidden layer instead of plain sight isn’t trivial. The question is usually asked in the context of technical discussions about tools – Microformats vs. XRDS-Simple for example – but has significant ramifications beyond technology. Microformats treat data with the idea of people-first-machines-second. It allows taking a web page that was designed for people and making it more useful by giving machines hits as to how to interact with it. XRDS-Simple on the other hand is a method for providing additional data that is not for humans and is focused on automation by machines.

Deciding which tools to use for answering the questions listed in the introduction to this guide should be made based on who is asking them, not what is the easiest way to deliver the answer. Since discovery isn’t applicable when the questions are asked by people alone (it is a machine process after all), we are left with two possibilities: machines-only or people-and-machines. It is easy to identify when the questions are asked by machines-only, but much more difficult when the questions are asked by people and machines alike.

3 thoughts on “Beginner’s Guide to Discovery – Part II: People vs. Machines

  1. I have to admit that I am still not sold on RDF and the current use cases for it. I still find XRDS much easier to work with and it seems to offer everything needed for the kind of discovery I am talking about. But as I post more on the subject it would be great to get feedback from RDF folks about how the same use cases can be implemented with RDF.

Comments are closed.