Host-meta (aka Site-meta) and Well-Known URIs

host-metaOver the past two weeks Well-Known URIs (draft-nottingham-site-meta) completed its last-call review at the IETF and is now pending IESG review before publication as an RFC. In addition, Host-meta (draft-hammer-hostmeta) was introduced to fill in the gap created by the recent revisions.

There is a lot of confusion about these drafts, not because they are complicated – they are pretty simple – but because of how they evolved and ended up solving different problems. It makes a good story for wannabe standards editors but that’s for another day.

Well-Known URIs was originally proposed as ‘site-meta’, a well-known location document at the root of the domain, used as a registry of other metadata and policy documents:

It is increasingly common for Web-based protocols to require the discovery of policy or metadata before making a request. For example, the Robots Exclusion Protocol specifies a way for automated processes to obtain permission to access resources; likewise, the Platform for Privacy Preferences tells user-agents how to discover privacy policy beforehand.

While there are several ways to access per-resource metadata (e.g., HTTP headers, WebDAV’s PROPFIND), the perceived overhead associated with them often precludes their use in these scenarios.

When this happens, it is common to designate a “well-known location” for such metadata, so that it can be easily located. However, this approach has the drawback of risking collisions, both with other such designated “well-known locations” and with pre-existing resources.

The idea was to create (one last) well-known location where all future documents will simply register and avoid the need to create more well-known documents. This was originally proposed as a new XML format but later changed to a simpler HTTP-header-like text-based format, and now, to a URI path prefix without any document format associated with it.

The name site-meta was changed to host-meta to better reflect the scope of the metadata provided by the document. For many people, a site is functional term which often means multiple host names used together to provide a complete service. For example, http://twitter.com and http://search.twitter.com is considered by most people to be a single site, but two separate hosts. The term host is used as defined by RFC 3986.

As the proposal evolved, it became clear that it was not going to satisfy the needs of many protocols because it would have forced them to go through one more level of indirection (and having to deal with another file format) just to get to their metadata. Instead of creating a well-known location document, the proposal creates a directory under which future well-known location documents should reside.

It also establishes a registry to avoid name collisions within the ‘/.well-known/’ namespace. In addition, it added a ‘.’ character to the name to reduce the likelihood of conflicts with common username name-spaces in sites that allow users to use URI such as http://example.com/joe as their profile page.

The Well-Known URIs proposal is currently pending IESG review and hopefully we be approved for publication as an RFC soon.

What also became clear during this process, is that there is a need for a metadata document about a host. The original idea for site-meta was to serve as a collection of links, but as we discussed more use cases, it became clear that we were also looking for a document to hold actual protocol metadata, not just point to it elsewhere.

Web-based protocols often require the discovery of host policy or metadata, where host is not a single resource but the entity controlling the collection of resources identified by URIs with a common host as defined by RFC 3986.  While these protocols have a wide range of metadata needs, they often define metadata that is concise, has simple syntax requirements, and can benefit from storing its metadata in a common location used by other related protocols.

Because there is no URI or a resource available to describe a host, many of the methods used for associating per-resource metadata (such as HTTP headers) are not available.  This often leads to the overloading of the root HTTP resource (e.g. ‘http://example.com/‘) with host metadata that is not specific to the root resource (e.g. a home page or web application), and which often has nothing to do it.

Another new feature of the host-meta proposal is the use of XRD as the document schema. This simplifies the protocols looking to use it as they all already require the ability to parse XRD documents. For example, this simple host-meta document for the ‘example.com‘ and ‘http://www.example.com‘ hosts provides a link for host-wide copyright information and a link template providing a URI for obtaining resource-specific metadata for each resource within the host-meta document scope:


<?xml version='1.0' encoding='UTF-8'?>
<XRD xmlns='http://docs.oasis-open.org/ns/xri/xrd-1.0'
     xmlns:hm='http://host-meta.net/ns/1.0'>

    <hm:Host>example.com</hm:Host>
    <hm:Host>www.example.com</hm:Host>

    <Link rel='license'
          href='http://example.com/license'>
        <Title xml:lang='en-us'>Site License Policy</Title>
    </Link>
    <Link rel='describedby'
          template='http://meta.example.com?uri={uri}'>
        <Title xml:lang='en-us'>Resource Descriptor</Title>
    </Link>
</XRD>

The next step for host-meta is to add a trust profile section which will handle with some security and authority issues common to many of the protocol looking to use it such as WebFinger, Salmon, and OpenID. Since host-meta now includes many of the new features offered by LRDD, that specification will have to change as well.

One thought on “Host-meta (aka Site-meta) and Well-Known URIs

Comments are closed.