Burr Metadata Framework Overview

It's been called the Great Conversation; a global cocktail party where everyone is talking at the same time. The Web is a vast slurry of content and services. It works especially well for new content because services like Google, Digg and YouTube help popular content to bubble to the surface. It's a bit like throwing pasta at a wall. If content is ready for prime time it sticks, if not, more often than not, it falls. But not all content that has value is either new or popular. And when that content stops sticking to the wall it sinks to the bottom of the slurry and much of it is eventually lost forever.

Until recently such content was published on physical media, collected and cataloged into libraries and archives so that it would be preserved and accessed in the future. But today nearly all content is created in native digital formats and increasingly never makes it onto physical formats at all. The Web is not designed institutionally or technologically to perform the same role that brick and mortar Libraries and archives have in the past.

This is the role that the Burr Metadata Framework (BMF) has been designed to fill.

BMF provides a framework for building structured, distributed, indexed collections of content. Pages/ records are linked to each other using defined-by links which point to records which define things. Collections are organized by defining the relationship (e.g.. broader, narrower, related, equivalent etc) that the concept the record represents with other concepts represented by other records. BMF uses a distributed architecture and works much the same way as distributed version control systems like Git and Arch. Client software downloads records and content from other collections and weaves them into a local collection by dynamically generating indexes of backlinks (similar to the way Google indexes Web content for their search engine).

The approach BMF uses has a number of advantages and features which are worth mentioning:

  • Interoperability is as simple as defining the relationships between concepts represented by records.
  • Multiple copies of content are kept in multiple locations, ensuring that content is not corrupted or lost.
  • Indexing is distributed and local rather than provided solely by centralized third-party search services like Google.
  • Context (the who, what, when, where associated far down the Long Tail. with content) is preserved, which is especially important for older and less accessed content located
  • Simply using a collection adds greater value to it over time as gaps in records refined, gaps filled, new relationships are established and indexes become increasingly fine grained.
  • Collections can be mirrors of other collections or a union of subsets of content from any number of other collections. Collections can be established for a variety of purposes including long term archival storage of master formats, public libraries providing content converted into popular formats, or personal working libraries made up of working formats used by software used for creating or editing content.
  • A globally unique ID system is integrated with a RESTful URI syntax which can be used for identification, defining complex queries, retrieval of and automated conversion of content into different formats and encodings.
  • Fine-grained access control used a federated identity system based on OpenID which allows multiple role to be defined which are associated with the identity of an individual, or corporate body.
  • Collections are designed to scale from small personal collections that could fit on an iPhone up to distributed superclusters of servers as large as the Googleplex.
  • Leverages the smart data structures so that relatively dumb and cheap applications can be used to process them. This is in contrast with the World Wide Web which requires smart and expensive applications to spider, index and search it.
  • Just as Cascading Style Sheets were designed to separate presentation from structure, BMF is designed to separate semantic markup from its structure This is done by defining extensible globally interoperable distributed catalog languages rather than monolithic global subject and authority catalogs.

BMF has been in development for over eight years. It has been chosen for use in three large scale archival projects which are in now in early stages of development. Documentation, technical specifications and schemas are being completed now in preparation for submitting BMF to an open standards body in fourth quarter 2009.



Home | Search | RSS Feed