You are currently viewing a snapshot of www.mozilla.org taken on April 21, 2008. Most of this content is highly out of date (some pages haven't been updated since the project began in 1998) and exists for historical purposes only. If there are any pages on this archive site that you think should be added back to www.mozilla.org, please file a bug.



RDF Technical Overview

Guha
Robert Churchill
John Giannandrea


Status: Mostly Accurate
This 1999-08-10 revision by DB cleans out some of the more obviously obsolete references so it should mostly accurate now. It also incorporates text salvaged from elsewhere describing the datasource architecture. 1999-08-10: added links to formal W3C specs.

Last updated: $Id: api.html,v 1.5 1999/08/11 15:14:42 daniel.brickley%bristol.ac.uk Exp $


Overview

This document is a high level overview of the RDF code in Mozilla. You need to understand the material in this document before hacking the Mozilla RDF code. You'll also need to read the formal W3C specifications. We do not attempt a formal overview of RDF here. Instead, consult the W3C Recommendation for RDF Model and Syntax. See also the RDF Schema Proposed Recommendation, which describes the RDF type system and model for metadata vocabulary description.

If all you want to do is use the Mozilla RDF APIs for your own application, you don't need to understand everything given here, but it might be a good idea anyway.

If you are looking for material on how RDF is used in applications, see the 'Aurora' document or the guide or the XUL Template Reference; these describe how RDF data sources manifest themselves in the user interface.

The Basic Idea

We have a lot of different pieces of structured data --- bookmarks, history, file systems, document structures, sitemaps, etc. The creation/access/manipulation code for these are completely independent. So, each of them has its own storage system, editing and viewing tools, query and manipulation APIs, etc. There is a substantial lost opportunity here. There is considerable overlap in the data model used by all these different structures. All these structures are instances of directed labeled graphs. So, the basic idea behind RDF is : if you can manifest yourself via the RDF data model (which is built upon directed labeled graphs), there is a marketplace of services that you can utilize. Some of these services include,
  1. Viewers and Editors for these structures.
  2. Persistent Storage
  3. Query Mechanisms
  4. Inferential services such as type checking and inheritance.
  5. Compositing, i.e., the ability to provide merged views of multiple graphs. Uses of this are described later.
  6. Serialization and transmission via the RDF-XML format.
  7. and many other services that we haven't yet thought about ...
Another way of looking at it is as follows: Just as COM/beans/... allows pieces of code to work together because they manifest a common object model, RDF tries to do the same thing for data and the common data model is built upon that of directed labeled graphs.

What is RDF

Informal Overview

RDF, in its most simple form, is a directed labeled graph. Imagine triples of the following: nodes (keyed off of URLs such as "http://people.netscape.com/rjc/") with arc nodes (such as "Owner Name") pointing to other nodes (such as "Robert John Churchill"). Each triple is an RDF "assertion" or "statement".

Most people, including end users, are familiar with hierarchies... from file systems to Netscape's bookmark window, a hierarchy helps to indicate organization in a tree. Through the use of mechanisms (ie. XUL templates and RDF's built-in notion of containership) that project an RDF graph onto a tree structure, we can present hiearchically oriented interfaces to arbitrarily complex RDF graph strutures.

Question: How does data make its way into RDF graphs?

Answer: For every type of data, there is an associated data source. For example, file and folders from the local file system are reflected into RDF's graph via a file system data source that responds to queries from RDF's engine.

Model vs Syntax

There are a couple very different things meant by the term RDF.
  • RDF as a data model / data abstraction layer / query language. Directed Labeled Graphs (DLG) are a very general mechanism for representing things. Naturally, it turns out that you can model a wide range of information as a DLG. It doesn't matter how the information is stored on disk or transmitted over the wire --- if it can be modeled as a DLG, we can make it queriable as a DLG.
  • RDF as a file format using XML. It would be nice to have a canonical file format to ship snippets of RDF across the wire. This is the RDF File format. Note that future W3C work may result in alternative mechanisms for shipping RDF graphs around in XML. The RDF 1.0 syntax is just one of many possible ways we might use XML to interchange RDF.
Nodes in RDF DLGs are Resources in the sense of URIs. This means that you can get two different graphs from different sources that reference some of the same nodes. You can superpose the two graphs (making sure that the common nodes are properly aligned) and you have just aggregated the information from the two sources.

This aggregation ability is used all around the place with RDF for personalization, overriding, etc.

RDF Datasource

An RDF datasource represents a directed labeled graph. See the Datasource HOWTO guide for more details on the RDF datasource APIs.

Every RDF graph consists of

  1. A set of nodes which are either 'resources' or 'literals'. Resources in the Mozilla RDF API are defined in the interface nsIRDFResource. Literal nodes are defined in the interface nsRDFLiteral
  2. A set of arcs, each labeled with a resource (actually in the RDF specs a specific sub-type of 'resource' called 'property') and a truth value, i.e., a true/false label.
An RDF datasource (nsRDFDataSource) can be aggregated with other RDF graph using nsIRDFCompositeDataSource. This interface presents as a single graph an aggregation of an ordered list of RDF data sources, each contributing a portion of the graph. The aggregation of the graphs is defined by simple superpositioning. The ordering of data sources specifies a priority and if an arc appears in multiple data sources with different truth values, the arc from the higher data source overrides. Each data source itself is identified by a URI. Note that the RDF Model and Syntax iteself only provides for positive assertions; the ability to store negative assertions is a behind-the-scenes mechanism and does not have any representation in the XML syntax.

Some standard data sources already in progress are

  • A file encoded in RDF, MCF, Netscape bookmarks or any of the data file formats that Navigator understands.
  • FTP directories
  • Local file systems (A, C, D, etc. drives, Mac Volumes, etc.)
  • Berkeley DB encodings of RDF
  • Browser history

More data sources can easily be added. We hope this will happen with the help of the developers outside Netscape.

The Mozilla RDF Query API is a very standard graph query API. It can be used both to traverse and to edit the graph. It is easy to expose a new source of data via the RDF APIs. To do this, one provides a wrapper (around that data source) that implements the RDF APIs.

A data source could be a read-write store or a read-only store. It can also be a read-partial-write store, i.e., it can execute only some of the edits presented to it. e.g., folder based file system directories are far less expressive than general RDF graphs. In the more general model, it is possible to make statements like "File001 contains the response to email0017". Neither the file system (nor the email system) is capable of representing such a statement. The wrapper for the file system (and email system) can legally refuse to perform such an edit. If a more general purpose RDF data source (for example one based on Berkeley DB), that database could perform the addition. The user of the query API need not know the difference.