RDF Technical Overview
GuhaRobert Churchill
John Giannandrea
Status: Mostly Accurate
This 1999-08-10 revision by DB cleans out some of the more obviously obsolete references so it should mostly accurate now. It also incorporates text salvaged from elsewhere describing the datasource architecture. 1999-08-10: added links to formal W3C specs.
Last updated: $Id: api.html,v 1.5 1999/08/11 15:14:42 daniel.brickley%bristol.ac.uk Exp $
Overview
This document is a high level overview of the RDF code in Mozilla. You need to understand the material in this document before hacking the Mozilla RDF code. You'll also need to read the formal W3C specifications. We do not attempt a formal overview of RDF here. Instead, consult the W3C Recommendation for RDF Model and Syntax. See also the RDF Schema Proposed Recommendation, which describes the RDF type system and model for metadata vocabulary description.If all you want to do is use the Mozilla RDF APIs for your own application, you don't need to understand everything given here, but it might be a good idea anyway.
If you are looking for material on how RDF is used in applications, see the 'Aurora' document or the guide or the XUL Template Reference; these describe how RDF data sources manifest themselves in the user interface.
The Basic Idea
We have a lot of different pieces of structured data --- bookmarks, history, file systems, document structures, sitemaps, etc. The creation/access/manipulation code for these are completely independent. So, each of them has its own storage system, editing and viewing tools, query and manipulation APIs, etc. There is a substantial lost opportunity here. There is considerable overlap in the data model used by all these different structures. All these structures are instances of directed labeled graphs. So, the basic idea behind RDF is : if you can manifest yourself via the RDF data model (which is built upon directed labeled graphs), there is a marketplace of services that you can utilize. Some of these services include,- Viewers and Editors for these structures.
- Persistent Storage
- Query Mechanisms
- Inferential services such as type checking and inheritance.
- Compositing, i.e., the ability to provide merged views of multiple graphs. Uses of this are described later.
- Serialization and transmission via the RDF-XML format.
- and many other services that we haven't yet thought about ...
What is RDF
Informal Overview
RDF, in its most simple form, is a directed labeled graph. Imagine triples of the following: nodes (keyed off of URLs such as "http://people.netscape.com/rjc/") with arc nodes (such as "Owner Name") pointing to other nodes (such as "Robert John Churchill"). Each triple is an RDF "assertion" or "statement".
Most people, including end users, are familiar with hierarchies... from file systems to Netscape's bookmark window, a hierarchy helps to indicate organization in a tree. Through the use of mechanisms (ie. XUL templates and RDF's built-in notion of containership) that project an RDF graph onto a tree structure, we can present hiearchically oriented interfaces to arbitrarily complex RDF graph strutures.
Question: How does data make its way into RDF graphs?
Answer: For every type of data, there is an associated data source. For example, file and folders from the local file system are reflected into RDF's graph via a file system data source that responds to queries from RDF's engine.
Model vs Syntax
There are a couple very different things meant by the term RDF.- RDF as a data model / data abstraction layer / query language. Directed Labeled Graphs (DLG) are a very general mechanism for representing things. Naturally, it turns out that you can model a wide range of information as a DLG. It doesn't matter how the information is stored on disk or transmitted over the wire --- if it can be modeled as a DLG, we can make it queriable as a DLG.
- RDF as a file format using XML. It would be nice to have a canonical file format to ship snippets of RDF across the wire. This is the RDF File format. Note that future W3C work may result in alternative mechanisms for shipping RDF graphs around in XML. The RDF 1.0 syntax is just one of many possible ways we might use XML to interchange RDF.
This aggregation ability is used all around the place with RDF for personalization, overriding, etc.
RDF Datasource
An RDF datasource represents a directed labeled graph. See the Datasource HOWTO guide for more details on the RDF datasource APIs.
Every RDF graph consists of
- A set of nodes which are either 'resources' or 'literals'. Resources in the Mozilla RDF API are defined in the interface nsIRDFResource. Literal nodes are defined in the interface nsRDFLiteral
- A set of arcs, each labeled with a resource (actually in the RDF specs a specific sub-type of 'resource' called 'property') and a truth value, i.e., a true/false label.
Some standard data sources already in progress are
- A file encoded in RDF, MCF, Netscape bookmarks or any of the data file formats that Navigator understands.
- FTP directories
- Local file systems (A, C, D, etc. drives, Mac Volumes, etc.)
- Berkeley DB encodings of RDF
- Browser history
More data sources can easily be added. We hope this will happen with the help of the developers outside Netscape.
The Mozilla RDF Query API is a very standard graph query API. It can be used both to traverse and to edit the graph. It is easy to expose a new source of data via the RDF APIs. To do this, one provides a wrapper (around that data source) that implements the RDF APIs.
A data source could be a read-write store or a read-only store. It can also be a read-partial-write store, i.e., it can execute only some of the edits presented to it. e.g., folder based file system directories are far less expressive than general RDF graphs. In the more general model, it is possible to make statements like "File001 contains the response to email0017". Neither the file system (nor the email system) is capable of representing such a statement. The wrapper for the file system (and email system) can legally refuse to perform such an edit. If a more general purpose RDF data source (for example one based on Berkeley DB), that database could perform the addition. The user of the query API need not know the difference.