rdf: back-end architecture

Contact: Chris Waterson (waterson@netscape.com)

This document provides an overview of the RDF “core” implementation, that is, the implementation of the RDF model within Mozilla. It assumes some familiarity with the RDF model as described in the RDF Model and Syntax Specification. See also the Datasource HOWTO for details on writing a datasource.

Model and Terminology
Interfaces
Example

Model and Terminology

The Mozilla implementation of the RDF model is based on the RDF Model and Syntax Specification. This section attempts to provide a pseudo-mathematical overview of that model [1]. Although there are slight differences between the Mozilla RDF model and the model presented in the RDF Model and Syntax Specification, the belief is that, for all practical purposes, these differences should not affect the use of the RDF model. Any differences that lead to behavior that is functionally incorrect with respect to the Model and Syntax Specification should be logged as a bug (select RDF as the “Component”).

The Mozilla RDF “universe” consists of:

A set of resources, R. There is a representation function, which is a one-to-one function that maps a resource in R to a string; this string is called the Universal Resource Identifier (or URI) of the resource, and should conform to RFC2396. There is also a parser function, which is an onto function that maps a URI to a resource.
A set of literals, L. There is a one-to-one function that maps a literal in L to a string value.
A set of statements, S. A statement s in S is a tuple R × R × { R ∪ L }.

Resources. Because the representation function that maps a resource to a URI is one-to-one, it is possible to identify each resource using its URI. That is, given a URI, it is possible to find a unique resource in R again. [2] Informally, a resource is some sort of “Internet object” that is uniquely identifiable; for example, a web page, an email account, or a USENET news article.

The representation function only gives one canonical URI for each resource. However, as specified in RFC2396, a resource might have many different names; e.g., a web page can be reached by redirection, or a HTTP or filesystem resource’s names may contain arbitrary amounts of “.” and “..”. Therefore, we introduce a parser function from URI to R which is defined on arbitrary URIs. The parser function may abstract several URIs to the same resource; the only guarantee is that a resource’s URI as obtained through the representation function is always parsed back to the resource itself. [3]

For efficiency’s sake, the parser function only does local computation, so e.g. redirection is not considered. We realize that this “optimization” does not adhere to the spirit of RFC2396, and may preclude some inferences because resources are not recognized as being “the same”. But it would clearly be impractical to contact a web server to resolve each HTTP URI, just for fear that it might be redirected!

The RDF Model and Syntax specification explicitly allows for anonymous resources. An anonymous resource is a resource without URI-addressable identity. The Mozilla implementation allows for such resources (e.g., in serialized RDF/XML), but will automatically assign a uniquely generated URI to such a resource.

Literals. As with resources, the function that maps a literal to a string value is one-to-one. Hence, it is possible to identify each literal using its string value. The RDF Model and Syntax Specification explicitly states that the mapping from literal to string value need not be one-to-one; we’ve chosen to make the mapping one-to-one to allow for efficient comparison of literals. Informally, a literal is a primitive value that has no "first-class identity"; for example, a string, a date, or a number.

Statements. A statement consists of a subject, a predicate, and an object. The subject must be a resource. The predicate must be a resource (strictly speaking, the predicate must be a resource that is a property; however, we do not differentiate between a resource that is a property and a resource that is not a property in the Mozilla implementation). The object may be either a resource or a literal. The terms statement and assertion are interchangeable.

The Mozilla RDF implementation factors the set of statements S into subsets. Each subset of statements is called a datasource. For example, there is a datasource that contains statements about mail messages and news articles; there is a datasource that contains statements about the current user’s browsing history; there is a datasource that contains statements about the current user’s bookmarks. Each datasource may be addressed individually. It is possible to query a datasource to determine whether a statement is present. Statements may be added to, removed from, or altered in a datasource.

The implementation allows datasources to be addressed collectively. That is, the statements from several different datasources may be combined into a composite datasource, in which they may be queried or altered “in the aggregate”.

A set of statements may be visualized as a directed, labelled graph, and much of the Mozilla RDF API is crafted with this visualization in mind. Specifically, the subject of each statement is a node (the source), the object of the statement is a node (the target), and the predicate is a directed arc from the subject node to the object node. In this parlance, a datasource -- which is simply a colletion of statements -- is a (possibly unconnected) graph. A composite datasource is the graph that is constructed by overlaying the subgraphs of several individual datasources.

Reification. The RDF Model and Syntax Specification discusses how a statement may itself be “reified”, and referred to as a resource. As of this writing, the Mozilla RDF model does not support automatic derivation of “meta statements” that arise from such reification.

Interfaces

Below are the primary interfaces that are used to interact with RDF.

nsIRDFSerivce. The RDF service is an utility interface that serves three primary purposes. First, it is used to manage “named” datasources. A named datasource is a singleton datasource that can be acquired using simple URI-like name [4]; e.g., rdf:bookmarks. Second, it is used to implement the function that maps a URI to a resource (which is the inverse of the one-to-one resource-to-URI function described above). Third, it is used to implement the function that maps a string value to a literal (similarly, the inverse of the one-to-one literal-to-string function described above).

nsIRDFNode. This is an interface for a node in the RDF graph. A node must either be an nsIRDFResource or an nsIRDFLiteral [5]. Objects that implement these interfaces must be acquired from the nsIRDFService.

nsIRDFDataSource. This is the interface that provides access to a collection of “related statements” (or a “subgraph”). This interface includes methods that allow testing for the presence of a statement, enumerating the statements contained in the collection, and adding and removing statements to the set.

nsIRDFCompositeDataSource. This interface is derived from nsIRDFDataSource. An implementation of this interface will typically combine the statements from several datasources together as a collective. Because the nsIRDFCompositeDataSource interface is derived from nsIRDFDataSource, it can be queried and modified just like an individual data source.

nsIRDFObserver. This is an interface that an RDF client implements. The interface allows a client to be notified when a change occurs to the statements in a datasource.

nsIRDFContainer. This is an interface that allows for simplified access to an RDF container object (a bag, sequence, or alternation). This interface, in conjunction with nsIRDFContainerUtils provide straightforward, Java vector-esque methods for manipulating and querying RDF container objects.

Example

This section provides some sample code that uses JavaScript and XPConnect to interact with the RDF engine, including:

Acquiring the RDF service
Acquiring a datasource
Acquiring RDF resources
Using the RDF resources to perform queries on and alter statements in the datasource

The code below illustrates this process.

Acquire the RDF service. To acquire the RDF service, use the Components object:

var RDF = Components.classes['@mozilla.org/rdf/rdf-service;1'].getService(); RDF = RDF.QueryInterface(Components.interfaces.nsIRDFService);

Create a datasource. Using the Components object, we’ll create an in-memory datasource, which is just a simple “scratch” datasource that will remember the statements we add to it:

var ds = Components. classes['@mozilla.org/rdf/datasource;1?name=in-memory-datasource']. createInstance(); ds = ds.QueryInterface(Components.interfaces.nsIRDFDataSource);

Acquire RDF nodes. Using the RDF service, you can acquire individual RDF resource and literal objects. These are what you use to perform a query on the RDF database.

var homepage = RDF.GetResource('http://people.netscape.com/waterson'); var FV_quality = RDF.GetResource('urn:my-web-vocabulary:quality'); var value = RDF.GetLiteral('tres cool');

Use the RDF nodes to add statements to the datasource. And finally, we “do the deed” using the Assert method of the nsIRDFDataSource interface:

ds.Assert(homepage, FV_quality, value, true);

Query the datasource. Now that we’ve added a statement to the datasource, we can query it to see if it’s really there:

if (ds.HasAssertion(homepage, FV_quality, value, true)) { alert('yep, it sure worked.'); }

We can pull a “target” value out given the source and a property:

var target = ds.GetTarget(homepage, FV_quality, true); target = target.QueryInterface(Components.interfaces.nsIRDFLiteral); // expect 'tres cool' alert('target is ' + target.Value + '!');

Or the “source”, given a property and a target:

var source = ds.GetSource(FV_quality, value, true); source = source.QueryInterface(Components.interfaces.nsIRDFResource); // expect 'http://people.netscape.com/waterson' alert('source is ' + source.Value + '!');

Acknowledgements

Dan Brickley and David McCusker both provided valuable inspiration and feedback. Axel Wienberg corrected several of my mathematical mis-steps, providing clear and precise verbiage for the way resources and URIs interact.

Notes

The intent is not to impress the reader with the author’s ability to generate pseudo-mathematical babble (frankly, I’m pretty self-concious about writing this given that I’m horrible at formalizing things), nor is the intent to confuse or cloud the issue. There have been several questions about “what is RDF, really?” (e.g., this USENET thread); this is a humble attempt to explain what is really happening in a somewhat formal (but hopefully accessable) way.
Recall that a one-to-one function is a function where if f(a) = f(b), then a = b So in the context of resources and URIs (where f maps a resource to a URI), given a specific resource’s URI, there can be no other resource with the same URI, so you’ll always be able to get back to the original resouce.
Mathematically speaking, if the representation function is f : R → URI and the parser function is g : URI → R, then for each r in R, g(f(r)) = r. Given a URI u, the canonicalized URI u_c is given by u_c = f(g(u)).
In reality, this is nothing more than a convenience utility that wraps the XPCOM service manager. The “name” of a named datasource is shorthand that is exanded into a ProgID; the ProgID is used to load a component that is assumed to support nsIRDFDataSource as an XPCOM service.
There are two other literal variants nsIRDFInt (for integer values) and nsIRDFDate (for date values). These are not “officially” part of the public API, and may undergo change as the dust settles around the XML and RDF Schema activity.

Dan Brickley writes (in this USENET post):

The representation of primative data typing within the RDF model was deferred by the W3C RDF Schema Working Group in anticipation of greater synergy with the XML Schema activity; we can probably anticipate the development of a syntax neutral set of primatives that will serve the needs of the RDF and XML communities, since RDF compatibility is a constraint on the XML Schema activity. For more details see the RDF home page, and in particular the June 1999 Web Architecture note Describing and Exchanging Data (Berners-Lee, Connolly, Swick) for a discussion of the issues involved here.