rdf: back-end architecture
Contact: Chris Waterson (waterson@netscape.com)This document provides an overview of the RDF “core” implementation, that is, the implementation of the RDF model within Mozilla. It assumes some familiarity with the RDF model as described in the RDF Model and Syntax Specification. See also the Datasource HOWTO for details on writing a datasource.
Model and TerminologyInterfaces
Example
Model and Terminology
The Mozilla implementation of the RDF model is based on the RDF Model and Syntax Specification. This section attempts to provide a pseudo-mathematical overview of that model [1]. Although there are slight differences between the Mozilla RDF model and the model presented in the RDF Model and Syntax Specification, the belief is that, for all practical purposes, these differences should not affect the use of the RDF model. Any differences that lead to behavior that is functionally incorrect with respect to the Model and Syntax Specification should be logged as a bug (select RDF as the “Component”).
The Mozilla RDF “universe” consists of:
- A set of resources, R. There is a representation function, which is a one-to-one function that maps a resource in R to a string; this string is called the Universal Resource Identifier (or URI) of the resource, and should conform to RFC2396. There is also a parser function, which is an onto function that maps a URI to a resource.
- A set of literals, L. There is a one-to-one function that maps a literal in L to a string value.
- A set of statements, S. A statement s in S is a tuple R × R × { R ∪ L }.
Resources. Because the representation function that maps a resource to a URI is one-to-one, it is possible to identify each resource using its URI. That is, given a URI, it is possible to find a unique resource in R again. [2] Informally, a resource is some sort of “Internet object” that is uniquely identifiable; for example, a web page, an email account, or a USENET news article.
The representation function only gives one canonical URI for each
resource. However, as specified in RFC2396, a resource might have
many different names; e.g., a web page can be reached by
redirection, or a HTTP or filesystem resource’s names may
contain arbitrary amounts of “.
” and
“..
”. Therefore, we introduce a parser
function from URI to R which is defined on arbitrary
URIs. The parser function may abstract several URIs to the same
resource; the only guarantee is that a resource’s URI as
obtained through the representation function is always parsed back
to the resource itself. [3]
For efficiency’s sake, the parser function only does local computation, so e.g. redirection is not considered. We realize that this “optimization” does not adhere to the spirit of RFC2396, and may preclude some inferences because resources are not recognized as being “the same”. But it would clearly be impractical to contact a web server to resolve each HTTP URI, just for fear that it might be redirected!
The RDF Model and Syntax specification explicitly allows for anonymous resources. An anonymous resource is a resource without URI-addressable identity. The Mozilla implementation allows for such resources (e.g., in serialized RDF/XML), but will automatically assign a uniquely generated URI to such a resource.
Literals. As with resources, the function that maps a literal to a string value is one-to-one. Hence, it is possible to identify each literal using its string value. The RDF Model and Syntax Specification explicitly states that the mapping from literal to string value need not be one-to-one; we’ve chosen to make the mapping one-to-one to allow for efficient comparison of literals. Informally, a literal is a primitive value that has no "first-class identity"; for example, a string, a date, or a number.
Statements. A statement consists of a subject, a predicate, and an object. The subject must be a resource. The predicate must be a resource (strictly speaking, the predicate must be a resource that is a property; however, we do not differentiate between a resource that is a property and a resource that is not a property in the Mozilla implementation). The object may be either a resource or a literal. The terms statement and assertion are interchangeable.
The Mozilla RDF implementation factors the set of statements S into subsets. Each subset of statements is called a datasource. For example, there is a datasource that contains statements about mail messages and news articles; there is a datasource that contains statements about the current user’s browsing history; there is a datasource that contains statements about the current user’s bookmarks. Each datasource may be addressed individually. It is possible to query a datasource to determine whether a statement is present. Statements may be added to, removed from, or altered in a datasource.
The implementation allows datasources to be addressed collectively. That is, the statements from several different datasources may be combined into a composite datasource, in which they may be queried or altered “in the aggregate”.
A set of statements may be visualized as a directed, labelled graph, and much of the Mozilla RDF API is crafted with this visualization in mind. Specifically, the subject of each statement is a node (the source), the object of the statement is a node (the target), and the predicate is a directed arc from the subject node to the object node. In this parlance, a datasource -- which is simply a colletion of statements -- is a (possibly unconnected) graph. A composite datasource is the graph that is constructed by overlaying the subgraphs of several individual datasources.
Reification. The RDF Model and Syntax Specification discusses how a statement may itself be “reified”, and referred to as a resource. As of this writing, the Mozilla RDF model does not support automatic derivation of “meta statements” that arise from such reification.
Interfaces
Below are the primary interfaces that are used to interact with RDF.
nsIRDFSerivce
.
The RDF service is an utility interface that serves three primary
purposes. First, it is used to manage “named”
datasources. A named datasource is a singleton datasource that can
be acquired using simple URI-like name [4];
e.g., rdf:bookmarks
. Second, it is used to implement
the function that maps a URI to a resource (which is
the inverse of the one-to-one resource-to-URI function
described above). Third, it is used to implement the function
that maps a string value to a literal (similarly, the inverse of
the one-to-one literal-to-string function described above).
nsIRDFNode
.
This is an interface for a node in the RDF graph. A node must
either be an
nsIRDFResource
or an
nsIRDFLiteral
[5].
Objects that implement these interfaces must be acquired
from the nsIRDFService
.
nsIRDFDataSource
.
This is the interface that provides access to a collection of
“related statements” (or a
“subgraph”). This interface includes methods that
allow testing for the presence of a statement, enumerating the
statements contained in the collection, and adding and removing
statements to the set.
nsIRDFCompositeDataSource
.
This interface is derived from nsIRDFDataSource
. An
implementation of this interface will typically combine the statements
from several datasources together as a collective. Because the
nsIRDFCompositeDataSource
interface is derived from
nsIRDFDataSource
, it can be queried and modified just
like an individual data source.
nsIRDFObserver
.
This is an interface that an RDF client implements. The
interface allows a client to be notified when a change occurs to
the statements in a datasource.
nsIRDFContainer
.
This is an interface that allows for simplified access to an RDF
container object (a bag, sequence, or
alternation). This interface, in conjunction with
nsIRDFContainerUtils
provide straightforward, Java vector-esque methods for
manipulating and querying RDF container objects.
Example
This section provides some sample code that uses JavaScript and XPConnect to interact with the RDF engine, including:
- Acquiring the RDF service
- Acquiring a datasource
- Acquiring RDF resources
- Using the RDF resources to perform queries on and alter statements in the datasource
The code below illustrates this process.
Acquire the RDF service. To acquire the RDF service, use the
Components
object:
Create a datasource. Using the Components
object, we’ll create an in-memory datasource, which
is just a simple “scratch” datasource that will
remember the statements we add to it:
Acquire RDF nodes. Using the RDF service, you can acquire individual RDF resource and literal objects. These are what you use to perform a query on the RDF database.
Use the RDF nodes to add statements to the datasource. And
finally, we “do the deed” using
the Assert
method of the
nsIRDFDataSource
interface:
Query the datasource. Now that we’ve added a statement to the datasource, we can query it to see if it’s really there:
We can pull a “target” value out given the source and a property:
Or the “source”, given a property and a target:
Acknowledgements
Dan Brickley and David McCusker both provided valuable inspiration and feedback. Axel Wienberg corrected several of my mathematical mis-steps, providing clear and precise verbiage for the way resources and URIs interact.
Notes
-
The intent is not to impress the reader with the author’s ability to generate pseudo-mathematical babble (frankly, I’m pretty self-concious about writing this given that I’m horrible at formalizing things), nor is the intent to confuse or cloud the issue. There have been several questions about “what is RDF, really?” (e.g., this USENET thread); this is a humble attempt to explain what is really happening in a somewhat formal (but hopefully accessable) way.
-
Recall that a one-to-one function is a function where if f(a) = f(b), then a = b So in the context of resources and URIs (where f maps a resource to a URI), given a specific resource’s URI, there can be no other resource with the same URI, so you’ll always be able to get back to the original resouce.
-
Mathematically speaking, if the representation function is f : R → URI and the parser function is g : URI → R, then for each r in R, g(f(r)) = r. Given a URI u, the canonicalized URI uc is given by uc = f(g(u)).
-
In reality, this is nothing more than a convenience utility that wraps the XPCOM service manager. The “name” of a named datasource is shorthand that is exanded into a ProgID; the ProgID is used to load a component that is assumed to support
nsIRDFDataSource
as an XPCOM service. -
There are two other literal variants
nsIRDFInt
(for integer values) andnsIRDFDate
(for date values). These are not “officially” part of the public API, and may undergo change as the dust settles around the XML and RDF Schema activity.Dan Brickley writes (in this USENET post):
The representation of primative data typing within the RDF model was deferred by the W3C RDF Schema Working Group in anticipation of greater synergy with the XML Schema activity; we can probably anticipate the development of a syntax neutral set of primatives that will serve the needs of the RDF and XML communities, since RDF compatibility is a constraint on the XML Schema activity. For more details see the RDF home page, and in particular the June 1999 Web Architecture note Describing and Exchanging Data (Berners-Lee, Connolly, Swick) for a discussion of the issues involved here.