Implementing RDF Data Sources - Draft #1
by Robert John Churchill
April 6, 1998

Summary: This document describes how to extend Netscape's Resource Description Framework (RDF) by the addition of new data sources.

Background: RDF, in its most simple form, is a directed labeled graph. Imagine triplets of the following: nodes (keyed off of URLs such as "http://people.netscape.com/rjc/") with arc nodes (such as "Owner Name") pointing to other nodes (such as "Robert John Churchill"). Each triplet is an RDF "assertion".

Most people, including end users, are familiar with hierarchies... from file systems to Netscape's bookmark window, a hierarchy helps to indicate organization in a tree. By defining a standard vocabulary that includes definitions for parent and child relationships, it becomes possible to layer a strict hierarchical view on top of RDF's graph. An example of this is Netscape's HyperTree which enables different views of data (such as bookmarks, history, the local file system, etc.) from RDF.

Question: How does data make its way into RDF's graph?

Answer: For every type of data, there is an associated data source. For example, file and folders from the local file system are reflected into RDF's graph via a file system data source that responds to queries from RDF's engine. (In the 3/31/1998 Free Source release, look at ns/modules/rdf/src/ for code.)

Implementing a RDF Data Source:

One of the key structures is the RDF_TranslatorStruct:
struct RDF_TranslatorStruct {
RDFL rdf;
char *url; /* data source URL */
void *pdata; /* private data storage */

/* translator entry points */

hasAssertionProc hasAssertion;
assertProc assert;
unassertProc unassert;
getSlotValueProc getSlotValue;
getSlotValuesProc getSlotValues;
nextItemProc nextValue;
disposeCursorProc disposeCursor;
disposeResourceProc disposeResource;
destroyProc destroy;
arcLabelsInProc arcLabelsIn;
arcLabelsInProc arcLabelsOut;
};
typedef struct RDF_TranslatorStruct *RDFT;


Here are prototypes for the translator entry points:
typedef PRBool     (*hasAssertionProc)(RDFT r, RDF_Resource u, RDF_Resource s, void *v,
                                        RDF_ValueType type, PRBool tv);
typedef PRBool     (*assertProc)(RDFT r, RDF_Resource u, RDF_Resource  s, void *v,
                                        RDF_ValueType type, PRBool tv);
typedef PRBool     (*unassertProc)(RDFT r, RDF_Resource u, RDF_Resource s, void *v,
                                        RDF_ValueType type);
typedef void *     (*getSlotValueProc)(RDFT r, RDF_Resource u, RDF_Resource s,
                                        RDF_ValueType type,  PRBool inversep, PRBool tv);
typedef RDF_Cursor (*getSlotValuesProc)(RDFT r, RDF_Resource u, RDF_Resource s,
                                        RDF_ValueType type, PRBool inversep, PRBool tv);
typedef void *     (*nextItemProc)(RDFT r, RDF_Cursor c);
typedef RDF_Error  (*disposeCursorProc)(RDFT r, RDF_Cursor c);
typedef RDF_Error  (*disposeResourceProc)(RDFT r, RDF_Resource u);
typedef RDF_Error  (*destroyProc)(struct RDF_TranslatorStruct*);
typedef RDF_Cursor (*arcLabelsOutProc)(RDFT r, RDF_Resource u);
typedef RDF_Cursor (*arcLabelsInProc)(RDFT r, RDF_Resource u);

Dictionary:

A "RDFT" is a RDF_Translator reference. Its the interface between RDF and a data source.
A "RDF_Cursor" is a database concept which allows enumeration over the results of a query.
A "RDF_Resource" is a node in RDF's graph.
A "RDF_Type" is a RDF basic type (such as RDF_INT_TYPE, RDF_STRING_TYPE, RDF_RESOURCE_TYPE, etc.)

"u" represents the source node
"s" represents the arc node
"v" represents the destination node/value
"tv" represents the truth value of the assertion (true or false)

Every data source implements a public routine which, given a URL representing the data source, will allocate its own RDF_Translator, fills in the relevent parts of the structure and return it to RDF.

Most data sources will implement:

hasAssertionProc: determine whether an assertion exists
assertProc: make an assertion
unassertProc: remove an assertion
getSlotValueProc: get the destination node/value
getSlotValuesProc: get a cursor to enumerate over the destination node/value pairs
nextItemProc: given a cursor, return the next node/value pair
disposeCursorProc: dispose of a given cursor

Example Code:

As an example, let's implement a file system data source!

First, let's register our data source. (Note: Currently, RDF will need to be modified to call our MakeFSStore() routine as there is no discovery mechnism.)
/*
Allocate a RDF Translator structure and register our entry points
*/

RDFT
MakeFSStore (char* url)
{
RDFT ntr;

ntr = (RDFT) XP_ALLOW(sizeof(struct RDF_TranslatorStruct));
if (ntr != NULL)
{
ntr->url = copyString(url);

ntr->assert = NULL;
ntr->unassert = NULL;
ntr->getSlotValue = fsGetSlotValue;
ntr->getSlotValues = fsGetSlotValues;
ntr->hasAssertion = fsHasAssertion;
ntr->nextValue = fsNextValue;
ntr->disposeCursor = fsDisposeCursor;

... do any data source specific initialization here ...

/* this sample file system data source should "discover"
all available volumes and add them into the graph */
}
return (ntr);
}


Given a well-known or discovered starting point (such as "C:\" on a Windows machine), RDF might ask for all the contents of that directory by obtaining a cursor from the the data sources getSlotValuesProc routine and asking for nodes to be returned. By repeatedly calling the data source's nextItemProc (which takes the cursor as an argument), RDF will get back RDF_Resources for each file/folder in the directory. When the end of the list is reached (indicated by NULL being returned from nextItemProc), RDF will that dispose of the cursor by calling disposeCursorProc.

Here is pseudo-code for what those three routines might look like:

/*
Construct a cursor iff asked for the contents of a directory
*/

RDF_Cursor
fsGetSlotValues (RDFT rdf, RDF_Resource u, RDF_Resource s,
RDF_ValueType type, PRBool inversep, PRBool tv)
{
RDF_Cursor theCursor = NULL;

/* asking for the contents of a directory ? */
if ((((s == gCoreVocab->RDF_parent) && (inversep == PR_TRUE)) ||
((s == gCoreVocah->RDF_child) && (inversep == PR_FALSE))) &&
(type == RDF_RESOURCE_TYPE) && (fsUnitp(u)) && (tv == PR_TRUE))
{
/* allocate cursor, fill in needed contents, and return it */

theCursor = XP_ALLOC(sizeof());
if (theCursor != NULL)
{
theCursor->u = u;
theCursor->s = s;
theCursor->type = type;
theCursor->inversep = inversep;
theCursor->tv = tv;
theCursor->count = 0; /* might want to skip "." and ".." by adjusting count */

... store any private data in the cursor here ...
}
}
else if (... asking for enumeration over something else we can answer? ...)
{
... construct a cursor that can respond to this query ...
}

return (theCursor);
}

/*
    Obtain the next file/folder in a directory
*/

void *
fsNextValue (RDFT rdf, RDF_Cursor theCursor)
{
    RDF_Resource    *r = NULL;
    char            *fileURL;

    if ((theCursor != NULL) &&
        (theCursor->u != NULL) &&
        (theCursor->s == gCoreVocab->RDF_parent) &&
        (theCursor->type == RDF_RESOURCE_TYPE) &&
        (theCursor->inversep == PR_TRUE) &&
        (theCursor->tv == PR_TRUE))
    {
        /* given that theCursor contains the directory ("u")
            and an index ("count") into its contents */
... get nth item (theCursor->count) from a directory ...
... construct a properly encoded fileURL!!! ...

/* create a RDF_Resource for the graph */
r = RDF_Resource(NULL, fileURL, PR_TRUE);
if (r != NULL)
{
/* if its a container (i.e. a directory), mark it as such */
if (isContainer == PR_TRUE)
{
setContainerp(r, PR_TRUE);
}
}
return(r);
}
else if (... looking for something else ...)
{
/* do the right thing */
return(...the data asked for, if it exists...);
}

/* return NULL if don't know how to answer the query */
return(NULL);
}

/*
Free the cursor
*/

RDF_Error
fsDisposeCursor (RDFT rdf, RDF_Cursor theCursor)
{
if (theCursor != NULL)
{
... remember to also free any private data stored in the cursor ...

XP_FREE(theCursor);
}
return (0);
}

Note: The following two statements are the same relationship. Both should be checked for!

Query #1:  (s == gCoreVocab->RDF_parent) && (inversep == PR_TRUE)  && (type == RDF_RESOURCE_TYPE)
Query #2:  (s == gCoreVocah->RDF_child)  && (inversep == PR_FALSE) && (type == RDF_RESOURCE_TYPE)

Now that our pseudo file system data source can enumerate the contents of a directory and add nodes into RDF's graph, the next step is to implement the getSlotValueProc routine which RDF will call when it wants properties of a node (such as its name).

/*
Answer requests for node ("u")'s property ("s")

Note: always allocate new memory for the answer as RDF will "own" it
*/

void *
fsGetSlotValue (RDFT rdf, RDF_Resource u, RDF_Resource s, RDF_ValueType type, PRBool inversep, PRBool tv)
{
void *data = NULL;

if ((s == gCoreVocab->RDF_name) && (type == RDF_STRING_TYPE) &&
(inversep == PR_FALSE) && (tv == PR_TRUE) && (fsUnitp(u)))
{
/* asking for the name of a node */

data = XP_STRDUP(... node's name...);
}
else if (...looking for other properties? ...)
{
... set "data" to be the correct result for what's being asked for ...
}

return (data);
}

Next, we need to implement the hasAssertionProc routine as RDF will use it to determine whether a specified assertion exists or not.

/*
Determine whether a given assertion exists.

Note: This example code answers the question: Does file/folder "u" exist in folder "v" ?
*/

PRBool
fsHasAssertion (RDFT rdf, RDF_Resource u, RDF_Resource s, void* v, RDF_ValueType type, PRBool tv)
{
PRBool exists = PR_FALSE;

if ((((s == gCoreVocab->RDF_parent) && (inversep == PR_TRUE)) ||
((s == gCoreVocah->RDF_child) && (inversep == PR_FALSE))) &&
(type == RDF_RESOURCE_TYPE) && (tv == PR_TRUE) &&
(fsUnitp(u)) && (fsUnitp(v)))
{
/* check to see if the file/folder "u" exists in the directory "v" */

exists = ... PR_TRUE, if file/folder "u" exists in directory "v"...
}
else if (... asking about something else...)
{
exists = ... determine the answer ...
}

return(exists);
}

Important Note: A file system data source can be implemented in a synchronous manner, but the rules change a bit if the data source needs to operate in an asynchronous manner. If data streams in over the network, use RDF's sendNotifications2() routine to inform the graph of the new assertions (addtion, insertion, or deletion) to be made as results arrive.

Note that the file system data source example didn't implement the assertProc or unassertProc. What would implementing these routines mean? An assertProc could provide the ability of creating files or folders, and would allow the data source to implement private commands that it had reflected into the HyperTree's contextual menus. An unassertProc could provide the ability of deleting files or folders (making sure to confirm the action with the user before doing so, of course!).

The Next Level:

Other routines in the RDF_Translator that a data source might want to implement include:

disposeResourceProc: indicates that a resource is being disposed from RDF's graph
destroyProc: RDF is being shut down (dispose of any outstanding tasks, threads, etc that the data source created)
arcLabelsOutProc: get a cursor to enumerate over all arc nodes coming off of a node
arcLabelsInProc: get a cursor to enumerate over all arc nodes coming into a node

RDF's vocabulary:

RDF defines a standard vocabulary that the RDF engine as well as its clients can use. Important vocabulary items include:

RDF_name
The name of a node. For example, given a file URL of "file://C:/Program Files/Navigator", its RDF_name would be "Navigator"
RDF_parent
Indicates a parent relationship. For example, given a file URL of "file://C:/Program Files/Navigator", its RDF_parent would be "file://C|/Program Files/"
RDF_child
Indicates a child relationship. For example, a file URL of "file://C:/Program Files/Navigator" would be a RDF_child of "file://C:/Program Files/".
RDF_Command Used to provide reflection of data source specific commands into the HyperTree's contextual menus.
Note: many other vocabulary items exist. (In the 3/31/1998 Free Source release, look at ns/modules/rdf/include/vocab.h)

Data stores:

RDF have two primary data stores: the "local" store which offers persistence across instances of Navigator, and the "remote" store which does not. If neither of these is appropriate for a data source, it can also keep its own private list of assertions.

The Future:

It would be useful to implement a plugin mechanism (perhaps using Netscape's XP_COM) so that RDF data sources could be implemented much like Netscape plugins are implemented today instead of requiring data sources to be built-in to the core product.

Today, there are data sources for bookmarks, history, and the local file system. What else could be a data source? Well, how about:

and that's just the beginning. Interested? Code up a RDF data source today!