rdf: on making a content model from a graph
Last Updated: 04-December-1998
Overview
A good first place to start is, why? Specifically, why convert an RDF graph to a tree-like content model? The motivation for doing this is very pragmatic:
We need to convert an RDF graph into an NGLayout content model in order to display the RDF content in the NGLayout viewer, or any of the XPFE components that are based on the NGLayout content model (e.g., the tree control and the toolbars).
This document discusses some of the issues surrounding integration of the RDF graph with the NGLayout content model. Specifically, it discusses different generic mechanisms for converting an RDF graph into a tree structure. It ends with a "call for suggestions": recommendations as to what heuristics may be useful in constructing a content model from arbitrary RDF.
Why Converting RDF to a Content Model is Hard
The primary problem that we face when trying to convert an RDF graph into a content model is that the notion of "node-hood" differs between the two structures.
A node in the RDF graph is extremely simple: it has a label (or value). Each of the node's properties is represented by a labeled arc that leads out of the node. Each arc points to another node, which itself has a label and its own have property arcs ad infinitum.
A node in the NGLayout content model is more complicated. In the content tree, a node has a tag: this corresponds exactly to an RDF node's label. A node also has attributes, which are property/value pairs. Finally, a node may have children, which are themselves nodes. [N.B., this isn't quite complete: what about text nodes? They don't have a tag, do they? This'll crop up later on...]
As you'll see below, there is no a priori reason to believe that there should or could be a single direct transformation from an RDF graph to a tree content model. It could be done in a number of interesting ways, and ideally, should be done stylistically using XSL, for example.
That said, we need something today to get up and running. The good news is that this isn't a throw-away exercise: XSL works on a tree-like content model, not on raw RDF. The XSL style system will need a "bootstrap" content model from which to begin transformations: hopefully that's what we'll figure out how to do here.
Why We Need a Generic Algorithm
Certainly, one way to go about this endeavor would be to create a customized graph-to-content model conversion algorithm for each type of RDF graph that we encounter. As you may have guessed, I believe that this is the wrong thing to do.
Specifically, I believe that writing custom graph-to-content model algorithms is the wrong thing to do because it doesn't scale. If you have m data source graphs, and n content models, you'll need m x n graph-to-content model implementations! That's a lot of typing.
The First Cut: GRAPH-TO-TREE
- Make c's tag be u's label.
-
For each property arc p that leads out of u to another
RDF node v, either
- Add an attribute/value pair to c, such that p = v.
- Recursively construct a node c0 in the content tree such that c0 corresponds to v, and make c0 a direct descendant of c.
- Construct a node c0 in the content tree such that c0 corresponds to p. Make c0 a direct descendant of c. Then recursively construct a node c1 in the content tree such that c1 corresponds to v, and make c1 a direct descendant of c0.
I'd like to propose the following algorithm, called GRAPH-TO-TREE, as a starting point for recursively converting an RDF graph into an NGLayout content model.
GRAPH-TO-TREE. An individual node u in the RDF graph is converted to a node c in the content tree using the following steps.
In short, GRAPH-TO-TREE recursively descends the RDF graph, non-deterministically choosing how to transform each property into a corresponding structure in the content model.
It is worthwhile to note that this algorithm can be performed lazily. Because it can generate the content model from any individual node in the RDF graph "on demand", we need not traverse the entire graph to construct a content model. In fact, we can construct the content model incrementally as the content viewer needs it.
Example
To make this a bit clearer, let's look at what each of the three different transformations in step 2 would construct given the following RDF/XML document. Assume this is contained in the document x.rdf.
<?xml version="1.0"?> <RDF:RDF xmlns:RDF="http://www.w3.org/TR/WD-rdf-syntax" xmlns:NS="http://somecompany.com/RDF#"> <RDF:Description RDF:ID="#foo"> <NS:title>babulach</NS:title> <NS:pointer> <RDF:Description RDF:ID="#bar"> <NS:title>bilch</NS:title> </RDF:Description> </NS:pointer> </RDF:Description> </RDF:RDF>
The graph for the above fragment looks as follows:
(x.rdf#foo) | +-----[NS:title]--->("babulach") | +----[NS:pointer]-->(x.rdf#bar)--+ | ("bilch")<-----[NS:title]-----+
If we were to exclusively apply transformation (I) to the graph, the XML-serialized version of the content model would appear like this:
<foo title="Title#1" pointer="x.rdf#bar"/>
If we were to exclusively apply transformation (II) to the graph, the XML-serialized version of the content model would appear like this:
<foo> babulach <bar>bilch</bar> </foo>
Finally, if we were to exclusively apply transformation (III) to the graph, the XML-serialized version of the content model would appear like this:
<foo> <title><babulach/></title> <pointer> <bar> <title><bilch/></title> </bar> </pointer> </foo>
Which Transformation is "Right"?
The big headache here is that, in step 2 of GRAPH-TO-TREE, you have to choose between three alternatives. Not just on a node-by-node basis, but on a property-by-property basis!
Do 'em all!
None of the transformations are mutually exclusive, so one solution is to simply whack all (or, at least, more than one) of the transformations into the content model.
For example, combining (I) and (III) on our above example would yield the following XML-serialized representation:
<foo title="Title#1" pointer="x.rdf#bar"> <title><babulach/></title> <pointer> <bar> <title><bilch/></title> </bar> </pointer> </foo>
As you can see, this leads to redundancy, and potentially places a heavy burden on the down-stream consumer (e.g., the CSS style sheet writer) to exclude content that he or she isn't interested in.
Heuristics
There are several domain-independent heuristics to guess which transformation would be right.
RDF collections. If we encounter an RDF collection node (e.g., rdf:Bag), we can safely assume that we need not add the collection items as attribute/value pairs using transformation (I). We can also probably assume that, by simply enforcing the order of the collection items in the (implicitly-ordered) content model, that transformation (III) is excessive. So simply using (II) for the collection items should be sufficient.
For example, the following RDF/XML,
<RDF:Bag ID="#foo"> <RDF:li>Big Judy</RDF:li> <RDF:li>Bad Judy</RDF:li> <RDF:li>Boo Judy</RDF:li> </RDF:Bag>
yields the below graph:
(foo.rdf#foo) --[RDF:instanceOf]--> (RDF:bag) | +--[RDF:_1]-->("Big Judy") | +--[RDF:_2]-->("Bad Judy") | +--[RDF:_3]-->("Boo Judy")Using transformation (II), we'd get a content model like the following:
<foo> Big Judy Bad Judy Boo Judy </foo>
Unfortunately, this is a little bit unnatural: "Big Judy", "Bad Judy", and "Boo Judy" would all be individual text nodes in the content model. The serialization doesn't adequately illustrate this, as they'd all be "lumped" together into a single node were we to write this out as XML and then immediately read it back in. It works a bit better when the items in the collection are themselves resources.
Resources vs. literals. We can examine the node at the end of an arc and determine whether it is a resource (which may itself have properties), or a simple literal (which must be a leaf node). For a resource, we can ignore transformation (I), which would create an attribute/value pair. We can do this because it is fairly safe to make the assumption that the resource is a "first class object" that needs its own element in the content model.
A simple literal probably would lead to use either (I) (creating an attribute/value pair), or (III) (creating an explicit element with a simple value). Use of (II) is somewhat ambiguous, because it assigns the value of the literal to the parent tag: more than one literal property would create concatenated, non-sensical text; e.g.,
<RDF:Description RDF:ID="#foo" NS:prop1="hey" NS:prop2="garth"/>
Would yield:
<foo> hey garth </foo>
As you can see, "hey" and "garth" are indistinguishable.
Parameterized Hints
If we can parametrically provide some hints to GRAPH-TO-TREE, we may be able to constrain the problem.
Define a "Tree Property (or Properties)". We can explicitly tell GRAPH-TO-TREE which properties to apply transformations (II) or (III) to: call these the "tree properties". We will apply transformation (I) to all other properties: call these "normal" properties.
For example, to display "bookmarks", we might tell GRAPH-TO-TREE that BM:Item and BM:Folder are tree properties. When GRAPH-TO-TREE encounters either of these properties, it will use either transformation (II) or transformation (III) to create a child element. Any other property will simply be constructed as an attribute/value pair.
Unfortunately, we still need to choose between transformations (II) and (III). In general, it's not clear which of these is appropriate.
As a side note, Mozilla Classic used a highly constrained version of this solution: there was a single "tree property", RDF:child. This was used to represent all containment structure.
Again, the key here is that these "hints" can be provided as parameters to GRAPH-TO-TREE. Remember our goal: one graph-to-content model conversion algorithm.
Hack-Alert! What's In There Now
You can (right now, today) view an RDF graph in HTML-like NGLayout document by applying CSS to it. The way to do this is to create an RDF file on your local hard drive (netlib will be able to tell it's an RDF file by it's file extension, .rdf).
The content model is generated using a combination of some of the above mechanisms. Transform (I) is generously applied to any and all nodes, creating tons of attributes. For an RDF node u with property p, value v, a combination of transformation (II) and (III) are used. Specifically, transform (III) is always applied. If v is determined to be a literal (that is, v is not a URI), then transform (II) is also applied.
To make this a bit more concrete, we'll look at what this does to our original example. Belew is the resulting XML-serialized content model that will be build if the sample RDF is read into NGLayout.
<foo title="babulach" pointer="x.rdf#bar"> <title>babulach<babulach/></title> <pointer> <bar title="bilch"> <title>bilch<bilch/></title> </bar> </pointer> </foo>
You'll notice that differs distinctly from just applying (III) in that it actually creates a text node for "babulach" and "bilch". This is what actually makes something show up on the screen.
Demo
So you wanna see it for yourself, eh? Copy and past the sample code into a file, x.rdf. Fire up NGLayout and type in a file: URL to open the thing. You'll probably get something on your screen like:
-
bilchbabulach
So you're a CSS hack, right? Add a style sheet to it -- insert the following line immediately before the <RDF:RDF> tag:
<?xml-stylesheet href="x.css" type="text/css"?>
And create x.css, for example, as follows:
title { display: block; padding-left: 8pt; } pointer { color: red; }
You should now start to see something a little more interesting:
-
bilch
babulach
Yeah, I know. It's backwards. It's a bug. Anyway, hack it. Let me know what's cool.
Waiting for Guffman...er, XSL
As you can see, this is a sticky, nasty problem. I can't wait for XSL to arrive, because it makes the problem belong to the content provider, not the core RDF engine.
Until then, we're on our own. Suggestions and comments are welcome. Please post them to mozilla-rdf@mozilla.org, our little Mozilla RDF community.