Dan Connolly's tinkering lab notebook

## Map and Territory in RDF APIs

RDF specs and APIs have made a bit of a mess out of a couple pretty basic tools of math and computing: graphs and logic formulas. With the RDF next steps workshop coming up and Pat Hayes re-thinking RDF semantics Sandro thinking out loud about RDF2, I'd like us to think about RDF in more traditional terms. The scala programming language seems to be an interesting framework to explore how they relate to RDF.

The Feb 1999 RDF spec wasnt very clear about the map and the territory. It said that statements are made out of parts in the territory, rather than features on the map, which doesnt make very much sense. RDF APIs seem to inherit this confusion; e.g. from an RDF::Value class for ruby:

''

### Examples:

#### Checking if a value is a resource (blank node or URI reference)

value.resource

Blank nodes and URI references are parts of the map; resources are in the territory.

Likewise in Package org.jrdf.graph:

Resource A resource stands for either a Blank Node or a URI Reference.

The 2004 RDF specs take great pains to clarify these use/mention distinctions, but they also go on at great length.

Let's review Wikipedia on graphs:

In mathematics, a graph is an abstract representation of a set of objects where some pairs of the objects are connected by links. ...  The edges may be directed (asymmetric) or undirected (symmetric) ... and the edges are called directed edges or arcs; ... graphs which have labeled edges are called edge-labeled graphs.

With that in mind, in the swap-scala project, we summarize the RDF abstract syntax as an edge-labelled directed graph with just one or two wrinkles:

package org.w3.swap.rdftrait RDFGraphParts {  type Arc = (SubjectNode, Label, Node)  type Node  type Literal <: Node  type SubjectNode <: Node  type BlankNode <: SubjectNode  type Label <: SubjectNode}

The wrinkles are:

• Arcs can only start from BlankNodes or Labels, i.e. SubjectNodes
• Arcs labels may also appear as Nodes

We use another trait to relate concrete datatypes to these abstract types:

trait RDFNodeBuilder extends RDFGraphParts {  def uri(i: String): Label  type LanguageTag = Symbol  def plain(s: String, lang: Option[LanguageTag]): Literal  def typed(s: String, dt: String): Literal  def xmllit(content: scala.xml.NodeSeq): Literal}

This doesn't pin down what a Label is, but in any concrete implementation, you can build one from a String using the uri method. The RDFNodeBuilder trait is used to implement RDF/XML, RDFa, and turtle parsers that are agnostic to the concrete implementation of an RDF graph.

Now let's look at terms of first order logic:

The set of terms is inductively defined by the following rules:

1. Variables. Any variable is a term.
2. Functions. Any expression f(t1,...,tn) of n arguments (where each argument ti is a term and f is a function symbol of valence n) is a term.

This is represented straightforwardly in scala a la:

package org.w3.swap.logic1 /** * A Term is either a Variable or an FunctionTerm. */sealed abstract class Term { ... }class Variable extends Term { ...}abstract class FunctionTerm() extends Term {  def fun: Any  def args: List[Term]}

The core RDF doesn't cover all of first order logic; it corresponds fairly closely to the conjunctive query fragment:

The conjunctive queries are simply the fragment of first-order logic given by the set of formulae that can be constructed from atomic formulae using conjunction $\wedge$ and existential quantification $\exists$, but not using disjunction $\lor$, negation $\neg$, or universal quantification $\forall$.

We can then excerpt just the relevant parts of the definition of formulas:

The set of formulas is inductively defined by the following rules:

1. Predicate symbols. If P is an n-ary predicate symbol and t1, ..., tn are terms then P(t1,...,tn) is a formula.
2. Binary connectives. If φ and ψ are formulas, then (φ $\rightarrow$ ψ) is a formula. Similar rules apply to other binary logical connectives.
3. Quantifiers. If φ is a formula and x is a variable, then $\forall x \varphi$ and $\exists x \varphi$ are formulas.

Our scala representation follows straightforwardly:

package org.w3.swap.logic1ec sealed abstract class ECFormulacase class Exists(vars: Set[Variable], g: And) extends ECFormulasealed abstract class Ground extends ECFormulacase class And(fmlas: Seq[Atomic]) extends Groundcase class Atomic(rel: Symbol, args: List[Term]) extends Ground

Now that we have scala representations for RDF graphs and conjunctive query formulas, how do we relate them? This is the fun part:

package org.w3.swap.rdflogicimport swap.rdf.RDFNodeBuilderimport swap.logic1.{Term, FunctionTerm, Variable}import swap.logic1ec.{Exists, And, Atomic, ECProver, ECFormula}/** * RDF has only ground, 0-ary function terms. */abstract class Ground extends FunctionTerm {  override def fun = this  override def args = Nil}case class Name(n: String) extends Groundcase class Plain(s: String, lang: Option[Symbol]) extends Groundcase class Data(lex: String, dt: Name) extends Groundcase class XMLLit(content: scala.xml.NodeSeq) extends Ground/** * Implement RDF Nodes (except BlankNode) using FOL function terms */trait TermNode extends RDFNodeBuilder {  type Node = Term  type SubjectNode = Term  type Label = Name  def uri(i: String) = Name(i)  type Literal = Term  def plain(s: String, lang: Option[Symbol]) = Plain(s, lang)  def typed(s: String, dt: String): Literal = Data(s, Name(dt))  def xmllit(e: scala.xml.NodeSeq): Literal = XMLLit(e)}

The abstract RDFGraphBuilder node types are implemented as first order logic terms. For formulas, we use a "holds" predicate:

 object RDFLogic extends ... {  def atom(s: Term, p: Term, o: Term): Atomic = {    Atomic('holds, List(s, p, o))  }  def atom(arc: (Term, Term, Term)): Atomic = {    Atomic('holds, List(arc._1, arc._2, arc._3))  }}

Then all the semantic machinery up to simple entailment between RDF graphs just falls out of conjunctive query.

I haven't done RDFS Entailment yet; the plan is to do basic rules first (N3rules or RIF BLD) and then use that for RDFS, OWL2-RL, and the like.