...making Linux just a little more fun!

<-- prev | next -->

RDF and the Semantic Web

By Jimmy O'Regan

RDF is a framework for defining metadata; data that describes data. It was developed by the W3C, based on work by Ramanathan V. Guha, and was originally used in Netscape Navigator 4.5's Smart Browsing ("What's related?") feature, and by Open Directory. RDF followed from work Guha had done earlier, both on the Cyc project, and on Apple's Hotsauce project.

RDF is currently being used in several areas: MusicBrainz uses it to provide playlist information about albums and songs, as a replacement for formats such as WinAmp's M3U files, or Microsoft's ASX files; Mozilla uses it for several purposes, e.g. recording the details of downloaded files (sample), remembering which action to perform for each MIME type, and so on. RDF is also used in some variants of RSS to provide details about syndicated articles.

RDF is a key component of the Semantic Web. In a May 2001 article in Scientific American, Tim Berners-Lee et al. described this concept:

The Semantic Web is an extension of the current web where information is given well-defined meaning, better enabling computers and people to work in cooperation.

The Semantic Web is the next "big deal" for the Web: the W3C is developing several technologies, based around RDF, which provide a machine-readable way of representing metadata. Various schema are already in existence: RSS 1.0, for summarising the contents of a website (though RSS is mostly used by news sites to provide a "feed" of the news they provide); OWL, which provides a basis for creating new schema while providing compatibility with similar schema; FOAF, which provides a way of describing people and the relationships between them. There are several others as well. Edd Dumbill has even proposed that file metadata in GNOME be represented in RDF.

A brief history of RDF

Cyc is a project to create an artificial intelligence with basic "common sense". It has a large database containing definitions such as: a tree is a kind of plant, a sycamore is a kind of tree, and so on. Cyc is then able to deduce from these definitions that a sycamore is a plant. Though work continues on a commercial version of Cyc, an open source version, OpenCyc, is also available.

Hotsauce was a browser plug-in created by Apple in 1996 that allowed users to use 3D navigation on a website that included an MCF definition. MCF, Meta Content Framework, was a text-based definition that used an RFC822-style format to describe a site. Following the examples I found on the net, I've created an example of my own that describes a simple site layout.

When Guha moved to Netscape, he met with Tim Bray, co-author of XML 1.0, and the two worked on creating an XML-based version of MCF, submitted to the W3C in May 1997. (I've created a simple example of this too).

With the addition of name spaces to XML, RDF began to take its current form. RDF uses XML name spaces to extend its vocabulary using RDF schema, though it's worth noting that although XML is the most common container format for RDF, other formats are in use. The W3C uses another format called N3 (Notation3), which bears greater resemblance to LISP-style languages, such as CycL and KIF.

RDF schema separate the RDF metadata - definitions of how new terms relate to each other - from the "normal" metadata. Instead of defining the relationships among items, such as "A is a child of B, and is a kind of C", as was done in MCF and MCF-XML, these are defined in separate schema, which may be referred to using XML name spaces and reused (though it is still possible to include schema within the document).

RDF Statements

Anyone interested in learning about RDF would do well to read the W3C's RDF Primer and/or "A no-nonsense guide to Semantic Web specs for XML people", but I'll give my best attempt to explain the concepts.

Each RDF statement is called a "triple", meaning it consists of three parts: subject, predicate, and object; the subject is either an RDF URI, or a blank node (I haven't seen a good explanation why these nodes are "blank", so I'll just refer to them as nodes). So, rephrasing the sentence "Linux Gazette is the name of the website at http://linuxgazette.net" looks like this:

http://linuxgazette.net has the property name with the value "Linux Gazette"

Predicates may only be RDF URIs - references to a schema definition of the property - while objects may be URIs, blank nodes, or literals. We can then express the previous triple as an N-Triple, which is the simplest valid form of RDF, like this:

# Note that each URI most be enclosed, and each triple must end with a '.'
<http://linuxgazette.net>  <http://example.org/sample#name> "Linux Gazette".
In XML, the general style is:
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">

<rdf:Description rdf:about="subject">
  <predicate>Object</predicate>
</rdf:Description>

</rdf:RDF>
so, using XML QNames, which are valid URIs, our example becomes:
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
        xmlns:example="http://example.org/sample#">
	
<rdf:Description rdf:about="http://linuxgazette.net">
 <example:name>Linux Gazette</example:name>
</rdf:Description>

</rdf:RDF>
RDF allows multiple properties to be abbreviated by including them within the same set of tag, so:
<rdf:Description rdf:about="http://linuxgazette.net">
  <example:name>Linux Gazette</example:name>
</rdf:Description>

<rdf:Description rdf:about="http://linuxgazette.net">
  <example:nickname>LG</example:nickname>
</rdf:Description>
is the same as:
<rdf:Description rdf:about="http://linuxgazette.net">
  <example:name>Linux Gazette</example:name>
  <example:nickname>LG</example:nickname>
</rdf:Description>
I've said that objects can be nodes: a quick example of this is:
<rdf:Description rdf:about="http://linuxgazette.net">
  <example:editor>
   <example:name>Ben Okopnik</example:name>
  </example:editor>
</rdf:Description>
These nodes can be given identifiers, and referred to by identifier, for clarity:
<rdf:Description rdf:about="http://linuxgazette.net">
  <example:editor rdf:nodeID="ben"/>
</rdf:Description>

<rdf:Description rdf:nodeID="ben"> <example:name>Ben Okopnik</example:name> </rdf:Description>

FOAF

FOAF stands for "Friend of a Friend". FOAF was designed as a way to represent information about people online, and the relationships between them. E-mail addresses are used as identifiers, though given how much spam we all seem to receive these days, it is also possible to use a sha1sum of an e-mail address, or both.

FOAF is being used by sites such as plink ("People Link") where people can meet each other, and sites like LiveJournal generate FOAF so that users can have their relationships automatically described. There is also SharedID, which is attempting to build a Single Sign-On Service based on FOAF.

Rather than talk about it in nauseating detail, I'll show a simple example, which should be relatively self-explanatory. (This is based on an example generated by FOAF-a-matic).

<rdf:RDF
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:foaf="http://xmlns.com/foaf/0.1/">
     
<foaf:Person rdf:nodeID="jimregan">
<foaf:name>Jimmy O'Regan</foaf:name>
<foaf:title>Mr.</foaf:title>
<foaf:givenname>Jimmy</foaf:givenname>
<foaf:family_name>O'Regan</foaf:family_name>
<foaf:nick>jimregan</foaf:nick>

<foaf:mbox rdf:resource="mailto:jimregan@o2.ie"/>
<foaf:mbox_sha1sum>9642c26da203ef143f884488d49194eb0747547c</foaf:mbox_sha1sum>

<foaf:homepage
rdf:resource="http://linuxgazette.net/authors/oregan.html"/>
<foaf:depiction
rdf:resource="http://linuxgazette.net/gx/authors/oregan.jpg"/>
<foaf:phone rdf:resource="tel:+353872441159"/>

<foaf:knows>
  <foaf:Person>
    <foaf:name>Mark Hogan</foaf:name>
    <foaf:mbox_sha1sum>7dbf56320b204be2e2bee161abed3ffc5825b590</foaf:mbox_sha1sum>
    </foaf:Person>
</foaf:knows>

</foaf:Person>

</rdf:RDF>

Most elements in FOAF can be added as many times as you like - many of us have several e-mail addresses, homepages, etc.

Anything you add to your own <foaf:Person> definition, you can add to any other node within a <foaf:knows> node.

<foaf:knows>
<foaf:Person>
<foaf:name>Mark Hogan</foaf:name>
<foaf:mbox_sha1sum>7dbf56320b204be2e2bee161abed3ffc5825b590</foaf:mbox_sha1sum>
<foaf:nick>Sprogzilla</foaf:nick>
<foaf:dateOfBirth>1997-06-29</foaf:dateOfBirth>
</foaf:Person>
</foaf:knows>

More Detail

There are quite a lot of things you can describe using FOAF, from the useful to the amusing (such as the <foaf:dnaChecksum> tag). Let's add a few of the more useful items, such as those generated by LiveJournal:

<foaf:workplaceHomepage rdf:resource="http://www.dewvalley.com"/>
<foaf:schoolHomepage rdf:resource="http://www.lit.ie"/>
<foaf:schoolHomepage rdf:resource="http://www.gairmscoilmhuirethurles.ie/"/>
<foaf:dateOfBirth>1979-07-31</foaf:dateOfBirth>

<!-- Generated by LiveJournal -->
<foaf:page>
  <foaf:Document rdf:about="http://www.livejournal.com/userinfo.bml?user=jimregan">
  <dc:title>LiveJournal.com Profile</dc:title>
    <dc:description>
      Full LiveJournal.com profile, including information such as interests 
      and bio.
    </dc:description>
  </foaf:Document>
</foaf:page>

<foaf:weblog rdf:resource="http://www.livejournal.com/users/jimregan/"/>

<foaf:interest dc:title="charles dickens" rdf:resource="http://www.livejournal.com/interests.bml?int=charles+dickens"/>
<foaf:interest dc:title="foaf" rdf:resource="http://www.livejournal.com/interests.bml?int=foaf" />
<foaf:interest dc:title="guitar" rdf:resource="http://www.livejournal.com/interests.bml?int=guitar" />
<foaf:interest dc:title="linux" rdf:resource="http://www.livejournal.com/interests.bml?int=linux" />
<foaf:interest dc:title="linux gazette" rdf:resource="http://www.livejournal.com/interests.bml?int=linux+gazette" />
<foaf:interest dc:title="neal stephenson" rdf:resource="http://www.livejournal.com/interests.bml?int=neal+stephenson" />
<foaf:interest dc:title="open source" rdf:resource="http://www.livejournal.com/interests.bml?int=open+source" />
<foaf:interest dc:title="terry pratchett" rdf:resource="http://www.livejournal.com/interests.bml?int=terry+pratchett" />
<foaf:interest dc:title="wikipedia" rdf:resource="http://www.livejournal.com/interests.bml?int=wikipedia" />

The <dc:title> and <dc:description> tags are from the Dublin Core schema, so we would have to change our namespace definitions to add the url:

<rdf:RDF
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:foaf="http://xmlns.com/foaf/0.1/"
     xmlns:dc="http://purl.org/dc/elements/1.1/">

There are also several ways of identifying various chat accounts you have online:

<foaf:jabberID>jimregan@jabber.org</foaf:jabberID>
<foaf:icqChatID>113804615</foaf:icqChatID>

and many other, miscellaneous details.

Now, that's all well and good, but what about the less useful stuff? FOAF has tags to add silly things like Myers-Briggs personality types, geekcode blocks, and .plan information:

<foaf:myersBriggs>ENTP</foaf:myersBriggs>
<foaf:geekCode>
GCS/IT/MU/TW/O d-(+)>--- s: a--(-) C++(+++)>++++$
UBLAC++(on)>++++$ P++(+++)>++++ L+++>++++$ E+(+++) W+++>$
N+ o K++ w(++)>-- O- M !V PS+(+++) PE(--) Y+>++ PGP-(+)>+++
t+() !5 X+ !R tv+ b++(+++)>++++$ DI++++ D++ G e h r-
y--**(++++)>+++++
</foaf:geekCode>
<foaf:plan>Get a better job</foaf:plan>

Other Schema

FOAF contains a lot of useful and amusing stuff, but there are many other schemas waiting to be used.

On the amusing side:

<rdf:RDF
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:foaf="http://xmlns.com/foaf/0.1/"
     xmlns:zodiac="http://www.ideaspace.net/users/wkearney/schema/astrology/0.1#"
     xmlns:quaff="http://purl.org/net/schemas/quaffing/">

<zodiac:Sign>Leo</zodiac:Sign>

<quaff:owesBeerTo rdf:nodeID="doyle"/>
<quaff:drankBeerWith rdf:nodeID="doyle"/>

<foaf:knows>
<foaf:Person rdf:nodeID="doyle">
<foaf:name>Jimmy Doyle</foaf:name>
<foaf:mbox_sha1sum>a0610d4a3086354b9ef1daf50d24de232115c965</foaf:mbox_sha1sum>
</foaf:Person>
</foaf:knows>

There are also more useful external schema - geo, to describe the location of a place; and srw to describe the languages you speak, read, write, and master.

<rdf:RDF
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:foaf="http://xmlns.com/foaf/0.1/"
     xmlns:geo="http://www.w3.org/2003/01/geo/wgs84_pos#"
     xmlns:srw="http://purl.org/net/inkel/rdf/schemas/lang/1.1#">


<foaf:based_near>
  <geo:Point>
    <geo:lat>52.6796</geo:lat>
    <geo:long>-7.8221</geo:long>
  </geo:Point>
</foaf:based_near>

<srw:masters>en</srw:masters>
<srw:reads>fr</srw:reads>
<srw:reads>ga</srw:reads>

One of the cool things you can do with geographic locations is to run FoafGeoGraph on your FOAF file. This looks for geographic locations for people, or for links to their FOAF files, from which it generates input for XPlanet which can be used to show your position relative to those of your friends. Nifty, eh?

You can also refer to other documents (though this is part of RDF). For example, there's a schema that represents iCalendar data in RDF. Using this, I can provide my work timetable and some birthdays from my foaf file (work.cal.rdf.txt and birthdays.cal.rdf.txt, generated with this Perl script, which I've lost the link for):

<rdfs:seeAlso rdf:resource="http://linuxgazette.net/misc/oregan/work.cal.rdf.txt" dc:title="Work schedule"/>
<rdfs:seeAlso rdf:resource="http://linuxgazette.net/misc/oregan/birthdays.cal.rdf.txt" dc:title="Birthdays"/>
I've already mentioned that it's possible to depict chat accounts with services such as Jabber, and MSN. It's also possible to represent other accounts. Chat accounts which don't have a specific tag available can still be represented, using the <foaf:OnlineChatAccount> tag, while other accounts use the <foaf:OnlineAccount> tag.
<foaf:holdsAccount>
  <foaf:OnlineChatAccount>
    <foaf:accountName>jimregan</foaf:accountName>
    <foaf:accountServiceHomepage dc:title="irc.freenode.net" rdf:resource="http://www.freenode.net/irc_servers.shtml" />
  </foaf:OnlineChatAccount>
</foaf:holdsAccount>

<foaf:holdsAccount>
  <foaf:OnlineAccount>
    <foaf:accountName>jimregan</foaf:accountName>    
    <foaf:accountServiceHomepage dc:title="Wikipedia" rdf:resource="http://en.wikipedia.org" />
  </foaf:OnlineAccount>
</foaf:holdsAccount>

Specifying a relationship

FOAF was originally designed to described the way in which people were related to each other, though this seems to have become unworkable. It's still possible to convey this information though:

<dcterms seems to be required by rel -->
<rdf:RDF
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:foaf="http://xmlns.com/foaf/0.1/"
     xmlns:dcterms="http://purl.org/dc/terms/"
     xmlns:rel="http://purl.org/vocab/relationship/">

<rel:parentOf rdf:nodeID="markinho"/>

<foaf:knows>
<foaf:Person rdf:nodeID="markinho">
<foaf:name>Mark Hogan</foaf:name>
<foaf:mbox_sha1sum>7dbf56320b204be2e2bee161abed3ffc5825b590</foaf:mbox_sha1sum>
<foaf:nick>Sprogzilla</foaf:nick>
<foaf:dateOfBirth>1997-06-29</foaf:dateOfBirth>
<rdfs:seeAlso rdf:resource="http://linuxgazette.net/105/misc/oregan/mark.rdf.txt"/>
</foaf:Person>
</foaf:knows>

Web of trust

Identity theft is becoming more and more of an issue on the 'net. To protect against this, there is a schema available which describes various aspects of a PGP signature, and to allow a FOAF file to be digitally signed.

(For more information on signing a FOAF document, see Edd Dumbill's page).

<rdf:RDF
     xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
     xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
     xmlns:foaf="http://xmlns.com/foaf/0.1/"
     xmlns:wot="http://xmlns.com/wot/0.1/">

<!-- Signature to verify this document -->
<foaf:Document rdf:about="">
  <wot:assurance rdf:resource="foaf-example.rdf.asc" />
</foaf:Document>

<!-- my signature -->
<wot:pubkeyAddress
rdf:resource="http://linuxgazette.net/104/misc/oregan/jimregan.asc"/>
<wot:hex_id> 773730F8</wot:hex_id>
<wot:length>1024</wot:length>
<wot:fingerprint>D675 279B 24AC BDC3 BAE9 BB7A 666C 30CA 7737 30F8</wot:fingerprint>

Co-Depiction.

Co-depiction is a hot topic in FOAF circles. The logical extension to saying you know someone is to prove it with a photo. Let's expand the first example to show a co-depiction of Mark and me:

<foaf:homepage rdf:resource="http://linuxgazette.net/authors/oregan.html"/>
<foaf:depiction rdf:resource="http://linuxgazette.net/gx/authors/oregan.jpg"/>
<foaf:depiction rdf:resource="http://linuxgazette.net/105/misc/oregan/IMAGE0004.jpg"/>
<foaf:phone rdf:resource="tel:+353872441159"/>

<foaf:knows>
<foaf:Person>
<foaf:name>Mark Hogan</foaf:name>
<foaf:depiction rdf:resource="http://linuxgazette.net/105/misc/oregan/IMAGE0004.jpg"/>
<foaf:mbox_sha1sum>7dbf56320b204be2e2bee161abed3ffc5825b590</foaf:mbox_sha1sum>
</foaf:Person>
</foaf:knows>

Nothing special, just the same <foaf:depiction> tag in two or more <foaf:Person> tags, which makes it easy for a computer to find where two people are co-depicted.

The next step in co-depiction is going to be the addition of SVG outlines, to show who is who in a photo, though unfortunately, no one has yet come up with a way of referring to the SVG data in a computer readable way. (SVG, as an XML-based format, is capable of including RDF metadata without difficulty).

Another problem is that SVG hasn't quite caught on yet, though the latest version of KDE supports it, and future versions of Mozilla should support it.

As it is, though, it can be useful to show humans who's who. Take this picture, which shows my son, his friend Adam, and me in the background. In the future, it will be possible to use an SVG image like this - which contains an invisible outline, with some additional metadata, to give machine readable information, similar to HTML's image maps.

At the moment, we can use an image like this, which contains the same outlines, but is filled with separate colours for each person depicted, and simply say that Mark is represented by the green area, Adam by the red, and me by the blue. (I use the svgdisplay program which comes with KSVG; those without an SVG viewer can look at this PNG to see what I'm talking about).

DOAP

DOAP stands for "Description of a Project". DOAP is an RDF vocabulary, influenced by FOAF, which was created by Edd Dumbill to provide details about an open source software project. At the time of writing, DOAP, as a project, is one week old, and already has impressive momentum.

DOAP is designed as a replacement for several different formats for project description, such as that used by Freshmeat or the GNU Free Software Directory, which complements FOAF. Here's an example which describes LG, generated using DOAP-a-matic (text version):

<?xml version="1.0" encoding="iso-8859-15"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
	 xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:doap="http://usefulinc.com/ns/doap#" 
	 xmlns:foaf="http://xmlns.com/foaf/0.1/" 
	 xmlns:admin="http://webns.net/mvcb/" 
	 xml:lang="en">
  <rdf:Description rdf:about="">
    <admin:generatorAgent rdf:resource="http://www.bonjourlesmouettes.org/doap-a-matic.php"/>
  </rdf:Description>
  <doap:Project>
    <doap:name>Linux Gazette</doap:name>
    <doap:shortname>LG</doap:shortname>
    <doap:homepage rdf:resource="http://linuxgazette.net"/>
    <doap:created>1995-07</doap:created>
    <doap:description>
      Linux Gazette is a monthly Linux webzine, dedicated to providing a free 
      magazine for the Linux community and making Linux just a little more fun!
    </doap:description>
    <doap:shortdesc>Linux Gazette is a monthly Linux webzine.</doap:shortdesc>
    <doap:mailing-list rdf:resource="http://linuxgazette.net/mailman/"/>
    <doap:programming-language>HTML</doap:programming-language>
    <doap:os>Linux</doap:os>
    <doap:license rdf:resource="http://linuxgazette.net/105/misc/oregan/OPL.rdf.txt"/>
    <doap:download-page rdf:resource="http://linuxgazette.net/ftpfiles/"/>
    <doap:download-mirror rdf:resource="http://www.tldp.org/LDP/LGNET/ftpfiles"/>
    <doap:maintainer>
      <foaf:Person>
	<foaf:name>Ben Okopnik</foaf:name>
	<foaf:nick>Ben</foaf:nick>
	<foaf:mbox_sha1sum>30073b332dac6fc812dcdd806ac1e9715ceac46a</foaf:mbox_sha1sum>
      </foaf:Person>
    </doap:maintainer>
  </doap:Project>
</rdf:RDF>

DOAP-a-matic doesn't have support for the OPL, the licence used by LG, but it's simple to define a new one (text version):

<rdf:RDF xmlns="http://web.resource.org/cc/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#">

<License rdf:about="http://linuxgazette.net/105/misc/oregan/OPL.rdf.txt">
  <rdfs:label>OPL</rdfs:label>
  <dc:title>Open Publication License</dc:title>
  <rdfs:seeAlso rdf:resource="http://www.opencontent.org/openpub/" />
</License>

</rdf:RDF>

Conclusion

RDF and the Semantic Web are exciting topics, and I hope I've piqued someone's interest with this article. Anyone wishing to find out more may simply Google for RDF, or try the list of bookmarks I accumulated while reading about RDF.

 


[BIO] Jimmy has been using computers from the tender age of seven, when his father inherited an Amstrad PCW8256. After a few brief flirtations with an Atari ST and numerous versions of DOS and Windows, Jimmy was introduced to Linux in 1998 and hasn't looked back.

Jimmy is a father of one, a techno-savvy seven year-old called Mark. In the spare time he enjoys outside of his personal circle of Hell, working in a factory, Jimmy likes to play guitar and edit Wikipedia.

Copyright © 2004, Jimmy O'Regan. Released under the Open Publication license

Published in Issue 105 of Linux Gazette, August 2004

<-- prev | next -->
Tux