XML Bits

Corrections

Last month, I wrote about RDF and the Semantic Web and included a sample FOAF file... that wasn't even valid XML. The worst part of it is, I had a fixed version ready just before publication, but it was too late for publication (partly because, in a sleep deprived haze, I managed to make a mess of the stuff that was there, and Ben had to go with an earlier version). The corrected version is here.

As well as that, Frank Manola (co-editor of the RDF Primer) wrote to make a correction:

I couldn't help commenting on your statement: "Each RDF statement is called a "triple", meaning it consists of three parts: subject, predicate, and object; the subject is either an RDF URI, or a blank node (I haven't seen a good explanation why these nodes are "blank", so I'll just refer to them as nodes). "
I guess the Primer didn't make the connection explicit enough. Under Figure 6 in the RDF Primer, the text says "Figure 6, which is a perfectly good RDF graph, uses a node without a URIref to stand for the concept of "John Smith's address". This blank node..." The intent was to indicate that a blank node was one without a URIref. In the figure, the blank node is an ellipse with no label, i.e., "blank". The term "nodes" without qualification refers to all those parts of graphs that aren't arcs, including URIrefs, literals, and blank nodes (see the discussion of the graph model in Section 2.2 of the Primer, or Section 3.1 of RDF Concepts and Abstract Syntax).

I also wrote about RSS last month. In the meantime, I found this page, which explains that there are 9 different versions of RSS. Maybe Atom is a better idea than I had thought!

XML and CSS

This is more of a "2 Cent Tip" than anything else, but I've found that a lot of people are unaware that XML can use CSS stylesheets, and that "modern" browsers are able to render this.

To include a stylesheet, just add a line like this to your XML file:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet href="xml-style.css"?>

After that, it's CSS as usual -- the only thing that's different is the tag names.

One nice thing about this is that if you add a stylesheet to your feed files, people can get an idea of what it'll look like in their feed reader if they choose to subscribe to it.

A drawback, especially with feeds, is that there are various ways of representing embedded HTML/XHTML. "Escaped" HTML in particular looks terrible. Because of the various ways Blogger allows people to post to their blogs, my Blogger Atom feed shows both how nice and how horrible this can look.

That aside, though, for an XML file you're generating, and have complete control over, stylesheets can make the difference (and can spare you from having to learn XSL!).

XML Fortune Cookies

Most Linux users know and love the fortune command, and use it regularly to get a dose of humour, wisdom, or silliness. Now, thanks to A.M. Kuchling's Quotation Exchange Language, it's possible to write your fortunes in XML.

Even if you're not the sort of person who wants to convert everything into XML, QEL has a lot going for it. For one thing, it can generate both standard fortune files and nice HTML. For another, it's politically correct regarding metadata (it'd be a terrible waste of XML if it wasn't, but that hasn't stopped anyone before).

A simple quote in QEL would look something like this:

<quotation>
<p>
So, throughout life, our worst weaknesses and meannesses are usually
committed for the sake of the people whom we most despise.
</p>
<author>Charles Dickens</author>
<source><cite>Great Expectations</cite></source>
</quotation>

With a little bit of wrapping, to make valid XML of it, it would look like this:

<?xml version="1.0" encoding="UTF-8"?>

<quotations>
<quotation>
<p>
So, throughout life, our worst weaknesses and meannesses are usually 
committed for the sake of the people whom we most despise.
</p>
<author>Charles Dickens</author>
<source><cite>Great Expectations</cite></source>
</quotation>
</quotations>

Using the qtformat tool, which is part of the quotation tools the creator of QEL provides here, we get this (qtformat -f for fortune format):

So, throughout life, our worst weaknesses and meannesses are usually
committed for the sake of the people whom we most despise.
    -- Charles Dickens, _Great Expectations_
%

As I am in the habit of writing about everything I spend more than 10 minutes doing, I, of course, have a prepared example. Again, this uses CSS, so if you have a browser capable of rendering XML, you can see my quotes file nicely laid out. (Otherwise, you'll see the usual XML gibberish -- sorry 'bout that!). Check it out here

A quick stumble through XSL

I spent longer than usual using Windows this month, and so started using a Windows-based feed reader. Called FeedReader. (Kudos for using the "Principle of least surprise" in their naming).

While most feed readers I've encountered support exporting their feed list as OPML, FeedReader doesn't. It does, however, keep its list of feeds in a simple XML file. So, after a quick stumble through the basics of XSL, I managed to whip together something that converted the list FeedReader keeps into OPML.

This is a simple example of a FeedReader file (normally kept in [windows mount point]/Documents\ and\ Settings/Owner/Application\ Data/FeedReader/):


<feeds>
      <item>
         <feedid>63219889590821</feedid>
         <title>Linux Gazette</title>
         <description>An e-zine dedicated to making Linux just a
little      
         bit more fun.
         Published the first day of every month.</description>
         <feedtype>http</feedtype>
         <archivesize>8888</archivesize>
         <alwaysshowlink>0</alwaysshowlink>
         <htmlurl>http://linuxgazette.net</htmlurl>
         <image>http://linuxgazette.net/gx/2004/newlogo-blank-100-gold2.jpg</image>
         <read>0</read>
         <unreadcount>17</unreadcount>
         <updateperiod>14</updateperiod>
         <LastModified>Fri, 02 Jul 2004 16:42:16
GMT</LastModified>
         <link>http://linuxgazette.net/lg.rss</link>
         <username></username>
         <password></password>
      </item>
</feeds>

I don't care about most of this information -- all I want are the contents of the <title>, <description>, <htmlurl>, and <link> tags, so I can have a simple XSL file (Text version):

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
     xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
     
<xsl:template match="/"> 
 <opml version="1.0">
   <head>
     <title>FeedReader Subscriptions</title>
   </head>
   <body>
     <xsl:apply-templates select="//item"/>
   </body>
 </opml>
</xsl:template>

<xsl:template match="item"> 
<xsl:variable name="title" select="title"/>
<xsl:variable name="desc" select="description"/>
<xsl:variable name="site" select="htmlurl"/>
<xsl:variable name="link" select="link"/>
  <outline title="{$title}" description="{$desc}" xmlUrl="{$link}"
htmlUrl="{$site}"/>
</xsl:template>

</xsl:stylesheet>

<xsl:template match="/"> matches the top-level element; I use this to output some header information (in this example, anything that doesn't start with xsl: is output).

xsl:apply-template tells the XSL processor to apply the second template on each matching line -- in this case, I have <xsl:apply-templates select="//item"/>, so it selects any second level element called <item>. If I wanted to be more accurate, I'd use <xsl:apply-templates select="/feeds/item"/>, so it would only match <item> tags within <feeds> tags.

The xsl:variable lines create a variable based on the contents of the subtags named in the select attribute, and then used to create my output.

If you want to find out more about XSL, LG had a good article in issue 89: Working with XSLT. XSL is a huge topic, and one I think I'll be using more of.

[BIO] Jimmy has been using computers from the tender age of seven, when his father inherited an Amstrad PCW8256. After a few brief flirtations with an Atari ST and numerous versions of DOS and Windows, Jimmy was introduced to Linux in 1998 and hasn't looked back.

Jimmy is a father of one, a techno-savvy seven year-old called Mark. In the spare time he enjoys outside of his personal circle of Hell, working in a factory, Jimmy likes to play guitar and edit Wikipedia.

Published in Issue 106 of Linux Gazette, September 2004

<-- prev | next -->