Sunday, August 1, 1999

Parsing XML with LotusScript and Microsoft’s XML object

INTERNET TECHNOLOGIES

By Mark Lawson

These days, XML is a hot topic and, almost daily, people are finding interesting new ways to apply the technology. It seems, however, that as soon as you want to experiment with this format you find yourself steered firmly towards Java.

I wanted to use LotusScript, if possible, to include some XML headline channels on our Web site. This article, while not intended as an in-depth tutorial on XML (Extensible Markup Language) or XSL (Extensible Stylesheet Language), shows how, with some LotusScript and the free Microsoft XML object (MSXML), you can download, parse, and format XML documents easily, either by tree-walking or using XSL patterns in a style-sheet.

XML headline structure

Headline files are published as channels for inclusion in 'personalized' sites such as My.Netscape or My.Userland and typically contain news items with a link to the full story. You can, however, get the files themselves; a great starting point for exploring XML since most of them use the same de-facto format, Netscape's RDF. Here is a partial example from a recent DominoPower XML file (these are updated daily):

<rdf:RDF>
<?xml version="1.0" ?>

<rdf:RDF xmlns:rdf="http:\//www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns="http:\//my.netscape.com/rdf/simple/0.9/">
  <channel>
    <title>DominoPower News Center</title>
    <link>http:\//www.dominopower.com/news/news.html</link>
    <description>DominoPower News Center</description>
  </channel>

  <image>
    <title>DominoPower Magazine</title>
    <url>http:\//www.dominopower.com/images/DominoPowerRSSLogo.gif</url>
    <link>http:\//www.dominopower.com/</link>
  </image>

  <item>
    <title>Battle against piracy</title>
    <link>http:\//www.dominopower.com/news/news.html</link>
  </item>

  <item>
    <title>SmartSuite review</title>
    <link>http:\//www.dominopower.com/news/news.html</link>
  </item>

As you can see, headlines have the typical XML tree structure; just think &quot;categorized view&quot;. A Notes view can be formed from the root (rdf:RDF in this case) and each twisty can be formed from a child node which, in turn, can contain other twisties/child nodes and so on. At the end of each branch is the column value. Given this format, what I needed was an agent that could download this sort of file every few hours, parse out the headlines, and create an HTML table I could insert into our site. First, though, I needed MSXML on my machine.

Installing and loading MSXML

Sadly, I haven't found a way of just installing the MSXML component by itself. MSXML only gets installed when you install Internet Explorer 5 or there is an earlier version in Internet Explorer 4. It also doesn't come with any documentation but there are books available (see the Product Availability and Resources section at the end of this article). There's also a comprehensive online tutorial on the Microsoft site. Once installed (you'll probably need to fully install IE5), try running the following agent: