XSH, An XML Editing Shell 

http://www.xml.com/lpt/a/998 http://www.xml.com/pub/a/2002/07/10/kip.html

by Kip Hampton July 10, 2002

Introduction 

A few months ago we briefly examined some of the command line utilities available to users of Perl and XML. This month we will continue in that vein by looking at the 300-pound gorilla of Perl/XML command line tools, Petr Pajas' intriguing XML::XSH.

XML::XSH and the xsh executable provide a rich shell environment which makes performing common XML-related tasks as terse and straightforward as using a UNIX shells like bash or csh. Yes, that's right — an XML editing shell. As we will see, it's not as crazy as it seems.

xsh Basics 

Before we look at xsh's advanced tricks, let's get familiar with the environment it provides. We'll begin by starting the xsh shell:

[user@host user] xsh -i
-----------------------------------------------------
 xsh - XML Editing Shell version 0.9 (Revision: 1.6)
-----------------------------------------------------
...
xsh scratch:/>

The xsh shell starts in interactive mode, creating a new default scratch pad document, called new_document.xml with the ID scratch. The shell prompt takes the form of the current document's ID (scratch, in this case), followed by a colon, and then the current working context within that document expressed as an XPath location (/, in this case). In other words, we can tell from the prompt that we are at the root (/) level of the current XML document, whose ID is scratch.

We can open an existing XML document from the file system in order to figure out how to navigate within and between documents:

xsh scratch:/> open cams=files/camelids.xml
parsing files/camelids.xml
done.
xsh cams:/>

The open command opens the document camelids.xml from the directory files in the same directory in which we started the xsh shell, assigns it the ID of cams, and changes the working context to the root (/) of that document.

To list the elements contained in the current context we use the ls command.

xsh cams:/> ls
<?xml version="1.0" encoding="iso-8859-1"?>
<camelids>...</camelids>
Found 1 node(s).
xsh cams:/>

Since the current context is the abstract root of the document, we see the XML declaration and the sole top-level <camelids> element. If our document contained processing instructions or a Document Type Definition between the XML declaration and the top-level element, they would appear here, too.

Right through here is where is where things get interesting. Just like its UNIX shell cousins, many of xsh's commands accept paths as arguments, specifying the context in which that command is evaluated. The difference is that in xsh those paths are XPath expressions which provide access to the contents of the open XML documents, rather than file system paths that provide an interface to the files and directories of the mounted volumes.

So, for example, if we wanted list all of the <habitat> elements in our camelids document, we need only supply the appropriate XPath expression to the ls command:

xsh cams:/> ls //habitat

This yields:

<habitat>
  Bactrian camels' habitat consists mainly of Asia's deserts.
  The temperature ranges from -29 degrees Celsius in
  the winter to 38 degrees Celsius in the summer.
</habitat>
<habitat>
  Dromedary camels prefer desert conditions characterized by
  a long dry season and a short rainy season.
  Introduction of the dromedary into other climates has
  proven unsuccessful as the camel is sensitive to the
  cold and humidity (Nowak 1991).
</habitat>
<habitat>
  Llamas are found in deserts, mountainous areas, and
  grasslands.
</habitat>
<habitat>
  Guanacos inhabit grasslands and shrublands from sea
  level to 4,000m. Occasionally they winter in forests.
</habitat>
<habitat>
  Vicunas are found in semiarid rolling grasslands and
  plains at altitudes of 3,500-5,750 meters. These lands
  are covered with short and tough vegetation.  Due to
  their daily water demands, vicunas live in areas where
  water is readily accessible. Climate in the habitat is
  usually dry and cold. Nowak (1991), Grizmek (1990).
</habitat>
Found 5 node(s).
xsh cams:/>

Or, if we want our query to be more specific, we can use predicate expressions in our XPath statement. For example,

xsh cams:/> ls //habitat[ancestor::species/@name='Lama guanicoe']

to select just the Guanaco's habitat element.

Similarly, we can change the command evaluation context within the current document by giving an XPath expression to the cd command:

xsh cams:/> cd //species[@name='Camelus dromedarius']/natural-history
xsh cams:/camelids/species[2]/natural-history>

Which causes the context location in our shell prompt to change to reflect the new context to which we have navigated. Thus, commands not explicitly passed an absolute location path will be evaluated in the context of the <natural-history> element contained in the document's second <species> element (the one whose name attribute is equal to "Camelus dromedarius"). Thus, if we give the ls commadn with no path specified, we'll see the contents of the new context:

xsh cams:/camelids/species[2]/natural-history> ls
<natural-history>
       <food-habits>...</food-habits>
       <reproduction>...</reproduction>
       <behavior>...</behavior>
       <habitat>...</habitat>
</natural-history>
Found 1 node(s).
xsh cams:/camelids/species[2]/natural-history>

In addition, xsh provides a way to execute commands on any currently open document without changing the element context by prepending that document's ID and a colon to the XPath expression:

xsh cams:/camelids/species[2]/natural-history> cd /
xsh cams:/> open xmlnews=http://www.xml.com/xml/news.rss
parsing http://www.xml.com/xml/news.rss
done.
xsh xmlnews:/>
xsh xmlnews:/> ls cams:/camelids/species[3]/common-name
<common-name>Llama</common-name>
Found 1 node(s).
xsh xmlnews:/>

Notice that the context changed to the root of the newly opened RSS document once it is parsed into memory, but we still have easy access to the data contained in the camelids document by adding that document's ID (cams) and a colon to the front of the path.

Also note that the location of the file passed to the open command is not limited to files on the local machine; it can also be an HTTP or FTP URL, so long as a well-formed XML document is returned.

To see a list of all the currently open documents and their associated IDs, use the files command:

xsh xmlnews:/> files
cams = files/camelids.xml
xmlnews = http://www.xml.com/xml/news.rss
xsh xmlnews:/>

Closing an open document is as easy as passing its ID to the close command.

xsh xmlnews:/> close xmlnews
closing file http://www.xml.com/xml/news.rss
xsh :>

If we wanted to save a local copy of the xmlnews document before closing, we would use the saveas command:.

xsh xmlnews:/> saveas xmlnews files/xmldotcom_news.rss
xmlnews=new_document1.xml --> files/xmldotcom_news.rss (utf-8)
saved xmlnews=files/xmldotcom_news.rss as files/xmldotcom_news.rss
in utf-8 encoding
xsh :>

We've now reviewed xsh basics: we can start the shell, open, close, and navigate through contents of XML documents. If this is all there was to xsh, it would still be a winner as an XPath testbed and teaching tool (making it quite useful to users of XSLT and XPathScript, as well as XML::LibXML and the other Perl modules which offer an XPath interface). But xsh bills itself as an XML editing shell, and as we will see, it's that and a fair bit more.

Creating and Editing XML Documents 

We can begin by creating a new XML document:

xsh :> create mynews news-channels

This creates a new document with the ID mynews with the top-level element news-channels and changes the context to the root of the new document. Let's have look:

xsh mynews:/> ls
<?xml version="1.0" encoding="utf-8"?>
<news-channels/>
xsh mynews:/>

So far, so good. Now lets add an element to the news-channels element.

xsh mynews:/> cd news-channels
xsh mynews:/news-channels> add element channel into .

We use the add command to add a channel element into the current context element, which is represented by a period character, and we can verify the result by listing the current context:

xsh mynews:/news-channels> ls
<news-channels><channel/></news-channels>
Found 1 node(s).
xsh mynews:/news-channels>

Note that the first argument for the add command must be the type of node being added (element in this case, ).

Suppose we need to add a name attribute to the new channel element, as well as an rss-url child element.

xsh mynews:/news-channels> add attribute "name='seepan uploads'" into
     ./channel[1]
xsh mynews:/news-channels> add element rss-url into ./channel[1]

Next, we'll add the URL of the CPAN RSS file as text node of the rss-url element:

xsh mynews:/news-channels> add text "http://search.cpan.org/recent.rdf"
    into ./channel[1]/rss-url

Let's add another channel element:

xsh mynews:/news-channels> add element channel before //channel[1]
xsh mynews:/news-channels> add attribute "name='perl news'"
    into ./channel[1]
xsh mynews:/news-channels> add element rss-url into ./channel[1]
xsh mynews:/news-channels> add text "http://search.cpan.org/recent.rdf"
     into ./channel[1]/rss-url

We used the before location expression as the third argument to the add command, specifying the first channel element as the evaluation context. This inserts the new channel into the list as the preceding sibling of the previously created channel.

Again, we van verify this by listing all the channels in the document:

xsh mynews:/news-channels> ls //channel
<channel name="perl news"><rss-url>http://www.perl.com/pace/perlnews.rdf
     </rss-url></channel>
<channel name="seepan uploads"><rss-url>http://search.cpan.org/recent.rdf
     </rss-url></channel>
Found 2 node(s).
xsh mynews:/news-channels>

Careful readers will have noticed the "seepan" typo — we can fix this using map, which applies a block of Perl code to nodes returned by the subsequent XPath expression:

xsh mynews:/news-channels> map { $_ = 'cpan uploads' } //channel[2]/@name

Here's a view of the full contents of our new document, obtained by listing the document's root:

xsh mynews:/news-channels> ls /
<?xml version="1.0" encoding="utf-8"?>
<news-channels>
  <channel name="perl news">
    <rss-url>http://www.perl.com/pace/perlnews.rdf</rss-url>
  </channel>
  <channel name="cpan uploads">
    <rss-url>http://search.cpan.org/recent.rdf</rss-url>
  </channel>
</news-channels>
Found 1 node(s).

Our new document is a bit simplistic, to be sure. But our goal here is just to demonstrate the basics of editing documents with xsh. What we've learned so far can be applied to the most complex XML documents.

To finish up, let's save our new document to disk and quit the shell:

xsh mynews:/news-channels> saveas mynews files/perl_channels.xml
mynews=new_document2.xml --> files/perl_channels.xml (utf-8)
saved mynews=files/perl_channels.xml as files/perl_channels.xml
in utf-8 encoding
xsh mynews:/news-channels>
xsh mynews:/news-channels> exit
[user@host user]$

xsh Scripting 

No shell would be complete without the ability to perform automated or scripted tasks. As a final example, let's create an xsh script, which uses the data contained in the perl_channels.xml document we just created, to fetch all the current Perl news items from all the channels into a single XML document:

quiet;
open sources=files/perl_channels.xml;
create merge news-items;
$i = 0;
foreach sources://rss-url {
    $name = string(.);
    open $i=$name;
    xcopy $i://item into merge:/news-items;
    close $i;
    $i=$i+1;
};
close sources;
saveas merge files/headlines.xml;
close merge;

Looking closer at this script we see that it loads the perl_channels.xml document, iterates over all of its <rss-url> elements, fetches each document from the Web using the open command to grab the URL, and copies all of each channel's <item> elements into a new document. The new document is then saved to disk as headlines.xml before exiting.

Starting to see why an XML editing shell isn't such a crazy idea? I know I am.

Going Further 

I've offered a glimpse of the ease and power that xsh provides, but there are many more commands and features available. For example,

xslt doc1 some_stylesheet.xsl doc2

transforms the document with the ID doc1 using the XSLT stylesheet some_stylesheet.xsl and stores the result in new document with the ID doc2.

Similarly, the command

xupdate myxupdate doc1

alters the content of the doc1 document using the rules contained in the XUpdate document stored in myxupdate.

For a complete list of commands, type help command at the xsh prompt, or help commandname for detailed usage of a specific command.

Conclusions 

I was initially skeptical about the notion of an "XML editing shell". At first glance, it seemed to me to be pushing the file path/XPath metaphor a bit too far; surely it's little more than a technical curiosity? But I was very wrong, and I don't mind admitting it. XML::XSH is an astonishingly powerful tool which has quickly become a new tool in my daily XML work. I highly recommend it.

Resources 

  • Download the sample code.
  • XSH Project Page
  • The XPath Language Specification

XML.com Copyright (c) 1998-2006 O'Reilly Media, Inc

documented on: 2006.10.09