Xsh

XSH - XML Editing Shell

XSH - XML Editing Shell http://xsh.sourceforge.net/ http://sourceforge.net/projects/xsh

XSH Documentation http://xsh.sourceforge.net/documentation.html

install XSH from CPAN

Ref:

XSH Requirements http://xsh.sourceforge.net/requirements.html

XSH Download http://xsh.sourceforge.net/download.html

required modules from CPAN:

Parse-RecDescent
Term-ReadLine-Perl
Text-Balanced
Text-Iconv
XML-Filter-BufferText
XML-Filter-DOMFilter
XML-LibXML
XML-LibXML-Common
XML-LibXML-Iterator
XML-LibXML-XPathContext
XML-LibXSLT
XML-NamespaceSupport
XML-NodeFilter
XML-SAX
XML-SAX-Base
XML-SAX-Writer
XML-XUpdate-LibXML

Available Debian packages:

libparse-recdescent-perl - Generates recursive-descent parsers in Perl
libterm-readline-perl-perl - Perl implementation of Readline libraries
libtext-iconv-perl - converts between character sets in Perl
libxml-filter-buffertext-perl - Perl module for putting all characters into a single event
libxml-libxml-common-perl - Perl module for common routines & constants for XML::LibXML et al
libxml-libxml-perl - Perl module for using the GNOME libxml2 library
libxml-libxslt-perl - Perl module for using the GNOME libxslt library
libxml-namespacesupport-perl - Perl module for supporting simple generic namespaces
libxml-nodefilter-perl - Perl module for a generic node-filter class for DOM traversal
libxml-sax-expat-incremental-perl - XML::SAX::Expat subclass for non-blocking (incremental) parsing
libxml-sax-expat-perl - Perl module for a SAX2 driver for Expat (XML::Parser)
libxml-sax-machines-perl - Perl modules for managing collections of SAX processors
libxml-sax-perl - Perl module for using and building Perl SAX2 XML processors
libxml-sax-writer-perl - Perl module for a SAX2 XML writer

Installation log.

$ sudo aptitude install libparse-recdescent-perl libterm-readline-perl-perl libtext-iconv-perl libxml-filter-buffertext-perl libxml-libxml-common-perl libxml-libxml-perl libxml-libxslt-perl libxml-namespacesupport-perl libxml-nodefilter-perl libxml-sax-expat-incremental-perl libxml-sax-expat-perl libxml-sax-machines-perl libxml-sax-perl libxml-sax-writer-perl
The following NEW packages will be installed:
  libxml-filter-buffertext-perl libxml-libxslt-perl libxml-nodefilter-perl
  libxml-sax-expat-incremental-perl libxml-sax-expat-perl
  libxml-sax-machines-perl libxml-sax-writer-perl
0 packages upgraded, 7 newly installed, 0 to remove and 396 not upgraded.
Need to get 182kB of archives. After unpacking 791kB will be used.

% perl -MCPAN -e "install XML::XSH2"
Checking if your kit is complete...
Looks good
 Warning: prerequisite XML::LibXML::Iterator 0 not found.
 Warning: prerequisite XML::XUpdate::LibXML 0.4.0 not found.

Running install for module XML::LibXML::Iterator
OK
  /usr/bin/make test -- OK
Running make install
Installing /usr/local/share/perl/5.8.8/XML/LibXML/Iterator.pm
Installing /usr/local/share/perl/5.8.8/XML/LibXML/NodeList/Iterator.pm
Installing /usr/local/man/man3/XML::LibXML::NodeList::Iterator.3pm
Installing /usr/local/man/man3/XML::LibXML::Iterator.3pm

Running install for module XML::XUpdate::LibXML
ok 1
  /usr/bin/make test -- OK
Running make install
Manifying blib/man1/xupdate.1p
Installing /usr/local/share/perl/5.8.8/XML/XUpdate/LibXML.pm
Installing /usr/local/share/perl/5.8.8/XML/Normalize/LibXML.pm
Installing /usr/local/man/man1/xupdate.1p
Installing /usr/local/man/man3/XML::Normalize::LibXML.3pm
Installing /usr/local/man/man3/XML::XUpdate::LibXML.3pm
Installing /usr/local/bin/xupdate
Writing /usr/local/lib/perl/5.8.8/auto/XML/XUpdate/LibXML/.packlist

Running make for P/PA/PAJAS/XML-XSH2-2.0.2.tar.gz
  Is already unwrapped into directory /vars/cpan/build/XML-XSH2-2.0.2

Failed Test Stat Wstat Total Fail  Failed  List of Failed
t/06wrap.t                56    6  10.71%  32 35 37 40 42 44
Failed 1/8 test scripts, 87.50% okay. 6/364 subtests failed, 98.35% okay.
make: *** [test_dynamic] Error 255
  /usr/bin/make test -- NOT OK
Running make install
  make test had returned bad status, won't install without force

% perl -MCPAN -e "force install XML::XSH2"
make: *** [test_dynamic] Error 255
  /usr/bin/make test -- NOT OK
Running make install
  make test had returned bad status, won't install without force

% perl -MCPAN -e shell
cpan> force install XML::XSH2
make: *** [test_dynamic] Error 255
  /usr/bin/make test -- NOT OK
Running make install
Installing /usr/local/share/perl/5.8.8/XML/XSH2.pm
Installing /usr/local/share/perl/5.8.8/XML/XSH2.pod
Installing /usr/local/share/perl/5.8.8/XML/XSH2/LibXMLCompat.pm
[...]
Installing /usr/local/man/man1/xsh.1p
Installing /usr/local/man/man3/XSH2.3
Installing /usr/local/bin/xsh
Writing /usr/local/lib/perl/5.8.8/auto/XML/XSH2/.packlist
Appending installation info to /usr/local/lib/perl/5.8.8/perllocal.pod
  /usr/bin/make install UNINST=1 -- OK

documented on: 2006.10.29

Using xsh to scrape web pages

http://www.stonehenge.com/merlyn/LinuxMag/col55.html

xsh uses XML::LibXML to parse an XML file into an internal document structure, which allows us to manipulate using a mix of Perl syntax and other control structures specifically designed for navigating a document tree.

One activity I find myself frequently attempting is extracting bits of useful information from existing web pages that change over some time period. In an ideal world, everything I would want would be provided via some RSS feed or “wholesale” SOAP web service, but in the world I still live in, I usually end up parsing the “retail” HTML intended for browser views.

Although HTML isn't XML, they both have common roots, and I've been experimenting lately with using XML::LibXML to parse HTML. (See this column, June 2003, for example.) The advantage to using an XML parser to handle the angley-bracketed text is that once the text is parsed, we can use DOM and XPath operations on the result, sometimes resulting in greater speed or flexibility over the traditional data structures built using c<HTML::Parser>.

Recently, I've started playing with the evolving xsh language, which can be briefly described as an XML manipulation shell. xsh uses XML::LibXML to parse an XML file into an internal document structure. Once the document is built, we can manipulate it using a mix of Perl syntax and other control structures specifically designed for navigating a document tree. Following the Unix filesystem metaphor, we can use xsh commands of “cd” and “pwd” much like their Unix counterpart, where the “current directory” is a document node of interest.

Many operations in xsh are specified using XPath, so a working knowledge of XPath is very helpful. For example,

cd /;

focuses the “current node” at the top of the document tree, while:

pwd;

shows that we are currently located there. To get “inside” the root node of the document, we can use:

cd /*;

which finds the one matching node and sets that as the current focus, which is undone by:

cd ..;

The metaphor works nicely because XPath's notation mimics the Unix filesystem for many of common operations.

Once we have a node of interest, we can display its location using locate, as in:

locate .; # same as "pwd;"

but I use the locate command more often to print all nodes that match a given XPath expression, as in:

locate //a[@href];

which finds and shows the path to all a nodes that have an href attribute. To display the href directly, I can use a similar expression:

locate //a/@href;

which sets the context to the attribute itself. If I merely wanted a count, I could replace this with:

count //a/@href;

Like other traditional shells, xsh can be used both as a programming scripting language, and directly interactively. I've found that the best way to write an xsh program is to have an editor open on my emerging program, and another window running an interactive session on a sample document similar to the one I'll eventually be parsing. I can start this with:

xsh -I path/to/some/file.xml

and I get the xsh prompt. In interactive mode, an implied semicolon at the end of each line makes entering statements easier.

The real power of xsh is that it can be intertwined with Perl code in your script. At any point, you can invoke Perl with:

eval { ... block of perl code here ... };

The scalar variables are shared between the Perl code and the xsh code, simplifying the integration. And, you can call back to xsh from inside the Perl code using the xsh() function.

While I won't have room to teach all of xsh in this article, I suggest you surf on over to http://xsh.sourceforge.net for further information. In the meanwhile, let me introduce xsh a little more by way of an example.

I noticed the other day that http://www.oreilly.com/animals.html has a list of the O'Reilly “animal” covers, organized by cover title, but as I was scanning through the list, I noticed that a few of the animals were used for more than one title. I was curious about how many animals were reused, so I decided to write a program to extract the information. I started by invoking an xsh shell, entering:

open HTML a = http://www.oreilly.com/animals.html

The HTML flag here tells xsh to use the HTML-string-parsing interfaces, rather than the XML-string-parsing interfaces. Additionally, because XML::LibXML uses Gnome's libxml2, we don't need to use LWP or some external program to fetch the URL.

Once I had the document in memory, I used some simple XPath queries to determine the structure of the web page. For example, I found all tables that weren't nested (didn't contain another table) with:

locate //table[not(.//table)]

Certainly I could have stared at the raw HTML (or even a prettied version) for a long time to find the same information. With xsh, I was simply “exploring a document” using XPath.

After a bit of experimentation, I ended up with the program shown in [listing one, below]. Note that the entire program consists of a use statement in line 3, followed by a call to xsh() in line 4 of a here-doc-string of the remaining text lines. I really wanted to do something like:

#!/usr/bin/env xsh
... xsh script here ...

but unfortunately, the version of xsh as I write this requires a -l flag to load a script. I'm told that a future version of xsh will work as needed.

The xsh script starts in line 6, with a command to enable recovering mode. Even though XML::LibXML deals relatively well with HTML, many web pages (including the one we're parsing) contain broken entity references. A hint to web page programmers: the text

<a href="/some/place?fred=flintstone&barney=rubble">
click here!</a>

is broken. You need to escape that ampersand as &. Just because nearly every browser error-corrects for this is no excuse to write bad HTML!

Line 7 turns on “quiet” mode, which prevents the open in line 8 from announcing its success.

An xsh script can have many documents open at once. XPath expressions can refer to nodes in other documents by prefixing the document name and a colon in front of the traditional XPath expression.

Lines 9 through 18 form a two-level nested foreach loop structure. The foreach beginning in line 9 puts a traditional Perl expression inside curly-braces. Each iteration of this resulting value will be placed into $__ (yes, with two underscores for reasons I don't completely understand).

The inner foreach loop uses an XPath expression to define a list of nodes. The “current node” is set to each matching node, and the block of code is then executed. Note that we're looking for all tables that don't contain a nested table, and which have a first row that has a first or second table cell that contains Book Title. The value of $__ is interpolated directly from the variable set in the outer loop. If I were a bit more clever, I might have been able to do without the nested loops, but I didn't care at this point, since the program worked. The final part of the XPath expression finds all table rows after the first row, which is where the real data is found.

Line 13 contains a debugging step… I wanted to see where these rows were actually found as I was developing the program. The xsh script can include Perl-style pound-sign comments, so this is commented out.

Line 14 assigns the string value of the last table cell in the row currently being examined to a scalar $cover. This variable is visible both to further xsh steps as well as included Perl code. I observed that the last cell always contained the animal (or other) cover, hence the capture. Similarly, $subject is set in line 15 to be the string value of the penultimate table cell. The values are automatically de-entitized, so I end up with a plain string here.

Line 16 breaks out into Perl to access a traditional Perl hash named %cover. The keys are the cover animals, while the corresponding values are array references listing all books with that particular animal.

Note the ease with which Perl and xsh code co-exist to produce the result. And, while this could have been written using a more traditional straight invocation of XML::LibXML, I think we're ahead by about five lines of code already in the first 15 lines here.

Now for the fun part. I want to create a new XML output that looks something like this:

<?xml version="1.0" encoding="utf-8"?>
<root>
  ...
  <cover>
    <animal>Lions</animal>
    <book>Java &amp; XML</book>
  </cover>
  <cover>
    <animal>Llama</animal>
    <book>Learning Perl</book>
  </cover>
  <cover>
    <animal>Llama &amp; camel</animal>
    <book>Perl Pocket Reference</book>
  </cover>
  <cover>
    <animal>Locking pliers</animal>
    <book>Google Hacks</book>
    <book>Google Pocket Guide</book>
  </cover>
  ...
</root>

I can do this by walking through the newly created hash and using traditional print operations, but it's more fun to just use xsh. Line 19 creates a new document t1 and gives it a root node of root.

Line 20 uses a Perl-style foreach expression to get the sorted keys of %cover. Note that these animals will be in $__, not $_, and I traced this in line 21 while I was debugging the program.

Line 22 adds a new cover element at the end of the root element. These new nodes are always added last, and line 23 moves our current focus inside this new element.

Line 24 and 25 create the animal node within the most recent cover node. The value of $__ is automatically re-entitized to be valid XML.

Lines 26 through 30 walk through the titles for the given animal cover, again using a Perl-style foreach loop. The book titles appear in $__, traced in line 27 during debugging.

Each new book element is created at the end of the current node in line 28, and the title text is inserted into this node in line 29. Note that by proper use of the current context node, the various pieces of animal and covers using that animal are brought together cleanly and simply.

We now have a new document which looks just like what we want to display, and we'll do that in lines 32 and 33. The quiet mode is again enforced, although it hasn't changed since line 7, but I consider this just some defensive programming on my part. Line 33 dumps the XML text to standard output in a nice indented fashion, by default.

As more data shows up on the web both in HTML and XML forms, I can see how this kind of scripting will be helpful to me. Of course, for specialized XML such as RSS or SOAP, other modules will do the job with fewer steps, but nothing stops me from using those modules in xsh programs as well. And xsh also connects with XML::LibXSLT for XSLT processing. Could xsh be the next ASP-like language? Perhaps, with a little more work on caching the parsed tree. Until next time, enjoy!

Example: Listings

00001: #!/usr/bin/perl
00002:
00003: use XML::XSH;
00004: xsh <<'END_XSH';
00005:
00006: recovering 1; # for broken entity recovery (a frequent HTML problem)
00007: quiet; # avoid tracing of open
00008: open HTML animals = "http://www.oreilly.com/animals.html[]";;
00009: foreach {1..2} {
00010:   foreach //table[not(.//table)
00011:                   and contains(tr[1]/td[$__], "Book Title")
00012:                  ]/tr[position() > 1] {
00013:   # pwd;
00014:   $cover = string(td[last()]);
00015:   $subject = string(td[last() - 1]);
00016:   eval { push @{$cover{$cover}}, $subject; }
00017:   }
00018: }
00019: create t1 root;
00020: foreach {sort keys %cover} {
00021:   ## print "animal $__";
00022:   insert element cover into /root;
00023:   cd /root/cover[last()];
00024:   insert element animal into .;
00025:   insert text $__ into animal;
00026:   foreach {sort @{$cover{$__}}} {
00027:     ## print "book $__";
00028:     insert element book into .;
00029:     insert text $__ into book[last()];
00030:   }
00031: }
00032: quiet; # avoid final message from ls
00033: ls /;
00034: END_XSH

XSH, An XML Editing Shell

http://www.xml.com/lpt/a/998 http://www.xml.com/pub/a/2002/07/10/kip.html

by Kip Hampton July 10, 2002

Introduction

A few months ago we briefly examined some of the command line utilities available to users of Perl and XML. This month we will continue in that vein by looking at the 300-pound gorilla of Perl/XML command line tools, Petr Pajas' intriguing XML::XSH.

XML::XSH and the xsh executable provide a rich shell environment which makes performing common XML-related tasks as terse and straightforward as using a UNIX shells like bash or csh. Yes, that's right — an XML editing shell. As we will see, it's not as crazy as it seems.

xsh Basics

Before we look at xsh's advanced tricks, let's get familiar with the environment it provides. We'll begin by starting the xsh shell:

[user@host user] xsh -i
-----------------------------------------------------
 xsh - XML Editing Shell version 0.9 (Revision: 1.6)
-----------------------------------------------------
...
xsh scratch:/>

The xsh shell starts in interactive mode, creating a new default scratch pad document, called new_document.xml with the ID scratch. The shell prompt takes the form of the current document's ID (scratch, in this case), followed by a colon, and then the current working context within that document expressed as an XPath location (/, in this case). In other words, we can tell from the prompt that we are at the root (/) level of the current XML document, whose ID is scratch.

We can open an existing XML document from the file system in order to figure out how to navigate within and between documents:

xsh scratch:/> open cams=files/camelids.xml
parsing files/camelids.xml
done.
xsh cams:/>

The open command opens the document camelids.xml from the directory files in the same directory in which we started the xsh shell, assigns it the ID of cams, and changes the working context to the root (/) of that document.

To list the elements contained in the current context we use the ls command.

xsh cams:/> ls
<?xml version="1.0" encoding="iso-8859-1"?>
<camelids>...</camelids>

Found 1 node(s).
xsh cams:/>

Since the current context is the abstract root of the document, we see the XML declaration and the sole top-level <camelids> element. If our document contained processing instructions or a Document Type Definition between the XML declaration and the top-level element, they would appear here, too.

Right through here is where is where things get interesting. Just like its UNIX shell cousins, many of xsh's commands accept paths as arguments, specifying the context in which that command is evaluated. The difference is that in xsh those paths are XPath expressions which provide access to the contents of the open XML documents, rather than file system paths that provide an interface to the files and directories of the mounted volumes.

So, for example, if we wanted list all of the <habitat> elements in our camelids document, we need only supply the appropriate XPath expression to the ls command:

xsh cams:/> ls //habitat

This yields:

<habitat>
  Bactrian camels' habitat consists mainly of Asia's deserts.
  The temperature ranges from -29 degrees Celsius in
  the winter to 38 degrees Celsius in the summer.
</habitat>
<habitat>
  Dromedary camels prefer desert conditions characterized by
  a long dry season and a short rainy season.
  Introduction of the dromedary into other climates has
  proven unsuccessful as the camel is sensitive to the
  cold and humidity (Nowak 1991).
</habitat>
<habitat>
  Llamas are found in deserts, mountainous areas, and
  grasslands.
</habitat>
<habitat>
  Guanacos inhabit grasslands and shrublands from sea
  level to 4,000m. Occasionally they winter in forests.
</habitat>
<habitat>
  Vicunas are found in semiarid rolling grasslands and
  plains at altitudes of 3,500-5,750 meters. These lands
  are covered with short and tough vegetation.  Due to
  their daily water demands, vicunas live in areas where
  water is readily accessible. Climate in the habitat is
  usually dry and cold. Nowak (1991), Grizmek (1990).
</habitat>

Found 5 node(s).
xsh cams:/>

Or, if we want our query to be more specific, we can use predicate expressions in our XPath statement. For example,

xsh cams:/> ls //habitat[ancestor::species/@name='Lama guanicoe']

to select just the Guanaco's habitat element.

Similarly, we can change the command evaluation context within the current document by giving an XPath expression to the cd command:

xsh cams:/> cd //species[@name='Camelus dromedarius']/natural-history
xsh cams:/camelids/species[2]/natural-history>

Which causes the context location in our shell prompt to change to reflect the new context to which we have navigated. Thus, commands not explicitly passed an absolute location path will be evaluated in the context of the <natural-history> element contained in the document's second <species> element (the one whose name attribute is equal to "Camelus dromedarius"). Thus, if we give the ls commadn with no path specified, we'll see the contents of the new context:

xsh cams:/camelids/species[2]/natural-history> ls
<natural-history>
       <food-habits>...</food-habits>
       <reproduction>...</reproduction>
       <behavior>...</behavior>
       <habitat>...</habitat>
</natural-history>

Found 1 node(s).
xsh cams:/camelids/species[2]/natural-history>

In addition, xsh provides a way to execute commands on any currently open document without changing the element context by prepending that document's ID and a colon to the XPath expression:

xsh cams:/camelids/species[2]/natural-history> cd /
xsh cams:/> open xmlnews=http://www.xml.com/xml/news.rss
parsing http://www.xml.com/xml/news.rss
done.
xsh xmlnews:/>
xsh xmlnews:/> ls cams:/camelids/species[3]/common-name
<common-name>Llama</common-name>

Found 1 node(s).
xsh xmlnews:/>

Notice that the context changed to the root of the newly opened RSS document once it is parsed into memory, but we still have easy access to the data contained in the camelids document by adding that document's ID (cams) and a colon to the front of the path.

Also note that the location of the file passed to the open command is not limited to files on the local machine; it can also be an HTTP or FTP URL, so long as a well-formed XML document is returned.

To see a list of all the currently open documents and their associated IDs, use the files command:

xsh xmlnews:/> files
cams = files/camelids.xml
xmlnews = http://www.xml.com/xml/news.rss
xsh xmlnews:/>

Closing an open document is as easy as passing its ID to the close command.

xsh xmlnews:/> close xmlnews
closing file http://www.xml.com/xml/news.rss
xsh :>

If we wanted to save a local copy of the xmlnews document before closing, we would use the saveas command:.

xsh xmlnews:/> saveas xmlnews files/xmldotcom_news.rss
xmlnews=new_document1.xml --> files/xmldotcom_news.rss (utf-8)
saved xmlnews=files/xmldotcom_news.rss as files/xmldotcom_news.rss
in utf-8 encoding
xsh :>

We've now reviewed xsh basics: we can start the shell, open, close, and navigate through contents of XML documents. If this is all there was to xsh, it would still be a winner as an XPath testbed and teaching tool (making it quite useful to users of XSLT and XPathScript, as well as XML::LibXML and the other Perl modules which offer an XPath interface). But xsh bills itself as an XML editing shell, and as we will see, it's that and a fair bit more.

Creating and Editing XML Documents

We can begin by creating a new XML document:

xsh :> create mynews news-channels

This creates a new document with the ID mynews with the top-level element news-channels and changes the context to the root of the new document. Let's have look:

xsh mynews:/> ls
<?xml version="1.0" encoding="utf-8"?>
<news-channels/>
xsh mynews:/>

So far, so good. Now lets add an element to the news-channels element.

xsh mynews:/> cd news-channels
xsh mynews:/news-channels> add element channel into .

We use the add command to add a channel element into the current context element, which is represented by a period character, and we can verify the result by listing the current context:

xsh mynews:/news-channels> ls
<news-channels><channel/></news-channels>

Found 1 node(s).
xsh mynews:/news-channels>

Note that the first argument for the add command must be the type of node being added (element in this case, ).

Suppose we need to add a name attribute to the new channel element, as well as an rss-url child element.

xsh mynews:/news-channels> add attribute "name='seepan uploads'" into
     ./channel[1]
xsh mynews:/news-channels> add element rss-url into ./channel[1]

Next, we'll add the URL of the CPAN RSS file as text node of the rss-url element:

xsh mynews:/news-channels> add text "http://search.cpan.org/recent.rdf"
    into ./channel[1]/rss-url

Let's add another channel element:

xsh mynews:/news-channels> add element channel before //channel[1]
xsh mynews:/news-channels> add attribute "name='perl news'"
    into ./channel[1]
xsh mynews:/news-channels> add element rss-url into ./channel[1]
xsh mynews:/news-channels> add text "http://search.cpan.org/recent.rdf"
     into ./channel[1]/rss-url

We used the before location expression as the third argument to the add command, specifying the first channel element as the evaluation context. This inserts the new channel into the list as the preceding sibling of the previously created channel.

Again, we van verify this by listing all the channels in the document:

xsh mynews:/news-channels> ls //channel
<channel name="perl news"><rss-url>http://www.perl.com/pace/perlnews.rdf
     </rss-url></channel>
<channel name="seepan uploads"><rss-url>http://search.cpan.org/recent.rdf
     </rss-url></channel>

Found 2 node(s).
xsh mynews:/news-channels>

Careful readers will have noticed the "seepan" typo — we can fix this using map, which applies a block of Perl code to nodes returned by the subsequent XPath expression:

xsh mynews:/news-channels> map { $_ = 'cpan uploads' } //channel[2]/@name

Here's a view of the full contents of our new document, obtained by listing the document's root:

xsh mynews:/news-channels> ls /
<?xml version="1.0" encoding="utf-8"?>
<news-channels>
  <channel name="perl news">
    <rss-url>http://www.perl.com/pace/perlnews.rdf</rss-url>
  </channel>
  <channel name="cpan uploads">
    <rss-url>http://search.cpan.org/recent.rdf</rss-url>
  </channel>
</news-channels>

Found 1 node(s).

Our new document is a bit simplistic, to be sure. But our goal here is just to demonstrate the basics of editing documents with xsh. What we've learned so far can be applied to the most complex XML documents.

To finish up, let's save our new document to disk and quit the shell:

xsh mynews:/news-channels> saveas mynews files/perl_channels.xml
mynews=new_document2.xml --> files/perl_channels.xml (utf-8)
saved mynews=files/perl_channels.xml as files/perl_channels.xml
in utf-8 encoding
xsh mynews:/news-channels>
xsh mynews:/news-channels> exit
[user@host user]$

xsh Scripting

No shell would be complete without the ability to perform automated or scripted tasks. As a final example, let's create an xsh script, which uses the data contained in the perl_channels.xml document we just created, to fetch all the current Perl news items from all the channels into a single XML document:

quiet;
open sources=files/perl_channels.xml;
create merge news-items;
$i = 0;
foreach sources://rss-url {
    $name = string(.);
    open $i=$name;
    xcopy $i://item into merge:/news-items;
    close $i;
    $i=$i+1;
};

close sources;
saveas merge files/headlines.xml;
close merge;

Looking closer at this script we see that it loads the perl_channels.xml document, iterates over all of its <rss-url> elements, fetches each document from the Web using the open command to grab the URL, and copies all of each channel's <item> elements into a new document. The new document is then saved to disk as headlines.xml before exiting.

Starting to see why an XML editing shell isn't such a crazy idea? I know I am.

Going Further

I've offered a glimpse of the ease and power that xsh provides, but there are many more commands and features available. For example,

xslt doc1 some_stylesheet.xsl doc2

transforms the document with the ID doc1 using the XSLT stylesheet some_stylesheet.xsl and stores the result in new document with the ID doc2.

Similarly, the command

xupdate myxupdate doc1

alters the content of the doc1 document using the rules contained in the XUpdate document stored in myxupdate.

For a complete list of commands, type help command at the xsh prompt, or help commandname for detailed usage of a specific command.

Conclusions

I was initially skeptical about the notion of an "XML editing shell". At first glance, it seemed to me to be pushing the file path/XPath metaphor a bit too far; surely it's little more than a technical curiosity? But I was very wrong, and I don't mind admitting it. XML::XSH is an astonishingly powerful tool which has quickly become a new tool in my daily XML work. I highly recommend it.

Resources

Download the sample code.
XSH Project Page
The XPath Language Specification

documented on: 2006.10.09