XML::XPathScript And AxKit

From Perl-XML FAQ

http://perl-xml.sourceforge.net/faq/

XML::XPathScript

http://perl-xml.sourceforge.net/faq/#xml_xpathscript

XPathScript is a stylesheet language comparable to XSLT, for transforming XML from one format to another (possibly HTML, but XPathScript also shines for non-XML-like output).

Like XSLT, XPathScript offers a dialect to mix verbatim portions of documents and code. Also like XSLT, it leverages the powerful "templates/apply-templates" and "cascading stylesheets" design patterns, that greatly simplify the design of stylesheets for programmers. The availability of the XPath query language inside stylesheets promotes the use of a purely document-dependent, side-effect-free coding style. But unlike XSLT which uses its own dedicated control language with an XML-compliant syntax, XPathScript uses Perl which is terse and highly extendable.

As of version 0.13 of XML::XPathScript, the module can use either XML::LibXML or XML::XPath as its parsing engine. Transformations can be performed either using a shell-based script or, in a web environment, within AxKit.

AxKit

If you're doing a lot of XML transformations (particularly for web-based clients), you should take a long hard look at AxKit. AxKit is a Perl-based (actually mod_perl-based) XML Application server for Apache. Here are some of AxKit's key features:

Data can come from XML or any SAX data source (such as a database query using XML::Generator::DBI)
stylesheets can be selected based on just about anything (file suffix, UserAgent, QueryString, cookies, phase of the moon …)
transformations can be specified using a variety of languages including XSLT (LibXSLT or Sablotron), XPathScript (a Perl-based transformation language) and XSP (a tag-based language)
output formats can be anything you want (including HTML, WAP, PDF etc)
caching of transformed documents can be handled automatically or using your own custom scheme

documented on: 2006.10.09

XML Publishing with AxKit

http://www.gulba.org/pipermail/linux-ssa/2004-June/003114.html

 Publisher: O'Reilly
ISBN: 0596002165

"XML Publishing with AxKit" presents web programmers with the knowledge they need to master AxKit, a mod_perl- and Apache-based XML content delivery solution. This book provides detailed information on installing, configuring, and deploying AxKit effectively, and it features a thorough introduction to XSP, which applies the concepts of Server Pages technologies to the XML world. The book also covers integrating AxKit with tools such as Template Toolkit, Apache:: Mason, Apache::ASP, and plain CGI, and it contains reference sections on configuration directives, XpathScript, and XSP.

http://www.oreilly.com/catalog/xmlaxkit/

Chapter 3, "Your First XML Web Site," is available online: http://www.oreilly.com/catalog/xmlaxkit/chapter/index.html

documented on: 2006.10.09

Online Perl Programming Books — Practical mod_perl

http://www.linuxtopia.org/online_books/mod_perl_programming_book/

Stas Bekman, Eric Cholet

Your First AxKit Page

http://www.linuxtopia.org/online_books/mod_perl_programming_book/appe_02.html

Now we're going to see how AxKit works, by transforming an XML file containing data about Camelids (note the dubious Perl reference) into HTML.

First you will need a sample XML file. Open the text editor of your choice and type the code shown in Example E-1.

Example: Example E-1. firstxml.xml

 <?xml version="1.0"?>
 <dromedaries>
   <species name="Camel">
     <humps>1 or 2</humps>
     <disposition>Cranky</disposition>
   </species>
   <species name="Llama">
     <humps>1</humps>
     <disposition>Aloof</disposition>
   </species>
   <species name="Alpaca">
     <humps>(see Llama)</humps>
     <disposition>Friendly</disposition>
   </species>
 </dromedaries>

Save this file in your web server document root (e.g., /home/httpd/httpd_perl/htdocs/) as firstxml.xml.

Now we need a stylesheet to transform the XML to HTML. For this first example we are going to use XPathScript, an XML transformation language specific to AxKit. Later we will give a brief introduction to XSLT.

Create a new file and type the code shown in Example E-2.

Example: Example E-2. firstxml.xps

 <%
 $t->{'humps'}{pre} = "<td>";
 $t->{'humps'}{post} = "</td>";
 $t->{'disposition'}{pre} = "<td>";
 $t->{'disposition'}{post} = "</td>";
 $t->{'species'}{pre} = "<tr><td>{\@name}</td>";
 $t->{'species'}{post} = "</tr>";
 %>
 <html>
 <head>
 <title>Know Your Dromedaries</title>
 </head>
 <body>
   <table border="1">
     <tr><th>Species</th>
         <th>No. of Humps</th>
         <th>Disposition</th></tr>
     <%= apply_templates('/dromedaries/species') %>
   </table>
 </body>
 </html>

Save this file as firstxml.xps.

Now to get the original file, firstxml.xml, to be transformed on the server by text.xps, we need to somehow associate that file with the stylesheet. Under AxKit there are a number of ways to do that, with varying flexibility. The simplest way is to edit your firstxml.xml file and, immediately after the <?xml version="1.0"?> declaration, add the following:

<?xml-stylesheet href="firstxml.xps"
                 type="application/x-xpathscript"?>

Now assuming the files are both in the same directory under your httpd document root, you should be able to make a request for text.xml and see server-side transformed XML in your browser. Now try changing the source XML file, and watch AxKit detect the change next time you load the file in the browser.

If Something Goes Wrong

If you don't see HTML in your browser but instead get the source XML, you will need to check your error log. (In Internet Explorer you will see a tree-based representation of the XML, and in Mozilla, Netscape, or Opera you will see all the text of the document joined together.)

AxKit sends out varying amounts of debug information depending on the value of AxDebugLevel (which we set to the maximum value of 10). If you can't decipher the contents of the error log, contact the AxKit user's mailing list at axkit-users@axkit.org with details of your problem.

How it Works?

The stylesheet above specifies how the various tags work. The ASP <% %>syntax delimits Perl code from HTML. You can execute any code within the stylesheet.

In this example, we use the special XPathScript $t hash reference, which specifies the names of tags and how they should be output to the browser. There are several options for the second level of the hash, and here we see two of those options: pre and post. pre and postspecify (respectfully) what appears before the tag and what appears after it. These values in $t take effect only when we call the apply_templates( ) function, which iterates over the nodes in the XML, executing the matching values in $t.

XPath

One of the key specifications being used in XML technologies is XPath. This is a little language used within other languages for selecting nodes within an XML document (just as regular expressions is a language of its own within Perl). The initial appearance of an XPath is similar to that of a Unix directory path. In Example E-2 we can see the XPath /dromedaries/species, which starts at the root of the document, finds the dromedaries root element, then finds the species children of the dromedaries element. Note that unlike Unix directory paths, XPaths can match multiple nodes; so in the case above, we select all of the species elements in the document.

Documenting all of XPath here would take up many pages. The grammar for XPath allows many constructs of a full programming language, such as functions, string literals, and Boolean expressions. What's important to know is that the syntax we are using to find nodes in our XML documents is not just something invented for AxKit!

XPathScript Details

http://www.linuxtopia.org/online_books/mod_perl_programming_book/appe_04.html

XPathScript aims to provide the power and flexibility of XSLT as an XML transformation language, without the restriction of XSLT's XML-based syntax. Unlike XSLT, which has special modes for outputting in text, XML, and HTML, XPathScript outputs only plain text. This makes it a lot easier than XSLT for people coming from a Perl background to learn. However, XPathScript is not a W3C specification, despite being based on XPath, which is a W3C recommendation.

XPathScript follows the basic ASP syntax for introducing code and outputting code to the browser: use <% %> to introduce Perl code, and <%= %> to output a value.

The XPathScript API

Along with the code delimiters, XPathScript provides stylesheet developers with a full API for accessing and transforming the source XML file. This API can be used in conjunction with the delimiters listed above to provide a stylesheet language that is as powerful as XSLT, yet supports all the features of a full programming language such as Perl. (Other implementations, such as Python or Java, also are possible.)

Extracting values

A simple example to get us started is to use the API to bring in the title from a DocBook article. A DocBook article title looks like this:

<article>
 <artheader>
  <title>XPathScript - A Viable Alternative to XSLT?</title>
  ...

The XPath expression to retrieve the text in the <title> element is:

/article/artheader/title/text( )

Putting all this together to make this text into the HTML title, we get the following XPathScript stylesheet:

<html>
<head>
 <title><%= findvalue("/article/artheader/title") %></title>
</head>
<body>
  This was a DocBook Article.
  We're only extracting the title for now!
<p>
The title was: <%= findvalue("/article/artheader/title") %>
</body>
</html>

Again, we see the XPath syntax being used to find the nodes in the document, along with the function findvalue( ). Similarly, a list of nodes can be extracted (and thus looped over) using the findnodes( ) function:

...
<%
for my $sect1 (findnodes("/article/sect1")) {
  print $sect1->findvalue("title"), "<br>\n";
  for my $sect2 ($sect1->findnodes("sect2")) {
    print " + ", $sect2->findvalue("title"), "<br>\n";
    for my $sect3 ($sect2->findnodes("sect3")) {
      print " + + ", $sect3->findvalue("title"), "<br>\n";
    }
  }
}
%>
...

Here we see how we can apply the find* functions to individual nodes as methods, which makes the node the context node to search from. That is, $node->findnodes("title") finds <title> child nodes of $node.

Declarative templates

We saw declarative templates earlier in this appendix, in Section E.2. The $t hash is the key to declarative templates. The apply_templates( ) function iterates over the nodes of your XML file, applying the templates defined in the $t hash reference as it meets matching tags. This is the most important feature of XpathScript, because it allows you to define the appearance of individual tags without having to do your own iteration logic. We call this declarative templating.

The keys of $t are the names of the elements, including namespace prefixes where appropriate. When apply_templates( ) is called, XPathScript tries to find a member of $t that matches the element name.

The following subkeys define the transformation:

pre: Output to occur before the tag
post: Output to occur after the tag
prechildren: Output to occur before the children of this tag are written
postchildren: Output to occur after the children of this tag are written
prechild: Output to occur before every child element of this tag
postchild: Output to occur after every child element of this tag
showtag: Set to a false value (generally zero) to disable rendering of the tag itself
testcode: Code to execute upon visiting this tag

More details about XPathScript can be found on the AxKit web site, at http://axkit.org/

documented on: 2006.10.09