XPath And XQuery

XPath Essential

What Can XPath Do?

XPath makes it possible for a processor to navigate around the hierarchy of XPath nodes to a particular part or parts of the document and may return a set of such nodes, called a node set, or may return a value which is a string, a number, or a Boolean value (i.e., true or false). A particularly important use of XPath is for matching. For example, if you wanted to transform an XML source document and present selected elements of it as HTML, then an XPath expression can be written to do that. The XPath expression would allow those selected elements (and only those elements), or, more precisely, the nodes which represent those elements, to be matched by the templates or elements in the XSLT stylesheet and their content displayed suitably within the resulting HTML document.

XPath in XSLT

In this general process of using XSLT to create an output tree, XPath fulfills the crucial role of enabling you to choose which elements, attributes, or other parts of the source tree you show to a user in any particular context. XPath expressions and location paths are used to select nodes for inclusion in the result tree. If an XPath location path matches the node you want to output, then the desired output is created. XPath expressions can be used, in effect, to hide information from some or all users. If, for example, you had selling details of the price you paid suppliers within the information that you held about a company s product, you would not want your customers to be aware of that. Such sensitive or confidential information could be selectively suppressed from being part of the output by omitting those elements from the XPath expressions used to create the output tree.

The XPath Forms of Syntax

XPath has, fairly confusingly, four forms of syntax, which may be used to write expressions and location paths.

You will perhaps notice in that XPath expression a similarity to the way a path, as in a directory listing or in a URL, is written. That similarity to other path notations gives rise to XPath s name. It is a path language for XML, not a path language in XML.

XPath can be viewed as a way to navigate round XML documents. Thus XPath has similarities to a set of street directions. When you are receiving street directions, you need to know what your starting point is. In XPath the starting point is called the context node. The logical parts of the in-memory representation of an XML document are termed nodes. An XPath processor deals with nodes, not with the more familiar elements and attributes.

In XPath the equivalent of a direction is called an axis. XPath has a total of 13 different axes, which we will look at in more detail later. A particularly commonly used axis is the child axis.

<?xml version='1.0'?>
<Invoice>
<CustomerName> John Smith </CustomerName>
<Address> 123 Any Street </Address>
<City> Anytown </City>
</Invoice>

the XPath expression child::* selects all child element nodes of the context node would select the nodes that represent the <CustomerName>, <Address>, and <City> elements.

If we wanted to select only the <CustomerName> element, we could use a more specific XPath expression child::CustomerName which selects only child elements of the context node that have an element type name of CustomerName.

XPath can also select a <CustomerName> element from our example XML document using yet another syntax

/child::Invoice/child::CustomerName

which can be abbreviated to

/Invoice/CustomerName

Already you have seen three out of the four forms of syntax that can be used in XPath.

Use XPath to locate information in XML documents

http://builder.com.com/5100-6389-1054416.html

Baseline Inc. | October 31, 2002

XML is an excellent vehicle for packaging and exchanging data. Parsing and transforming an XML document are common tasks, but what about locating a specific piece of information within an XML document — XPath fills this niche. XPath is a set of syntax rules for addressing the individual pieces of an XML document. If you're familiar with XSLT, you've used XPath, perhaps without realizing it.

An industry standard

XPath is an industry standard developed by the World Wide Web Consortium (W3C). It's used in both the XSLT and XPointer standards. Native XML databases often use it to locate information as well.

XPath follows in the path of the Document Object Model (DOM), whereby each XML document is treated as a tree of nodes. Consequently, the nodes are one of seven types: root, element, attribute, text, namespace, processing instruction, and comment. These are all standard aspects of any XML document. You can see many of these elements in the following sample XML:

<?xml version="1.0" encoding="ISO-8859-1"?>
<books>
<book type='hardback'>
<title>Atlas Shrugged</title>
<author>Ayn Rand</author>
<isbn>0525934189</isbn>
</book>
<book type='paperback'>
<title>A Burnt-Out Case</title>
<author>Graham Greene</author>
<isbn>0140185399</isbn>
</book>
</books>

The root node is books; book is an element with the type attribute, and the text exists throughout the XML document elements. So how do you easily locate individual pieces of data within the document? XPath is the answer.

Locate what you need

You locate information in an XML document by using location-path expressions. These expressions are made up of steps.

A node is the most common search element you'll encounter. Nodes in the example books XML include book, title, and author. You use paths to locate nodes within an XML document. The slash (/) separates child nodes, with all elements matching the pattern returned. The following XPath statement returns all book elements:

//books/book

A double slash (//) signals that all elements in the XML document that match the search criteria are returned, regardless of location/level within the document. You can easily retrieve all ISBN elements:

/books/book/isbn

The previous code returns the following elements from the sample XML document:

<books>
<book type='hardback'>
<isbn>0525934189</isbn>
</book>
<book type='paperback'>
<isbn>0140185399</isbn>
</book>
</books>

Use square brackets to further concentrate the search. The brackets locate elements with certain child nodes or particular values. The following expression locates all books with the specified title:

/books/book[title='Atlas Shrugged']

You can use the brackets to select all books with author elements as well:

/books/book[author]

The bracket notation lets you use attributes as search criteria. The @ symbol facilitates working with attributes. The following XPath locates all hardback books (all books with the type attribute value hardback):

//book[@type='hardback']

It returns the following element from the sample XML document:

<book type='hardback'>
<title>Atlas Shrugged</title>
<author>Ayn Rand</author>
<isbn>0525934189</isbn>
</book>

The bracket notation is called a predicate in the XPath documentation. Another application of the brackets is specifying the item number to retrieve. For example, the first book element is read from the XML document using the following XPath: /books/book[1]

The sample returns the first book element from the sample XML document:

<book type='hardback'>
<title>Atlas Shrugged</title>
<author>Ayn Rand</author>
<isbn>0525934189</isbn>
</book>

Specifying elements by position, name, or attribute is great, but some situations require all elements. Thankfully, the XPath specification supports wildcards to retrieve everything. Every element contained within the root node is easily retrieved with the wildcard (*). The following sample returns all books from the sample XML document:

/books/*

You can easily combine statements with Boolean operators to select a combination of elements. The following statement retrieves all hardcover and soft cover books; thus all elements from the sample XML document:

//books/book[@type='hardcover'] | //books/book[@type='softcover']

The pipe (|) is equal to the logical OR operator. Selecting individual nodes from an XML document is powerful, but developers must be aware of the path to the node. In addition, XPath provides the logical OR and AND for evaluating results. Also, equality operators are available via the <=, <, >, >=, ==, and !=. The double equal (==) signs evaluate equality, while exclamation mark and equal sign (!=) evaluate inequality.

Reference point

The first character in the statement determines point of reference. Statements beginning with a forward slash (/) are considered absolute, while omitting the slash results in a relative reference. I've used absolute references up to this point, so here's an example of a relative reference:

book/*

The previous statement begins the search at the current reference point. It may appear in a group of statements, so the reference point left by the previous statement is utilized. Also, keep in mind that double forward slashes (//) retrieve every matching element regardless of location within the document.

Context and parent

XPath provides a dot notation to handle selecting the current and parent elements. This is analogous to a directory listing in which a single period (.) represents the current directory and double periods (..) represent the parent directory. In XPath, the single period is used to select the current node, and double periods return the parent of the current node. So, to retrieve all child nodes of the parent of the current node, use:

../*

For example, you could access all books from the sample XML document with the following XPath expression:

/books/book/..

Get what you need

The concepts I've touched on in this article are only an introduction to XPath. You can combine them and use them in an XSLT document or XPointer. XPath does provide more power via built-in functions, and it offers an alternate syntax. Check out the XPath specification for more details.

Locate and format XML data with XPath functions

http://builder.com.com/5100-6387-1057652.html

Baseline Inc. | November 8, 2002

XPath allows you to quickly locate and extract information from an XML hierarchy, and it offers extended functionality via its built-in functions, which provide an easy way to work with numeric and textual data. Locating data within an XML document doesn't require thorough knowledge of a traditional development language like C# or Java. The combination of XSLT and XPath provides everything you need to locate and format the data accordingly.

The string and number XPath functions are two examples of this functionality. Here, I'll provide an overview of these functions. However, this is by no means meant to be an exhaustive coverage of XPath functions.

Working with numbers

XPath provides numerous functions that make it easy to work with functions. Table A provides a sampling:

Table: Table A, XPath number functions
Name	Description
ceiling()	Generates the smallest integer that is not less than the number passed to the function
floor()	Generates the largest integer that is not less than the number passed to the function
number()	The value passed to the function is converted to a number.
round()	The number passed to the function is rounded to the nearest integer.
sum()	A total value is calculated from the set of numeric values (node set) passed to it.

Let's examine the XML to be used in the examples:

<?xml version="1.0" encoding="ISO-8859-1"?>
<books>
<book type='hardback'>
<title>Atlas Shrugged</title>
<author>Ayn Rand</author>+
<isbn>0525934189</isbn>
<price>39.95</price>
</book>
<book type='paperback'>
<title>A Burnt-Out Case</title>
<author>Graham Greene</author>
<isbn>0140185399</isbn>
<price>13.00</price>
</book>
</books>

The number functions may be used to accomplish numerous tasks. For example, the total value of the books (if this were an order) may be easily calculated with the sum() function, like this:

sum(//books/book/price)

This is placed in an accompanying stylesheet to display the results:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/">
<p>Total cost of books = <xsl:value-of select="sum(//books/book/price)"/></p>
<p>Total number of books = <xsl:value-of select="count(//books/book)"/></p>
</xsl:template>
</xsl:stylesheet>

The previous stylesheet may be applied to the sample XML with the following directive:

<?xml-stylesheet href="xslt.xsl" type="text/xsl"?>

This line is added to the XML file to tell the processor what stylesheet to apply to the XML:

<?xml version="1.0" encoding="ISO-8859-1"?>
<?xml-stylesheet href="xslt.xsl" type="text/xsl"?>
<books>
<book type="hardback">
<title>Atlas Shrugged</title>
<author>Ayn Rand</author>
<isbn>0525934189</isbn>
<price>39.95</price>
</book>
<book type="paperback">
<title>A Burnt-Out Case</title>
<author>Graham Greene</author>
<isbn>0140185399</isbn>
<price>13.00</price>
</book>
</books>

The transformation results are displayed in Figure A.

Figure A
Take advantage of simple XPath functions.

The functionality provided by the number functions allows calculations to be accomplished while diving into a programming language like Java or C#. In addition, the XPath text functions allow XML data to be formatted or manipulated as necessary.

Working with text

The XPath string functions are extensive; a partial list is provided in Table B.

Table: Table B, XPath string functions
Name	Description
concat()	Concentrates two string values
contains()	Returns a Boolean value indicating whether the first value contains the second (true), or not (false)
normalize-space(	) Leading and trailing spaces are removed from the value
starts-with()	Returns a Boolean value indicating whether the first value begins with the second value passed in
string()	Converts a value to a string
string-length()	Returns the length of a string value
substring()	Returns a portion of a string. The first parameter is the string, the second is the starting point of the substring, and the third parameter is the ending point of the substring returned.

The contains() function may be used to determine if an element matches the criteria or not. This example utilizes the contains() function to determine if a certain book has been found:

<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:template match="/">
<html>
<head><title>Listing B</title></head>
<body>
<h1>XPath functions - Listing B</h1>
<p>Selected book:
<xsl:variable name="title" select="//books/book/title"/>
<xsl:if test="contains($title,'Shrug')">
<xsl:value-of select="$title"/>
</xsl:if>
</p>
</body></html>
</xsl:template>
</xsl:stylesheet>

The result of the previous example is displayed in Figure B.

Figure B
Using XPath string functions

The example combines the power of the XSL if/then directive and the contains() function to locate information. The remaining string functions may be used in a similar fashion. XPath contains Boolean and node-set functions as well, but they are beyond the scope of this article. Look for more about these functions in upcoming articles.

XPath functions offer better translations and less programming

http://builder.com.com/5100-6387-1054415.html

October 9, 2001 By Brian Schaffner

Using XML's Path Language (XPath) functions in your XSLT templates can help reduce the amount of programming that you need to do when it comes time to translate your data. Let's run through some of the same functions you can use to gather and manipulate quantities and improve your XSL translations.

Count(node-set)

You can use the count() function to determine the number of nodes in a template, which can be useful for processing XML data that contains a dynamic set of data. It's also useful for creating summary data from XML data. Suppose you want to perform a translation that shows the number of items on a customer's order. Here's an example of how you might use the count() function to achieve this:

<xsl:template match="OrderRecord">
  <ItemCount><xsl:value-of select='count(/OrderRecord/Items/LineItem)'/></ItemCount>
</xsl:template>

Not(boolean)

The not() function returns true or false depending on the value of the Boolean expression. Not() will return the inverse value of the Boolean value. For example, not(true()) returns false and not(false()) returns true. This function is useful for performing logic when the condition being tested is a false condition rather than a true condition. We can use this function to change the behavior of our count() example above. In this case, when there are no LineItems, we won't put the ItemCount element in the resulting XML:

<xsl:template match="CustomerRecord">
  <xsl:if test="not(count(/CustomerRecord/RecordData) = 0)">
    <ItemCount><xsl:value-of select='count(/CustomerRecord/RecordData)'/></ItemCount>
  </xsl:if>
</xsl:template>

Sum(node-set)

The sum() function creates a summary value based on the node-set provided. This function can be used to sum the values of nodes such as prices, quantities, and weights. Suppose your translation needs to determine the total amount for an order based on the line-item subtotals. The following example illustrates how to do this:

<xsl:template match="OrderRecord">
  <TotalPrice><xsl:value-of select='sum(/OrderRecord/LineItems/Item/Subtotal)'/></TotalPrice>
</xsl:template>

Round(number)

The round() function is used to round off floating-point numbers to integer values. This function is useful when the floating-point value or values need to be converted to integers because of the target system they are going into or because you need to round off to a different precision (such as converting three-decimal currency to two-decimal currency). Suppose you sell transactions for fractions of a penny, but your invoicing system must bill customers using dollars and whole cents. In this case, you will need to round off the values in order to put them into the invoicing system. To keep from losing any pennies, we'll sum the values first and then round off as a final step. The following example illustrates this:

<xsl:template match="OrderRecord">
  <TotalPrice><xsl:value-of select='round(sum(/OrderRecord/LineItems/Item/Subtotal) * 100) div
  100'/></TotalPrice>
</xsl:template>

Summary

In this article, we looked at four XPath functions that you can use in your XSLT templates. XPath functions provide a nice set of functionality for creating complex transformations. Using these functions can reduce the amount of programming required when a system receives the XML data.

[xsl] XSLT/XPath function references

Date: Tue, 10 Jul 2001 08:14:04 +0200

> I know of the following:
> 1) XSLT functions(and my reference for that is
> http://zvon.org/xxl/XSLTreference/Output/index.html)[]
>
> 2) XPath
>    2.1) http://www.caucho.com/products/resin/ref/xpath.xtp[] and
>    2.2) http://www.caucho.com/products/resin/ref/xpath-fun.xtp)[]
>
> 3) And the most commonly used quick reference from MulberryTech (I have the
> pdf file locally stored on my PC)
>
> Questions:
> 1) When and for what are 1 and 2 usually used for? Moreover, what is the
> difference between 2.1 and 2.2
> 2) Are there constraints as to where each of these can and cannot be used?
> Any tutorial on this would suffice. I will dig in into the details.

in our reference (your Ref. 1) the XPath functions (defined in XPath recommendation) and XSLT functions (defined in XSLT recommendation) are shown together (to enable easier navigation??). Whether it belongs to XPath or XSLT functions is shown on top of each page in square brackets.

Ref. no. 2.1 shows XPath patterns, not functions. XPath functions are listed in Ref. 2.2 and XSLT functions are not listed.

To your second question: You can use any of these functions in predicates

<xsl:apply-templates select="//*[any_function_here() ... ]

or tests.

Using [] (was: bug of xpath from XML::XPath)

To: Matt Sergeant, matt@sergeant.org

Hi, Matt:

I noticed a bug in xpath from the XML::XPath module:

$ xpath XmlXPath.test.xml "//a[contains(@onmouseover,'topnext')]"
Found 2 nodes:

Now, when I try to get the first node, it should return one node. But:

$ xpath XmlXPath.test.xml "//a[contains(@onmouseover,'topnext')][1]"
Found 2 nodes:
-- NODE --
<a border="0" onmouseover="iOver('topnext'); iOver('bottomnext'); self.status=nextblurb; return true;" onmouseout="iOut('topnext'); iOut('bottomnext'); self.status=''; return true;" href="l-grub-1-2.html"><img alt="Next" border="0" src="../i/next.gif" name="topnext" /></a>-- NODE --
<a border="0" onmouseover="iOver('topnext'); iOver('bottomnext'); self.status=nextblurb; return true;" onmouseout="iOut('topnext'); iOut('bottomnext'); self.status=''; return true;" href="l-grub-1-2.html"><img alt="Next" border="0" src="../i/next.gif" name="bottomnext" /></a>

However, when I tried to duplicate the error in the 05attrib.t, I failed. Due to my limited knowledge on Xml, and XML::XPath, I don't know where the problem is now. All I can guarantee is that the xml file is well formed. I'm sending the test file and my effort trying to duplicate the error (in 05attrib.t) to you, for you to have a look at your leisure time. There must be somewhere wrong.

Thanks

PS. attachment follows in a separated email. PPS. My XML::XPath is the latest, version 1.13.

Using [] (was: bug of xpath from XML::XPath)

 >   $ xpath XmlXPath.test.xml "//a[contains(@onmouseover,'topnext')]"
 >   Found 2 nodes:
 >
 > Now, when I try to get the first node, it should return one node.
But:
 >
 >   $ xpath XmlXPath.test.xml
"//a[contains(@onmouseover,'topnext')][1]"
 >   Found 2 nodes:

You're misunderstanding what that [1] does. It applies to the local context of the <a> tag, not to that result set. You need brackets to set that context.

Using [] (was: bug of xpath from XML::XPath)

$ xpath XmlXPath.test.xml "(//a[contains(@onmouseover,'topnext')])[1]"
Found 1 nodes:
-- NODE --
<a border="0" onmouseover="iOver('topnext'); iOver('bottomnext'); self.status=nextblurb; return true;" onmouseout="iOut('topnext'); iOut('bottomnext'); self.status=''; return true;" href="l-grub-1-2.html"><img alt="Next" border="0" src="../i/next.gif" name="topnext" /></a>

A new approach to locating XML data with XQuery

http://builder.com.com/5100-6389-5093696.html

Baseline Inc. | November 3, 2003

By Tony Patton

The rapid adoption of XML throughout the industry has led to an abundance of XML-formatted data. XSLT is the popular method for transforming XML to a required format, but locating data within an XML document is a different story. XPath was developed to easily retrieve items from an XML document, but it requires knowledge of the XML document structure. This fundamental need to locate XML-based information resulted in the development of XQuery (XML Query).

Basically, XQuery is a standard query language developed by the Web Consortium. It is analogous to SQL (http://builder.com.com/5100-6388-1051673.html) and its relationship with the backend database, but XQuery is not restricted to XML-based data. XQuery is flexible enough to query a broad spectrum of data sources, including relational databases, XML documents, Web services, packaged applications, and legacy systems. This article provides a quick introduction of the essential aspects of XQuery.

Express yourself

The main aspect of XQuery is that everything is an expression. XQuery is not a programming language, so XQuery scripts (or programs) are expressions. This is where the SQL analogy is appropriate, because SQL statements are basically expressions that interact with backend data, although the expressions may become very complex. Here is a simple example of an XQuery expression:

let $value1 := 0
let $value2 := 1
let $rValue := ""
if ($value1 > $value2) then let $rValue := "true" else let $rValue := "not true"
return $rValue

The preceding few lines represent a simple XQuery expression. It creates and assigns values to variables, utilizes flow control via the if statement, and outputs a value via the return keyword. In this example, let is used to assign values and the dollar sign is prepended to variable names. In addition, the assignment operator is a colon plus the equal sign. The if structure follows the basic syntax of most languages. The return statement marks the point in the expression where a value is returned. The return value may be a simple variable (like the previous example), static text, or a mixture of extracted values and text.

Basic elements

While the central ingredient of XQuery is the expression, these expressions utilize the following common reserved keywords:

for—Process (loop) items within an XML document
let—Create and assign variable values
where—Conditional statement used in conjunction with the for keyword
return—The values returned to the expression originator

An overused acronym used for these common keywords is FLWR, often called FLWR-expression. Here is a basic XML document, which contains a sampling of books:

<?xml version="1.0" encoding="ISO-8859-1"?>
<books>
<book type="paperback">
<title>American Psycho</title>
<author>Bret Easton Ellis</author>+
<isbn>0679735771</isbn>
<price>14.00</price></book>
<book type="hardback">
<title>A Burnt-Out Case</title>
<author>Graham Greene</author>
<isbn>0370014995</isbn>
<price>13.00</price></book>
<book type="paperback">
<title>The Information</title>
<author>Martin Amis</author>
<isbn>0679735739</isbn>
<price>14.00</price>
</book></books>

The preceding XML is used in the following XQuery example:

let $doc = document("books.xml")
for $d in $doc/books/book
return
{$d/title/text}
?by
<td>
{$d/author/text}

In this simple example, all books in the format of title followed by the text and the author's name are returned. Notice that XPath (http://builder.com.com/5100-6387-1057652.html) notation is used to specify individual nodes in the for statement and in portions of the return statement. Two other noteworthy aspects of this example:

The document is a standard XQuery function. It is used to access an XML document or node as an element within the expression. In the previous example, it was assigned to a variable and later processed with XPath expressions.
The values are utilized in the return portion of the expression by enclosing them within curly braces with the appropriate XPath syntax. The current element is accessed using the variable name declared in the for statement.

There are three books in the sample XML, but only two are paperbacks (the attribute of the book element). This attribute may be utilized in a where statement that extends the previous example to output only paperback books:

let $doc = document("books.xml")
for $d in $doc/books/book
where ($d/@type = "paperback")
return
{$d/title/text}
?by
<td>
{$d/author/text}

The where clause guarantees only paperback items are returned. Again, this is similar to the structure of a SQL statement where a condition is used. These examples provide a peek at the XQuery approach and syntax. It is a powerful language for retrieving necessary data, and industry support is swelling.

Where can I get it?

As with most developing technologies, you must search carefully to locate products that support it. Thankfully, XQuery is seen as an industry standard, so the rush to support it has been quick. You can find it in popular tools such as XML Spy and Corel XMetal. In addition, Microsoft has been quick to provide XQuery support for .NET, and the Java community has been on the bandwagon for some time. The support is overwhelming, so the prospect of XQuery becoming deprecated is dismal (I never say never).

Only the beginning

The comparison between XQuery and SQL is good, but notable differences are apparent. A major variation is XQuery's lack of support for updating the data source (this is a basic aspect of SQL), nor can data sources be created on the fly. With this limitation in mind, some vendors have chosen to develop a proprietary approach to updating. This is just one item, but it does show the difference and highlights the fact that this is a relatively new technology (version 1.0) that will continue to evolve.