XML:DB native XML database API and its implementation in Apache Xindice

Prev

http://builder.com.com/5100-6387-5098255.html

Peter V. Mikhalenko | November 14, 2003

The XML:DB API is designed to enable a common access mechanism to native XML databases. The API enables the construction of applications to store, retrieve, modify and query data that is stored in an XML database. The API is described in terms of IDL, giving a freedom to implement it in any particular language such as Java or C++, with the assumption that the language is object-oriented.

It is designed as vendor neutral to support use with the largest array of databases possible. With a native XML solution, there is no need to convert XML data to some other data structure — you store and retrieve your data ready-to-use in any XML processing workflow.

On the other hand, the benefits of relational data structures are high-speed data retrieval and a relational theory grounded on strong mathematics; it is time-proven technology. However, the performance benefits of a relational database can be depleted by mapping relational structures to XML.

XML:DB API can be considered generally equivalent to technologies such as ODBC, JDBC or Perl DBI.

XML:DB use cases

The API allows you to:

Retrieve a document from the database using a known ID, if you want to work with the result as a DOM Document object.
Retrieve a document from the database using a known ID, if you want to work with the result as text XML.
Retrieve a document from the database using a known ID, if you want to use a SAX content handler to handle the document.
Retrieve a binary BLOB from the database. The BLOB is identified with a known ID. The database will need to determine the data is binary and return the proper resource type.
Retrieve a document from the database using a known ID, if you want to work with the result as a DOM Node object.
Store a new DOM document in the database using a known ID.
Use a SAX ContentHandler to store a new document in the database using a known ID.
Store a new text XML document in the database using a known ID.
Remove an existing resource from the database using a known ID.
Update an existing DOM document stored in the database.
Update an existing text XML document stored in the database.
Search the collection of fields by XPath expression, working with the results as DOM Nodes.
Insert multiple DOM documents under the control of a transaction.
XUpdate update language.

Inside the XML:DB initiative, besides the IDL API and Java interfaces, there is also an update language expressed as a well-formed XML language. XUpdate makes extensive use of the expression language defined by XPath for selecting elements for updating and for conditional processing. XUpdate is a pure descriptive language which is designed with references to the definition of XSL Transformations.

An update is represented by an <xupdate:modifications> element in an XML document. An <xupdate:modifications> element must have a version attribute, indicating the version of XUpdate that the update requires. For the current moment, version 1.0 is the only version allowed.

This element may contain several types of attributes:

xupdate:insert-before
xupdate:insert-after
xupdate:append
xupdate:update
xupdate:remove
xupdate:rename
xupdate:variable
xupdate:value-of
xupdate:if

Inserts and appends are very similar to XSLT stylesheet processing. For example, for creating an XML comment you should execute the following code:

<xupdate:comment>This is the comment</xupdate:comment>

Which should transform to:

<!--This is the comment -->

And the query:

<xupdate:update select="/bottles/wine[2]/province">
Champagne
</xupdate:update>

Would change the content of the context node to:

<bottles>
<wine>
<province>Beaujolais</province>
</wine>
<wine>
<province>Champagne</province>
</wine>
</bottles>

The following code will be intuitively clear for people who know XSLT:

<xupdate:variable name="province" select="/bottles/wine[0]/province"/>

<xupdate:append select="/bottles">
  <xupdate:element name="wine">
    <xupdate:value-of select="$province"/>
  </xupdate:element>
</xupdate:append>

It binds the selected object to the variable named province and uses the value of this variable to append a new wine record.

Reasons to store data in a native XML database

One reason to store data in a native XML database is to avoid the inefficiency and wasted space that results when your data is semi-structured. That is, it has a regular structure, but that structure varies enough that mapping it to a relational database results in either a large number of columns with null values (wasted space) or a large number of tables (inefficient). Although semi-structured data can be stored in object-oriented and hierarchical databases, choosing to store it in a native XML database in the form of an XML document may be a better option.

A second reason to store data in a native XML database is retrieval speed. Depending on how the native XML database physically stores data, it might be able to retrieve data much faster than a relational database. The reason for this is that some storage strategies used by native XML databases store entire documents together physically or use physical (rather than logical) pointers between the parts of the document. This allows the documents to be retrieved either without joins or with physical joins, both of which are faster than the logical joins used by relational databases.

A third reason to store data in a native XML database is that it allows you to exploit XML-specific capabilities, such as executing XML queries. Given that few data-centric applications need this today and that relational databases are implementing XML query languages, this reason is less compelling.

Apache Xindice

Apache Xindice is a native database designed from the ground up to be especially valuable when you have very complex XML structures that would be difficult or impossible to map to a more structured database.

At the present time Xindice uses XPath for its query language and XML:DB XUpdate for its update language. It provides an implementation of the XML:DB API in Java and it is also possible to access Xindice from other languages using XML-RPC.

Native XML database technology is a very new area and Xindice is very much a project in development. The server currently supports storing well-formed XML documents. This means it does not have any schema that constrains what can be placed into a document collection. This makes Xindice a semi-structured database and provides tremendous flexibility in how you store your data, but it also means you give up some common database functionality such as data types.

Xindice currently offers three layers of APIs that can be used to develop applications:

The XML:DB XML Database API is used to develop Xindice applications in Java.
The CORBA API is used when accessing Xindice from a language other then Java.
The Core Server API is the internal Java API of the core database engine. This is the lowest level API and is only available to software running in the same Java VM as the database engine itself.