Table of Contents
http://builder.com.com/5100-31-5075453.html
September 29, 2003
By Brian Schaffner
A common problem with XML documents is how to persist them. Storing them in a relational database is often the most logical choice because relational databases are so prevalent.
It's not a simple matter of inserting the XML document into the database; there may be additional considerations. Let's look at some techniques you can use to store XML documents in relational databases.
The simplest and easiest technique is to create a table within the database that has a single large text field where you can store the XML data. Depending on the specific database and the specific XML documents, this field might be a binary large object (BLOB). Some databases require you to store large amounts of data as a BLOB rather than text.
The advantages of this technique are that it's extremely simple to dump the data into the table and equally simple to extract it back out. There are no keys to manage for this table.
Some major drawbacks are that you probably won't be able to do any useful text searching, and you may have difficulties locating a specific document since there's nothing to identify a unique document within the table.
The next most complex solution is to use a keyed table. This is very similar to the document table approach, but this time your table has two fields: a unique key and the XML document. With this technique, you retain much of the simplicity of storing and retrieving whole XML documents. You also introduce a small amount of complexity with managing the unique keys.
A common approach to creating unique keys is to use an MD5 checksum on the XML document. Keep in mind that this approach is insufficient if you are going to have duplicate XML documents in your table. In that case, you may add additional key fields that can be used to uniquely identify the document.
Like the document table, the keyed table is easy to implement. The additional overhead of using the keys is not significant and it solves the problem of finding specific documents within the table. However, like the document table, you will still not be able to perform any useful text searches.
This technique is more complex, but it also gives you more flexibility. With finite discrete tables, you create a set of tables that will store a finite set of discrete XML information. What does that mean? Well, here's an example.
Imagine you have an order document. At the root of the document is the Order element, which contains CustomerInfo, ItemInfo, and ShippingInfo elements. Within the database, you create an OrderDoc table that has an ID field, a CustomerInfoId field, an ItemInfoId field, and a ShippingInfoId field. Then, you also create a CustomerInfo table, an ItemInfo table, and a ShippingInfo table.
These tables have their respective ID fields along with information about the customer, the items, and the shipping data related to the order. Within this table, there may be additional levels of reference. For example, the CustomerInfo table might contain an AddressInfo field, which references an entry from the AddressInfo table.
The advantage of this approach is that it allows you to more closely model the tables to the XML data. This allows you to perform more sophisticated queries against the data. It also makes the data more available, since you don't need an XML parser to read the information.
The downside is that this technique requires a lot more effort to develop and maintain. It means that every document has to be parsed out into discrete components and then stored in the database. If that process is not carefully managed, you could end up with some serious data integrity issues. It also means that when extracting an XML document from the database, you have to assemble the discrete components.