Search an XML File with SQL

12:12AM Mar 03, 2008 in category Coldfusion by Jesse Bethke

There exists many vendors for database solutions, each of them with their own unique medium.  Frequently we have to move data between them or to a desktop application like Excel.  Generally a flat file is done to do this.  The most common is the CSV (Comma seperated values) or tab delimited file.  However, ever increasingly more popular is the XML format.  XML allows for the opportunity to store three dimensional data structures in a flat file. It also provides for a more descriptive nature; for example each data node is at least somewhat self aware of the data it represents.

XML On Its Own

I was first pushed into a XML format because I needed to develop a small web application that needed to dynamically pull from data.  I did not have the resources to utilize a database solution.  I used Coldfusion to parse the XML documents and XPath to search them.

Unfortunately, the solution was nearsighted in that the data collection grew to expand over 10 XML files; each nearly a megabyte large.  I am of course talking of USD's South Dakota Chronology Database that I developed in 2005. Back then I developed a very impressive interpreter class that would convert my custom query language into XPath that would allow for complex case-insensitive searching of any XML document. I'd even started to develop some phonetic searching features.  That query interpreter is also utilized by Real SD, Richardson Collection, the old Research Gateway, and a handful of my own solutions.

Looking back, I wish I knew then what I know now.  XML is still a good solution for small web applications, but can be made even more useful when you can utilize some of Coldfusion's query object components.

XML as a Query Object

A query object is a coldfusion object that gets returned from a database query.  However, Coldfusion allows you to populate a query with your own data.  This makes it a uniform dataset for all sorts of data manipulation.

Sample XML Data

The upcoming examples will use this simple set of XML Data. You could of course read in a much more complex XML file and parse it with xmlParse().

<cfxml variable="inlineXML">
    <?xml version="1.0" encoding="ISO-8859-1" ?>
    <xmlItems>
        <item>
            <name>Bouncy Ball</name>
            <color>Red</color>
            <funLevel>5</funLevel>
        </item>
        <item>
            <name>Fast Bike</name>
            <color>Navy</color>
            <funLevel>6</funLevel>           
        </item>
        <item>
            <name>Crayons</name>
            <color>Multi</color>
            <funLevel>4</funLevel>           
        </item>
        <item>
            <name>Book</name>
            <color>Navy</color>
            <funLevel>4</funLevel>           
        </item>
    </xmlItems>
</cfxml>

Convert XML Data into a Query Object

This code is overly simplified for the small data set, but you get the idea. Essentially it gets a list of item nodes and loops through them. Each item's data value is inserted into the query object.

<cfscript>
	productQuery = QueryNew("name, color, funLevel");
	items = XMLSearch(inlineXML, "//item");
	QueryAddRow(productQuery, arrayLen(items));
	for(i = 1; i lte arrayLen(items); i = i+1) {
		QuerySetCell(productQuery, "name", items[i].name.xmlText, i);
		QuerySetCell(productQuery, "color", items[i].color.xmlText, i);
		QuerySetCell(productQuery, "funLevel", items[i].funLevel.xmlText, i);
	}
</cfscript>

What if you have relationships in your data? Well, in such case you need to intelligently split up your XML data and put one entity type in one query, and another entity in another query object. For instance, suppose you wanted your colors in a separate query. You'll see in a minute how you could query across multiple query objects just as if they were tables in a database.

Query of Queries

Coldfusion has a magnificent framework that allows you to query a query object and get back another query object using the SQL language. This brief example nearly explains itself.

<cfquery name="colorQuery" dbType="query">
    SELECT
    	color, 
    	COUNT(color) as colorCount
    FROM
    	productQuery
    GROUP BY
    	color
</cfquery>

This query gets a list of the colors and the number of products in each color. You could also have gotten products back, but this is more interesting.

Take notice though of the FROM statement in the query. It selects from productQuery. ProductQuery is the name we assigned our query object. This could also be extended to use JOIN statements to select from multiple query objects. So you could merge XML data with results from a database.

Using the Query

This makes searching the XML data much easier. But it also makes utilizing the data infinitely easier. You can see where I'm headed with this if you have ever tried to traverse an XML DOM. While Coldfusion's compliance with E4X standards makes it easier, it is still by no means trivial. Coldfusion query objects are however. For example, the following code will list all the colors and the number of products in each color.

<cfoutput query="colorQuery">
    #color# (#colorCount#)<br/>
</cfoutput>

How about a flashy pie chart showing the distribution per color:

<cfchart format="flash" show3d="yes">
    <cfchartseries 
    	type="pie"
        query="colorQuery"
        itemColumn="color"
        valueColumn="colorCount"  />
</cfchart>

Doing the same with straight up XML would boggle the mind.

Other Potential Uses

As you may have guessed, there is more potential here. You could also pull your CSV files in and then search them with SQL.

If you are an application developer, you have written a couple dozen document parsers. Each into a unique format. If you are using Coldfusion I'd recommend trying to stuff it into a query object. You'll appreciate it later.

In regards to Performance

I'm not terribly certain how Coldfusion indexes its query objects. It goes without saying that querying a well indexed database is faster than querying a query object. As far as XPath vs a query, I'm not 100% certain. With a small xml file it is negligible. The biggest hit would probably be in parsing into the query. I'd highly recommend that if it is big then store it in the application scope and only parse it on application start (as oppose to each request).

Comments:

ummm... this looks interesting to me haven't heard before about searching XML file with SQL.

Posted by Free SMS on May 12, 2008 at 05:31 AM CDT #

"XML file with SQL" Very interesting!

Posted by ral on June 03, 2008 at 01:30 AM CDT #

Post a Comment:
  • HTML Syntax: Allowed

Calendar

« September 2008
SunMonTueWedThuFriSat
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
    
       
Today
The views and opinions expressed in this page are strictly those of the page author. The contents of this page have not been reviewed or approved by The University of South Dakota.