VSM Cover Story

Exploit Yukon's XML Data Type

Delve into SQL Server 2005's new native XML data type. Learn to add XML columns to a table, populate and index the columns, and understand basic XQuery syntax.

Technology Toolbox: XML, SQL Server 2005 or SQL Server Express beta 2 (February CTP or later), Visual Studio 2005 (February CTP or later) or Visual Basic Express beta 2, XPath 2.0, XQuery 1.0

Combining hierarchical XML content and relational columns in database tables is akin to mixing oil and water—the two data structures are unnatural partners. The original approach was to store XML documents as character large objects (CLOBs), which requires adding columns containing extracted atomic values or full-text searches to locate specific document instances, nodes, or values. Shredding documents into a set of relational columns is another alternative, but involves complex hierarchical-to-relational mapping. All major players in the relational database management market now have strategies for implementing Part 14 of the ANSI SQL 2003 standard. Part 14, "XML-Related Specifications (SQL/XML)," defines a modified XML Infoset data type that has sets of operators, publishing functions, and rules for mapping SQL to XML components but doesn't support association with an XML schema. The next ANSI standard, commonly called ANSI SQL 200n, will be based on the XQuery 1.0/XPath 2.0 data model and will enable validating XML data type instances against optional schemas.

SQL Server 2005's new native XML data type goes beyond the SQL 2003 standard by enabling XQuery 1.0 and XPath 2.0 expressions to retrieve, restructure, and update multiple strongly typed XML document instances. The XML data type is a variation on the varbinary(max) data type that stores UTF-16-encoded XML documents or fragments as condensed binary large objects (BLOBs). The XML column's structure enables the SQL Server 2005 query optimizer to generate query plans that minimize execution time. In this article, I'll show you how to add XML columns to a table, populate the columns with document instances generated by an SQLXML FOR XML AUTO, TYPE query, and add XmlSchemaCollections. I'll also explain basic XQuery syntax and provide sample expressions to return selected document instances, nodes, or atomic values and update XML column content.

You'll need the February 2005 Community Technical Preview (CTP) or later of SQL Server 2005—or SQL Server Express (SQLX) and SQL Server Express Manager (XM)—with the Northwind sample database installed to execute some of this article's T-SQL batch statements and XQuery expressions. A compatible version of Visual Studio 2005 or Visual Basic Express is required to explore the code of the XQuery.sln Windows form demonstration project that I'll describe shortly.

Generic T-SQL batch statements for creating a table with an XML column or adding an XML column to an existing table are the same as those for any other relational data type:

CREATE TABLE Table(KeyColumn datatype 
PRIMARY KEY, xmlColumn xml [[NOT] 
	NULL][, ?]); 

ALTER TABLE Table ADD xmlColumn xml 
	[NULL]; 

Using the Northwind Orders table as an example, this code adds OrdersXML1 and OrdersXML2 columns:

ALTER TABLE Orders ADD OrderXML1 xml 
	NULL, 
OrderXML2 xml NULL;

Populate XML columns with INSERT or UPDATE statements that pass the well-formed XML content as a string or a T-SQL variable of the nvarchar(max) data type with this generic batch:

INSERT Table(xmlColumn) VALUES 
	(N'XMLContent') WHERE KeyColumn = 
	value

DECLARE @xmlVar nvarchar(max)
SET @xmlVar = N'XMLContent' 
UPDATE Table SET xmlColumn = @xmlVar
WHERE KeyColumn = value

The nvarchar(max) data type is one of three new SQL Server 2005 Large Value types that enable storing up to 2^31-1 data bytes. The nvarchar(max), varchar(max), and varbinary(max) data types replace the ntext, text, and image data types, which remain for backward compatibility only.

By default, XML column content can be one or more fragments or a document with a root element; XML columns treat content as "just a bunch of nodes" (JABON). This batch adds a single Order instance to the OrderXML1 column:

DECLARE @xmlVar nvarchar(max);
SET @SalesOrder = N'<?xml version="1.0"
	encoding="utf-16"?>
<Order>
	<OrderID>10262</OrderID>
	<CustomerID>RATTC</CustomerID>
	<EmployeeID>8</EmployeeID>
	<OrderDate>1996-07-
		22T08:00:00Z</OrderDate>
	<!--Intervening elements obmitted -->
	<ShipCountry>USA</ShipCountry>
</Order>' ;
UPDATE Orders SET OrderXML1 = 
	@SalesOrder
WHERE OrderID = 10262

If a document or fragment isn't well-formed, attempts to insert the content return an error message.

You can automate XML column instance INSERTs or UPDATEs with a FOR XML AUTO, ELEMENTS, TYPE query. The TYPE directive, which is new in SQL Server 2005, returns a stream of the XML data type. This batch adds the same instance as the preceding example:

DECLARE @OrderID int; 
SET @OrderID = 10262; 
DECLARE @OrderXML xml; 
SET @OrderXML = (SELECT * FROM Orders 
	AS [Order] 
	WHERE OrderID = @OrderID 
	FOR XML AUTO, ELEMENTS, TYPE); 
UPDATE Orders SET OrderXML1 = @OrderXML  
	WHERE OrderID = @OrderID;

Alternatively, you can generate customized instances from SqlRecord or SqlDataReader objects with an XmlWriter, which replaces the .NET Framework 1.x's XmlTextWriter class. The XQuery demonstration project's CreateOrderXML2Doc function—in the XQueryOrders.vb partial class file—creates complex SalesOrder instances for the OrderXML2 column. These instances have qualified elements and attributes in four namespaces (see Figure 1 and Listing 1).

Associate XML Schemas With XML Columns
Content that's contained in XML columns is untyped by default. Strong typing enables instance validation and the XQuery 1.0 and XPath 2.0 Data Model's type system. You type instances by importing one or more XML schemas into an XmlSchemaCollection with this command:

CREATE XML SCHEMA COLLECTION 
	xscCollection
AS N'<xsd:schema?></xsd:schema>
[<xsd:schema?></xsd:schema>[?]]'

February CTP and later XmlSchemaCollections require instances to contain time-zone extensions—either Z or {+ | -}HH:MM to xsd:dateTime values, which requires FOR XML AUTO, ELEMENTS, TYPE queries to include T-SQL CONVERT functions to update datetime field values. (SQL Server 2005's datetime data type doesn't support time-zone extensions.) The XQuery project's btnFillOrderXML1 event handler includes T-SQL examples that add time-zone offsets.

XmlSchemaCollections support the "<xsd:import namespace="anyURI">" element; you add the imported schema(s) to the string (see Listing 2). XmlSchemaCollections have database scope, so you must associate the collection with the table's XML column by this generic batch, which assumes that the XML column exists:

ALTER TABLE Table DROP COLUMN xmlColumn;
ALTER TABLE Table ADD xmlColumn xml 
	([DOCUMENT] xscCollection) [[NOT 
	]NULL];

You must drop an existing XML column before you add an XmlSchemaCollection. However, you can add XML schemas for new namespaces or add new components to existing namespaces with the ALTER XML SCHEMA COLLECTION statement. The optional DOCUMENT directive enforces conformance to the XML Infoset model, which requires a single root node. If you don't add DOCUMENT, fragments are permitted—assuming that the schema supports fragments. SQL Server 2005's XML engine validates instances when executing UPDATE, INSERT, or XQuery modify operations.

If you don't have a schema for the XML column content, VS 2005's XML editor does a respectable job of inferring schemas from sample document instances, including imported schemas to support multiple namespaces. In most cases, you must tweak the xs:datatype, minOccurs, maxOccurs, and other attribute values to conform the schemas to the document structure. Importing XML schemas into an XmlSchemaCollection shreds them, so make sure to save a copy of the source documents (see Listing 3). You can generate a readable version of the schema by executing this query as a single line in SQL Server Management Studio (SSMS) or XM:

SELECT 
xml_schema_namespace
	(N'schema',N'xscCollection')
[.query('/xs:schema[@targetNamespace=
	"xsURI"]')]

The optional XQuery expression is required if the schema has multiple namespaces; in this case, you specify the default namespace. (The emphasized brackets are required). The demonstration project's ReadXmlSchemaCollection function executes this query to return a semi-readable version (without formatting white space) of the OrderXML2 column's schema collection to an XmlReader:

SELECT xml_schema_namespace(N'dbo',
	N'OrderXML2SchemaColl') 
	.query('/xs:schema[@targetNamespace= 
	"http://www.northwind.com/schemas/SalesOrder"]')

An XmlTextWriter formats the schema with element indents and line breaks (see Figure 2), and saves the schema in the ?\bin\debug or ?\bin\release folder as OrderXML2SchemaColl.xsd.

You can't index XML columns with conventional clustered or non-clustered relational indexes; XML columns require indexes that point to nodes and atomic values and preserve document order. The SQL Server development team defined a special structure—named ORDPATH—to tokenize element and attribute markup as integers in XML indexes. Creating a PRIMARY XML index requires a clustered index on the table's primary key, which participates in a back-join with the XML index. You add the XML index with the generic statement:

CREATE PRIMARY XML INDEX pidx_name ON 
Table (xmlColumn); 

After you add the PRIMARY XML index, you can add one or more specialized secondary indexes—PATH, VALUE, and PROPERTY—by referencing the PRIMARY index. These two batches create and drop all four indexes on the OrderXML2 column:

CREATE PRIMARY XML INDEX pidx_OrderXML2 
	ON Orders (OrderXML2); 
CREATE XML INDEX sidx_path_OrderXML2
	ON Orders (OrderXML2) USING XML 
		INDEX 
	pidx_OrderXML2 FOR PATH; 
CREATE XML INDEX sidx_value_OrderXML2
	ON Orders (OrderXML2) USING XML 
		INDEX 
	pidx_OrderXML2 FOR VALUE; 
CREATE XML INDEX sidx_prop_OrderXML2
	ON Orders (OrderXML2) USING XML 
		INDEX 
	pidx_OrderXML2 FOR PROPERTY; 

DROP INDEX sidx_prop_OrderXML2 ON 
	Orders; DROP INDEX 
	sidx_value_OrderXML2 ON Orders; 
	DROP INDEX sidx_path_OrderXML2 ON 
	Orders;  DROP INDEX pidx_OrderXML2 
	ON Orders; 

Unlike XmlSchemaCollections, you can add and drop XML indexes without dropping the associated XML column. Prepare for long execution times when adding XML indexes to columns that have many rows of large, complex instances.

XML indexes are much larger than their relational counterparts, and PRIMARY XML indexes often exceed the size of the compressed content in the XML column. Secondary indexes, which improve the performance of specific types of XQuery expressions, are somewhat smaller than the PRIMARY index. The database engine's query optimizer includes XML indexes when it generates the query plan. As you would expect, XML indexes slow INSERT, UPDATE, DELETE, and XQuery modify() operations. You can determine the size (number of pages) and fragmentation percentage of XML indexes by executing this query:

SELECT * FROM 
	sys.dm_db_index_physical_stats 
('Table', '*' , DEFAULT, 'DETAILED')

sys.dm_db_index_physical_stats replaces the DBCC SHOWCONTIG statement and returns a rowset (see Figure 3). The sys.dm_* data management functions, which appear under the SSMS Object Explorer's \Databases\DatabaseName\Views\System Views node, are new in SQL Server 2005.

Execute XPath Queries Against XML Columns
The SQL Server 2005 built-in XQuery engine conforms to the W3C July 2004 XQuery 1.0 and XPath 2.0 working draft. Final recommendations aren't expected until early 2006 at best. This delay led the VS 2005 team to remove the System.Xml.Query namespace from the .NET Framework beta 2. XQuery is the only way to query, reshape, and modify data in XML columns, so waiting for the W3C recommendation to arrive wasn't an option for the SQL Server team.

Microsoft's XQuery implementation embeds methods that execute XQuery expressions within T-SQL SELECT statements (see Table 1). The query(), value(), and exist() methods use this generic syntax, where rel is an abbreviation of "relational":

SELECT xmlColumn.{query | value | exist}
(?{XPath20Expression | FLWORExpression}') 
FROM Table 
[WHERE {XQueryExistsExpr | relColumn1 = 
	XQueryValueExpr | relColumn1 = 
		{relColumn2 | 
	SqlLiteral}}] [AS Alias] 
[GROUP BY relExpr] 
[ORDER BY relExpr].

As an example, this simple T-SQL query and XPath selector returns all Order instances:

SELECT OrderXML1.query('/') AS Result 
FROM Orders 

If you've populated the OrderXML1 column with the XQuery demonstration project, you can execute the queries by typing them in the upper textbox or the query window of SSMS or XM. The XQuery.txt file in the project's ?\bin folder contains all XQuery expression examples. The project's form has three radio buttons that add prebuilt queries to the upper textbox.

This T-SQL query with an XQuery FLWOR expression and an XPath 2.0 exist method expression as the WHERE clause constraint (Sample XQuery Expression 1) returns multiple Order instances that have a U.S. destination:

SELECT OrderXML1.query(
	'for $Result in /Order 
	return $Result') AS Result 
FROM Orders 
WHERE OrderXML1.exist
	('/Order[data(ShipCountry) 
	="USA"]') = 1;

The FLWOR acronym—unsurprisingly pronounced "flower"—represents the XQuery "for ... [let] ... where ... order by ... return" iterator syntax for an individual document instance. (The SQL Server 2005 XQuery implementation omits support for the let keyword.) The "for $Result in /Order return $Result" FLWOR expression is equivalent to the XPath /Order selector. The exist method returns 1 (true) or 0 (false), depending on the truth of the [data(ShipCountry)="USA"] XPath 2.0 predicate. The XPath 2.0 data() function returns the typed value of its argument, which in this case is xs:string for a typed XML column.

FLWOR syntax is similar to SQL, but it's important to differentiate the scope of keywords that are common to FLWOR and SQL, such as where/WHERE and order by/ORDER BY, or have a similar effect, such as for ? in/SELECT. XPath 2.0 and FLWOR expressions act on nodes of an individual instance of the XML column's document collection. You can visualize the collection as a nodes dimension added to the two-dimensional, column-row structure of relational tables (see Figure 4). SQL's SELECT and other keywords act on all rows of the table.

Failure to take advantage of SQL keywords, such as WHERE clauses, can lead to performance issues. As an example, this query returns 820 empty nodes to an XmlReader before reaching the first of 10 desired nodes specified by the predicate:

SELECT OrderXML1.query
	('(/Order[./OrderID >=  
	11067 and ./OrderID <= 11077])') 
AS [Order] FROM Orders;

This query with the predicate expression in a WHERE constraint returns only the desired rows:

SELECT OrderXML1.query('/Order') 
FROM Orders 
WHERE OrderXML1.exist
	('/Order/OrderID[.>= 11068 
	and .<= 11077]') = 1;

Type the preceding expressions or paste XQuery2A and 2B from the XQuery.txt file in an SSMS or XM query window to examine the nodes returned.

FLWOR expressions' construction feature lets you emulate XSL transforms by re-shaping the returned instances (see Figure 5). This query, a simplified version of Sample XQuery Expression 2, uses the sql:column() function and a join based on a value() expression to return instances with the Customers table's Phone value and, if the Fax value isn't NULL, Fax value:

SELECT OrderXML1.query
	('for $Result in /Order 
return 
<Order>
	<OrderDate>{data($Result/
		OrderDate)}</OrderDate>
	<OrderID>{data($Result/
		OrderID)}</OrderID>
	<ShipName>{data($Result/
		ShipName)}</ShipName>
	<ShipPhone>
		{sql:column("Customers.Phone")}
	</ShipPhone>
	{if (not(empty(sql:column
		("Customers.Fax")))) then 
	<ShipFax>{sql:column
		("Customers.Fax")}</ShipFax>
	else ()} 
</Order>') 
FROM Orders, Customers WHERE 
	Customers.CustomerID = 
	OrderXML1.value
		('(/Order/CustomerID)[1]', 
		'nvarchar(5)'); 

The modify() method lets you alter the instance content with Microsoft's implementation of a proprietary XML Data Manipulation Language (XML DML). This expression updates the ShipVia value of typed OrderXML1 instances to 1:

UPDATE Orders SET 
OrderXML1.modify
	('replace value of 
	(/Order/ShipVia)[1] with 
	xs:int(1)') 
WHERE ShipCountry = 'USA';

For untyped instances, replace xs:int(1) with 1.

The preceding sample XQuery expressions only scratch the surface of the capabilities of four of the five XQuery data access methods. Test-drive the XQuery demonstration project, explore its 1,500 lines of VB 2005 code, edit the sample queries, and try writing and executing a few XQuery expressions from scratch. (Check out the illustrated help file at www.oakleaf.ws/xquery2005/). Add and remove XML indexes to test their effect on XQuery execution time. I believe you'll agree with my conclusion that the SQL Server 2005 team has created an elegant and performant implementation of the native XML column type and its XQuery 1.0/XPath 2.0 subset.

comments powered by Disqus
Upcoming Events

.NET Insight

Sign up for our newsletter.

I agree to this site's Privacy Policy.