In-Depth
Simplify XML Data Validation
The second-generation .NET Framework builds on the tools of the first to provide better, more standards-compliant XML data validation.
Technology Toolbox: XML, .NET Framework 2.0
Version 1.x of the .NET Framework gave you some capable tools for validating XML data. .NET Framework 2.0 augments the original capabilities significantly.
Version 2 builds on the foundation of .NET 1.x's XML validation features, providing new classes and methods that are not only standards-compliant, but also easy to use. I'll show you how to take advantage of some of these features, which will translate to better-performing, more standards-compliant XML data in your applications. I'll also walk you through some of the different scenarios and ways in which you can validate an XML document with an XSD schema programmatically (see Table 1).
The feature-rich XML support in .NET Framework 2.0 means you can perform XML data validation against a Document Type Definition (DTD) or an XML schema. You specify the validation settings as well as a ValidationEventHandler method using the XmlReaderSettings object. You then perform the validation during the reading and parsing operations of the factory-created XmlReader object.
Let's review some XML basics before diving into how the new features in XML can help you validate XML data more easily. XML data can be considered as categorically correct if it is well-formed and valid. Being well-formed requires the XML data to be syntactically correct; otherwise, the XML parser will raise an error. XML data is valid when the elements and the content of the individual elements in the XML data conform to the declared data types specified in the schema or DTD.
XML functionality in .NET is handled primarily by the classes present in a handful of namespaces: System.Xml, System.Xml.Schema, System.Xml.XPath, System.Xml.Xsl, and System.Xml.Serialization.
The System.Xml namespace is probably the most significant of these namespaces if only because it includes classes such as XmlDocument, XmlNodeReader, XmlReader, and XmlReaderSettings, which are critical in validating XML data. An XML file is usually validated for its conformance to a particular schema or a DTD. The XML schema file is usually an XML-Data Reduced (XDR) or XML Schema Definition (XSD) file. XSD schema-based validation is the industry-accepted standard and will be the method of XML validation in this article. I won't explain how to validate XML data using DTDs because those are not typically used outside legacy applications.
Validate XML Data With XSD
Validation is the process of enforcing rules on the XML content either through an XSD schema or a DTD or XDR schema. An XML document contains elements, attributes, and values of primitive data types. An XSD schema defines elements, attributes, and the relationship between them. It conforms to the World Wide Web Consortium (W3C) XML schema standards and recommendations.
.NET Framework 2.0 classes support the W3C XML schema recommendation. The classes that are commonly employed to validate the XML document are XmlReader, XmlReaderSettings, XmlSchemaSet, and XmlNodeReader.
Validating an XML document with an XSD schema requires five steps. First, define a ValidationEventHandler. Second, create an instance of the XmlReaderSettings object. The XmlReaderSettings class lets you specify a set of options the XmlReader object will support, and these options will govern the effects when parsing XML data. A ValidationEventHandler method is defined. Note that the XmlReaderSettings class renders the XmlValidatingReader class used with .NET 1.x obsolete. Third, associate the XmlReaderSettings class with the already defined ValidationEventHandler method. Fourth, set XmlReaderSettings' XsdValidate property to True. Fifth, add an XSD schema to the XmlReaderSettings class through its Schemas property. Once you complete these steps, the XmlReader class will validate the XML document automatically while parsing the XML using the Read method.
Use the ValidationEventHandler event to define an event handler for receiving the notification about XSD schema validation errors. The ValidationEventHandler callback function reports the validation errors and warnings. Note that validation errors do not stop parsing; parsing stops only if the XML document is not well-formed. However, failing to provide a callback function to handle validation errors results in an exception being thrown when a validation error occurs. Using the validation event callback mechanism to trap all validation errors lets you discover all validation errors in a single pass.
So far, I've described the steps used to validate XML data with the .NET Framework 2.0. Now let's put that knowledge to use. Begin by creating an XML document named Authors.xml (see Listing 1) and an XSD for Authors.xml called Authors.xsd (see Listing 2).
From here, it's a simple matter to use Authors.xsd to validate the contents of the Authors.xml file (see Listing 3). Begin by declaring variables to hold the path of the XML and XSD schema files, then create an instance of the XmlReaderSettings object and associate a validation event handler callback method to it. Next, set the XsdValidate property to True, which signals the XmlReader object to validate the XML data as it parses XML data. By default, you set this property to False so the XmlReader object doesn't validate the XML data by default. Next, add the Authors.xsd file to the Schemas collection of the XmlReaderSettings object and invoke the static Create method of the XmlReader object, passing in the path of the Authors.xml file and the XmlReaderSettings object.
Create XmlReader Objects
The Create method returns an instance of the XmlReader object, which provides validation of a DTD or an XML schema when parsing the document. You create the XmlReader objects with the Create method by passing in the XmlReaderSettings object, so the XmlReader object supports settings on the XmlReaderSettings object. Next, invoke the Read method of the XmlReader object in a While loop, which enables the entire XML file to be read and validated. Any validation errors will invoke the ValidationEventHandler method. Inside this method, a StringBuilder object keeps appending the contents of the validation error message to itself. If you don't provide a validation event handler, an XmlException is thrown when a validation error occurs. If the validation is successful, you get a message indicating that the validation has completed successfully (see Figure 1).
So far, you've seen how to perform validation using the standalone XSD schema file through the XmlReaderSettings class. This approach is an excellent way to validate XML data, but it doesn't provide a neat and efficient way of reusing XML schemas. This is where the XmlSchemaSet class comes into play. This class not only lets you create a cache of XML schemas, but it also enables you to compile multiple schemas for the same target namespace into a single logical schema. The XmlSchemaSet class replaces the XmlSchemaCollection class, the preferred class for caching schemas in .NET Framework 1.x. The new XmlSchemaSet class provides much better standards compliance and better performance.
For example, assume you want to validate XML data using the new XmlSchemaSet class (see Listing 4). After you create an instance of the XmlSchemaSet class, invoke its Add method to add the Authors.xsd schema to the XmlSchemaSet class. Once you add the schema to XmlSchemaSet, set the Schemas property of the XmlReaderSettings object to the XmlSchemaSet object, and invoke the Read method of the XmlReader object to parse the XML data in a loop. As in the previous example, the parser stops only if the XML data is not well-formed. Not stopping as you find each validation error enables you to find all the validation errors in one pass, without having to parse the XML document repeatedly.
Validate XML Data in a DOM Object
So far, you've learned how to validate XML data when reading the XML data using the XmlReader class. Another common scenario you need to consider is when you have data stored in an XmlDocument object. In this case, the only type of validation you can perform is load-time validation. In .NET Framework 1.x, you would do this by passing a validating reader object, such as an XmlValidatingReader object, into the Load method. Unfortunately, if you make changes to the XmlDocument, you have no way to ensure that the data still conforms to the schema. You can get around this by using the XmlNodeReader class, which reads data stored in an XmlNode object. Using this class enables you to validate a DOM object by passing the XmlNodeReader to the Create method (see Listing 5). A careful look at the listing reveals how an XmlNodeReader object returned from the XmlDocument object (which in turn is loaded from the Authors.xml document) includes XML schema validation support layered on top while reading. Note that this XML schema validation support layered on top of an XmlNodeReader object is a new feature introduced with .NET Framework 2.0.
Load the Authors.xml file into an XmlDocument and modify it in memory by adding an attribute called "test" before you read the XmlDocument object into an XmlNodeReader object. Then pass the XML document to an XmlNodeReader, which in turn is then passed to the factory-created XmlReader object. When the validating reader parses the file, it can validate any changes made to the file. You're adding an invalid attribute to the XmlDocument object, so the XSD schema will fail (see Figure 2).
This article scratches only the surface of what's possible with the new .NET Framework's features. You can use the foundation supplied by this article to construct sophisticated XML validation mechanisms that are useful when performing operations such as validating XML data returned from an XML Web service, validating XML data before passing it onto XML-aware applications, and so on.
About the Author
Thiru Thangarathinam works at Intel in Chandler, Ariz. He specializes in architecting, designing, and developing .NET-based distributed enterprise-class applications. He has coauthored several books on .NET-related technologies. He has also been a frequent contributor to leading technology-related publications. Reach him at [email protected].