In-Depth

A Guide To VB 2008 (REPLACE ARTICLE MOVED TO COLUMNS)

VB 2008 includes some terrific new XML functionality that will make you both more effective and more productive when working with XML in your applications.

Technology Toolbox: VB.NET, XML

Visual Basic 2008 (aka VB9) introduces a host of new features, including extension methods, Language Integrated Query (LINQ), anonymous types, and lambda expressions, to name just a few. I've covered several of these features previously (see Table 1, "Quick Guide to VB 2008," for a short description of what each feature does and where you can access the previous coverage), so I won't detail all of them in-depth in this column. But there's one truly outstanding set of features in VB9: the integrated support for XML. The support for XML in VB9 is extensive, which makes working with XML incredibly easy, yet empowering.

One of the more empowering XML features in VB9 is XML literals. XML literals allow you to write XML in the VB code editor and, most importantly, have that code recognized as XML. Typically, you assign an XML literal to an XElement variable:

Dim el As XElement = _
	<book>War and Peace</book>

The XElement comes from the System.Xml.Linq assembly, and it provides a simplified way of working with XML. The XElement can contain multiple XML elements and attributes.

VB 2008 lets you omit declaring a variable's type and instead have its type inferred (Option Infer On). This makes declaring the XElement much simpler:

Dim el = <book>War and Peace</book>

The el variable is of the type XElement. VB 2008 also has XML literal support for other types of XML nodes, including XDocument, XComment, XCData, and XProcessingInstruction. An XDocument is inferred if it has the XML declaration at the start of the literal:

Dim doc = <?xml version="1.0" 
	encoding="UTF-8"?>
	<books></books>

Dim myComment = _
	<!-- an XML Comment-->

Dim cdata = <![CDATA[cdata section
	can contain html characters <> & etc.]]>

Dim instruction = _
	<?xml-stylesheet type="text/xsl" 
	href="mytransform.xsl"?>

XML literals are, as the name implies, the literal representation of the XML as it would appear in a file (or close to it). For example, if you wanted to name a book "War & Peace" instead of "War and Peace," you'd need to encode the & to & inside the XML literal:

' won't compile
Dim el = _
	<book>War & Peace</book>

' will compile
Dim el = _
	<book>War & Peace</book>

Alternatively, you can use embedded expressions inside the XML literal. The embedded expression is enclosed by <%= and %>, and the expression must return a value or null reference; as such it can be a call to a function, or property, or any expression that returns a value, but cannot be a call to a Sub. In this example the embedded expression returns the String "War & Peace":

Dim el = _
	<book><%= "War & Peace" %></book>

Note that the expression is HTML encoded for you because you set the value through an embedded expression. You need to be wary of this and ensure you don't HTML encode the value beforehand. This is particularly important if you're getting values back from a Web control.

Typically you'll use embedded expressions for reading values back from controls:

Dim book = <book title=<%= txtBoxTitle.Text %>>
	<author><%= txtBoxAuthor.Text %></author>
	</book>

Note how the embedded expression for the title attribute isn't enclosed in quotation marks; however, the resulting output will be.

Embedded expressions can also be LINQ queries. LINQ lets you create XML literals inside the query and populate their values using more embedded expressions nested inside your query:

Dim books = <books>
	<%= From bk In mybooks Select _
	<book><%= bk.Title %></book> _
	%>
	</books>

Note that you need a line continuation character inside an embedded expression when dealing with code statements continued on the next line, but not between XML literals themselves.

You can use any type for the value of an XElement. When a type is added to the XElement, the type's ToString method is called to retrieve the element's new value. The string, as mentioned previously, is HTML encoded. A string such as "<book />" is encoded to "<book />". This means only expressions that return an XNode (from which XElement derives) or an enumerable collection of XNodes can insert child elements. Unfortunately, there's no interface inside the XLINQ library for getting the XML representation from an object. Yet the XElement Add method does a seemingly bizarre thing of accepting any IEnumerable, and, if the items aren't XNodes, calling ToString on each of them. For an array of strings, such as {"abc", "def", "ghi"}, the result is "abcdefghi." For an array of integers {1, 2, 3}, the result is "123." The reason for this is the Enumerable could in fact be a mix of XNodes and types meant to be converted to a string. I think this would've been more empowering if there were an interface that would let you return an XNode; perhaps the XLINQ team will do something like this in a future release.

Return values of Nothing change the XML output. For attribute values, the attribute is omitted if the expression is Nothing. For example, consider this statement:

<book title=<%= var1 %>></book> 

In this case, your output is <book></book> when var1 is Nothing. The tag becomes self closing for element values: <book><%= var1 %></book>

This output is <book /> when var1 is Nothing.

An empty string isn't considered to be Nothing. The previous two examples will output <book title=""></book>, and <book></book> if you put in an empty string.

If you want to keep your XML as clean as possible, but don't want to include empty values for attributes, you can check to see whether the string is an empty string by determining whether its length is zero. If so, insert Nothing instead. I use an extension method for this:

<Runtime.CompilerServices.Extension()> _
Function EmptyAsNull( _
	ByVal value As String) As String
	If value IsNot Nothing _
		AndAlso value.Length = 0 Then
		value = Nothing
	End If
	Return value
End Function

In this case, the embedded expression becomes:

<book title= _
	<%= var1.EmptyAsNull %>>
	</book> 

Another interesting use of embedded expressions is for the element's name:

<<%= var1 %>></>

Note the closing tag has no name specified because the opening tag isn't known yet. This works with simple XML, but it becomes more problematic when dealing with XML namespaces, and you'll need to use an XName for the name, as well as the relevant XML namespace.

Master XML Namespaces
XML namespaces might be the most confusing part about working with XML in .NET. With an XNode (or XElement or XAttribute that derives from an XNode), each node stores its fully qualified name, which is its local name and a namespace name. This pair of XML nodes has the same fully qualified name:

<book xmlns="books.com"></book>
<a:book xmlns:a="books.com"></a:book>

The XML for those two nodes is different, but each node still has the same fully qualified name of {books.com}book. Whether you're using VB, XPath, XLINQ, XSLT, or any other XML parsing language, those nodes are identifiable by the same code. So unless you're using Regular Expressions or some other text-based search that doesn't parse namespaces properly the difference is mainly aesthetic. There are times when you will want to control the XML output because of aesthetics or due to textual parsing. I'll show you how you can do that, but it helps if you understand how namespaces work with XNodes before tackling that problem.

XAttribute and XElement have similar constructors, allowing a string for the name or an XName. An XName stores both the local name and the XML namespace. You have three basic options for constructing the book element outside of VB's XML literals:

' pass in the local name prefixed 
' with the namespace enclosed in {} 's
Dim book = New XElement( _
	"{books.com}book")

' use an XName
Dim xn As XName = _
	XName.Get("book", "books.com")
Dim book = New XElement(xn)

'use an XNamespace and an XName
Dim xs As XNamespace = _
	XNamespace.Get("books.com")
Dim xn As XName = _
	xs.GetName("book")
Dim book = New XElement(xn)

The most efficient way is to use an XNamespace and an XName. The first example uses {}'s and requires parsing at runtime. The second example calls XName.Get, which requires a lookup internally to get the XNamespace from a dictionary. In the third example, you have the XNamespace and XName available for easy and efficient re-use on any further nodes. All three examples create the same output:

<book xmlns="books.com" />

To have that XML formatted using a prefix such as "a:book" you need to add an XML namespace attribute to the element:

book.Add(New _
	XAttribute( _
	XNamespace.Xmlns.GetName( _
	"a"), "books.com"))

This creates the output formatted with a namespace prefix:

<a:book xmlns:a="books.com" />

You added an xmlns declaration to the XML, so the XElement will use it for that namespace in that element and any child elements. If you were to now add another xmlns declaration for the same namespace but with a "b" prefix, the "b" prefix would be used (for more information about XML namespaces, visit w3.org/TR/REC-xml-names).

Internally, there are a number of factors at play here. XNamespaces are stored in a dictionary, allowing no duplicates. The prefix used for a namespace isn't stored, but is calculated for the given XElement. You need to call the XElement's GetPrefixOfNamespace method to determine the prefix. This method works its way back up the XML tree, reading the attributes from last to first in each node until it finds an xmlns declaration for the namespace. Hence, the last xmlns attribute added wins.

When you write an XElement or XDocument out to XML, it uses the same logic, but does so in a more efficient manner. Rather than traversing back up the tree as it writes each element, it instead caches the prefixes for each new namespace it encounters. The effect is the same; the last xmlns declaration for any namespace wins. For element namespaces, the default namespace is redefined if no matching prefix is found. The output of <book xmlns="books.com" /> is a typical example. For elements, the output doesn't include any prefixes you haven't defined. The story is different for attributes. Attributes don't inherit the element's namespace, so each attribute that has an XML namespace other than "" must be given an xmlns prefix and a corresponding xmlns must be defined within the scope of the containing element. These generated prefixes follow the pattern of p1, p2, p3 … pn. Hence, you should generally avoid using pn, where n is any integer, for your namespace declarations.

Ideally, you should add all the required xmlns declarations yourself to the root level node to generate aesthetically preferable XML output.

So far, I've shown you the way XML namespaces work inside the XLINQ framework. This is the same for both VB and C#. But VB makes this a lot easier and clearer. Consider this example of creating a book node and adding an xmlns prefix based on the .NET 3.5 Framework:

'the framework way
Dim xs As XNamespace = _
	XNamespace.Get("books.com")
Dim xn As XName = xs.GetName( _
	"book")
Dim book = New XElement(xn)
book.Add(New _
	XAttribute( _
	XNamespace.Xmlns.GetName( _
	"a"), "books.com"))

Now compare that with the VB9 way:

Dim book = <a:book xmlns:a="books.com"/>

The VB9 version is much more readable. But it gets even better. VB9 lets you declare XML namespaces as Import statements. This means you can define the XML namespace once at the start of the code file and then re-use it throughout all your XML declarations. You can also define the Import at the project level for the entire project by adding it to the Imports section of the project's properties' references tab:

'declared before any code in a code file
Imports <xmlns:a="books.com">

'inside a method in the same file
Dim book = <a:book />

When you use Imports, the VB compiler keeps track of the XML Namespaces used and adds the xmlns declarations for you.

One issue to be aware of occurs when you add XElements into another XElement. For example, if you were to create two separate book elements and then add them to a books element, the namespace declarations would be repeated in each element:

Dim books = <a:books/>
Dim book1 = <a:book>my first book</a:book>
Dim book2 = <a:book>my second book</a:book>
books.Add(book1, book2)

The output for these declarations looks like this:

<a:books xmlns:a="books.com">
	<a:book xmlns:a="books.com">my first book</a:book>
	<a:book xmlns:a="books.com">my second book</a:book>
</a:books>

To preserve the prefix, VB adds the xmlns attribute to the first element of each variable. When the books element is written out, the XElement preserves those attributes. But there's any easy fix: Use embedded expressions to add the elements, rather than calling the Add method:

Dim book1 = _
	<a:book>my first book</a:book>
Dim book2 = _
	<a:book>my second book</a:book>
Dim books = _
	<a:books>
	<%= book1 %>
	<%= book2 %>
	</a:books>

VB keeps track of the namespaces it uses inside a literal, so it removes those same namespaces from any child nodes added as an embedded expression. That difference changes the output to this:

<a:books xmlns:a="books.com">
	<a:book>my first book</a:book>
	<a:book>my second book</a:book>
</a:books>

Should you ever need to change or remove the prefix used, you can remove any xmlns attributes from the element and then add your own xmlns attribute. Add these two lines of code to the previous example to change it to use a default namespace instead of the "a" prefix:

books.Attribute( _
	XNamespace.Xmlns.GetName( _
	"a")).Remove()
books.Add(New XAttribute( _
	"xmlns", "books.com"))

Your new output looks like this:

<books xmlns="books.com">
	<book>my first book</book>
	<book>my second book</book>
</books

One final tip on Namespaces: If you use the Imports to declare an XML namespace, you can use VB's GetXmlNamespace operator to get an XNamespace reference:

Dim ns = GetXmlNamespace(a)

Read XML in VB
VB adds even more features when it comes to reading XML, including axis properties and schema-based IntelliSense. Axis properties allow you to navigate the nodes of an XDocument or XElement with a natural XML-looking kind of syntax. Assume each book in the books example has an authors element, and each authors element has one or more author elements. With axis properties, you can get the entire collection of authors by using the child axis to navigate the tree:

Dim authors = _
	books.<book>.<authors>.<author>

Alternatively, you can use the descendant axis property, which is symbolized by three periods:

Dim authors = books...<author>

In each of these examples, authors is an IEnumerable(Of XElement). VB adds a couple of extensions to make it easy to work with these collections of XElements: the Value extension, which returns the value of the first element and an indexer extension, which lets you access elements in the collection by index. For example, authors(2) is the third author. There's also the @ axis property, which lets you read an attribute's value:

Dim ibn = books.<book>.@ibn

Note that the @ axis property returns the value for the first element of the collection of elements.

If your XML uses namespaces, you need to define them as Imports to use axis properties. Once you have defined the namespace Import, you can then use axis properties using the prefix:

Dim authors = books...<bk:author>

It gets even better. If you add a schema to your project when using axis properties, you get IntelliSense automatically. You don't get the IntelliSense for XML literals (writing XML), but you do for axis properties, such as when reading and navigating the XML (see Figure 1).

XLINQ provides a nicer, simpler model for working with XML than previous versions of the .NET framework provided. VB9 also adds tons of features and a great IDE support story to that, which makes VB the ideal language choice when it comes to working with XML from inside .NET. The many additions the VB team has added are impressive, to put it mildly.

comments powered by Disqus

Featured

  • Creating Reactive Applications in .NET

    In modern applications, data is being retrieved in asynchronous, real-time streams, as traditional pull requests where the clients asks for data from the server are becoming a thing of the past.

  • AI for GitHub Collaboration? Maybe Not So Much

    No doubt GitHub Copilot has been a boon for developers, but AI might not be the best tool for collaboration, according to developers weighing in on a recent social media post from the GitHub team.

  • Visual Studio 2022 Getting VS Code 'Command Palette' Equivalent

    As any Visual Studio Code user knows, the editor's command palette is a powerful tool for getting things done quickly, without having to navigate through menus and dialogs. Now, we learn how an equivalent is coming for Microsoft's flagship Visual Studio IDE, invoked by the same familiar Ctrl+Shift+P keyboard shortcut.

  • .NET 9 Preview 3: 'I've Been Waiting 9 Years for This API!'

    Microsoft's third preview of .NET 9 sees a lot of minor tweaks and fixes with no earth-shaking new functionality, but little things can be important to individual developers.

  • Data Anomaly Detection Using a Neural Autoencoder with C#

    Dr. James McCaffrey of Microsoft Research tackles the process of examining a set of source data to find data items that are different in some way from the majority of the source items.

Subscribe on YouTube