Ask Kathleen

Convert XPath into LINQ to XML

Learn how to convert XPath statements in XMLNode.SelectNodes expressions to LINQ to XML for better maintainability and performance; also, drill down on the performance implications of using LINQ to XML relative to their XPath against XmlDocument objects counterparts.

Technologies mentioned in this article include VB.NET, C#, and XAML.

Q We had a contractor do some XML work for us that we're having a hard time maintaining, partly because no one understands the complex XPath statements he used in XmlNode.SelectNodes statements. I'm trying to rewrite this in LINQ to XML so that we have an easier time maintaining it. However, I'm getting the error "Namespace Manager or XsltContext needed. This query has a prefix, variable, or user-defined function." How can I make this work?

A The problem is managing XML namespaces correctly in LINQ to XML. You didn't mention whether you're working in VB or C#, so I'll show you how to do solve this in C# with the .NET classes, and then show you the same solution with the Visual Basic shortcuts.

I'll use the example XML structure in Listing 1, to access the expanded online version of this listing that incorporates additional fields, customers, and orders). This XML declares a namespace without a prefix, so all non-prefixed elements are in the namespace http:kadgen/CustomersAndOrders. The real name of the element includes the namespace, and your queries need to reflect this.

You handle namespace resolution with a namespace manager when working with the SelectNodes method. If the namespace manager doesn't support a prefix in your XPath, you receive an error:

string xPath = 
   "//co:Order[.//co:ShipRegion[ = '" 
   + state + "']";
XmlDocument xmlDoc = 
   new XmlDocument();
XmlNamespaceManager nsmgr = 
   new XmlNamespaceManager(
XmlNodeList nodeList = 
   xPath, nsmgr);

You can convert this to work with an XDocument using the XPathSelectElements extension method in the Sys-tem.Xml.XPath namespace:

IEnumerable<XElement> elements = 
   xPath, nsmgr);

This is an easy conversion, but you're stuck with the same complex XPath statements.

To rewrite this as a LINQ expression, you need to shift to element naming to use instances of the XName class. The XName and XNamespace classes don't have constructors because .NET optimizes them through atomization. Instead of constructors, implicit conversion operators create XName objects from strings. You can create an XName instance from a string using the expanded notation:

XName xName = 

If you had to write that every time you accessed an element, you'd go crazy! Instead, create a variable, perhaps at the class level, that includes an instance of the XNamespace class and use this wherever you use a namespace:

XNamespace co = 
var q = from order 
   in xDoc.Descendants(
   co + "Order") 
   where order.Descendants(
   co + "ShipRegion").First().Value 
   == state
   select order;

You can use the resulting query in a for each loop or convert it using the ToArrray and ToList extension methods.

Visual Basic lets you define the namespace and prefix in an Imports statement:

Imports _

VB also supplies shortcuts on the three main XML axes: child, descendant, and attribute. If a schema is available, VB offers IntelliSense specific to your XML structure along these axes. These shortcuts simplify the VB query:

Dim q = From order _
   In xDoc...<co:Order> _
   Where order...<co:ShipRegion>.Value _
   = state

LINQ to XML makes it easier to work with namespaces than any previous version of .NET because you use them directly with background optimizations. This is true in both C# and Visual Basic, although VB takes the syntax simplification one step further.

Q I'm currently using a lot of XPath against XmlDocument objects. What are the performance implications of converting these into LINQ to XML?

A Many LINQ to XML statements run significantly faster than their XPath counterparts. But this depends on how the query is constructed, so you might plan for performance checks on frequently used queries against the types of XML files you use.

In addition to being faster in many scenarios, LINQ is easier to performance tune because it's hard to figure out what those complex XPath statements are doing. It's easier to understand LINQ to XML queries and delayed execution means you can break them up into logical chunks without incurring a performance penalty.

I'll use this non-trivial XPath statement to explore performance. It extracts all the customers who have shipped to a specific state from the XML in Listing 1:

xPath As String = _
   "//co:Customer[@CustomerID = " & _
   "//co:ShipRegion[. = '" & _
   state & "']//ancestor::co:Order" & _

You can write this query in LINQ using three different approaches. You can use a join, nested query, or two adjacent queries. Throw in a few variations, including using extension methods and there are tons of possibilities. Performance is not equivalent--they vary from about three times to about 30 times faster than the XPath version. A nested query looks a bit like a nested query in SQL Server:

Dim elements = _
   From customer In xDoc...<co:Customer> _
   Where (From order In xDoc...<co:Order> _
      Where order...<co:ShipRegion>.First().Value _
      = "OR" And _
      order.<co:CustomerID>.First().Value = _
      customer.@CustomerID _
      Select 1).Count() > 0 _
   Select customer.<co:CompanyName>.First().Value _

You can solve the same problem with a join:

Dim elements = _
   From region In xDoc...<co:ShipRegion> _
   Join customer In xDoc...<co:Customer> _
      On region.Ancestors( _
      ns + "Order").<co:CustomerID>.Value _
      Equals customer.@CustomerID _
   Where region.Value = "OR" _
   Select customer.<co:CompanyName>.Value _

Both produce the same results, but performance testing (available in the download) shows the join to be nearly 10 times faster than the nested query, due to optimizations LINQ supplies to the join. The join is also easier to read and shorter, so it wins hands down.

You can simplify this query by splitting it into two parts, which also makes debugging easier. First, determine the customer IDs for the orders you're interested in, and then determine what company names belong to these customer IDs with a join:

Dim selectedCustIDs = _
   From order In xDoc...<co:Order> _
   Where order...<co:ShipRegion>.Value = "OR" _
   Select order...<co:CustomerID>.Value
Dim elements = _
   From customer _
   In xDoc.Descendants(ns + "Customer") _
   Join selectedCustId _
   In selectedCustIDs _
   On customer.@CustomerID _
   Equals selectedCustId _
   Select customer.<co:CompanyName> _

There are a few variations, including using the Contains method on the IEnumerable(Of Order), but the join is faster.

The execution is delayed, so the expression trees of the two queries are combined before the query executes. Assuming you iterate only the second query, only the combined query is executed and performance is the same as the earlier combined join. You get increased readability and easier debugging for free.

You can also write this query in C#:

var selectedCustIDs = 
   from order in 
   xDoc.Descendants(ns + "Order")
   where order.Descendants(
      ns + "ShipRegion").First(
      ).Value == "OR"
      select order.Descendants(
         ns + "CustomerID").First().Value;
   var elements = ( 
      from customer in 
         xDoc.Descendants(ns + "Customer")
         join selectedCustId in selectedCustIDs
         on customer.Attribute(
         "CustomerID").Value equals 
         select customer.Elements(
            ns + "CompanyName").First(

That's harder to read, but you can clean it up a good bit by writing a few extension methods:

public static XElement 
   this XElement element, 
   XNamespace ns, string name)
   return element.Descendants(
      ns + name).FirstOrDefault();

public static XElement 
   FirstElement(this XElement element, 
   XNamespace ns, string name)
   return element.Elements(
      ns + name).FirstOrDefault();

These extension methods allow you to describe the intent of your selection along the axis in a single method:

var selectedCustIDs = 
   from order in
   ns + "Order")
      where order.FirstDescendant(
         ns, "ShipRegion").Value == "OR"
         select order.FirstDescendant(
         ns, "CustomerID").Value;
var elements = (
   from customer in 
   xDoc.Descendants(ns + "Customer")
      join selectedCustId 
         in selectedCustIDs
      on customer.Attribute(
         equals selectedCustId
      select customer.FirstElement(
         ns, "CompanyName")).Distinct();

C# and VB both enable you to exact a bit more performance if you cache your XName objects so they're created only once.

I'm not ready to guarantee that LINQ to XML statements will always be faster than their XPath equivalents, but in my experience, once I get things tuned, LINQ to XML is at least as fast and usually faster.

About the Author

Kathleen is a consultant, author, trainer and speaker. She’s been a Microsoft MVP for 10 years and is an active member of the INETA Speaker’s Bureau where she receives high marks for her talks. She wrote "Code Generation in Microsoft .NET" (Apress) and often speaks at industry conferences and local user groups around the U.S. Kathleen is the founder and principal of GenDotNet and continues to research code generation and metadata as well as leveraging new technologies springing forth in .NET 3.5. Her passion is helping programmers be smarter in how they develop and consume the range of new technologies, but at the end of the day, she’s a coder writing applications just like you. Reach her at

comments powered by Disqus
Most   Popular
Upcoming Events

.NET Insight

Sign up for our newsletter.

Terms and Privacy Policy consent

I agree to this site's Privacy Policy.