Save Time With LINQ Queries -- Visual Studio Magazine

Developer Product Briefs

Save Time With LINQ Queries

See how the LINQ syntax, specifically DLinq and XLinq, can increase your productivity and reduce the possibility for error.

By Bill Wagner
05/01/2006

Save Time With LINQ Queries
See how the LINQ syntax, specifically DLinq and XLinq, can increase your productivity and reduce errors.
by Bill Wagner

May 1, 2006

Suppose different departments in your business spoke different languages: English in software development, French in accounting, German in human resources, and Spanish in marketing. This scenario would be disastrous for productivity, regardless of which language was your favorite. So you probably won't encounter that situation often. Instead, most businesses standardize with a single language and vocabulary to make communication more efficient.

Software is different. We have created entirely different languages to process data based on its location. We have C# and VB.NET for general computing. We've created SQL to modify data that has been stored in a relational database. We've created XSLT, DOM, and XQuery to process data that has been stored in an XML document.

Think about this for a minute: The tools you choose for development are based on the location of the data, not the kind of data you are manipulating. For example, an Employee object in memory is processed differently from an Employee record in a database, and you've even got different methods to process the same Employee information in an XML document. Why?

This problem, a superset of Object Relational Mapping (ORM), is the justification behind Language Integrated Query (LINQ). The concept behind LINQ is simple: You can use your favorite general-purpose programming language to manipulate data regardless of its location.

While this concept is simple, you need to familiarize yourself with the new technology emerging around it. You need to learn about the language extensions

that support LINQ, DLinq, which translates LINQ queries into code that can execute in the context of a database; and XLinq, which handles queries and manipulation of objects in an XML document.

So the LINQ framework developers are still concerned with where data resides, but you don't have to worry about it. You can write code in your favorite language, and the LINQ assemblies do the hard work for you. In this article you'll see how LINQ queries can save you time by simplifying the set of tools you use to manipulate data.

Language Extensions
To make LINQ technologies work, the C# and VB.NET teams have added many features to the languages that support querying objects. Here is a simple C# example that queries an array for numbers divisible by 5:

int [] numbers = ReadNumbers();
var numsDivFive = 
  from n in numbers 
  where n % 5 == 0 
  select n.ToString();

These two statements introduce several new language features, so look at them one at a time. Start with from n in numbers, which defines an enumeration across a collection. It more or less has the same meaning as foreach (int n in numbers). This statement declares a variable that will be used for each and every element in the collection (n) and the collection being enumerated (numbers).

Next look at select n.ToString(). It defines the object that is returned from the enumeration. In this case, the code transforms n, or the number, to a string by using the ToString() method.

Finally, examine the Where clause: where n % 5 == 0. It defines a predicate that is tested for each and every element.

Where clauses are critical to understanding LINQ. For example, you can write any type of filter or condition, and LINQ will translate it into executing code. Fully understanding how Where clauses work will help you recognize the power of the LINQ syntax. Where clauses are built on two concepts: extension methods and lambda expressions.

Extension methods are methods that you write to extend the public interface of a class. The Where method used in the previous code sample is one of the standard sequence operators, and Microsoft has delivered its source with the LINQ preview builds:

public static IEnumerable<T> Where<T>(
    this IEnumerable<T> source, Func<T, bool> predicate) 
{
    foreach (T element in source) 
    {
        if (predicate(element)) yield return element;
    }
}

By placing the this modifier on the source parameter, the Where method acts as though it is a member of any class that implements IEnumerable<T>, and extends the signature of IEnumerable<T>. Strict rules on the method call resolution prevent extension methods from interfering with regular methods. (A full discussion about these rules is beyond the scope of this article.)

The second parameter of Where, the predicate, is a lambda expression. A lambda expression is a function. You can think of lambda expressions as passing functions as data. In .NET, lambda expressions are built using delegates. The delegate Func is defined in sequence.cs as:

public delegate T Func<A0,
T>(A0 arg0);

The Func delegate is simply any function that takes a single argument and returns a type. The compiler translates the Where expression into a method, and creates a delegate from it. In a sense, the compiler writes this method for you:

bool MyPredicate( int n )
{ if ( n % 5 == 0 ) return true;
   else return false;
}

The compiler also translates your query into a method call using the Func delegate definition:

numbers.Where( new Func<int, bool>(MyPredicate));

Delegate functions are not magic. But the C# team added some powerful compiler extensions to translate query expressions into delegate functions.

Finally you've arrived at the "dreaded" var keyword, and local type inference. This keyword can seem complicated if you view it as untyped, and implicitly substitute var for any type. Var is not untyped; instead, it is an unnamed strong type. Substitute the compile-time type of the right-hand side for var instead.

In the first example, the type is IEnumerable<string>. The select statement returns a sequence of strings, so the compiler infers that numsDivFive must be an IEnumerable<string>. The var keyword was added because some queries create new types for you. More importantly, these types are anonymous; the compiler assigns the new type a name without letting you know what it is.

For example, this snippet of code returns a sequence of a new type:

List<Customer> customers = GetCustomerList();
var orders =
    from c in customers,
    o in c.Orders
    where o.Total < 500.00M
    select new {c.CustomerID, o.OrderID, 
    o.Total};

The concept in this example is rather simple. It examines all orders from all customers using a compound From clause. If the total cost of an order is less than 500, the method adds that order to the resulting collection. But it doesn't return the entire order. It returns a new type that contains three public properties: the customer ID, the order ID, and the total cost. This new type was created by the compiler. Here's the definition the compiler creates:

public class ????? {
  private int customerID;
  public int CustomerID
  { 
    get { return customerID; }
    set { customerID = value; }
  }
  private int orderID;
  public int OrderID
  {
    get { return orderID; }
    set { orderID = value; }
  }
  private decimal total;
  public decimal Total
  {
    get { return total; }
    set { total = value; }
  }
}

It's essentially the same code you'd have created yourself. The only difference is that you don't know the name of this new type; it's an anonymous type. So, what type do you specify for orders? It's IEnumerable<?????>, but what replaces the question marks? The compiler knows, but you don't.

This is why the C# language designers added the var keyword. You can use var to declare local variables when you don't know the name of the anonymous type that the compiler created for you. Variables declared with var must be initialized at the point they are declared, because it's the only way the compiler knows what type they might be.

Now you know the basic syntax for LINQ queries. The language designers have added powerful new features to C# that enable you to write SQL-like queries on any arbitrary sequence. Simply put, if you can foreach it, you can query it. You can also use many extension methods: order by, group by, skip distinct, union, and intersect. These new features add to your favorite programming language the query capability that's normally associated with relational data.

DLinq: Bridging the Object/Relational Gap
So far you've received a whirlwind tour of the new language features in LINQ that enable you to query objects in memory. DLinq was designed to manage several of the issues relating to the differences between relational data and object data. One of its general goals is to provide better programming language support for relational expressions. The current .NET APIs you use for working with relational data raise several concerns, many of which revolve around loosely bound arguments that prevent the compiler from catching errors in your database access code.

Check out this sample code containing some common database access practices (I've added comments where runtime errors would be generated if you made any mistakes):

// Connection string is a literal string. 
// It may or may not be correct for the current database
SqlConnection c = new SqlConnection(?);
c.Open();
// Queries are literal strings. Runtime errors.
SqlCommand cmd = new SqlCommand(
  @"SELECT c.Name, c.Phone
       FROM Customers c
       WHERE c.City = @p0");
// Loosely bound arguments. Runtime errors
cmd.Parameters.AddWithValue("@p0", "London");
DataReader dr = c.Execute(cmd);
while (dr.Read()) {
    // loosely typed results introduce runtime errors.
    // No compile time type checks on types of results
    string name = dr.GetString(0);
    string phone = dr.GetString(1);
    DateTime date = dr.GetDateTime(2);
}
dr.Close();

Contrast that with a similar LINQ version:

// Classes describe data.
[Table(Name="Customers")]]
public class Customer 
{
  // Attributes specify column attributes:
  [Column(Id=true)
  public int Id;

  public string Name;
  public string Phone;
}

// Classes describe database connections:
public class Northwind: DataContext
{
    // Tables are like collections
    public Table<Customer> Customers;
    ?
}

// Strongly typed connection, but connection
// string is still a string.
Northwind db = new Northwind(?);
// Same familiar LINQ syntax
var contacts =
    from c in db.Customers
    where c.City == "London"
    // strongly typed results. Not casts or runtime errors
    select new { c.Name, c.Phone };

Resolving problems with loosely bound arguments is only the beginning; you need to think about other important issues. The three biggest concerns with current .NET APIs are object equality, relationships between tables, and database updates.

Object equality is a serious concern. C# and VB.NET use reference semantics, unless you override certain methods to provide value semantics. This means two object variables are equal if they refer to the same object in memory. However, relational databases define equality based on comparing the values of the primary keys.

In DLinq, the DataContext class provides the bridge between these models. It works by keeping track of the database records that you have retrieved through the data context. Anytime a query would generate the same results again, the DataContext class returns a reference to the same object. This means the DataContext class manages the conversion between the database's concept of object identity (primary key) and the Common Language Runtime's (CLR's) concept of object identity (object reference). So if your query requests an object that is already present, the query short-circuits and no database communications take place.

The second major concern with current .NET APIs is managing relationships between different object types. In relational data, entity relationships are defined by links between primary and foreign keys. When you create object models, you manage the relationships using containment. For example, a customer might contain a List<Orders>.

DLinq manages relationships using the EntityRef class and the Association attribute, as demonstrated by these two class definitions:

[Table(Name="Customers")]
public class Customer
{
	[Column(Id=true)]
	public string CustomerID;
	...
	private EntitySet<Order> _Orders;
	[Association(Storage="_Orders", OtherKey="CustomerID")]
	public EntitySet<Order> Orders {
		get { return this._Orders; }
		set { this._Orders.Assign(value); }
	}
}


[Table(Name="Orders")]
public class Order
{
	[Column(Id=true)]
	public int OrderID;
	[Column]
	public string CustomerID;
	private EntityRef<Customer> _Customer;    
	[Association(Storage="_Customer", ThisKey="CustomerID")]
	public Customer Customer {
		get { return this._Customer.Entity; }
		set { this._Customer.Entity = value; }
	}
}

The Customer class contains a set of Orders. An EntitySet object contains the many orders that relate to the one customer. The Association attribute on the Orders property of a customer describes the foreign key relationship that determines which field in the Orders table maps to the primary key in the Customer table. The Order class contains an EntityRef that references the customer to which this order refers. In this case, the Association attribute describes the key in the Orders table that maps to the CustomerID in the Customer table. These classes and the related attributes help the low-level DLinq libraries follow relationships to build complex sets of objects that provide the mapping between the database schema and the class definitions you create.

Updating the database is the third major issue plaguing .NET APIs. The DataContext class provides the methods for handling updates, or rolling back changes. The DataContext.SubmitChanges() method submits changes to the database, and implements an optimistic concurrency model for the database transactions. If there are concurrency errors, the SubmitChanges method throws a Concurrency exception. The RejectChanges() method discards all changes. Classes and methods support database transactions as well.

The DataContext class inside DLinq does lots of work to map between the database model and the object model correctly. It manages the relationships between table definitions and class definitions. It manages the different definitions of equality between the two worlds. It manages the different mechanisms for relationships between tables and relationships between objects. Finally, it manages updates, concurrency, and transactions (using other DLinq classes). In short, the DLinq portions of LINQ provide what most people think of when they discuss ORM alternatives.

XLinq: Processing XML Documents
Just mention XQuery or XML Dom programming to most C# or VB.NET developers, and they'll break out in a cold sweat (me included). These types of programming aren't particularly intuitive, and both require an entirely new syntax.

XLinq improves the situation in two major ways. First, XLinq replicates the power of XQuery and XPath in .NET languages, simplifying the syntax you need to learn in order to manipulate XML data. Second, XLinq enables XQuery and XPath features in a manner that is consistent with the general LINQ syntax. This means that XLinq programming is different from programming with the W3C DOM model. So the XLinq model is more natural and easier for most developers.

Here is code used to manipulate an XML document without LINQ:

// Use an imperative model for all instructions:
// Document centric: (Can't create elements without a document)
XmlDocument doc = new XmlDocument();
XmlElement contacts = doc.CreateElement("contacts");
foreach (Customer c in customers)
  // Different syntax needed for different data locations.
  if (c.Country == "USA") {
    XmlElement e = doc.CreateElement("contact");
    XmlElement name = doc.CreateElement("name");
    name.InnerText = c.CompanyName;
    e.AppendChild(name);
   // Memory intensive.  Extra objects created to support structure.
    XmlElement phone = doc.CreateElement("phone");
    phone.InnerText = c.Phone;
    e.AppendChild(phone);
    contacts.AppendChild(e);
  }
doc.AppendChild(contacts);

The same XML fragment can be created more simply using XLinq:

// More of a declarative model:
// Element centric:
XElement contacts = new XElement("contacts",
    // Syntax matches other LINQ queries
    from c in customers
    where c.Country == "USA"
    select new XElement("contact",
        // Smaller and faster.
        new XElement("name", c.CompanyName),
        new XElement("phone", c.Phone)
    )
);

By comparing these two snippets of code, you gain perspective on the goals of the XLinq feature set. Most obviously, the XLinq version of this code is much smaller. The XLinq syntax better suggests the XML structure of the resulting document. Finally, the XLinq version is more efficient, both in terms of speed and size. The LINQ query operators' use of deferred execution when enumerating a set of elements is responsible in part for this advantage. When a LINQ query executes, it examines the minimum amount of information in the source collection to determine if the Where clause passes the test.

XLinq also makes it easier to transform XML data from one format to another. Suppose you want to search a customer list and create a contact list containing the primary contact for each customer. Your source document might follow this format:

<CustomerList>
   <Customer>
         <First>Josh</First>
         <Last>Holmes</Last>
        <Address/>
        <Phone/>
        <OrderList/>
   </Customer>
   . . .
</CustomerList>

Data for several of the elements under each customer doesn't appear in this example, but you'll note placeholders for the missing data. Your output document might look like this:

<Contacts>
   <Contact>
       <FirstName>Josh</FirstName>
       <LastName>Holmes</LastName>
   </Contact>
   . . . 
</Contact>

The XLinq query to accomplish your transformation is simple:

new XElement("Contacts",
   from c in customerList.Elements("Customer")
   select new XElement("Contact",
          new XElement("FirstName", (string) c.Element("First")),
          new XElement("LastName", (string) c.Element("Last"))) );

Because XElements are first class objects, manipulating the XML structure is much simpler. You simply select the elements from the original data source, and place them into a new XElement structure.

But isn't the purpose of XML to provide a persistent storage model? One day, you'll be reading and writing XML documents using XLinq. Saving your data is incredibly simple; XElement.Save() does all the work. You need to supply the path name only. Creating an XElement from stored XML data is just as simple; you use XElement.Load( string pathname) or XElement.Parse(string xmlnode) to create an in-memory XElement tree from persistent storage.

This article has only scratched the surface of the code being developed for the LINQ project. LINQ includes query operators that provide many of the capabilities SQL developers take for granted, and operations that are beyond what SQL developers would expect. The DLinq and XLinq libraries also have more functionality: more complex mapping, deferred query execution, namespace and alias support, and more.

Of course, the LINQ syntax does have pitfalls. ORM is not a simple problem. The new language features move both C# and VB.NET into the realm of functional programming, while keeping their object-oriented roots. It's a difficult transition for many developers to make. But, for developers who work through the shift and learn the new methods for solving problems, LINQ will greatly improve their productivity.

About the Author
Bill Wagner, cofounder of SRT Solutions, has developed commercial software for the past 20 years, leading the design on many successful engineering and enterprise Microsoft Windows products. He now spends his time facilitating .NET adoption in clients' product and enterprise development. Bill's principal strengths include the C# language, the core framework, Smart Clients, and SOA and design. In 2003, Microsoft recognized Bill's expertise and appointed him Regional Director for Michigan. In 2005, he was reappointed and also awarded Microsoft C# Most Valuable Professional (MVP) status. A frequent speaker and internationally recognized author, Bill has been a contributing editor, editorial board member, and columnist for more than a decade. Addison-Wesley released his latest book, Effective C#, in 2004. He is a founding member of the Great Lakes .NET User Group and the Ann Arbor .NET Developers Group, and actively contributes to the Ann Arbor Computer Society.

Printable Format

comments powered by Disqus

Featured

Full-Stack with a Side of Copilot: Building and Deploying an App the AI-Accelerated Way

In this Q&A, developer and VSLive! speaker Esteban Garcia explains how GitHub Copilot can accelerate the full software development lifecycle -- from architecture and code to tests, CI/CD, and Azure deployment -- and how to use it as a repeatable engineering workflow rather than just a faster autocomplete tool.
VS Code 1.127 Further Integrates Advanced Browser-AI Tech

Microsoft's July 1 Visual Studio Code update continues a recent push to make the editor's integrated browser a more capable development surface -- and a more useful tool for AI agents.
Support Vector Regression with SGD Training Using C#

Support vector regression can predict numeric values effectively, and this article shows how to implement and train a kernel SVR model in C# using stochastic sub-gradient descent.
New GitHub Switch Limits Repo Issue Creation to Collaborators Only

After publicly touting pull request limits as a way to cut maintainer noise, GitHub is taking the same idea further with a new setting that lets repository admins restrict issue creation to collaborators only.