Developer Product Briefs
Save Time With LINQ Queries
See how the LINQ syntax, specifically DLinq and XLinq, can increase your productivity and reduce the possibility for error.
Save Time With LINQ Queries
See how the LINQ syntax, specifically DLinq and XLinq, can increase your productivity and reduce errors.
by Bill Wagner
May 1, 2006
Suppose different departments in your business spoke different languages:
English in software development, French in accounting, German in human resources,
and Spanish in marketing. This scenario would be disastrous for productivity,
regardless of which language was your favorite. So you probably won't encounter
that situation often. Instead, most businesses standardize with a single language
and vocabulary to make communication more efficient.
Software is different. We have created entirely different languages to process
data based on its location. We have C# and VB.NET for general computing. We've
created SQL to modify data that has been stored in a relational database. We've
created XSLT, DOM, and XQuery to process data that has been stored in an XML
document.
Think about this for a minute: The tools you choose for development are based
on the location of the data, not the kind of data you are manipulating. For
example, an Employee object in memory is processed differently from an Employee
record in a database, and you've even got different methods to process the
same Employee information in an XML document. Why?
This problem, a superset of Object Relational Mapping (ORM), is the justification
behind Language Integrated Query (LINQ). The concept behind LINQ is simple:
You can use your favorite general-purpose programming language to manipulate
data regardless of its location.
While this concept is simple, you need to familiarize yourself with the new
technology emerging around it. You need to learn about the language extensions
that support LINQ, DLinq, which translates LINQ queries into code that can
execute in the context of a database; and XLinq, which handles queries and
manipulation of objects in an XML document.
So the LINQ framework developers are still concerned with where data resides,
but you don't have to worry about it. You can write code in your favorite language,
and the LINQ assemblies do the hard work for you. In this article you'll see
how LINQ queries can save you time by simplifying the set of tools you use
to manipulate data.
Language Extensions
To make LINQ technologies work, the C# and VB.NET teams have added many features
to the languages that support querying objects. Here is a simple C# example
that queries an array for numbers divisible by 5:
int [] numbers = ReadNumbers();
var numsDivFive =
from n in numbers
where n % 5 == 0
select n.ToString();
These two statements introduce several new language features, so look at them
one at a time. Start with from n in numbers, which defines an enumeration
across a collection. It more or less has the same meaning as foreach (int
n in numbers). This statement declares a variable that will be used for
each and every element in the collection (n) and the collection being
enumerated (numbers).
Next look at select n.ToString(). It defines the object that is returned
from the enumeration. In this case, the code transforms n, or the
number, to a string by using the ToString() method.
Finally, examine the Where clause: where n % 5 == 0. It defines a
predicate that is tested for each and every element.
Where clauses are critical to understanding LINQ. For example, you can write
any type of filter or condition, and LINQ will translate it into executing
code. Fully understanding how Where clauses work will help you recognize the
power of the LINQ syntax. Where clauses are built on two concepts: extension
methods and lambda expressions.
Extension methods are methods that you write to extend the public interface of a class. The Where method used in the previous code sample is one of the
standard sequence operators, and Microsoft has delivered its source with the
LINQ preview builds:
public static IEnumerable<T> Where<T>(
this IEnumerable<T> source, Func<T, bool> predicate)
{
foreach (T element in source)
{
if (predicate(element)) yield return element;
}
}
By placing the this modifier on the source parameter, the
Where method acts as though it is a member of any class that implements IEnumerable<T>,
and extends the signature of IEnumerable<T>. Strict rules on the method
call resolution prevent extension methods from interfering with regular methods. (A
full discussion about these rules is beyond the scope of this article.)
The second parameter of Where, the predicate, is a lambda expression. A lambda
expression is a function. You can think of lambda expressions as passing functions
as data. In .NET, lambda expressions are built using delegates. The delegate
Func is defined in sequence.cs as:
public delegate T Func<A0,
T>(A0 arg0);
The Func delegate is simply any function that takes a single argument and
returns a type. The compiler translates the Where expression into a method,
and creates a delegate from it. In a sense, the compiler writes this method
for you:
bool MyPredicate( int n )
{ if ( n % 5 == 0 ) return true;
else return false;
}
The compiler also translates your query into a method call using the Func
delegate definition:
numbers.Where( new Func<int, bool>(MyPredicate));
Delegate functions are not magic. But the C# team added some powerful compiler
extensions to translate query expressions into delegate functions.
Finally you've arrived at the "dreaded" var keyword, and local type
inference. This keyword can seem complicated if you view it as untyped, and
implicitly substitute var for any type. Var is not untyped; instead, it is
an unnamed strong type. Substitute the compile-time type of the right-hand
side for var instead.
In the first example, the type is IEnumerable<string>. The select statement
returns a sequence of strings, so the compiler infers that numsDivFive must
be an IEnumerable<string>. The var keyword was added because some queries
create new types for you. More importantly, these types are anonymous; the
compiler assigns the new type a name without letting you know what it is.
For example, this snippet of code returns a sequence of a new type:
List<Customer> customers = GetCustomerList();
var orders =
from c in customers,
o in c.Orders
where o.Total < 500.00M
select new {c.CustomerID, o.OrderID,
o.Total};
The concept in this example is rather simple. It examines all orders from
all customers using a compound From clause. If the total cost of an order is
less than 500, the method adds that order to the resulting collection. But
it doesn't return the entire order. It returns a new type that contains three
public properties: the customer ID, the order ID, and the total cost. This
new type was created by the compiler. Here's the definition the compiler creates:
public class ????? {
private int customerID;
public int CustomerID
{
get { return customerID; }
set { customerID = value; }
}
private int orderID;
public int OrderID
{
get { return orderID; }
set { orderID = value; }
}
private decimal total;
public decimal Total
{
get { return total; }
set { total = value; }
}
}
It's essentially the same code you'd have created yourself. The only difference
is that you don't know the name of this new type; it's an anonymous type. So,
what type do you specify for orders? It's IEnumerable<?????>, but what
replaces the question marks? The compiler knows, but you don't.
This is why the C# language designers added the var keyword. You can use var
to declare local variables when you don't know the name of the anonymous type
that the compiler created for you. Variables declared with var must be initialized
at the point they are declared, because it's the only way the compiler knows
what type they might be.
Now you know the basic syntax for LINQ queries. The language designers have
added powerful new features to C# that enable you to write SQL-like queries
on any arbitrary sequence. Simply put, if you can foreach it, you can query
it. You can also use many extension methods: order by, group by, skip distinct,
union, and intersect. These new features add to your favorite programming language
the query capability that's normally associated with relational data.
DLinq: Bridging the Object/Relational Gap
So far you've received a whirlwind tour of the new language features in LINQ
that enable you to query objects in memory. DLinq was designed to manage several
of the issues relating to the differences between relational data and object
data. One of its general goals is to provide better programming language support
for relational expressions. The current .NET APIs you use for working with
relational data raise several concerns, many of which revolve around loosely
bound arguments that prevent the compiler from catching errors in your database
access code.
Check out this sample code containing some common database access practices
(I've added comments where runtime errors would be generated if you made any
mistakes):
// Connection string is a literal string.
// It may or may not be correct for the current database
SqlConnection c = new SqlConnection(?);
c.Open();
// Queries are literal strings. Runtime errors.
SqlCommand cmd = new SqlCommand(
@"SELECT c.Name, c.Phone
FROM Customers c
WHERE c.City = @p0");
// Loosely bound arguments. Runtime errors
cmd.Parameters.AddWithValue("@p0", "London");
DataReader dr = c.Execute(cmd);
while (dr.Read()) {
// loosely typed results introduce runtime errors.
// No compile time type checks on types of results
string name = dr.GetString(0);
string phone = dr.GetString(1);
DateTime date = dr.GetDateTime(2);
}
dr.Close();
Contrast that with a similar LINQ version:
// Classes describe data.
[Table(Name="Customers")]]
public class Customer
{
// Attributes specify column attributes:
[Column(Id=true)
public int Id;
public string Name;
public string Phone;
}
// Classes describe database connections:
public class Northwind: DataContext
{
// Tables are like collections
public Table<Customer> Customers;
?
}
// Strongly typed connection, but connection
// string is still a string.
Northwind db = new Northwind(?);
// Same familiar LINQ syntax
var contacts =
from c in db.Customers
where c.City == "London"
// strongly typed results. Not casts or runtime errors
select new { c.Name, c.Phone };
Resolving problems with loosely bound arguments is only the beginning; you
need to think about other important issues. The three biggest concerns with
current .NET APIs are object equality, relationships between tables, and database
updates.
Object equality is a serious concern. C# and VB.NET use reference semantics,
unless you override certain methods to provide value semantics. This means
two object variables are equal if they refer to the same object in memory.
However, relational databases define equality based on comparing the values
of the primary keys.
In DLinq, the DataContext class provides the bridge between these models.
It works by keeping track of the database records that you have retrieved through
the data context. Anytime a query would generate the same results again, the
DataContext class returns a reference to the same object. This means the DataContext
class manages the conversion between the database's concept of object identity
(primary key) and the Common Language Runtime's (CLR's) concept of object identity
(object reference). So if your query requests an object that is already present,
the query short-circuits and no database communications take place.
The second major concern with current .NET APIs is managing relationships
between different object types. In relational data, entity relationships are
defined by links between primary and foreign keys. When you create object models,
you manage the relationships using containment. For example, a customer might
contain a List<Orders>.
DLinq manages relationships using the EntityRef class and the Association
attribute, as demonstrated by these two class definitions:
[Table(Name="Customers")]
public class Customer
{
[Column(Id=true)]
public string CustomerID;
...
private EntitySet<Order> _Orders;
[Association(Storage="_Orders", OtherKey="CustomerID")]
public EntitySet<Order> Orders {
get { return this._Orders; }
set { this._Orders.Assign(value); }
}
}
[Table(Name="Orders")]
public class Order
{
[Column(Id=true)]
public int OrderID;
[Column]
public string CustomerID;
private EntityRef<Customer> _Customer;
[Association(Storage="_Customer", ThisKey="CustomerID")]
public Customer Customer {
get { return this._Customer.Entity; }
set { this._Customer.Entity = value; }
}
}
The Customer class contains a set of Orders. An EntitySet object contains
the many orders that relate to the one customer. The Association attribute
on the Orders property of a customer describes the foreign key relationship
that determines which field in the Orders table maps to the primary key in
the Customer table. The Order class contains an EntityRef that references the
customer to which this order refers. In this case, the Association attribute
describes the key in the Orders table that maps to the CustomerID in the Customer
table. These classes and the related attributes help the low-level DLinq libraries
follow relationships to build complex sets of objects that provide the mapping
between the database schema and the class definitions you create.
Updating the database is the third major issue plaguing .NET APIs. The DataContext
class provides the methods for handling updates, or rolling back changes. The
DataContext.SubmitChanges() method submits changes to the database, and implements
an optimistic concurrency model for the database transactions. If there are
concurrency errors, the SubmitChanges method throws a Concurrency exception.
The RejectChanges() method discards all changes. Classes and methods support
database transactions as well.
The DataContext class inside DLinq does lots of work to map between the database
model and the object model correctly. It manages the relationships between
table definitions and class definitions. It manages the different definitions
of equality between the two worlds. It manages the different mechanisms for
relationships between tables and relationships between objects. Finally, it
manages updates, concurrency, and transactions (using other DLinq classes).
In short, the DLinq portions of LINQ provide what most people think of when
they discuss ORM alternatives.
XLinq: Processing XML Documents
Just mention XQuery or XML Dom programming to most C# or VB.NET developers,
and they'll break out in a cold sweat (me included). These types of programming
aren't particularly intuitive, and both require an entirely new syntax.
XLinq improves the situation in two major ways. First, XLinq replicates the
power of XQuery and XPath in .NET languages, simplifying the syntax you need
to learn in order to manipulate XML data. Second, XLinq enables XQuery and
XPath features in a manner that is consistent with the general LINQ syntax.
This means that XLinq programming is different from programming with the W3C
DOM model. So the XLinq model is more natural and easier for most developers.
Here is code used to manipulate an XML document without LINQ:
// Use an imperative model for all instructions:
// Document centric: (Can't create elements without a document)
XmlDocument doc = new XmlDocument();
XmlElement contacts = doc.CreateElement("contacts");
foreach (Customer c in customers)
// Different syntax needed for different data locations.
if (c.Country == "USA") {
XmlElement e = doc.CreateElement("contact");
XmlElement name = doc.CreateElement("name");
name.InnerText = c.CompanyName;
e.AppendChild(name);
// Memory intensive. Extra objects created to support structure.
XmlElement phone = doc.CreateElement("phone");
phone.InnerText = c.Phone;
e.AppendChild(phone);
contacts.AppendChild(e);
}
doc.AppendChild(contacts);
The same XML fragment can be created more simply using XLinq:
// More of a declarative model:
// Element centric:
XElement contacts = new XElement("contacts",
// Syntax matches other LINQ queries
from c in customers
where c.Country == "USA"
select new XElement("contact",
// Smaller and faster.
new XElement("name", c.CompanyName),
new XElement("phone", c.Phone)
)
);
By comparing these two snippets of code, you gain perspective on the goals
of the XLinq feature set. Most obviously, the XLinq version of this code is
much smaller. The XLinq syntax better suggests the XML structure of the resulting
document. Finally, the XLinq version is more efficient, both in terms of speed
and size. The LINQ query operators' use of deferred execution when enumerating
a set of elements is responsible in part for this advantage. When a LINQ query
executes, it examines the minimum amount of information in the source collection
to determine if the Where clause passes the test.
XLinq also makes it easier to transform XML data from one format to another.
Suppose you want to search a customer list and create a contact list containing
the primary contact for each customer. Your source document might follow this
format:
<CustomerList>
<Customer>
<First>Josh</First>
<Last>Holmes</Last>
<Address/>
<Phone/>
<OrderList/>
</Customer>
. . .
</CustomerList>
Data for several of the elements under each customer doesn't appear in this
example, but you'll note placeholders for the missing data. Your output document
might look like this:
<Contacts>
<Contact>
<FirstName>Josh</FirstName>
<LastName>Holmes</LastName>
</Contact>
. . .
</Contact>
The XLinq query to accomplish your transformation is simple:
new XElement("Contacts",
from c in customerList.Elements("Customer")
select new XElement("Contact",
new XElement("FirstName", (string) c.Element("First")),
new XElement("LastName", (string) c.Element("Last"))) );
Because XElements are first class objects, manipulating the XML structure
is much simpler. You simply select the elements from the original data source,
and place them into a new XElement structure.
But isn't the purpose of XML to provide a persistent storage model? One day,
you'll be reading and writing XML documents using XLinq. Saving your data is
incredibly simple; XElement.Save() does all the work. You need to supply the
path name only. Creating an XElement from stored XML data is just as simple;
you use XElement.Load( string pathname) or XElement.Parse(string xmlnode) to
create an in-memory XElement tree from persistent storage.
This article has only scratched the surface of the code being developed for
the LINQ project. LINQ includes query operators that provide many of the capabilities
SQL developers take for granted, and operations that are beyond what SQL developers
would expect. The DLinq and XLinq libraries also have more functionality: more
complex mapping, deferred query execution, namespace and alias support, and
more.
Of course, the LINQ syntax does have pitfalls. ORM is not a simple problem.
The new language features move both C# and VB.NET into the realm of functional
programming, while keeping their object-oriented roots. It's a difficult transition
for many developers to make. But, for developers who work through the shift
and learn the new methods for solving problems, LINQ will greatly improve their
productivity.
About the Author
Bill Wagner, cofounder of SRT
Solutions, has developed commercial software for the past 20 years, leading
the design on many successful engineering and enterprise Microsoft Windows
products. He now spends his time facilitating .NET adoption in clients' product
and enterprise development. Bill's principal strengths include the C# language,
the core framework, Smart Clients, and SOA and design. In 2003, Microsoft recognized
Bill's expertise and appointed him Regional
Director for Michigan. In 2005, he was reappointed and also awarded Microsoft
C# Most Valuable Professional (MVP) status. A frequent speaker and internationally
recognized author, Bill has been a contributing editor, editorial board member,
and columnist for more than a decade. Addison-Wesley released his latest book, Effective
C#, in 2004. He is a founding member of the Great
Lakes .NET User Group and the Ann Arbor .NET Developers Group, and actively
contributes to the Ann Arbor Computer
Society.