LINQ Changes How You Will Program
Extension methods bring together old and new ways of working with data, and open doors to new language opportunities.
Technology Toolbox: VB.NET
- By Bill McCarthy
LINQ will revolutionize the way Visual Studio developers think about and write code. At the most basic level, using LINQ-style queries in your code transitions your code from imperative- to declarative-style programming: you no longer say how things are done step-by-step, but instead move to stating what your goal is. LINQ also represents bigger, yet more subtle changes in the way we code. Delayed evaluation, functional programming, and language translation through expression trees are all part of the fabric of LINQ and significantly alter the landscape for VB and C#.
Consider this simple query that selects Customers with the last name "Smith":
Dim query = From c in Customers _
Where c.LastName = "Smith"
You might recall from my "Beautify Your Code with Extensions" article [Programming Techniques, VSM May 2007] that this query will call the Where extension method based on the most derived match for argument of the Where methods in scope. If the System.Linq namespace has been imported either at project level or at the start of the code file, and Customers is a List(Of Customer), the closest matching Where extension will be found in the System.Linq.Enumerable class.The System.Linq.Enumerable class contains all the LINQ extension methods for types defined as IEnumerable(Of T): The extensions include Any, All, Max, Min, Join, GroupBy, GroupJoin, OrderBy, ThenBy, Select, SelectMany, and Where -- to name a few. A query on a List(Of Customer) typically uses these extension methods. The Where method used in this case has this signature:
Public Shared Function Where(Of TSource) ( _
source As IEnumerable(Of TSource), _
predicate As Func(Of TSource, Boolean) _
) As IEnumerable(Of TSource)
The query is compiled to use the Where method of System.Linq.Enumerable:
Dim query = Customers.Where( _
Function(c As Customer) c.LastName = "Smith" )
Notice the Function(c As Customer) c.LastName = "Smith" part of the query expression. This is a lambda expression, also referred to as an inline function. In this case, the lambda expression is compiled to match the required signature of the predicate argument in the Where extension method.
The predicate argument is of the signature Func(Of TSource, Boolean), which is a concrete delegate signature based on the generic Funct(Of T, TResult). In this case, TResult is a Boolean so it means the function must return a Boolean.
Putting this all together, the Where method called would have this concrete signature:
Public Shared Function Where ( _
source As IEnumerable(Of Customer), _
predicate As Func(Of Customer, Boolean) _
) As IEnumerable(Of Customer)
An interesting feature of this approach is that even though you've defined the query to search through the LastName property, nothing has happened yet. This is because the lambda function is compiled as a delegate, which gets called only by the IEnumerator(Of T) the Where method returns. So, it's only when you iterate over the query that the lambda expression gets called on each customer to see whether the customer's LastName is Smith.
If you were to look inside the System.Linq.Enumerable class, you would find nested enumerator classes specialized for different kinds of expressions. In this case, the enumerator class is a <WhereIterator>d__0(Of TSource). The name doesn't matter because you shouldn't see that. The key thing to note is that it implements IEnumerable(Of TSource) and IEnumerator(Of TSource). In this case TSource being Customer.
This iterator class is both IEnumerable and IEnumerator, so it allows the class to return a reference to itself when IEnumerable.GetEnumerator is called. If GetEnumerator is called a second time or from a different thread, a reset clone is returned. The lambda function is stored in a field called predicate and when IEnumerator.MoveNext is called, the source is enumerated and the lambda is called on each item.
This delayed evaluation allows you to re-use items declared once. For example, assume you change the original query to take input from a textbox:
Dim query = From c in Customers _
Where c.LastName = TextBoxName.Text
You can now re-use this query. The value that is in the text box won't be evaluated until the query is iterated, but it will be evaluated on each loop. Assume, for example, there are multiple Smiths in your customer list when you implement this code:
For each c As Customer in query
TextBoxName.Text = "Smith"
The first item returned is what the text box text was when you entered the code, but "Smith" will be the item looked for in all subsequent items. This is because the lambda in this case is compiled as a function, similar to this:
Function Lambda1(ByVal c as Customer) As Boolean
Return c.LastName = TextBoxName.Text
This approach comes with some advantages and potential disadvantages. You might get some unexpected results if the variable being used in the query expression (the TextBoxName.Text property, in this case) can change during the iteration. In some cases, this approach can result in a runtime exception. For example, you can get a runtime exception to occur if you modify the original List(Of Customer) by adding or removing items while the query is being iterated.
Of course, these problems are no different than the issues you need to deal with today when iterating a list; the difference is, the issues might not be as inherently obvious to you when using queries. The key thing to note here is this query expression is not evaluated until the query is iterated over:
Where c.LastName = _
This is in distinct contrast to a function call, where the TextBoxName.Text value is evaluated as the parameter value. The new code is declarative in nature.
Queries are Compositional
Queries aren't evaluated until they are iterated, so you can build a query on a query. For example, you might decide to get all Smiths with the first name "John" from the original query:
Dim query2 = From c in query _
Where c.FirstName = "John"
With a List(Of Customer), query2 would be another Where iterator that uses the where iterator from the first query as the source. Adopting this approach results in cascading iterators. In this particular case, it would be more efficient to express the query espression like this:
Where c.LastName = "Smith" AndAlso c.FirstName = "John"
Here, the expression is compiled into the one lambda function. But in cases where you are using a nested From clause, such as selecting invoices from the customer, separating the query into two parts can often help in readability:
Dim invoices = From c in Customers _
Where c.LastName = "Smith" _
From inv In c.Invoices _
Where inv.Date > searchDate _
You could write the query as two separate queries:
Dim smiths = From c in Customers _
Where c.LastName = "Smith"
Dim invoices = From inv In smiths _
Where inv.Date > searchDate
If you are writing heavily nested queries, breaking them down into their compositional parts also makes for easier debugging. Both VB and C# allow an IEnumerable(Of T) to be evaluated while execution is paused for debugging purposes. This allows you to see easily how many Smiths there are.
So far I've only talked about LINQ queries where the lambda expressions are compiled as functions. If you are using LINQ to SQL, you don't want these expressions to be functions called on the client side. In that case, you would be fetching all the data, then running the functions one-by-one as the entire data is iterated. Instead, what you want, and what you get, is a translation of the lambda expression to TSQL.
For example, assume you were to write a query similar to the one illustrated earlier. The generated T-SQL looks like this:
SELECT [t0].[FirstName], [t0].[LastName]
FROM [dbo].[Customers] AS [t0]
WHERE ([t0].[ FirstName] = @p0)
AND ([t0].[ LastName] = @p1)
— @p0: Input String
(Size = 4; Prec = 0; Scale = 0) [John]
— @p1: Input String
(Size = 5; Prec = 0; Scale = 0) [Smith]
LINQ to SQL will generate the same query whether or not you write this as one query or as a query on a query. In other words, LINQ to SQL optimizes compositional queries. You can write the query like this:
Dim finalquery = From c in Customers _
Where c.FirstName = "John" _
AndAlso c.LastName = "Smith"
Or, you can write it like this:
Dim query1 = From c in Customers _
Where c.FirstName = "John"
Dim finalquery = From c in query1 _
Where c.LastName = "Smith"
Either way, LINQ to SQL generates the same T-SQL.
The mechanism by which the LINQ to SQL extensions generate the T-SQL is complex. The key, though, is that the lambda function in this case is passed to the extension methods as an expression tree, not as
An expression tree is a symbolic way of describing query expressions. This description is not executable code; rather, it's the information needed to compile executable code. This description allows for different providers to compile the query as appropriate. As such, expression trees provide the information about what the expression is meant to do, not how it does it. This is declarative coding; the actual implementation is up to the query provider.
Assume you write the original query with this Where clause:
c.LastName = "Smith"
This creates an expression consisting of a BinaryExpression at the root of the tree. The BinaryExpression has a Method property that stores MethodInfo -- the String.Equals method, in this case. The BinaryExpression also includes Left and Right Expressions, which are the Expression representations of the operands. Here, the Left expression is a PropertyExpression, c.LastName. The Right expression is a ConstantExpression: Smith (see Figure 1).
[Click on image for larger view.]
|Figure 1. Create an Expression Tree for a Where Clause.
This figure describes the expression tree for a simple Where clause for this query:
Form c in Customers Where c.LastName = "Smith"
The expression tree is an Expression(Of Func(Of Customer, Boolean)) and represents the lambda function Function (c) c.LastName = "Smith"
Expression trees give you an easy way to compose queries, by building upon an expression (see Figure 2). The magic of expression trees comes from the way the VB compiler generates them for you automatically from your query expressions or lambda functions. The VB compiler knows to compile to an expression tree based on the extension method that is in scope. For LINQ to SQL, the extension method will be in the System.Linq.Queryable class. The Queryable class echoes the features of the System.Linq.Enumerable class; however, the predicate arguments are as Expression(Of Func(Of T, TResult), rather than as Func(Of T, TResult). In other words, the predicates become expressions that describe the function, rather than the function as a delegate.
[Click on image for larger view.]
|Figure 2. Create an Expression Tree for the Lambda Function.
The process for creating an expression tree for the Lamda function isn't dissimilar to what you must do to create one for a Where clause:
Function (c) c.FirstName
= "John" AndAlso c.LastName = "Smith"
Note how this expression builds upon Figure 1 by replacing the root with another BinaryExpression and making the original expression the new Left operand.
The System.Linq.Queryable.Where extension has this signature:
Public Shared Function Where(Of TSource)( _
ByVal source As IQueryable(Of TSource), _
ByVal predicate As Expression( _
Of Func(Of TSource, Boolean)) _
) As IQueryable(Of TSource)
Not only is the predicate as Expression, but the source is as IQueryable(Of T) instead of IEnumerable(Of T). IQueryable(Of T) implements IEnumerable(Of T) and adds three new properties: ElementType, Expression, and Provider.
The extension methods in System.Linq.Queryable call on the IQueryable's Provider to create the query passing the expression tree to it and to get an IQueryable back. The Provider does the necessary translation of the expression. For LINQ to SQL, the provider is the System.Data.Linq.Table(Of Entity) class.
There is one other standard provider. The System.Core.dll that contains the System.Linq.Enumerable and System.Linq.-
Queryable classes comes with a default IQueryableProvider, EnumerableQuery(Of T). The EnumerableQuery(Of T) provider translates an expression tree to executable code, MSIL. You can use this provider implicitly through a data source as IEnumerable or IEnumerable(Of T) and calling the AsQueryable extension. For example, if you have a List(Of Customer), you can call the AsQueryable extension on it to return an IQueryable(Of Customer) with an EnumerableQuery(Of Customer) as the Provider. This allows you to pass in expression trees to the extension methods instead of delegates, which makes it possible to create a dynamic query for things such as a UI. The sample application shows you how to build an expression tree by constructing a dynamic query based on user input.
You can build your own Expression tree and pass that to any of Queryable extensions. Take the example of searching for last name of Smith, you can create the expression tree as follows:
' build an expression tree for the lambda function
' Function( c As Customer ) c.LastName = "Smith"
' create a parameter expression for the c As Customer
Dim exParam = Expression.Parameter( _
' create a proeprty expression for c.LastName
Dim exProperty = Expression.Property( _
' create a constant expression for the string "Smith"
Dim exConst = Expression.Constant( _
' create the equal expression using
' the property and constant
Dim exEquals = _
' create the lambda expression from the parts
Dim exLambda = Expression.Lambda( _
Of Func(Of Customer, Boolean)) _
' create the query
Dim query = _
The Queryable extension methods provide a means for you to translate from one boundary to another. The LINQ to SQL provider translates your VB code to T-SQL through an expression tree, and the EnumerableQuery provider translates an expression to MSIL. You can use other providers to translate code to different Web services or different data stores or anything that has some form of query language.
As your code is translated in this manner, it becomes more declarative and you lose precise control over the when and how it works. For example, consider the case of the text box I discussed earlier. Using List(Of Customers) and changing the text to search for, while iterating immediately changes the rest of the iteration because the function is called on each iteration. With LINQ to SQL, you don't get this behavior because it's translated to T-SQL and evaluated at the start of the iteration.
The advantage of having the providers decide how to evaluate the lambdas is that you get more efficient code, as is the case with the generated T-SQL. Soon, we're likely to see providers that also do operations in parallel, taking advantage of multi-core processors. Again, you'll relinquish the control of "how" to them and focus on specifying the "what." The empowerment is from letting go, by letting providers choose what's best. The change to using a declarative functional programming style has many subtleties about it that change the problem boundaries and the way you will write code. You'd be right to feel a little uncertain: The subtleties are all too often the cause of unwanted side effects. But by understanding how these things work, you can learn to trust them and let them do the work for you. It might take time, but in the end I'm sure you'll find the declarative styles encouraged by LINQ empowering.