Practical .NET

To Think in LINQ

If you start "thinking in LINQ" you'll get more done with less code, and what you write will be simpler than using SQL.

Switching to LINQ and the Entity Framework (EF) means that code you'll write for your data-driven applications will be simpler than the equivalent SQL. I admit: You can't do everything with LINQ that you can with SQL (though, when you need to, you can execute SQL from the EF). But SQL doesn't generate objects from rows in tables, either. Here's a look at how to do the typical things you'll need to do in an application, but done with LINQ+EF with some hints about "thinking in LINQ" and a few traps to avoid.

Retrieving Data Across Multiple Tables
In an EF model based on the Northwind database, this code would retrieve all the Customers that have a home office in Berlin, sort them by the name of the contact person and display the result in a GridView:

Dim nw As New northwindEntities
Dim res = From c In nw.Customers
          Where c.City = "Berlin"
        Order By c.ContactName
        Select c

GridView1.DataSource = res

This is already simpler than the code you'd need with SQL: You don't need to create and configure three different ADO.NET objects (or a DataSet), you don't need to manage opening/closing connections and, especially, you don't need the code to convert the retrieved rows into the objects that the rest of your application uses.

But what if you wanted to retrieve Customer objects based on the purchases the customer had made? Thinking like a SQL developer, you could use the Join keyword. This example joins the Customers collections to the Orders collections on their CustomerIds and retrieves the Customer objects where a related Order object has freight charges:

Dim res = From c In nw.Customers
          Join o In nw.Orders
            On c.CustomerID Equals o.CustomerID
          Where o.Freight > 0
          Select c

But if you're joining two tables, you probably want data from both tables. To do that, create a new object with properties containing values from both objects. What you don't need to do is give this object a name -- all you care about are the properties that hold the values from the two collections you're joining: You create an anonymous object.

This example creates that anonymous object (you can see that the New keyword isn't followed by a class name) with two properties: Name and Emp. The anonymous object's Name property is set to the CompanyName property from the Customer object, and the anonymous object's Emp property is set to the EmployeeId from the Order object:

Dim res = From c In nw.Customers
          Join o In nw.Orders
            On c.CustomerID Equals o.CustomerID
          Where o.Freight > 0
          Select New With {.Name = c.CompanyName, 
            .Emp = o.EmployeeID}

If you're uncomfortable with not knowing the name of the anonymous object, you can define your own class to hold your result and set its properties from your code. This example assumes that I've defined a class named OrderDto with two properties called Name and Emp:

Dim res = From c In nw.Customers
          Join o In nw.Orders
            On c.CustomerID Equals o.CustomerID
          Where o.Freight > 0 
          Select New OrderDto With {.Name = c.CompanyName, 
            .Emp = o.EmployeeID}

Thinking in LINQ
Using an explicit Join in a LINQ query is an example of thinking "in SQL" rather than "in LINQ+EF." For instance, the Order object has a Customer navigation property that links the Order object to the Customer object to which the Order belongs. So in my LINQ code, rather than using an explicit Join to attach the Customer to the Orders table, I can query the Orders collection and retrieve the Customer information through the Orders navigation property:

Dim res = From o In nw.Orders
          Where o.Freight > 0
          Select New With {.Name = o.Customer.CompanyName, 
            .Emp = o.EmployeeID}

This gives the same result as the SQL-like version with the explicit Join, but requires about 33 percent less typing (and better IntelliSense support).

This query just retrieves the EmployeeID -- not much help to users who would rather see the Employee name. Fortunately, the Order entity has a navigation property that points to the Employee that placed the Order. You can follow that navigation property to get to the Employee's last name. This example does that and is considerably easier to write than a query joining the Customers, Orders and Employees tables:

Dim res = From o In nw.Orders
          Where o.Freight > 0
          Select New With {.Name = o.Customer.CompanyName,
            .Emp = o.Employee.LastName}

These examples lead to a couple of insights about "thinking in LINQ" (and the EF): First, think in terms of objects and properties, rather than rows and columns; second, only use Join when there isn't a navigation property. In fact, if the navigation property you want doesn't exist, consider adding the appropriate Navigation property to your EF model. The result will be code that's easier to both write and read.

Returning LINQ Results
A LINQ query always returns a collection (which may have 0, 1 or many objects). However, the collections returned from LINQ+EF aren't exactly like other collections. For instance, you can write code like this to return the result of a LINQ query from a method:

Dim res = From o In nw.Orders
          Select o 
Return res

Done from a method in an application to some other method in the same application, this code probably won't have a problem. Done from a Windows Communication Foundation (WCF) service, on the other hand, the code reveals a LINQ trap.

There are a couple of reasons for this. One is that a LINQ to Entities query isn't processed until you actually manipulate the results. Processing a LINQ query in an application separate from the EF model that generated the query is a bad idea. The other reason is that the EF constantly monitors the changes made to LINQ objects (for instance, changing the value in a property). That monitoring isn't going to work across applications.

Putting those two traps together means it's not a good idea to return the results from a LINQ query from a method. It's a better idea to transfer the data you need from the EF objects to a named class and return that named class.

The following method contains a LINQ query that instantiates the OrderDTO class, sets its properties from entities in the model and then returns the results, after converting the result to a List with an extension method:

Public Function OrdersWithFreight() As List(Of OrderDto)
Dim nw As New northwndModel.northwindEntities

Dim res = From o In nw.Orders
          Where o.Freight > 0
          Select New OrderDto With {.OrderId = o.OrderId,
            .CompanyName = o.Customer.CompanyName,
            .EmployeeName = o.Employee.LastName}
Return res.ToList

End Function

Simplifying Queries
We've all seen SQL queries that took several seconds to puzzle out (all of those queries were written by someone else, of course). LINQ includes the Let keyword, which can simplify your queries by assigning a value to, effectively, a variable within the LINQ statement.

For instance, let's say you want to find all the Orders that have an Order_Detail with a very small or very large value (in this case, "very small" means less than $10 and "very large" means more than $100). The problem is that calculating the value of an Order_Detail means multiplying the quantity by the price and then subtracting any discount. Rather than repeating that calculation twice (once to compare to the minimum value and once to compare to the maximum value), the Let clause allows you to store the results of the calculation in a "query variable" to use elsewhere in the query.

The following example uses the Let keyword twice to ensure that the code is relatively easy to read for anyone who comes after me. It also includes the Order By keyword and navigation properties to sort the results by Customer number (and still without writing a Join):

Dim res = From ol In nw.Order_Details
          Let Totprice = ol.UnitPrice() * ol.Quantity
          Let FinalPrice = Totprice - (Totprice * ol.Discount)
          Where (FinalPrice < 10.0 Or FinalPrice > 100)
          Order By ol.Order.Customer.CustomerID
          Select ol.Order

The Power of Extension Methods
You may have recognized that the previous query is going to return duplicate results: If an order has multiple Order_Details that meet the criteria, the Order will appear in the result once for each matching Order_Detail.

Extension methods can solve this problem as handily as the ToList method converted a query to a List. Technically, extension methods aren't part of LINQ -- they're just methods that attach themselves to objects that implement a specific interface.

Fortunately, there are dozens of extension methods that attach themselves to the collections that a LINQ query produces. One of them is the Distinct method, which eliminates duplicates in a LINQ result. To have each order appear only once, I'd add the Distinct extension method to the previous query:

Dim res = (From ol In nw.Order_Details
          Let Totprice = ol.UnitPrice() * ol.Quantity
          Let FinalPrice = Totprice - (Totprice * ol.Discount)
          Where (FinalPrice < 10.0 Or FinalPrice > 100)
          Order By ol.Order.Customer.CustomerID
          Select ol.Order).Distinct

I may find, however, that after the duplicates are removed, the results may not be in the specified order (the Distinct method doesn't guarantee that the results order is maintained). The solution is simple, though: tack an OrderBy extension method onto the end of my query, following the Distinct method. The OrderBy method must be passed as a lambda expression that returns the property to sort on (in this case, the CustomerID):

res = (From ol In nw.Order_Details
      Let Totprice = ol.UnitPrice() * ol.Quantity
      Let FinalPrice = Totprice - (Totprice * ol.Discount)
      Where (FinalPrice < 10.0 Or FinalPrice > 100)
      Select ol.Order).Distinct.
        OrderBy(Function(ol) ol.Customer.CustomerID)

By the way, don't be concerned that using the extension methods to process the result of the LINQ query will cause the EF to generate a SQL query that returns all the rows, converts them all into objects, removes the duplicate objects and then sorts them. The SQL statement generated by the LINQ statement will include the Distinct keyword and the OrderBy clause. In other words, it will be exactly the SQL statement you would write.

Using extension methods, however, leads to another LINQ trap: Not all extension methods can be folded into a SQL statement the way Distinct and OrderBy can, because SQL doesn't have the matching functionality.

For instance, the SkipWhile method, which allows you to skip over objects at the start of the collection that meet some condition, doesn't have a SQL equivalent. This LINQ query, which skips over the initial Order_Details that don't have a discount, won't work (and I'm not suggesting this a reasonable query, by the way -- it's just an example):

Dim res = (From od In nw.Order_Details
           Order By od.Discount
           Select od).
SkipWhile(Function(od) od.Discount = 0)

That's not a problem, as it turns out. All you have to do is convert the result to something that isn't SQL related -- a List or an array, for instance. Once you've done that you can use any LINQ-related extension method. This example converts the results to an array before using the SkipWhile extension method:

Dim res = (From od In nw.Order_Details
           Order By od.Discount
           Select od).ToArray

Dim ret = res.SkipWhile(Function(od) od.Discount = 0)

This is what you'd have to do anyway to accomplish this code's goals: use SQL to retrieve the data and write some code to iterate over the initial records (although, unless you needed all of the rows for some other purpose, the smart thing to do would be to just include a Where clause that rejects those records where Discount = 0).

So another rule for using LINQ: Get to know your LINQ extension methods and be willing to combine them to get the result you want. When you want to use some extension method that SQL doesn't support, consider converting your result into some type of object collection that LINQ will work with and then applying the extension method.

The Power of Groups
One of the jewels in the SQL crown is the Group By keywords, which create a collection of anonymous objects. (I realize that the next few queries don't do anything that couldn't be done without grouping -- I'm going to make a point first and then use the Group By keyword to do something that you'd actually want to do.) This is a case where the C# syntax is significantly different from the Visual Basic syntax, so I'll do the Visual Basic syntax first and then follow with the C# code.

Here's a basic grouping query that groups Orders by the Customer object they belong too:

Dim res = From ord In nw.Order
          Group By custId = ord.CustomerID
          Into CustOrders = Group

The Group By keyword creates a collection of anonymous objects, one for each different CustomerID value. The Group By and the Into keyword work together to define properties of the anonymous object -- one property called custId and one called CustOrders. The custId property is set to the CustomerID for the grouped Orders, while the CustOrders property, thanks to the Group keyword, contains all the Orders that share the CustomerID.

Note that there's no Select in the previous query -- it would be difficult to refer to the anonymous object generated by the Group By, so don't try.

The result, as with any LINQ query, is a collection of these anonymous objects. Which means that, like SQL, the output of most LINQ queries can be used as the input to another LINQ query. A second query can use the result to find the anonymous object for a specific customer, like this:

Dim res2 = (From custOrd In res
           Where custOrd.custId = "ALFKI"
           Select custOrd)

As with any LINQ query, res2 will be a collection -- but as you've probably realized, there will be only one item in the collection: the anonymous object with a custId of ALFKI. Rather than work with the collection, you can use the First extension method to extract the first (and only) item in the result:

Dim res3 = res.First

Finally, all the Orders in the CustOrders collection can be processed with code like this:

For Each ord In res3.CustOrders
  MessageBox.Show(ord.Employee.FirstName)
Next

Rather than retrieve the collection in two separate queries, however, you can collapse all that code into a single query that creates the anonymous objects and then selects one of the anonymous objects based on the CustID property. That query would look like this:

Dim res = (From ord In nw.Orders
          Group By custID = ord.CustomerID
          Into CustOrders = Group
          Where custID = "ALFKI").First

The C# is different. First, you must explicitly declare the anonymous object you intend to return. In defining that object, you can use the Key keyword to refer to the value that you grouped on (CustomerID, in this example). You don't need the Group keyword to define the property that holds all of the related records (all the Orders, in this case) -- that's done automatically. So, in C#, this code will create the collection of anonymous objects with two properties, select the object with its key set to ALFKI and extract the first item:

var res = (from ord in nw.Orders
          group ord by ord.CustomerID
            into CustOrders
          where CustOrders.Key == "ALFKI"
          select new {custId = CustOrders.Key, 
            cOrders = CustOrders }).First();

There's no real benefit in collapsing the two queries into one, by the way. The EF doesn't generate the SQL for a LINQ query until one of the resulting collections is processed. In the Group By examples in the article, it wouldn't be until the For...Each loop processed the CustOrders collection that a SQL statement would be generated. That SQL statement would reflect both LINQ queries. That leads to another way of "thinking in LINQ": If you feel your code is easier to read or maintain by having separate LINQ queries, don't hesitate to break your LINQ query down into multiple steps.

Summarizing Data
Nothing's been done in these sample queries that couldn't be done just as easily without grouping. Where grouping can be useful is in generating summary values for related records. To get a summarized value, just add a new property to your anonymous object to hold the summarized value and use an extension method (or equivalent) to generate the result. This example adds a property to an anonymous object that holds the sum of the freight for all the Orders that share a CustomerID:

Dim res = From ord In nw.Orders
          Group By custID = ord.CustomerID
            Into SumOfFreight = Sum(ord.Freight)

For Each co In res
  MessageBox.Show(co.custID & ": " & co.Sum)
Next

In C#, the syntax for specifying the summary property is a little more ornate. You still need to specify the property you're adding to the anonymous object to hold the summarized value. However, to calculate the summary value you must run the extension method on the grouping name, passing a lambda expression to specify the property used in the summary. This example defines a property called SumOfFreight and sums up the Freight property on the grouped Orders:

res = from ord in nw.Orders
      group ord by ord.CustomerID
        into CustOrders
      select new {custId = CustOrders.Key, 
        SumOfFreight = CustOrders.Sum(o=>o.Freight)};

If you want to select an anonymous object based on the summary values, just use the Where clause to find the groups you want. This example finds all the customers with a total freight greater than $5,000:

Dim res = From ord In nw.Orders
          Group By custID = ord.CustomerID
            Into SumOfFreight = Sum(ord.Freight),
              AverageOfFreight = Average(ord.Freight)
          Where SumOfFreight > 5000

In C#, because the where keyword precedes the select keyword, you'll use a summary expression in the where clause to find objects based on a summary value. This example calculates the sum of the Freight property twice (once to find groups where the total is greater than $5,000 and once to set the SumOfFreight property):

var res = from ord in nw.Orders
          group ord by ord.CustomerID
            into CustOrders
          where CustOrders.Sum(o=>o.Freight) > 5000
          select new {custId = CustOrders.Key, 
            SumOfFreight = CustOrders.Sum(o=>o.Freight)}; 

Rather than repeat the summary expression, this would be a good place to use the Let keyword:

var res2 = from ord in nw.Orders
           group ord by ord.CustomerID
             into CustOrders
           let FreightSum = CustOrders.Sum(o=>o.Freight)
           where FreightSum > 100
           select new { custId = CustOrders.Key, 
             SumOfFreight = FreightSum };

While I think that navigation properties simplify queries, I'm willing to admit that summarizing in LINQ and selecting on summary values isn't significantly simpler than it is in SQL. However, I don't think it's any more complicated than SQL and, because it's object- and property-based, you don't have to switch your brain over from "thinking in .NET objects" to "thinking in SQL."

Outer Joins
Earlier I looked at Joins and suggested that they weren't as important in LINQ as they were in SQL. However, in LINQ you can use the Join keyword with the grouping Into keyword to create a collection of grouped objects. This lets you do the LINQ version of the SQL Outer Join. Often the reason you use an Outer Join is to find those records that don't have a matching row by checking for nulls in the missing rows.

In Visual Basic, a LINQ Group Join creates a set of anonymous objects with all the properties of the first collection referenced in the Join and, potentially, another property holding the related objects from the second collection in the Join. To find the objects that don't have a match, you just test to see if there's anything in the grouped records collection.

The following example, for instance, finds all the Customers that don't have any Orders. The query creates a set of anonymous objects with properties from the Customer object (because the Customers collection appears first in the Join) and adds a property called Ords that holds the related objects (because Orders appears in the second part of the Join). The query checks to see if the count of the Ords group is greater than 0 to determine if a Customer has Orders:

Dim res = From c In nw.Customers
          Group Join o In nw.Orders
          On c.CustomerID Equals o.CustomerID 
          Into Ords = Group
          Where Ord.Count = 0
          Select c

The C# code is slightly different, omitting the two uses of the Group keyword:

var res = from c in nw.Customers
          join o in nw.Orders
          on c.CustomerID equals o.CustomerID 
          into Ords
          where Ords.Count() == 0
          select c;

You don't have to use LINQ. But LINQ will let you access your data as objects with code -- and, I suspect, simpler code than SQL. Barring some good reason to make the jump, you have to ask yourself why you're not using LINQ and the EF.

comments powered by Disqus

Featured

  • Microsoft Revamps Fledgling AutoGen Framework for Agentic AI

    Only at v0.4, Microsoft's AutoGen framework for agentic AI -- the hottest new trend in AI development -- has already undergone a complete revamp, going to an asynchronous, event-driven architecture.

  • IDE Irony: Coding Errors Cause 'Critical' Vulnerability in Visual Studio

    In a larger-than-normal Patch Tuesday, Microsoft warned of a "critical" vulnerability in Visual Studio that should be fixed immediately if automatic patching isn't enabled, ironically caused by coding errors.

  • Building Blazor Applications

    A trio of Blazor experts will conduct a full-day workshop for devs to learn everything about the tech a a March developer conference in Las Vegas keynoted by Microsoft execs and featuring many Microsoft devs.

  • Gradient Boosting Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the gradient boosting regression technique, where the goal is to predict a single numeric value. Compared to existing library implementations of gradient boosting regression, a from-scratch implementation allows much easier customization and integration with other .NET systems.

  • Microsoft Execs to Tackle AI and Cloud in Dev Conference Keynotes

    AI unsurprisingly is all over keynotes that Microsoft execs will helm to kick off the Visual Studio Live! developer conference in Las Vegas, March 10-14, which the company described as "a must-attend event."

Subscribe on YouTube