Practical .NET

Grouping Results in LINQ

Complex queries can be solved with a combination of simpler LINQ queries, anonymous objects and the Group/By/Into clauses. Using Group/By/Into will also give you the best possible performance for those complex queries.

Let's take an interesting problem: Your users want to find the 10 customers who have bought the most from you. To do this, you need to group all the sales orders for each customer, total the value for those sales orders for each customer, sort by the total value of those sales orders for each customer, and (finally) pick off the first 10. However, you can't just present just the resulting customer information -- the user also wants to be able to look at each customer's individual sales orders.

This isn't an unusual request. Often users don't want the object -- they want a group of objects that meet a set of conditions. In addition, sometimes the data you're working with isn't a single value on individual objects -- it's a summary value generated from a group of objects that share some characteristic (customer number, in my case). A simple Where clause isn't going to do the job here.

You could, of course, write some procedural code: Retrieve all the SalesOrder objects, sort them, and then loop through those sorted objects, noting when some value changes (customer number, in my example) and, finally, generating a summary result. Or you can let LINQ do all the work by using the Group/By/Into clauses (I discussed Group and Into in an earlier column where I walked, line by painful line, through the LINQ equivalent of a SQL outer join).

And using Group/By/Into is what you want to do from a performance point of view. Using Group/Into will cause LINQ and Entity Framework to work together to turn the grouping and summarizing work over to your database server. Not only does that transfer the processing to another CPU, your database engine is going to be able to do the grouping/summarizing work faster than you could (and with less chance of introducing a bug).

Showing how to exploit Group/By/Into is the point of this column. However, the LINQ code is sufficiently different between Visual Basic and C# that it makes sense to discuss the two solutions independently. I'll do the C# version first, but if you're a Visual Basic programmer you should just skip to the next section.

Grouping in C#
The first step is to process the collection of SalesOrders, using the group clause to organize the SalesOrder objects into a new collection held in a class that implements the IGrouping interface. Each item in the collection will itself be a collection of the SalesOrder objects that share a common value. The by clause allows you to specify the property whose value is to be used to organize the SalesOrders and the into clause lets you name the generated collection. This example organizes SalesOrders into a collection called TotaledOrders using the SalesOrder CustomerId property:

CustomerOrdersContext db = new CustomerOrdersContext();
var OrdersByCustomer = from so in db.SalesOrders
                       group so by so.CustomerId 
                       into TotaledOrders

Each of the IGrouping collections in the TotaledOrders collection has all of the members you'd expect a collection to have (they each have a Count method, for example). In addition, IGrouping collections have a property called Key that holds the shared value (the value from the CustomerId property, in this case).

However, to support the user's request, I don't want just this "collection of SalesOrders collections" -- I want a collection of objects with properties holding the shared CustomerId, the total value of all the SalesOrders in each collection and the SalesOrders themselves. To do that, in my query's select clause, I create an anonymous object with these properties:

  • Orders: A List of all of the Orders in each collection
  • CustomerId: The shared SalesOrders CustomerId which I can get from each collection's Key property
  • TotaledValue: The sum of the TotalValue property for all of the orders in each collection

Putting that together with the rest of the code, the code that creates the collection of anonymous objects I want looks like this:

var OrdersByCustomer = from so in db.SalesOrders
                       group so by so.CustomerId into TotaledOrders
                       select new {
                                    CustomerId = TotaledOrders.Key,
                                    TotaledValue = TotaledOrders.Sum(s => s.TotalValue),
                                    Orders = TotaledOrders.ToList()
                                  };

The next section is the material I've just covered, but in Visual Basic. If you've read this section, you'll probably want to skip the next section.

Grouping in Visual Basic
The first step is to process the collection of SalesOrder and use the Group clause to organize the SalesOrder objects into a collection of anonymous objects. The By clause allows you to specify the property to be used to organize the SalesOrders and the Into clause allows you to both name the properties on the resulting anonymous objects and assign values to those properties.

When defining these properties, the Group keyword references the collection of SalesOrders that share the same value (the same CustomerId, in this case). You can use the Group keyword to define a property holding all of the related SalesOrders. Your anonymous object will automatically acquire a property that holds the shared value; that property will have the same name as the property holding the shared value (that is, my anonymous object will automatically have a property called CustomerId that holds the shared CustomerId value). You can also perform summary calculations (Sum, Average, Max and so on) on the related orders and use those values to set properties on the anonymous object.

The following code organizes SalesOrders into a collection of anonymous objects, each of which has three properties:

  • TotaledOrders: The SalesOrders that share a common value in their CustomerId propert
  • TotaledValue: The sum of the TotalValue property of the SalesOrders in the group
  • CustomerId: The value shared by all the SalesOrders in TotaledOrders (automatically generated)

Here's the code:

Dim db As New CustomerOrdersEntities
Dim OrdersByCustomer = From so In db.SalesOrders
                       Group so By so.CustomerId
                       Into TotaledOrders = Group,
                                            TotaledValue = Sum(so.TotalValue)

In this query, I don't need a Select clause. Instead, LINQ will automatically return the collection of anonymous objects.

Selecting the Top Objects
Now that I've created my collection of anonymous objects, I can sort it by the TotaledValue property in descending order so that the objects with the largest total sales appear first. I can then use the Take function to pull off the first 10 items. In Visual Basic that second query looks like this:

Dim SelectedCustomers = (From sos In OrdersByCustomer
                         Order By sos.TotaledValue Descending
                         Select sos).Take(10)

In C#, the code looks like this:

var SelectedCustomers = (from sos in OrdersByCustomer
                         orderby sos.TotaledValue descending
                         select sos).Take(10);

You don't have to do this in two separate queries -- you could combine the two queries into a single, more complicated query, as shown in Listing 1 (the equivalent C# version would be more complicated).

Listing 1: The Combined LINQ Statements in Visual Basic
Dim OrdersByCustomer = (From sos In
                         (From so In db.SalesOrders
                          Group so By so.CustomerId
                          Into TotaledOrders = Group,
                                               TotaledValue = Sum(so.TotalValue)
                         )
                        Order By sos.TotaledValue Descending
                        Select sos).Take(10)

But why bother (other than to ensure some job security)? No SQL will be sent to the database server until you start processing the individual objects, which won't happen until after the second query. At that point, LINQ/Entity Framework will collapse the two queries into a single SQL statement, anyway. I even checked: Both the generated SQL and the objects returned are identical regardless of whether you use two LINQ queries or one. As far as I'm concerned, leaving the two queries separate makes for more readable code.

Regardless of how many queries I use, I now have what the user wanted: The top 10 customers by purchases with all their orders. The code looks pretty slick, too … as long as no one asks how long it took me to figure out that these were the right LINQ queries. That's the beauty of only presenting the final result: Hopefully, you always look like a genius.

About the Author

Peter Vogel is a system architect and principal in PH&V Information Services. PH&V provides full-stack consulting from UX design through object modeling to database design. Peter tweets about his VSM columns with the hashtag #vogelarticles. His blog posts on user experience design can be found at http://blog.learningtree.com/tag/ui/.

comments powered by Disqus

Featured

  • Compare New GitHub Copilot Free Plan for Visual Studio/VS Code to Paid Plans

    The free plan restricts the number of completions, chat requests and access to AI models, being suitable for occasional users and small projects.

  • Diving Deep into .NET MAUI

    Ever since someone figured out that fiddling bits results in source code, developers have sought one codebase for all types of apps on all platforms, with Microsoft's latest attempt to further that effort being .NET MAUI.

  • Copilot AI Boosts Abound in New VS Code v1.96

    Microsoft improved on its new "Copilot Edit" functionality in the latest release of Visual Studio Code, v1.96, its open-source based code editor that has become the most popular in the world according to many surveys.

  • AdaBoost Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the AdaBoost.R2 algorithm for regression problems (where the goal is to predict a single numeric value). The implementation follows the original source research paper closely, so you can use it as a guide for customization for specific scenarios.

  • Versioning and Documenting ASP.NET Core Services

    Building an API with ASP.NET Core is only half the job. If your API is going to live more than one release cycle, you're going to need to version it. If you have other people building clients for it, you're going to need to document it.

Subscribe on YouTube