Practical .NET

Subqueries in LINQ

When I'm doing batch processing against a database, I use ADO.NET and SQL. But, when I'm creating an online, transactional application, I use Entity Framework (which you can think of as a wrapper around ADO.NET) and LINQ. While LINQ is missing a lot of the features that are part of SQL, LINQ is more than good enough for most of my online queries.

You'll notice, in that last sentence, I said "most," not "all."

There are times when using LINQ that I miss some of SQL's features ... and the SQL feature that I miss most is subqueries. Fortunately for me, there are tools you can use in creating LINQ queries that give you almost all of the benefits of a subquery.

One caveat: I realize that the sample queries in this column are very simple and certainly could have been solved without using subqueries. My goal is to demonstrate syntax and features and I decided creating complex queries (ones that would actually force you to use subqueries) made that syntax harder to see.

The Role of the Subquery
Typically, there are three places I use subqueries when writing SQL. First, in the Select clause, I'll use a subquery to add a new item to the select list. This query adds the total sales in Canada to the information about each Customer in Canada:

Select custId, totalPurchases, (Select Sum(totalValue) 
			           From Orders
		                 Where region = "Canada") As TotalCanadaSales
From Customers
Where region = "Canada"

Second, I use subqueries in my Join clause, to create a table I can run a query against. This query builds a virtual table of Canadian orders for me to use in my main query:

Select custId, orderDate
From (Select *
        From Orders 
      Where region = "Canada") As CanadaCustomers

Finally, I use subqueries in the Where clause, to select the rows I want to process. This query returns only information about Customers in Canada:

Select custId, orderDate
  From Orders
Where custId In (Select custId
		    From Customers
		  Where region = "Canada")

If you thought that many of those queries could be rewritten using Join statements ... well, you're not wrong there, either. However, writing a query as a set of subqueries lets me break complex queries down into smaller parts that I can test independently of each other (and, usually, using subqueries has no impact on my query's performance). Interestingly, the LINQ alternatives also often allow me to build up my queries out of smaller pieces that I can test before using.

Restricting Rows with Joins and Cascading Queries
Which leads to your first option in creating subqueries in LINQ. LINQ doesn't execute your query until you "realize" it. You can realize a query by calling some method on it (ToList is a good example) or by working with individual objects in the result (by using a foreach loop to process each object in the query's result). This means that, like SQL, you can assemble complex LINQ queries out of simpler queries. Effectively, you create a hierarchy of queries that build on each other, as these two queries do:

var CanadaCustomers = from o in db.Orders
		      where o.region = "Canada"
		      select o;

var result = from ord in CanadaCustomers
	     select new {ord.custId, ord.orderDate};

Another option is to use LINQ's Join keyword (something we don't have to do very often in LINQ because of Entity Framework's navigation properties). Using the Join keyword, I could have written my previous query like this by embedding the first query inside the second one:

var result = from ord in db.Orders
             join osub in (from o in db.Orders
		            where o.region = "Canada"
		            select o)
                on ord.orderId equals osub.orderId 
	      select new {ord.custId, ord.orderDate};

By the way, for the on clause in this example it's essential that the range variable that you're using in the join (osub, in my example) be on the right-hand side of the keyword equals.

Having said that, in most cases, I think it's easier to maintain multiple, separate queries than it is to deal with one big query. The "multiple query" always lets me test each query individually to make sure they're doing what I want; with the nested version of the query I can't test the inner query without dismantling my main query.

Extending the Select List and Limiting the Source Collection
If you want to add more results to your output, you can do that with LINQ's join keyword also. I've discussed that here using the SQL-like syntax. If you prefer the method-based syntax, you can accomplish the same goal by using the dbCollectionEntry object and its query method (I've covered that also). In fact, I'd claim that LINQ is more flexible than SQL in this respect: In SQL, with a subquery, I can only add a single value to my Select list; with LINQ I can fill properties with collections of objects.

Where I want to use subqueries to limit the objects returned, I can combine the SQL-based syntax with the method-based syntax by calling the Where method on the collection in my From clause, as in this example:

var CanadaCustomers = from c in db.Customers.Where(cust => cust.region == "Canada")
		       select new {c.custId, c.name}; 

As with subqueries in SQL, I can test the results of applying the Where method to my Customers collection independently and, once I know it's returning what I want, build the rest of the LINQ statement around it.

Technically speaking LINQ doesn't support subqueries (at least, not yet). There exist, however, lots of options for achieving the same results.

About the Author

Peter Vogel is a system architect and principal in PH&V Information Services. PH&V provides full-stack consulting from UX design through object modeling to database design. Peter tweets about his VSM columns with the hashtag #vogelarticles. His blog posts on user experience design can be found at http://blog.learningtree.com/tag/ui/.

comments powered by Disqus

Featured

  • Random Forest Regression and Bagging Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the random forest regression technique (and a variant called bagging regression), where the goal is to predict a single numeric value. The demo program uses C#, but it can be easily refactored to other C-family languages.

  • Compare New GitHub Copilot Free Plan for Visual Studio/VS Code to Paid Plans

    The free plan restricts the number of completions, chat requests and access to AI models, being suitable for occasional users and small projects.

  • Diving Deep into .NET MAUI

    Ever since someone figured out that fiddling bits results in source code, developers have sought one codebase for all types of apps on all platforms, with Microsoft's latest attempt to further that effort being .NET MAUI.

  • Copilot AI Boosts Abound in New VS Code v1.96

    Microsoft improved on its new "Copilot Edit" functionality in the latest release of Visual Studio Code, v1.96, its open-source based code editor that has become the most popular in the world according to many surveys.

  • AdaBoost Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the AdaBoost.R2 algorithm for regression problems (where the goal is to predict a single numeric value). The implementation follows the original source research paper closely, so you can use it as a guide for customization for specific scenarios.

Subscribe on YouTube