Practical .NET
Reduce Trips to the Database Service with EF.E's Future Queries
One way to slow down your application is to make multiple trips to the database. Here's a way to cut those trips in half (at least) and it has to do with the EntityFramework.Extended library.
In an earlier column this month I discussed one feature in the LoreSoft EntityFramework.Extended library: the ability to do batch updates and deletes while reducing the number of data accesses you have to make. But, as I also pointed out in that column, the EntityFramework.Extended library supports three other sets of functionality, one of which is almost as useful: the ability to piggyback queries on top of each other. I'll discuss the piggyback feature in this column and you can research the other two features (caching the results of Entity Framework/LINQ queries and producing an audit log of client-side changes) on your own at the library's GitHub site.
The Goal
Some background: By default, Entity Framework uses lazy loading. This means that, when you create a LINQ query, no SQL is issued to the database, no rows are retrieved, and no objects are created from those rows. It's only when the query is realized that all of that activity takes place. There are numerous ways to realize a query: iterating through a query's result, attempting to retrieve a single value from a query's result, calling the ToList method on the query's results and so on.
One of the benefits of lazy loading is that you can break complex queries down into a series of simpler queries and stack those queries on top of each other. The following code, for example, eventually generates a result of all of the Premium Customers with a credit limit greater than $5,000. The result, when it's realized, will be held in the variable custLast:
Dim db As New CustomerOrdersEntities
Dim custBase = From c In db.Customers
Where c.CustCreditStaus = "Premium"
Select c
Dim custLast = From c in custBase
Where c.CreditLimit > 5000
Select c
The two queries are tied together because the output of one query (custBase) is the source for the other query (custLast). It's not the case that the first query retrieves more objects than you want and that the second query then filters them down to the final result (that would be terrifically inefficient). In fact, no actual database activity takes place until the code does something with the custLast query result -- that's when the queries are realized. At that point the two queries are collapsed into a single SQL statement, which is sent to the database to retrieve the objects in a single trip.
That "single trip" feature is important because, often, the cost of the trip to the database can take more time than the actual data retrieval (this is especially true if you're calling a Web Service over the Internet). However, to take advantage of this feature the queries involved have to build on each other. What if you wanted to send two (or more) unrelated queries to the database in a single trip and get back two (or more) unrelated sets of results?
You can do this in ADO.NET (as I discussed in a tip last year). Because Entity Framework is just a big wrapper for ADO.NET, you should be able to do it in Entity Framework and LINQ also … but you can't, at least not with the set of LINQ methods that come with the Microsoft .NET Framework. This is where EntityFramework.Extended steps in.
To use this functionality you'll need to use NuGet to add EntityFramework.Extended to your project (I'll just call it EF.E from here on in). You'll also need to add an imports/using statement to your class file for EntityFramework.Extensions so that the compiler can find the EF.E methods.
Piggybacking Queries
EF.E provides three methods to let you issue (and retrieve results from) multiple unrelated queries in one trip to the database: Future, FutureFirstOrDefault and FutureCount.
The easiest method to use is FutureCount. The method returns the number of items in a query the next time you realize some other query. This code, for example, first uses FutureCount to retrieve the number of objects in the Customers collection … but it doesn't go to the database yet. The code next retrieves all the Orders with delivery dates later than today … but, again, not yet. Neither of these unrelated queries are realized by the end of this code:
Dim custCount As Integer?
custCount = db.Customers.FutureCount
Dim ords = From o In db.Orders
Where o.DeliverDate > DateTime.Now
Select o
But when the following code executes and iterates through the collection of Orders, Entity Framework finally issues the statement to retrieve the Order objects:
For Each ord In ords
'...code to work with Order objects
Next
But, at the same time that the Orders are retrieved, the unrelated query to retrieve the number of Customers is also issued: Both queries are realized and you get two results for the price of one trip to the database. If I now retrieve the Value of the nullable integer holding the count of Customers, no access to the database takes place:
Dim res as Integer
res = custCount.Value
You can also use FutureCount with existing queries to retrieve the number of objects without realizing the query, as this code does:
Dim custs = From c In db.Customers
Where c.CustCreditStatus = "Premium"
Select c
Dim custCount = custs.FutureCount
At this point in that code, neither the custs collection or custCount variable are populated. However, I can realize both queries by reading the Value property of the custCount variable (the output from a call to FutureCount defaults to a nullable type that will have a Value property):
Dim res As Integer
res = custCount.Value
If I use the custs result at this point, I'll find my objects already retrieved and waiting for me to use them.
Stacking Queries
If you move up from the FutureCount method to the Future method, you can combine any arbitrary set of queries that you want into a single trip to the database. This example uses Future to combine one Future query (which retrieves Premium Customers) with another, non-Future query (which retrieves Orders that are to be delivered in the future):
Dim custs = (From c In db.Customers
Where c.CustCreditStatus = "Premium"
Select c).Future
Dim ords = From o In db.Orders
Where o.DeliverDate > DateTime.Now
Select o
Now, when the ords result is realized, both sets of SQL statements are issued in a single trip to the database and both results are realized. This call to ToList will do the trick:
Dim ordAct = ords.ToList
If I now process the custs result in a For Each loop, custs is already populated, so entering the loop won't generate a second trip to the database:
For Each c In custs
'...work with Customer objects...
Next
As you might imagine, the third EF.E method -- FutureFirstOrDefault -- returns only the first object retrieved from the database (when, of course, you finally get around to realizing some other query). You can also combine more than just the two queries I've used in my examples.
As I've said before, the two slowest things you can do in a data-driven application is issue a call to another computer or read/write a hard disk. EF.E Future methods won't eliminate those trips (and won't have any direct effect on your reads) but can cut the number of trips in half (at least).
About the Author
Peter Vogel is a system architect and principal in PH&V Information Services. PH&V provides full-stack consulting from UX design through object modeling to database design. Peter tweets about his VSM columns with the hashtag #vogelarticles. His blog posts on user experience design can be found at http://blog.learningtree.com/tag/ui/.