Practical .NET

Making Your Data Updates and Deletes More Efficient

By adding one open source library to your application (and adding two methods to your LINQ statements) you can cut the time spent on database updates and deletes in half.

As I've said on numerous occasions (and summed up here), speeding up business applications comes down to reducing the number of trips to the database and optimizing your server-side data access once you're there. And, as I've discussed in more recent columns, you can simplify your code and optimize your client-side data access by separating your data retrieval code from your update code (the CQRS pattern). As a result, I've been writing columns on how, exactly, you could implement all of that using the base versions of Entity Framework and LINQ.

But, in one of those columns, a reader pointed out that I've ignored an open source set of Entity Framework extensions that makes it easier for you to achieve that separation: the LoreSoft Entity Framework Extended Library (available through NuGet as EntityFramework.Extended -- I'll call it EF.E). The EF.E library actually includes four more-or-less separate sets of functionality, but, for me, the jewel of the crown is the library's support for batch updates and deletes without data retrieval (I'll discuss one of the other sets in a later column).

The Problem
Some background first: The SQL statement to delete a row in the Customers table that has its Id column set to "A123" looks like this:

Delete
  From Customers
  Where Id = 'A123'

If you use ADO.NET directly to issue this command, you make just one trip to the database to send the statement (after, of course, fussing around with Connection, Command and Parameter objects). If, on the other hand, you use Entity Framework and LINQ (I'll call it EF+LINQ) to manage your ADO.NET code and generate your SQL statement, the default pattern is different. With EF+LINQ you retrieve the corresponding object, remove it from its collection, and then call SaveChanges. All together, that means you've made two trips to the database: one to retrieve the object and one to issue the SQL statement when you call the SaveChanges method.

The multiple retrieval issue is one problem. Another problem with EF+LINQ is batch updates. If you wanted to set the credit limit to $2,000 for all Customer rows where the CustCreditStatus column was set to "Premium," you'd use this SQL statement:

Update Customers
  Set CreditLimit = 2000
  Where CustCreditStatus = 'Premium'

Again, with ADO.NET, you just issue that statement to the database server and assume that the server will optimize the heck out of it.

If you let EF+LINQ mange the process, on the other hand, you'll first have to retrieve all the Premium Customer rows and then iterate through all of the resulting objects, updating each one individually. And, you'll note, at this point you haven't even called SaveChanges, but you've already made one trip to the database. Furthermore, you've retrieved (potentially) an unlimited collection of Customer objects -- that's got to hurt.

Furthermore, when you do finally call SaveChanges, Entity Framework isn't smart enough to issue the single SQL Update statement I used as my example. Instead, Entity Framework will generate a SQL statement for each individual Customer object. You'll still only make one trip to the database on the call to SaveChanges because Entity Framework will bundle up all of the Update statements into a single transmission wrapped inside a transaction. However, each of those potentially unlimited number of Update statements will be executed individually because there's no way for the database engine to optimize these updates. If you have a lot of Premium Customer rows, you'll have an operation with a response time that should be measured with a calendar, not a stop watch (assuming that your application doesn't just time out waiting for a response).

Of course, if you're using Entity Framework for an online, transaction-based application where you're manipulating a couple of dozen rows at a time, the batch update problem isn't an issue. And, as I've pointed out in a previous column, you can avoid that first trip at the cost of writing some additional code. However, it would be nice if you could do batch updates with EF+LINQ and avoid that first retrieval without writing extra code.

The Solution
Good news! EF.E addresses both the "delete/update without retrieval" and the "batch update" problem. The beautiful thing about EF.E is that it really just extends what you'd do normally.

For example, to delete without retrieval you begin by calling the standard LINQ Where method from the collection containing the object you want to delete. You pass, as usual, a lambda expression that returns true for any row that you want to select. Now, to delete those rows using EF.E, you add a call to the EF.E Delete method after the Where method.

This code, for example, deletes all the Customers whose first name is Peter:

Dim db As New CustomerOrdersEntities
db.Customers.Where(Function(c) c.FirstName = "Peter").
             Delete()

If you prefer to use the LINQ "SQL-like" syntax, just call the Delete method from your LINQ statement (just as you would if you were using First or Distinct). Because it's only the Where clause that matters, your LINQ statement doesn't even need to include a Select clause. This code, using the LINQ SQL-like syntax does the same thing as my previous example:

Dim res As Integer
res = (From c In db.Customers
       Where c.FirstName = "Peter").Delete()

In both of these examples there's no need to retrieve the relevant Customer objects first. If you were to grab the SQL statement sent to the database, you'd see that it's exactly what you would write yourself:

Delete
  From Customers
  Where FirstName = 'Peter'

Effectively, the LINQ statement (regardless of how you write it), provides the Where clause that the Delete method will use in the single SQL statement it sends to the database. Given that single SQL statement, the database engine will delete all of the specified Customer rows in as optimized fashion as it can manage. In addition to saving a trip to the database, you also have a method for performing batch deletes.

The Delete method itself, by the way, returns an integer result that tells you how many items were deleted (which might be more than one because, after all, there might be multiple customers whose first name is Peter). Returning an integer result instead of a collection of objects also means that the Delete method will probably be the last method called in your LINQ statement.

To perform batch updates, you call the EF.E Update method from a LINQ Where method that specifies the items to be updated (as with the EF.E Delete method). The difference with the Update method is that you must pass the method an object from the collection you're searching. Any properties on that object that you set to a value will be used to update the corresponding row in the table.

This code, for example, sets the CreditLimit of all Customers with a CustCreditStatus of Premium to 2000:

res = (From c In db.Customers
       Where c.CustCreditStatus = "Premium").
       Update(Function(c) New Customer With {.CreditLimit = 2000})

This code does the same thing using the LINQ method-based syntax:

res = db.Customers.Where(Function(c) c.CustCreditStatus = "Premium").
                   Update(Function(c) New Customer With {.CreditLimit = 1000})

Only the columns corresponding to properties you set in the object created in the Update method's lambda expression will be updated -- any properties you don't set will be left alone. This means that if you have property you want to set to its default, then you must explicitly set that property's value. This code, for example, sets the CreditLimit property and then goes on to set the CustCreditStatus to nothing:

res = db.Customers.Where(Function(c) c.CustCreditStatus = "Premium").
                   Update(Function(c) New Customer With {.CreditLimit = 1000, 
                                                         .CustCreditStatus = Nothing})

Like the Delete statement, the Update method returns an integer telling you how many rows were updated and must follow the Where clause.

There is one significant limitation with the Update method: You must create the object used in the lambda expression inside the lambda expression. You can't, for example, create a Customer object outside of the lambda expression and then return it from the lambda expression. You also can't pass the Update method a lambda expression of more than one line or pass a reference to another method in your application. Putting this all together, it means you must know what properties you're going to set when you write your Update method's code. There's no way, at run time, to analyze the return values from your UI, determine what properties your user changed and then construct an object in your lambda expression that sets only those properties.

Some final notes: While my Where clauses have been very simple, for both the Delete and the Update methods you can make your Where clause as complicated as you want. As your Where clause becomes more complicated, though, you might want to check that integer result to see if it's non-zero -- a zero result would mean that you didn't find anything to delete.

While I normally stay away from add-ins in these columns, my reader was right: EF.E provides a much simpler way to avoid retrieving records than my previous examples did. And, while there are commercial products that support batch updates with Entity Framework, EF.E will let you do it for free. Personally, I like free.

About the Author

Peter Vogel is a system architect and principal in PH&V Information Services. PH&V provides full-stack consulting from UX design through object modeling to database design. Peter tweets about his VSM columns with the hashtag #vogelarticles. His blog posts on user experience design can be found at http://blog.learningtree.com/tag/ui/.

comments powered by Disqus

Featured

Subscribe on YouTube