Practical .NET

Splitting Tables in Entity Framework 6 to Improve Performance

There are some occasions when using Entity Framework can really hurt you: When you have tables with hundreds of columns or tables with large payloads. Here's how to get EF6 to do the right thing.

The first rule of speeding up your application is "Manage your database access." Unfortunately, using Entity Framework (EF) means giving up some control around managing access -- specifically, in generating your SQL. As a reader pointed out in a recent column, taking all the defaults of EF can result in an application that runs very slowly, indeed.

The issue the reader raised is that, by default, Entity Framework generates SQL statements to retrieve all of the columns that correspond to the properties on your entity classes. Often, that will be more data than you need. There are going to be times when, for example, all you want is data from a customer's first and last name columns, but the entity you're using has many more properties than just those two columns. Entity Framework, not being able to read your mind, will retrieve the columns in the table corresponding to every property on your entity, not just the first and last names.

While, on the face of it, that seems inefficient, the result usually falls into a "no harm, no foul" zone: You probably can't measure the difference between retrieving just the two columns you want and retrieving all of the columns.

When You Might Have a Problem
But to make that "no harm, no foul" claim, I have to make three assumptions. First, I'm assuming you're building an online transactional application where you don't retrieve many rows at a time (typically, less than a dozen rows) -- that is, a "transaction's worth" of rows. As the number of rows being retrieved increases, eventually the difference between getting two columns and getting "all the columns" will become great enough to make a difference. However, my assumption is going to be true in virtually every case: Entity Framework is targeted toward online transactional applications -- you shouldn't be using it for batch processing or reporting applications where you're processing hundreds or thousands of rows at a time.

My other two assumptions are reasonable, but, unlike my first assumption, not always true. My second assumption is that "all the columns" is some small number -- a couple of dozen columns, for example. If you have a table with hundreds of columns, then it's certainly possible that you'll notice the difference between retrieving a few columns and retrieving "all the columns," even if you're only retrieving a "transaction's worth" of rows.

My third assumption is that your table doesn't include any binary large object (blob) columns (in Microsoft SQL Server, these are columns with the varbinary, image and xml data types). If you have a blob column defined in your entity class and are retrieving that column when you don't want it then you could very easily have a performance problem (though much will also depend on how big the object stored in the blob column is). It's even possible that you can have a problem if you have enough character large object columns (clob: text, ntext, nvarchar(max), text, varchar(max)).

In these scenarios, however, the fix to your problem is sufficiently simple and probably sufficiently localized that I wouldn't apply it. Instead, I'd build the application with a simple entity object that retrieves all of the columns and see if your performance is acceptable. Only if you're not getting the performance you want should you apply the following fix to the affected parts of your application.

Fixing the Problem
Effectively, you want to implement two data retrieval patterns when getting data from the columns that make up your problem tables. For most of the properties on your entity object, you want the Entity Framework default: eager loading. With eager loading, the data from the column is retrieved when you process the entity object.

For the blob columns (or for the hundreds of columns you don't need) you want a different approach: lazy loading. With lazy loading, the data is retrieved from the column when you read the corresponding properties, not when you touch the entity. Sadly, Entity Framework doesn't have a LazyLoading attribute that you can apply to properties to defer retrieving their data.

However, Entity Framework does support lazy loading between business entities that are connected through an association. By leveraging that feature you can achieve the result you want by splitting your table across two or more entities joined by an association.

With your table split into two (or more) related entities you can retrieve the properties you want eagerly and lazily defer retrieving the properties you don't want until you need them. You'll need at least two entities: one with the columns you want to retrieve eagerly; and one on the other side of an association, with the properties you'll retrieve lazily. For the "hundreds of columns" problem, you'll need to look at your application to decide which columns you use together and group your entities' properties to support that usage.

For this example, I'm going to take the easier of the two problems and look at a table that has a blob column. I've altered the Microsoft AdventureWorks database so the Customers table includes a column called Picture of type image (presumably this holds a picture of the customer). In this column, I'm going to implement this solution Code First using EF6. (I'll return to this problem later this month to solve it by using the EF designer.)

A Code First Solution
In the Code First solution I write my two Customer entity classes and segregate my Picture property (which I want to load lazily) from the properties I want to load eagerly. Both classes must have the customer's ID property, which identifies a customer. The two classes look like this, initially:

Public Class CustomerEager
  <Key>
  Public Property ID As Integer
  Public Property LastName As String
  ...rest of properties...
End Class

Public Class CustomerLazy
  <Key>
  Public Property ID As Integer
  Public Property Picture As Byte()
End Class

The next step is to extend the CustomerEager class (the class with the properties I want most of the time) with a navigation property that connects to the entity I want to load "lazily." In this example, I've called the property CustomerBLOBs and the data type it returns is my CustomerLazy class:

Public Class CustomerEager
  <Key>
  Public Property ID As Integer
  Public Property LastName As String
  ...rest of properties...

  Public Overridable Property CustomerBLOBS As CustomerLazy
End Class

In my DbContext object, I'm going to assume I'll only access my CustomerLazy entities through the navigation property on my CustomerEager entity. As a result, I only need to set up a property for my CustomersEager entity collection in my DbContext object:

Public Class AdventureWorksLTEntitiesRevised
  Inherits DbContext

Public Property CustomersEager() As DbSet(Of CustomerEager)

The real work is in establishing the relationship between the two entities and the table they represent. I do that in my DbContext OnModelCreating method. First, I tell Entity Framework to generate CustomerLazy and CustomerEager entity objects from the same table: Customers. That code looks like this:

Protected Overrides Sub OnModelCreating(modelBuilder As DbModelBuilder)
  MyBase.OnModelCreating(modelBuilder)

  modelBuilder.Entity(Of CustomerLazy)().ToTable("Customers")
  modelBuilder.Entity(Of CustomerEager)().ToTable("Customers")

But, in the same method, I also need to establish that the association in the CustomerEager entity made through the CustomerBLOBs property has two characteristics. First, it's a one-to-one relationship with the object at the other end of the relationship. Second, the object at the end of the other relationship is always present. That's what this code does:

  modelBuilder.Entity(Of CustomerEager).
    HasRequired(Function(c) c.CustomerBLOBs).
    WithRequiredPrincipal()
End Sub

Loading Eagerly and Lazily
With the changes in place, I can now process all the customers in my database eagerly and retrieve the picture only when I need it (lazily). Code like the following will process all of the customers, only occasionally retrieving the entity with the blob property:

Dim db As New AdventureWorksLTEntitiesRevised
For Each c As CustomerEager In db.CustomersEager
  '...process CustomerEager properties...
  If c.LastName = "Vogel" Then
    '...process c.CustomerBLOBs.Photo
  End If
Next

However, this code will result in two trips to the database: one to retrieve the CustomerEager entity and one to retrieve the CustomerLazy data. That makes sense when you only need the lazy entity infrequently.

However, when you know that you do need the data from both entities, you'd like to avoid that second trip to the database. In that scenario, you want to retrieve your lazy entity eagerly (of course, if you need all the data all the time, you shouldn't have split your tables). Entity Framework will eagerly load your lazy entity if you use the Entity Framework Include method when retrieving your eager object. Include causes Entity Framework to perform an eager load of the object at the other end of the navigation property whose name you pass to the Include method. This example does an eager load of the entity pointed to by the eager entity's CustomerBLOBs property:

Dim cust = (From c In db.CustomersEager.Include("CustomerBLOBs")
            Where c.LastName = "Vogel"
            Select c).First
...process both CustomerEager and CustomerLazy properties

These steps are only necessary if you're in either of the two scenarios I described at the start of this column -- and, even then, only if you actually have a performance problem. And if you do, splitting tables and using the Entity Framework support for eager and lazy loading across associations will give you eager loading of the data you want and lazy loading of data that might otherwise hurt your performance. In other words: everything you want.

About the Author

Peter Vogel is a system architect and principal in PH&V Information Services. PH&V provides full-stack consulting from UX design through object modeling to database design. Peter tweets about his VSM columns with the hashtag #vogelarticles. His blog posts on user experience design can be found at http://blog.learningtree.com/tag/ui/.

comments powered by Disqus

Featured

  • AI for GitHub Collaboration? Maybe Not So Much

    No doubt GitHub Copilot has been a boon for developers, but AI might not be the best tool for collaboration, according to developers weighing in on a recent social media post from the GitHub team.

  • Visual Studio 2022 Getting VS Code 'Command Palette' Equivalent

    As any Visual Studio Code user knows, the editor's command palette is a powerful tool for getting things done quickly, without having to navigate through menus and dialogs. Now, we learn how an equivalent is coming for Microsoft's flagship Visual Studio IDE, invoked by the same familiar Ctrl+Shift+P keyboard shortcut.

  • .NET 9 Preview 3: 'I've Been Waiting 9 Years for This API!'

    Microsoft's third preview of .NET 9 sees a lot of minor tweaks and fixes with no earth-shaking new functionality, but little things can be important to individual developers.

  • Data Anomaly Detection Using a Neural Autoencoder with C#

    Dr. James McCaffrey of Microsoft Research tackles the process of examining a set of source data to find data items that are different in some way from the majority of the source items.

  • What's New for Python, Java in Visual Studio Code

    Microsoft announced March 2024 updates to its Python and Java extensions for Visual Studio Code, the open source-based, cross-platform code editor that has repeatedly been named the No. 1 tool in major development surveys.

Subscribe on YouTube