Practical .NET

Building the Object Model You Want with Entity Framework

When it comes to inheritance, relational database theory and object-oriented programming have more in common than you might think. Understanding that overlap is critical in designing the object model that will generate the database design your application needs.

I’ve discussed the dependency inversion principle elsewhere. It says the first step in building your application is deciding what interface would make it easy for users to use your application, and then what API would make it easy to build the application. Later, when you build the classes behind that API, you’ll repeat the process by deciding what API would make it easy to build those classes that are used directly by your application. Following this process, you’ll eventually arrive at the entity classes that work with your database.

At this point, regardless of whether you’re working in a code-first mode (generating your database as an "implementation detail" from your object model) or in a database-first mode (matching your object model to an already existing database schema), you have two ways of looking at the problem: The relational perspective and the object-oriented perspective.

The Relational Perspective
A common problem you’ll run into at this point is known in relational database theory as the supertype/subtype problem. The traditional example used to illustrate this problem is a collection of employees, some of whom are paid a fixed amount per week and some of whom are paid by the hour. Both groups share some information (employee id, name and so on) but each group has unique information (the salary for those paid a fixed amount and the hourly rate for those paid by the hour). There are three table designs that support storing this information:

  1. Separate tables for hourly and salaried employees.
  2. One table for all employees with a flag column that identifies the employee type.
  3. One table to hold common information (id, name and so on) and two additional tables: one for the related hourly information and one for the salaried information.

Of course, this problem applies to any group of business entities that have a variety of types but share common information -- products, customers, transaction types and so on.

The decision about which database design to use to address this problem isn’t a technical decision strictly driven by relational database theory. It’s a performance decision, driven primarily by usage. If your use cases seldom have you processing all the employees together, you should use solution 1; if you frequently process employees together, solution 2 is your best choice; if your use cases fall somewhere in the middle, then solution 3 might be your best choice.

There are three additional issues. First, by "performance," I don’t mean just response time -- ease of programming and ease of use will also factor into this decision. If you pick option 1, but users have to -- directly or indirectly -- choose which table to use, then you’ve created a new problem by using solution 1.

Second, option 2 raises some issues within third normal form because it’s possible to have two identical rows that differ only by their type flag (that is, one row has its type flag set to "Hourly" while another row is identical in every way except its type flag is set to "Salary"). Because the type flag column won’t be part of the primary key, you’ve violated third normal form.

Third, "usage" changes. While you may not, initially, process both groups separately, later in the life of your organization you may find yourself often treating them as a single group. Your initial database design may not meet your current needs. The concern here is that a change to your table design can ripple through your whole application. The good news is that Entity Framework (EF) can make this problem almost trivial to fix.

The OO Perspective
In an object-oriented (OO) world, faced with Salaried/Hourly employees, you might immediately start thinking about using inheritance: Define an EmployeeBase class with two derived classes, HourlyEmployee and SalariedEmployee. You’d do this to share methods or properties common to both HourlyEmployee and SalariedEmployee by putting those methods or properties in the EmployeeBase class.

You also use inheritance when you want all employees to look alike. There are times when you want to write a method that will process all employees, regardless of whether they’re hourly or salaried. If the various employee classes all inherit from an EmployeeBase class, then a parameter declared as EmployeeBase can work with all of those classes. This is similar to the relational strategy where you put both kinds of employees in a single table: All employees now have the same set of columns and are easier to process as a group.

The EF Perspective
EF supports all three relational strategies:

  1. Separate tables for hourly and salaried employees. EF calls this Table Per Concrete class (TPC).
  2. One table for all employees with a flag column that identifies the employee type: Table Per Hierarchy (TPH).
  3. One table to hold common information (id, name and so on), and separate tables for the related hourly and salaried information: Table Per Type (TPT).

In all three strategies, regardless of the number of tables, your object model has three classes: the BaseEmployee class plus the HourlyEmployee and SalariedEmployee classes that inherit from them. You get all the benefits of inheritance from the OO perspective and your choice of the best performance from the relational perspective. With TPT and TPH, you can even switch between the two designs without affecting the rest of your application. With TPC you get … problems, quite frankly.

To implement any of the designs, you need the three entity classes as shown in Listing 1. The EmployeeBase has the columns/properties common to all employees, while HourlyEmployee and SalariedEmployee have the columns/properties unique to each class.

Listing 1: Three Entity Classes Including a Shared Base Class
Public MustInherit Class EmployeeBase
  Public Property Id As Integer
  Public Property Name As String
  '... more properties/columns ...
End Class

Public Class HourlyEmployee
  Inherits EmployeeBase
  Public Property Salary As Decimal
End Class

Public Class SalariedEmployee
  Inherits EmployeeBase
  Public Property HourlyRate As Decimal
End Class

With that in place, you’re ready to have EF implement the table design you want.

Implementing Table Per Class
TPC has only two tables (one for each of the two concrete classes: HourlyEmployees and SalariedEmployees). In the OnModelCreating method of the DbContext, you specify how each of the classes is to be mapped to a table, dictating that inherited properties are to be duplicated in each table. For HourlyEmployees, that code looks like this (a similar entry is required for the SalariedEmployee table):

Protected Overrides Sub OnModelCreating(modelBuilder As DbModelBuilder)
  modelBuilder.Entity(Of HourlyEmployee)().Map(Sub(m)
                                                 m.MapInheritedProperties()
                                                 m.ToTable("HourlyEmployee")
                                               End Sub)

This strategy works well as long as you define one DbSet property for HourlyEmployees and one for SalariedEmployees in the DbContext class, like this:

Public Class EmployeeContext
  Inherits DbContext

  Public Property HourlyEmployees As DbSet(Of HourlyEmployee)
  Public Property SalariedEmployees As DbSet(Of SalariedEmployee)

But imagine you have a Shift table that assigns an employee to a particular shift and doesn’t care whether the employee is hourly or salaried. To support this you’d have to add two navigation properties to the Shift class, one for each type of employee.

You could, instead, define a single DbSet property for Employees, like this:

Public Class EmployeeContext
  Inherits DbContext

  Public Property EmployeeBases As DbSet(Of EmployeeBase)

This "solution" creates its own set of problems, though. At the very least, you’d have to give up having the database generate the values for the Id property and have to manage those values yourself. For a longer and better discussion of the problems with TPC, see "Inheritance with EF Code First: Part 3 -- Table Per Concrete Type."

Implementing Table Per Type
TPT, on the other hand, avoids those problems. This solution will have three tables, one for each of the classes (EmployeeBase, HourlyEmployee and SalariedEmployee). For that design, you can reduce the DbContext class to that single entry for EmployeeBase without incurring new problems.

For this solution, the DbContext object’s OnModelCreating method just maps the two concrete types (SalariedEmployee and HourlyEmployee) to their respective tables:

Protected Overrides Sub OnModelCreating(modelBuilder As DbModelBuilder)
  modelBuilder.Entity(Of HourlyEmployee)().ToTable("HourlyEmployees")
  modelBuilder.Entity(Of SalariedEmployee)().ToTable("SalaryEmployees")

EF will take care of generating an EmployeeBase table and establishing foreign/primary key relationships between the three tables. Now, if you want to have a Shift object refer to an employee, you can just tie Shift object to the EmployeeBase class, like this:

Public Class Shift
  Public Property Id As Integer
  Public Property EmpId As Integer
  <ForeignKey("EmpId")>
  Public Property Employee As EmployeeBase
End Class

Implementing Table Per Hierarchy
Finally, you can create a TPH design with a single table. As with TPT, the DbContext object still has only a single DbSet property referring to EmployeeBase. The OnModelCreating method, though, must define a new flag column that distinguishes between the rows that represent HourlyEmployees and SalariedEmployees (the specified column will automatically be added to the base table). The code in the DbContext’s OnModelCreating method looks like this:

Protected Overrides Sub OnModelCreating(modelBuilder As DbModelBuilder)
  modelBuilder.Entity(Of EmployeeBase)().
              Map(Of HourlyEmployee)(Function(m) m.Requires("EmployeeType").HasValue("Hourly")).
              Map(Of SalariedEmployee)(Function(m) m.Requires("EmployeeType").HasValue("Salary"))

As with TPT, references from the Shift object can simply refer to EmployeeBase without caring if the Employee is Hourly or Salaried. Furthermore, because both TPT and TPH have identical DbSet properties, you can switch between these designs without disturbing the application.

TPH isn’t all sunshine and unicorns, though: TPH creates the opportunity for data integrity problems. In the EmployeeBases table, the Salary column (used only by SalariedEmployee) and HourlyRate column (used only by HourlyEmployee) now accept NULLs. As long as you use your EF model, if you fail, for example, to set the HourlyRate property in the HourlyEmployee object, the HourlyRate column will default to 0 and the Salary column will be set to NULL. This is what you want (assuming you intended to skip setting the HourlyRate property, of course).

However, any application that doesn’t use your EF model can create a row where both Salary and HourlyRate are NULL or where both are set to some value. This is not what you want. To prevent this you’ll want, at the very least, to add conditional CHECK constraints to the table to ensure that the right columns (and only the right columns) are set to NULL, based on the EmployeeType column.

Choosing the Right Design
For best performance, with EF as with relational design, a single table holding all the rows/classes (TPH) is going to give you the best performance when processing all employees because it avoids joins between tables. Performance won’t get worse as your object model either gets deeper (more classes in the inheritance chain) or broader (more concrete classes). If performance matters more than anything else (and you protect yourself against data integrity problems) this is your best choice … assuming you frequently process both types together.

If TPH doesn’t make sense for you (and given the extra problems that TPC brings), TPT is your best alternative. The one case where TPC makes sense is if you never, ever process the two types of employees together (for example, the Shift object is used only with HourlyEmployees). However, you’d better get this decision right -- switching away from TPC to either TPT or TPH is the kind of change that ripples through your whole application

TPT also gives you great tools for handling volatility in your database design. With TPT you can add additional tables or modify the tables for individual types without any impacting the other types. In addition, with TPT, when you need to make a change that affects all of the types you can just change the base table. However, processing all employees is going to involve lots of outer joins, which has performance implications. These problems will only get worse as your object model either gets deeper or broader.

There’s no completely right answer here: Whatever choice you make, three months down the line you’ll hit a problem that would’ve been easier to solve if you’d only picked the other design. So, no matter what you do, you’ll be wrong. I find some comfort in that.

About the Author

Peter Vogel is a system architect and principal in PH&V Information Services. PH&V provides full-stack consulting from UX design through object modeling to database design. Peter tweets about his VSM columns with the hashtag #vogelarticles. His blog posts on user experience design can be found at http://blog.learningtree.com/tag/ui/.

comments powered by Disqus

Featured

Subscribe on YouTube