Practical .NET

Creating the Right Database Initializer for Entity Framework

If you want to treat your database design as an 'implementation detail' that just falls out of getting your object model right, then Entity Framework gives you four choices. Picking the right one, however, may mean creating your own.

When you commit to working with Entity Framework in a code-first manner, you have four options that control when your database schema is created. In my opinion, none of them is perfect.

Why None of the Existing Initializers Are Right
For example, the default option is to create a database initializer class that inherits from CreateDatabaseIfNotExists. With this option the first line of code that accesses the database triggers the initializer. The initializer checks to see if the database exists and, if it doesn't, the initializer creates the database and populates it with tables that correspond to your object model. If your initializer includes a Seed method that adds data to your table, it's run also.

This option is the most benign of the four (that is, the least likely to do harm) and, in my opinion, the most useless: It's just not clear to me when this option makes sense. About the best you can say for CreateDatabaseIfNotExists is that it will throw an exception if, as you enhance your application, your object model and database schema are different. It won't do anything about that difference, though.

In my experience, if I'm making changes to an application, then I'm usually making changes to the objects the application uses and, as a result, I want changes made to my database schema. The very last thing I need is an exception telling me that I'm doing what I'm trying to do.

Of the provided initializers, my favorite option is DropCreateDatabaseAlways. This option means that when I press F5 to debug my application, the database is always deleted and its schema recreated. What I like about this option is that data from my last test run is blown away and (assuming I put some code in the initializer's Seed method) a fresh set of my data loaded in. In an automated testing environment, ensuring that my data is what I think it is matters to me so this is my preferred option ... of the ones provided by Entity Framework.

Which leads to my objection to the remaining two options: DropCreateDatabaseWhenModelChanges and MigrateDatabaseToLatestVersion. The DropCreateDatabaseWhenModelChanges option drops and creates my database ... but only if my database schema and object model are different. This does ensure my database schema matches my object model and that I have fresh data ... but only when my object model and database differ. And, quite frankly, most of my changes don't involve changes to my object model. As far as I'm concerned, MigrateDatabaseToLatestVersion just makes the problem worse: not only is it only triggered when I have a change to my object model, it doesn't run the Seed method when the database is first created (or whenever there are pending migrations, for that matter).

These last two options get me into debugging faster but, without fresh data, I feel that I'm just doing the wrong thing quicker.

So I created my own initializer: It only drops and recreates my schema when my object model changes. However, it always deletes any existing data and reruns my Seed method to populate the schema with fresh data.

Creating a Custom Initializer
The first step is creating a custom initializer that implements the IDatabaseInitializer interface. You'll want to be able to pass any DbContext object to your initializer, so the initializer must be set up as a generic class constrained to accept only DbContext classes (or anything that inherits from DbContext). That class begins like this:

class LoadDataDropCreateWhenChange<T> : IDatabaseInitializer<T> 
                                        where T : DbContext
{

The interface requires a single method called InitializeDatabase that accepts a DbContext object and returns void. This is the method that Entity Framework calls when that first line of code in your application accesses the database.

What I'm going to do in this method is access the database associated with the DbContext object. Once I have the database, I can check to see if it (a) exists and (b) matches my object model. If either of those condition are true I delete the database and recreate it from my object model using methods on the Database object. After that, I call the Seed method that I'll (eventually) build into my initializer.

Here's that code:

public void InitializeDatabase(T context)
{
  Database db = context.Database;

  if (!db.CompatibleWithModel(false))
  {
    db.Delete();
    db.Create();
    Seed(context);
  }

One caveat here: for CompatibleWithModel to work, my database must contain the Migration tables that hold information about my database changes. As a result, your database must be created using the Create method on the Database object, as I do here.

Clearing Out the Data
When I don't want to drop and recreate the database, I still want to clear the data out of my tables and run my Seed method. To speed up that process, I first create a transaction:

else
{
   DbContextTransaction tran = db.BeginTransaction();

Once the transaction is started, I retrieve the PropertyInfo objects for the DbSet properties that I added to my DbContext object. However, the DbContext object has some properties in addition to my DbSet properties so I have to filter for the DbSet properties. This code does just that:

   IEnumerable<PropertyInfo> dbSetProperties = 
      context.GetType()
        .GetProperties()
        .Where(p => p.PropertyType.Name.StartsWith("DbSet"));

Now that I have the PropertyInfo object for my DbSet properties, I can just loop through them, pulling the name from each property and using that in an SQL Delete command:

foreach (PropertyInfo dbProp in dbSetProperties)
{
  string tableName = dbProp.Name;
  if (tableName != null)
  {
      context.Database.ExecuteSqlCommand("Delete from " + tableName);
  }
}

This reveals an important limitation in my initializer: If you're using the TableName attribute to override your property name or counting on pluralization to match a property called Customers to a table called customer, this simple version of the code won't deal with that.

Finally, I commit the transaction and, as before, call my Seed method:

  tran.Commit();
  Seed(context);
}

Finally, of course, I need to provide a Seed method that developers using my initializer can override if they want to load their tables with some sample data. That method looks like this:

protected virtual void Seed(T context)
{

}

Obviously, I'm making a trade-off here: I'm betting that deleting the rows in each individual table is faster than dropping and recreating the whole database. That bet, I believe, usually pays off for me because, typically, I have many tables but, for unit testing purposes, relatively few rows in each table. My experience is that dropping and recreating all the tables every time is more expensive than deleting, from all the tables, what few rows I have.

Besides, if it appears that I'm losing my bet, I can just switch to my initializer to inherit from some other initializer and see which pain I prefer.

Using the Initializer
To use this initializer, all a developer needs to do is create a class that inherits from my LoadDataDropCreateWhenChange class, specifying the data context it is to be used with. The developer can then choose to override the Seed method to load the schema with some test data.

Here's an example of my initializer being used with a DbContext object called CustomerModel, along with a Seed method that loads the Customers table with a single row:

class MyInitializer: LoadDataDropCreateWhenChange<CustomerModel>
{
  protected override void Seed(CustomerModel context)
  {
    Customer cust = new Customer
    {
      Id = 1,
      FirstName = "Peter",
      // ... additional properties ... 
    };
    context.Customers.Add(cust);
    context.SaveChanges();
    base.Seed(context);
  }        
}

To have the Initializer invoked, you need to pass it to the DbContext's SetInitializer method along with a reference to your DbContext object. The most obvious place to do that is in the DbContext object's constructor. This example, using a DbContext object called CustomerModel, leverages my initializer:

class CustomerModel: DbContext
{
  public CustomerModel()
  {
    Database.SetInitializer<CustomerModel>(new MyInitializer());            
  }

And now you have an initializer that works the way you want it to.

About the Author

Peter Vogel is a system architect and principal in PH&V Information Services. PH&V provides full-stack consulting from UX design through object modeling to database design. Peter tweets about his VSM columns with the hashtag #vogelarticles. His blog posts on user experience design can be found at http://blog.learningtree.com/tag/ui/.

comments powered by Disqus

Featured

  • Full Stack Hands-On Development with .NET

    In the fast-paced realm of modern software development, proficiency across a full stack of technologies is not just beneficial, it's essential. Microsoft has an entire stack of open source development components in its .NET platform (formerly known as .NET Core) that can be used to build an end-to-end set of applications.

  • .NET-Centric Uno Platform Debuts 'Single Project' for 9 Targets

    "We've reduced the complexity of project files and eliminated the need for explicit NuGet package references, separate project libraries, or 'shared' projects."

  • Creating Reactive Applications in .NET

    In modern applications, data is being retrieved in asynchronous, real-time streams, as traditional pull requests where the clients asks for data from the server are becoming a thing of the past.

  • AI for GitHub Collaboration? Maybe Not So Much

    No doubt GitHub Copilot has been a boon for developers, but AI might not be the best tool for collaboration, according to developers weighing in on a recent social media post from the GitHub team.

  • Visual Studio 2022 Getting VS Code 'Command Palette' Equivalent

    As any Visual Studio Code user knows, the editor's command palette is a powerful tool for getting things done quickly, without having to navigate through menus and dialogs. Now, we learn how an equivalent is coming for Microsoft's flagship Visual Studio IDE, invoked by the same familiar Ctrl+Shift+P keyboard shortcut.

Subscribe on YouTube