In-Depth

Eliminate Database Dependencies in Test-Driven Development

How to avoid the end-to-end integration test problem with the Repository pattern.

Many developers are tasked with writing unit tests as test-first design and development becomes commonplace, even on teams that aren't strictly practicing Agile methodologies. Test-driven development (TDD) can produce cleaner code by requiring project teams to first write unit tests that fail, then program just enough code for a needed function, retest, refactor and repeat the cycle. If you haven't written code using TDD, starting from a failing test sounds awkward. But it's this extra bit of thought about what you want to achieve that gives you a clearer understanding of what you need to accomplish.

If you're writing an n-tier application using Visual Studio and you're using TDD, it's not uncommon that your unit tests for business-tier functionality read and write to a database. In order for these tests to run, you need a running database with the most up-to-date schema along with any supporting data.

What if you're writing unit tests for a service-oriented application and your code needs to make calls to a Windows Communication Foundation (WCF) service running on a different machine? Or it makes calls to a mainframe or some other back-end system? At this point, is it even practical to write integration tests to these systems?

The solution to this end-to-end integration problem is to start using the Repository pattern in order to "mock out" your database or other back-end systems and decouple your application logic from these dependencies. Martin Fowler describes the Repository pattern by saying it "mediates between the domain and data-mapping layers using a collection-like interface for accessing domain objects" ("Patterns of Enterprise Application Architecture," Addison Wesley Professional, 2002). If your business tier uses your Repositories as interfaces, then you can easily drop in mocked versions to use from your tests and then drop in the database versions at run-time. In the end, you'll have a way to test your business-tier functionality without requiring a running database or other back-end system. As a side effect, your code will be easier to maintain because you will have decreased the level of coupling in your application.

In this article, I'll show you how to leverage the Repository pattern to eliminate the dependency on SQL Server from your unit tests and, in the process, improve your application's testability. The unit-testing features that I discuss in this article are all available in Visual Studio 2008 Professional Edition and in all Visual Studio 2008 Team Editions. The sample application is written in Visual Studio 2008 Team Suite and requires an instance of SQL Server or SQL Express. See the readme.txt for information about how to set up the database and where to find the connection string.

Too Many Moving Parts
When you're writing unit tests, the simplest implementation is to have your unit tests read and write to the database as part of the tests. This makes sense, right? Your tests are exercising your business tier (domain model, service layer, façade), presentation layer (Model-View-Controller [MVC], Model-View-Presenter [MVP]) and data-access objects.

Sure, it's a good test and it's thorough, but you'll run into some subtle problems. For example, let's say that you're working on a method in your business tier that checks to see if a user with a username exists. If the user exists, the business-tier method should return true and if the user doesn't exist, it should return false. Your unit test would look something like this:

[TestMethod]
public void UsernameExists()
{
    UserFacade facade = new UserFacade();

    string userNameThatShouldExist = "benday";
    string usernameThatShouldNotExist = "benday2";

    Assert.IsTrue(
    facade.UsernameExists(userNameThatShouldExist), 
        "User name should exist.");

    Assert.IsFalse(
   facade.UsernameExists(usernameThatShouldNotExist), 
        "User name should not exist.");
}

The UserFacade method will look something like this:

public bool UsernameExists(string username)
{
    UserDataAccess da = new UserDataAccess();

    User user = da.GetByUserName(username);

    if (user != null)
    {
        return true;
    }
    else
    {
        return false;
    }
}

It's a good test and it verifies the functionality for both the "exists" and "doesn't exist" cases-but here's where it starts to get a little sticky. For this test to be reliable, before it runs you need to put the database in a known state where the username "benday" exists and username "benday2" doesn't. It's simple enough to make that happen if you can quickly create and delete users. But what if that wasn't the case? What if the UserLogin table has a foreign key constraint that requires you to create a Person record before you create the UserLogin record?

You have to create a Person record in order to create the UserLogin record, and you have to delete the Person record first before you delete the UserLogin record. It's no big deal, but now you're doing extra work to set up the test-and it's work that isn't exactly related to what you're trying to test. Then there's the problem of making sure your business-tier and data-access-tier code is in sync with the database schema. Again, not a huge annoyance, but it's another distraction that takes you away from writing your unit test and your app's code.

Making this work with a database is too easy. Let's make your architecture more complicated. Let's say that you're writing a service-oriented application using WCF. In order to make that UsernameExists() check, the UserFacade object has to call out to a Web service. Now you have to make sure that your Web service is deployed and running and that it thinks that the "benday" username exists and that the "benday2" username doesn't exist. This isn't terrible, but it's adding a lot more moving parts to your test. What if you're sharing an instance of the Web service with other developers? If you start adding and removing users, is that going to mess up anyone else's tests? Is there any chance that two people could run the tests at roughly the same time and cause you to get false "passes" or false "failures" in your unit tests? Plus, if that shared Web service goes down, you and the other developers are blocked.

As your application gets more complicated, you'll probably add more and more unit tests with setup requirements, and after a short time it will become clear that you're doing way too much work to make your unit tests run. You designed your tests and your app with tight coupling and now you're hitting the end-to-end integration problem.

If you can remove the direct dependency on the UserDataAccess class, you can eliminate the need for the database. The code for UserFacade's UsernameExists() method is made up of two parts: data access and the logic to decide if the user exists. For the purpose of your unit test, you really only care about the logic portion, and there's no need to exercise the data-access code.

Decouple with an Interface and the Repository Pattern
A great way to separate the UsernameExists() method from the database-access code is to introduce an interface and apply the Repository pattern. The purpose of the Repository pattern is to isolate domain model (business object) functionality from data-access functionality. A Repository provides a simple interface for saving and retrieving business objects and encapsulates the implementation details of how the work gets done.


[Click on image for larger view.]
Figure 1. Avoidable Test Methods. For a unit test to be reliable, you need to put the database in a known state, which can make for extra work if a foreign key relationship exists from UserLogin to Person.

Here's a design of a Repository interface for the User object:

public interface IUserRepository : IRepository<User>
{
    User GetByUserName(string username);
}

public interface IRepository<T> 
    where T : IInt32Id
{
    IList<T> GetAll();
    T GetById(int id);
    void Save(T saveThis);
    void Delete(T deleteThis);
}

public interface IInt32Id
{
    int Id { get; set; }
}

In an n-tier app that uses the domain model pattern and saves to the database, you need a way to distinguish one object from another. This is known as object identity. The identity value for an object is usually the same as the primary key column for the table that stores the data.

When you're building an n-tier application, try to have all the objects use the same property to describe their identity. This approach keeps your object model clean and consistent. You can enforce this consistency by creating an interface that allows access to the identity property for your objects. In this article's sample application this is the IInt32Id interface, and it defines a single property named Id that's an integer.

All of the domain model objects implement IInt32Id, which helps you maintain a clean and consistent design for your repositories. In order to get good reuse, an IRepository<T> interface defines the CRUD operations for any business objects in the application. If a business object needs Repository methods that aren't in IRepository<T>, you can create an object-specific interface and add the method there. In the case of the UsernameExists() feature, an IUserRepository interface implements IRepository<User> and has a GetByUserName() method.

Now that you've defined the Repository interfaces, you can refactor UserFacade to use this interface for data access instead of the UserDataAccess object:

public UserFacade(IUserRepository repository)
{
    m_repositoryInstance = repository;
}
private IUserRepository m_repositoryInstance;

public bool UsernameExists(string username)
{
// UserDataAccess da = new UserDataAccess();
// User user = da.GetByUserName(username);

User user = m_repositoryInstance.   
    GetByUserName(username);

    if (user != null)
    {
        return true;
    }
    else
    {
        return false;
    }
} 

While the sample code above looks very similar to the original code, the UserFacade object no longer cares what data-access implementation is used. As long as you supply an instance of IUserRepository on the constructor, the UsernameExists() method has what it needs to do its work. Hiding this logic behind an interface and passing it in on the constructor gives you an opportunity to use a mock implementation of IUserRepository.

In unit testing, a mock is an object that implements an interface and provides a simplified version instead of the complete implementation that you'd use in production. In this case, you can pass in a mock object that implements IUserRepository and provides database functionality in-memory instead of using an actual database.

In-Memory Repository
It's amazing how much in-memory database functionality you can create with just an instance of List<T> and some Language Integrated Query (LINQ). The hardest thing to implement is generating IDENTITY values, and even that's just a matter of incrementing an integer variable. If you design your mock implementation to use generics, you can almost instantly create a mock Repository that can emulate database functionality for any of your business objects:

public class RepositoryMock<T> 
    : IRepository<T> where T : IInt32Id
{
    private int m_currentIdValue = 0;

    public RepositoryMock()
    {
        Items = new List<T>();
    }

    protected List<T> Items { get; set; }

    public void Save(T saveThis)
    {
        if (saveThis == null)
        {
            throw new ArgumentNullException(
                 "saveThis", "Argument cannot be null.");
        }

        if (saveThis.Id == 0)
        {
            saveThis.Id = ++m_currentIdValue;
        }

        if (Items.Contains(saveThis) == false)
        {
            Items.Add(saveThis);
        }
    }

    public T GetById(int id)
    {
        return (from item in Items
                where item.Id == id
                select item).FirstOrDefault();
    }

    public IList<T> GetAll()
    {
        return Items;
    }

    public void Delete(T saveThis)
    {
        Items.Remove(saveThis);
    }
}

In the code sample above, you can see that everything revolves around the Items property. This property contains all the items in the "database" and is the target for any LINQ queries. The Save() method takes an instance of a business object and looks to see if the Id property is 0 or not. If the value is 0, the mock Repository simulates an INSERT by incrementing the m_currentIdValue variable and setting the value onto the Id of the object. After the identity value has been set, the Save() method adds the object to the Items collection.

The GetById() method uses the Id property on the IInt32Id interface and issues a LINQ query against the Items collection to finds the first object with the Id value.

To create UsernameExists() unit-test functionality, you need to implement the GetByUserName() method that's defined on IUserRepository. To do this, you simply create a class called InMemoryUserRepository and make the new class extend from RepositoryMock<User>. From there, you write a LINQ query that looks for matching Users in the Items collection by UserName:

public class InMemoryUserRepository :
    RepositoryMock<User>, IUserRepository
{
    public InMemoryUserRepository()
    {

    }
    public User GetByUserName(string username)
    {
        if (username == null)
        {
            return null;
        }
        else
        {
            User match = (from user in Items
                          where user.UserName == username
                          select user).FirstOrDefault();

            return match;
        }
    }
}

With this in-memory implementation, you no longer have to worry about whether the database is running, or if it's populated with old data or any foreign key requirements. The setup for the UsernameExists() test is as simple as adding instances of User to the Items collection via calls to Save().

Here's the setup logic for the UsernameExists() test:

[TestInitialize]
public void SetupTest()
{
    m_userRepositoryInstance = 
    new InMemoryUserRepository();

    // add the "benday" user to the 'database'
    m_userRepositoryInstance.Save(
        new User() { 
        UserName = "benday", FirstName = "Ben", 
	  LastName = "Day" 
        });
}

Additional Benefits of the Repository Pattern
While this approach certainly makes it easier to test this method, refactoring UserFacade to use the Repository pattern also makes it easier to change how or where you store your data when you're running the application in production. Let's say that you want to convert your data store from a database to a Web service. All you'd have to do is write a Web service implementation of IUserRepository and pass it in to the UserFacade. Voila! Without changing any code in UserFacade, you're saving and retrieving from a service.

Using a mock implementation of the Repository is also helpful when dealing with dependencies in your project schedule. If your team is working on a large system, the people writing the code for data access might adhere to a different schedule than other parts of the team. Describing the contract for what's going to happen in your application via an interface is usually much faster than waiting for it to actually be written. Using interfaces and repositories, you can quickly describe the interfaces and then write mock implementations that give you enough functionality so that you can write the rest of the application.

Now that the database is eliminated from the UsernameExists() test, you might be wondering how to test the data-access logic. It's simple: Write unit tests against the database implementation of the IUserRepository. This approach actually makes for a better and cleaner test because you're testing the data-access logic directly instead of via some other piece of functionality. Unfortunately, when you write these tests, you'll still have to worry about the logic to setup and initialize the database with adequate test data. This is a bit of a pain, but at least you only have to worry about it for the Repository tests instead of for every test that uses a Repository.

Serious About Test
Over the last few years, Microsoft has made a huge push to provide developers with tools that make it easy to write unit tests and do TDD. If you wanted to write unit tests in Visual Studio 2005, you needed to use a plug-in such as NUnit or purchase one of the Team System editions. With Visual Studio 2008, the unit-testing functionality is available in the Professional Edition.

Microsoft is continuing its efforts to provide better tools within the IDE by adding new testing features to Visual Studio 2010's Team editions. One of the big ones for unit testing and TDD is a new productivity feature called Test Impact Analysis (TIA). TIA will show the developer which unit tests need to be run in order to verify that the edited code still works. This is a clear sign that Microsoft has accepted that TDD is a best practice and that developers should take it seriously.

Beyond the strictly TDD world, Team System 2010 has major improvements for Web Testing and Load Testing, as well as a whole new set of features targeted at Manual Testers (Microsoft's name for the traditional QA role). Manual Testers will now be able to manage all their test cases using Team Foundation Server (TFS), and then record tests against running applications. Once the tests have been recorded, they can be played back against future builds, thus giving the Manual Tester a way to automate a lot of their testing work.

In parallel with this push for better testing features in Visual Studio, the .NET development community has started to put lots of effort into using patterns like Repository, Strategy and Data Mapper, and to take design for testability seriously. It's all about writing high-quality code and saying "my code works"-with the ability to back that assertion up.

comments powered by Disqus

Featured

  • Microsoft Revamps Fledgling AutoGen Framework for Agentic AI

    Only at v0.4, Microsoft's AutoGen framework for agentic AI -- the hottest new trend in AI development -- has already undergone a complete revamp, going to an asynchronous, event-driven architecture.

  • IDE Irony: Coding Errors Cause 'Critical' Vulnerability in Visual Studio

    In a larger-than-normal Patch Tuesday, Microsoft warned of a "critical" vulnerability in Visual Studio that should be fixed immediately if automatic patching isn't enabled, ironically caused by coding errors.

  • Building Blazor Applications

    A trio of Blazor experts will conduct a full-day workshop for devs to learn everything about the tech a a March developer conference in Las Vegas keynoted by Microsoft execs and featuring many Microsoft devs.

  • Gradient Boosting Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the gradient boosting regression technique, where the goal is to predict a single numeric value. Compared to existing library implementations of gradient boosting regression, a from-scratch implementation allows much easier customization and integration with other .NET systems.

  • Microsoft Execs to Tackle AI and Cloud in Dev Conference Keynotes

    AI unsurprisingly is all over keynotes that Microsoft execs will helm to kick off the Visual Studio Live! developer conference in Las Vegas, March 10-14, which the company described as "a must-attend event."

Subscribe on YouTube