Getting Started

Learn to Read a Data Model

Data models provide important insights into your company's business. Learn how to read one, as well as some tips for creating strong data models of your own.

Technology Toolbox: Visual Basic, C#

Reading a data model can be a little intimidating, and comprehending some of the larger models can be comparable to reading the schematic of a 747 airliner.

Yet, new developers are often thrown into situations where reading a data model is essential to doing their jobs well. This article shows you how to read a data model to attain a basic understanding of a business application's core building blocks.

When you look at a data model, you need to ask yourself whether it is a logical or physical model. Typically managers, analysts, and business people want to read the logical model, because it illustrates the business flow within the organization. Developers, however, prefer to read the physical model as they write programs that actually update the tables that populate the model. The goal of a logical model is to define the business rules for a particular aspect of the business. Database administrators often take a logical model and deconstruct it to gain additional insight and improvements in performance and efficiency. A logical model is similar to a flowchart diagram in many respects; if your goal is to gain an understanding of the business, use the logical model. Alternatively, the physical model is a view of the actual layout of the physical elements, or tables, in a database.

A data model tells a story, or rather a lot of short stories that are somehow related. There is a trick to reading a large data model. I'll share this trick with you in this article, but first it's important to understand the basic meaning of the boxes and lines you build a model with (see Figure 1).

Each box in the model is called an entity, or table. An entity represents a unique item—a customer, an employee, a department—within your business. It might help you to think of an entity as the nouns in a story. For example, consider the text, "The customer placed an order." In this story, the customer and the order comprise two distinct entities. The data modeler typically starts off by going over existing documentation, and conducting interviews with the business people who live the process on a daily basis. From the result of this research, a picture or story begins to emerge.

Several rules govern what fields of data you gather into each of these entities. If you work with data modelers much, you hear terms like "First Normal Form," "Third Normal Form," and so on. Don't worry about the meaning of these phrases. The thought behind these rules is that data fields (or columns) within a table should be related only to a given entity. If you hear the term "Normalized," it means the model follows the guidelines of these "Normal Forms."

Design a Model
Assume you want to design an entity that represents a customer of your company. You need to keep in mind a couple caveats. First, you don't want to add a field for capturing the division within your company that this customer does business with. That information has nothing to do with the customer; rather, it's information on how your company views that customer, so you shouldn't include it here.

You also want to avoid combining multiple pieces of data within one field. It breaks modeling rules to use an address column that contains the entire address within one column. You want to avoid this because it makes the data much more difficult to work with. Separating the data elements into distinct columns is important. Doing so allows you to manage the quality of the data you save. In this example, combining all the address fields into one makes it difficult to enforce a business rule that all addresses in the customer table must have a zip code.

That said, some tables in the model don't represent a distinct entity, but a relationship between two entities. For example, the requirements for a given app might dictate that more than one employee will work on an order, so the app needs to capture each employee who works on a given order, as well as the date and time this work occurs. These tables are called associative entities, and you use them to connect two entities that have a relationship to each other.

Certain restrictions apply, however. You cannot put the network_id of the employee who took the order on the order table because that wouldn't allow for more than one employee to be associated with the order. You also cannot put the order_number on the employee table because an employee can work on more than one order. The associative table allows us to connect many different employees to many different orders. In the sample model, the Employee_Order table serves the purpose of associating many different employees with an order (see Figure 1). In this model, you can associate many employees with many orders.

A well-designed model includes a primary key. A primary key is typically a column that identifies uniquely one row of data in the table. For example, assume you have an employee table. A good candidate for the primary key in your organization might be the network identifier the employee uses to log on to the network. In most environments, this moniker is unique per employee. This unique identifier can also be a meaningless system generated number. Some database experts think the unique identifier should never be based on real data, not least because real data you base this identifier on might have to change in the future.

By themselves, the tables in a model describe nothing more than the data you collect for each entity, as well as how you identify each item uniquely. The lines in the data model tell you a story by showing you how these entities are related to each other.

It's All Connected
In a good model, every entity is connected to at least one other entity, and most are connected to many other entities. The reason for this is simple; everything in your business is connected to something else. Customers place orders, orders are taken by employees, and employees work within a department. The lines on your model defining these relationships are called foreign keys. Think of a foreign key as a contract between two entities. It's one entity saying to the other, "I promise that I will keep in my table only the primary key of your table as a reference to an item in your table."

If you want to create a new order in the model, you must first have the customer in the customer table. In the sample model, the order table has a foreign key to the customer table; the customer_number element must already be on the customer table before the order table allows an employee to save the order.

Together, these lines help you enforce data integrity. Without the lines, you might enter an order for the customer, "Acme Anvils," and make up some fake customer number. But the shipping department will have no address for the company when it goes to ship the order. This commitment also works to help you keep your existing data healthy and intact. For example, you cannot delete a customer that has orders.

A data model enables you to be incredibly efficient in managing your data. For example, imagine a database where every order in the order table contained a full set of customer data, address, phone number, key contacts, and so on. That would be a lot of redundant data. Now imagine a customer calls and tells you he or she has changed his or her phone number. In this case, you would have to go through every table in the database, look for that customer, and update his or her phone number. With any luck, everyone spelled "Rays Copier Service" exactly the same way when they entered the order. Of course, the better approach is not to depend on such a fact, and reduce instances of redundant data. A data model helps you do that.

That's about it for what you need to know when creating your own data model. At the outset, I promised I'd also show you a trick for reading a large data model. The trick is to read the model a bit at a time. No one—not even an experienced data modeler—can look at a medium-to-large model and understand all the business rules at a glance. The best way to understand what the model is trying to communicate is to break it up into pieces, or subject areas. This is really what an enterprise data model is, a collection of related subject areas, or stories. If your database group doesn't already have one, ask for a diagram of the logical model with just the tables in the subject you want to understand. Next, pick any entity in the model and follow the lines to some of its closest neighbors. Generally, the tables with the most lines attached will be core elements within the model. Once you can read all these "short stories," the model as a whole will start to make sense.

Take a look at the sample model again. Treated as a story, the data model might read like this:

All employees work within one department, and each department belongs to only one division. Each employee is identified by his or her network login. Customers place orders, and the orders are taken by an employee. After every order is taken, you keep a record of each change made when an employee works on the order. You keep only one address on file for the customer. The business can track orders by customer, employee, department, and division.

About the Author

Steven Berringer is a developer and MCDBA. He has developed with the Microsoft toolset for 12 years and received an Honorable Mention in the 2003 SQL Server Magazine Innovator Awards.

comments powered by Disqus

Featured

  • Compare New GitHub Copilot Free Plan for Visual Studio/VS Code to Paid Plans

    The free plan restricts the number of completions, chat requests and access to AI models, being suitable for occasional users and small projects.

  • Diving Deep into .NET MAUI

    Ever since someone figured out that fiddling bits results in source code, developers have sought one codebase for all types of apps on all platforms, with Microsoft's latest attempt to further that effort being .NET MAUI.

  • Copilot AI Boosts Abound in New VS Code v1.96

    Microsoft improved on its new "Copilot Edit" functionality in the latest release of Visual Studio Code, v1.96, its open-source based code editor that has become the most popular in the world according to many surveys.

  • AdaBoost Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the AdaBoost.R2 algorithm for regression problems (where the goal is to predict a single numeric value). The implementation follows the original source research paper closely, so you can use it as a guide for customization for specific scenarios.

  • Versioning and Documenting ASP.NET Core Services

    Building an API with ASP.NET Core is only half the job. If your API is going to live more than one release cycle, you're going to need to version it. If you have other people building clients for it, you're going to need to document it.

Subscribe on YouTube