Inside the TFS Databases (An Occasional Series)
Get to know the ins and outs of TfsWorkItemTracking in the first part of this series covering the most useful aspects of the TFS database.
- By Jeff Levinson
Based on many of the posts out there, it seems that a number of you are interested in the underlying Team Foundation Server (TFS) databases. The problem with this is that they're not documented and there are no relational constraints (or very few) so you can't figure it out that way, and there seems to be a lot of confusing data. On top of that, when you customize the process templates, it can get even more confusing.
Microsoft doesn't provide support for end users with regard to the database -- and for good reasons. It also doesn't guarantee forward compatibility (which is why we never recommend making changes to the existing database tables...ever). So as a warning, look but don't touch.
(Seriously, do not make any changes to the existing schema and do not ever edit the data manually. Doing so can corrupt multiple records. This article is for informational purposes so you can create reports from the relational data and look at the data directly and understand what you're looking at.)
However, if you need to write custom reports, you sort of have to understand the underlying data -- especially because not everything is reported to the data warehouse. In addition, maybe you need instant verification through reporting and the data warehouse is updated only every hour (at minimum) or longer depending on your environment. In that case, there's a good reason to write reports off of the transactional databases. In this series, I'll examine the pieces of the TFS databases that I hope you'll find most useful.
In this first part I'll spend some time in the TfsWorkItemTracking database. This database holds all of the information regarding, as you might guess, work items. There are a number of reasons why you may want to report off of this database. First, it contains the long text fields such as the description field or any other HTML type fields there may be. As an example, we'll use a simple bug that I created (Figure 1). As you can see, the bug is in the proposed state and has some basic information with it (the bug number is 389 -- remember this as this is the key in the database).
In the TfsWorkItemTracking database, run the following set of queries with SQL Server Management Studio (as another warning, if you're going to do this, create a log-in with read-only permissions only; this will prevent you from doing something accidentally):
WHERE ID = 389
WHERE ID = 389
WHERE ID = 389
This will return you data from the WorkItemsAre and WorkItemsLatest tables but no data from the WorkItemsWere table. The WorkItemsAre and WorkItemsLatest tables are virtual mirrors of each other, but you'll notice in Figure 2 that there's a key difference in the results: the Revised Date (which, as far as I can tell, is never used in this particular table) in the WorkItemsLatest table. In general, you can pick either table from which to return the latest version; I use the WorkItemsLatest table.
If you look at the data in the WorkItemsLatest table, you'll find almost all of the data (we'll cover what you don't find in a minute) with cryptic column names like Fld10027 with the value "By accident, duh." Obviously, this field name isn't helpful, so how do you resolve that to an actual field name? By using the Fields table, of course. Running this query will return you the name of the column "How Found":
WHERE FldID = 10027
To actually return the query with all of the valid names as column headers requires a fairly complex query. This structure gives the process template its flexibility. As you add fields, these fields are mapped to new columns in the work item tables and entries are made in the Fields table. This is just one reason not to fool with the tables.
The WorkItemsWere table stores changes to the work items. If I go back and make a change to the Bug (I've activated it and assigned it to "Steve") and rerun the first set of queries, I'll now have the results shown in Figure 3. As you can see, the WorkItemsWere table now contains my original entry (the work item in the proposed state) with a revision number of 1. The WorkItemsLatest and WorkItemsAre tables contain the current version of the work item. As you make more changes, the historical versions of the work items will be noted in the WorkItemsWere table. It's important to note that the Revision number (Rev column) is a key value. The primary key for any given version of a document is the ID + Rev number.
Now, about those other fields that weren't stored here. If I go back into bug 389 and make a change to the Symptoms value and save it, you'll note another revision in the work items tables, though nothing has really changed. That's because HTML fields are stored as ntext fields and are in a different table -- the WorkItemLongTexts fields. If you run the query below
WHERE ID = 389
you'll get the results shown in Figure 4. There are a couple of things about this to note, the first of which is that you don't know which fields these apply to. Again, this is based on the fields table. If you update the query in Figure 4 so that it's joined with the Fields table, you'll get more readable results:
FROM WorkItemLongTexts a
join Fields b on a.FldID = b.FldID
WHERE a.ID = 389
Again, one of the keys to note here is the Rev number. Note that it says 1 and 3 for the Symptom field. There is no Rev 2 because no changes were made to this field during the revision 2 (when I changed the work item to Active and reassigned it to Steve).
In this article, I've briefly shown you some of the key tables in the TfsWorkItemTracking database. In the next article in this series I'll show you how to map work items to projects and get complete records of any given version, and I'll examine some of the very helpful views available to you.
If you have specific things you would like to know about the database, feel free to drop me an e-mail at Jeff.Levinson@nwcadence.com and I'll try to address them in a future article.
Jeff Levinson is the Application Lifecycle Management practice lead for Northwest Cadence specializing in process and methodology. He is the co-author of "Pro Visual Studio Team System with Database Professionals" (Apress 2007), the author of "Building Client/Server Applications with VB.NET" (Apress 2003) and has written numerous articles. He is an MCAD, MCSD, MCDBA, MCT and is a Team System MVP. He has a Masters in Software Engineering from Carnegie Mellon University and is a former Solutions Design and Integration Architect for The Boeing Company. You can reach him at Jeff.Levinson@nwcadence.com.