Inside VSTS

Inside the TFS Databases, Part 2

Jeff continues his exploration of the TfsWorkItemTracking database by showing you how to retrieve a record from a given point in time using some helpful SQL statements.

In the first installment of this series, I worked through some of the basic structure of the TfsWorkItemTracking database. I'll continue with this examination in this column and provide some helpful SQL statements to get you started.

To start, let's look at how you associated work items to projects since there is no project information in the WorkItems tables. The relationship between projects and work items is made through the work item classification (Areas and Iterations). This information is stored in several places: the TreeNodes, TreeIDPairs and the xxTree tables. While the TreeNodes and TreeIDPairs contain valuable information, the xxTree is the easiest source to work with for area and iteration information.

The schema for xxTree is shown in Listing 1. These are the important columns to note:

  • ID: This is the primary key for the node.
  • Parent ID: 0 if this is the top level node; otherwise, it refers to the PK of the parent node.
  • Node Name: This is the name of the node.
  • Node Type: This is either "TF Server" for the server name, "Team Project" for the root node in any area or iteration hierarchy, or "Area Level x" or "Iteration Level x." These last two refer to the column in which the path to that node is stored.
  • Team Project: This is the project which the node is a part of.

Looking back in the WorkItemsAre table, you'll note an "AreaID" column. This column maps back to the ID column of the xxTree table and allows you to classify the work item as part of the project. In addition, the IterationID in the WorkItemsAre table allows you to classify the work item as part of a given iteration. Note that while I refer to the WorkItemsAre table, these columns exist in each of the work items tables (WorkItemsAre, WorkItemsWere, WorkItemsLatest).

The following shows a simple select statement to return the work items as part of a given project. While this is a simplistic statement, it gives you the starting point to construct your own queries:

DECLARE @project nvarchar(255) = 'Test Project'

SELECT	a.[Team Project],
	b.Title
FROM	xxTree a
JOIN	WorkItemsAre b on a.ID = b.AreaID
WHERE	a.[Team Project] = @project

At this point, you can now query the work items and place them in relationship to areas, iterations and projects, but you still can't retrieve a complete work item. What do I mean by this? Well, you can't retrieve any fields in your work item type that are denoted as PlainText or HTMLText because these are stored in a different table -- the WorkItemLongTexts table. In this section, I'll show you how to retrieve that information. But before I dive into the table, you might be asking the question, "Why do I need to get these pieces of information?"

A few weeks ago a customer asked me a simple question: If I put all of my information into TFS, how do I get it out? My first thought was, "Why do you need to get it out since anyone can look at it?" But I realized that was ridiculous; business people and executives like to read Word documents or printed documents. This brings up a couple of things. You have to report off of the WorkItemTracking database -- the long text fields aren't even reported to the TfsWarehouse database. The types of queries you write here will be about recreating a requirements document or a use case. In addition, you have to deal with formatting and a few other issues.

The WorkItemLongTexts table is a very simple table (the schema is shown in Listing 2). There are four critical columns in this table:

  • ID: The primary key of the long text.
  • FldID: The field that the long text belongs to (you'll notice that by default there's FldID 52 and 54 for almost all items in this table which correspond to the Description and History fields, respectively).
  • Words: The text of the field. Nnote the HTML tags -- which you may or may not have -- for some of the FldID 54 fields (<P>, <OL>, <LI>, etc.).
  • Rev: The revision number of the work item that this text belongs to. Every version of the long texts is stored separately.

Before you can do anything else, you have to resolve the FldID column to a real value so you can filter on something more intelligent than an ID. The Fields table stores the relevant information for this value. The query seen below will retrieve the field associated with the long text:

select	x.ID,
	y.Name,
	x.Words
from	WorkItemLongTexts x
join	Fields y on x.FldID = y.FldID

Next, you need to be able to retrieve a specific version of the long text fields so you can construct a complete work item. For all practical purposes, getting a specific version of the long texts is difficult, the reason being that the Rev field in the WorkItemLongTexts table only stores the revision when a change is made to one of the long text fields. Take the following example:

  1. Add a new work item (Rev 1).
  2. Make a change to the description field (Rev 2 for the work item and the long text).
  3. Make a change to the state (Rev 3 for the work item).
  4. Make another change to the state (Rev 4 for the work item).
  5. Make a change to the description field (Rev 5 for the work item and description field).

Now, go back and try to reconstruct the work item at Rev 4. The issue is that the WorkItemLongTexts only stores Revs 2 and 5. Essentially, you have to write a query that says if the specific revision number doesn't exist, take the next lowest revision number. Not the simplest thing to do.

But let's come back to this later. For the moment, let's work on the latest version of a work item. The first step is to get the actual latest version of a long text value. To do that, I'll use the query shown in Listing 3. The inner query simply retrieves the values needed to ensure that when we pull back the Word field, it will be the Word field associated with the latest revision for a given work item and field. The outer query retrieves the Word field associated with that version.

However, this still doesn't help us because the format isn't usable in a reporting scenario; everything has to be on one line so that it's read as a single record. To do that, we're going to do a cheap man's pivot. The difficulty is that long text fields can't be pivoted using the PIVOT SQL statement so we have to do it manually -- but we can make it faster than in previous versions of SQL Server by using a Common Table Expression (CTE). To do this, use the SQL shown in Listing 4. This query now returns one work item per row with the long text fields included. Now you have a suitable query for writing reports.

Let's step back to the specific version of a work item. There are really two ways to query for it: by a specific revision number or by a date (i.e., give me the work item as of this date). Querying by a revision number is by far the easier prospect of the two. The query shown in Listing 5 will do the trick but requires just a bit of explanation. The CTE requests the latest version of the long text that's less than or equal to the latest revision. The reason for this is that since the long text is only incremented when it's changed, you need to carry forward an "older" (i.e., unrevised) version to the current revision that you're looking for. Since there is a revision made for any change to a work item (the changed date is incremented), all revisions are available in the WorkItemsWere table and so no special handling is required here.

However, there's one specific problem with this query: You can't request the latest version. Now, I'm not saying this is the prettiest SQL ever and I'm open to suggestions on how to improve it, but the SQL for querying everything is shown in Listing 6. The solution is essentially simple -- union the WorkItemsAre and WorkItemsWere. Note that if you union them as I've shown, the result set is ordered by the revision number.

Back to the date issue. There's no easy way to do this within a single SQL statement. The reason for this is that on top of the previous issue of the long texts not being updated for every change to a work item, there's the issue of the dates being full dates. That is, they store their values as "mm/dd/yy hh:mm:ss.ms." Requesting a specific date -- if more than one change was made on that date -- is difficult. You'd have to prompt the user to enter a specific date and time.

Also, the queries I've been using won't work for this because you have to work from the WorkItemsAre and WorkItemsWere table backward since a union between the two tables is the only way to ensure that you don't miss a specific revision. Once you have the revision number you're looking for, you can plug that value into the previous queries to get what you need.

This article should have given you a general understanding of the basics of the WorkItemTracking database. You can now retrieve a given record at a given point in time with all of the necessary fields. In a future article, I'll write about creating a report with this creation and some ideas for formatting which always seems to trip people up. If you have any questions or ideas on how to improve on this process, let me know at [email protected].

About the Author

Jeff Levinson is the Application Lifecycle Management practice lead for Northwest Cadence specializing in process and methodology. He is the co-author of "Pro Visual Studio Team System with Database Professionals" (Apress 2007), the author of "Building Client/Server Applications with VB.NET" (Apress 2003) and has written numerous articles. He is an MCAD, MCSD, MCDBA, MCT and is a Team System MVP. He has a Masters in Software Engineering from Carnegie Mellon University and is a former Solutions Design and Integration Architect for The Boeing Company. You can reach him at [email protected].

comments powered by Disqus

Featured

  • AdaBoost Binary Classification Using C#

    Dr. James McCaffrey from Microsoft Research presents a C# program that illustrates using the AdaBoost algorithm to perform binary classification for spam detection. Compared to other classification algorithms, AdaBoost is powerful and works well with small datasets, but is sometimes susceptible to model overfitting.

  • From Core to Containers to Orchestration: Modernizing Your Azure Compute

    The cloud changed IT forever. And then containers changed the cloud. And then Kubernetes changed containers. And then microservices usurped monoliths, and so it goes in the cloudscape. Here's help to sort it all out.

  • The Well-Architected Architect on Azure

    In the dynamic field of cloud computing, the architect's role is increasingly pivotal as they must navigate a complex landscape, considering everything from the overarching architecture and individual service configurations to the various trade-offs involved. Here's help.

  • Windows Community Toolkit Update Improves Controls

    The Windows Community Toolkit advanced to version 8.1, adding new features, improving existing controls and making dependency changes.

Subscribe on YouTube