Practical .NET
Managing Temporal Data in SQL Server 2016 and Azure SQL Database
SQL Server 2016 and Azure SQL Database both give you tables that automatically keep track of the history of your data. Here's how to retrieve that historical data.
In an earlier column, I showed how to create a temporal table in SQL Server 2016 or Azure SQL Database.
Once you've defined a temporal table, SQL Server automatically keeps a history of all the changes that the rows in the table go through, storing old versions of the data in a "historical" table separate from the original table. In addition, SQL Server adds time-based columns to the two tables specifying when each row was the "current" row. All of this means you can retrieve the state of your data for any point in time, a feature that can be useful in analyzing trends, recovering from unfortunate design decisions or debugging production problems.
The good news here are the new keywords in SQL that support querying for rows in a point of time. The bad news is that Entity Framework doesn't directly support temporal tables and LINQ doesn't include corresponding keywords (though, of course, this may change with .NET Core). To take advantage of temporal tables, you'll need to write SQL statements.
Retrieving Data for a Point in Time
The key change in SQL is the FOR SYSTEM_TIME clause that you use with the From clause to retrieve historical rows. FOR SYSTEM_TIME supports four sub-clauses that allow you to retrieve data at a point in time or retrieve all the historical rows in a period (or just all the historical rows).
To get rows for a point in time, you use the AS OF sub-clause, passing the date you're interested in. This example retrieves the rows from the SalesOrders table that were current at a specific time on Dec. 31:
Select *
From dbo.Orders
FOR SYSTEM_TIME AS OF '2017-10-31 17:46:38.1500000'
As is often the case with datetimes in SQL Server, you must be aware that a bare date (like "2017-10-31" in my sample query) includes a time: It's midnight in the morning of Oct. 31 (‘2017-10-31 00:00:00.00'). AS OF takes that time seriously, so a query like this retrieves the rows as if they were first thing in the morning of Oct. 31:
Select *
From dbo.Orders
FOR SYSTEM_TIME AS OF '2017-10-31'
This means that if you want to know what the data looked like at the end of the business day, you'll need to pick an appropriate time. Assuming that you live on the east coast of North America, are storing your times in UTC format and want to know what your rows looked like at 5 p.m. local time, then you'll need a query like this:
Select *
From dbo.Orders
FOR SYSTEM_TIME AS OF '2017-10-31 22:00:00'
The FOR SYSTEM_TIME clause is used on the FROM clause so it applies uniformly to all the tables involved in the query. If you use a JOIN to include related temporal tables, then you'll see the rows that were current in those related tables at the specified time. There are some potential complexities here. If, for example, a row is added to a table after 5 p.m., then it wasn't in the table at 5 p.m. and won't appear in the result of my previous query (on the other hand, a row deleted after 5 p.m. was in the table at 5 p.m. and will appear in the result). This may require you to include OUTER JOINs to retrieve all the data you want.
Retrieving History
Of course, sometimes you're not interested in the data as it existed at some point, but want to look at how the data changed over time. The simplest way to retrieve those rows is to use the ALL sub-clause, which retrieves all the historical rows. I suspect that, most of the time, you'll be doing that for selected rows rather than for a whole table. Therefore, to see the history for Order A123, you might use a query like this:
Select *
From dbo.Orders
FOR SYSTEM_TIME ALL
Where Id = "A123"
Of course, without an ORDER BY clause you can't guarantee the order in which you'll get your rows. In fact, the default order seems to be inherently confusing: the current row at the top followed by the historical rows from the oldest to the newest. A better query that orders the rows from more recent to oldest looks like this:
Select *
FROM dbo.Orders
FOR SYSTEM_TIME ALL
Where Id = "A123"
Order by StartTime Desc
One warning: You may not get all of the historical records with ALL (or any of the sub-clauses, as far as that goes). If you've made multiple updates to a single row in a single transaction, each of those changes is stored as a separate row in the historical table. However, when you use FOR SYSTEM_TIME, those rows will be combined into a single row in the result you get back. If you really do want to see all of the historical rows, then you'll need to query the history table itself (which is a good reason to give your history table a name, as I did in my previous column).
Of course, if your temporal table has been around for any length of time, then you probably don't want all the historical rows -- you'll want to restrict the history to a specific period. You have three sub-clauses that let you restrict the historical rows you're retrieving: BETWEEN…AND, FROM…TO and CONTAINED IN.
Both BETWEEN…AND/FROM…TO retrieve rows that existed between the start and end date. The difference between the two is that BETWEEN…AND will include rows that were changed on the end date while FROM…TO will exclude them. This query, using FROM…TO returns all the historical rows between midnight on the morning of Oct. 1 and up to (but not including) midnight, the morning of Nov. 1:
Select *
From dbo.Orders
FOR SYSTEM_TIME FROM '2017-10-31' To '2017-11-1'
Where Id = "A123"
Order by StartTime Desc
This query, on the other hand, adds in any historical rows that were changed on midnight, Nov. 1:
Select *
From dbo.Orders
FOR SYSTEM_TIME BETWEEN '2017-10-31' AND '2017-11-1'
Where Id = "A123"
Order by StartTime Desc
CONTAINED IN is more restrictive than the other two: It retrieves the rows that existed only during the specified period -- in other words, rows that were changed both after and before the two dates. Rows that existed before or after the two endpoints won't appear in the result (this makes CONTAINED IN good for finding "phantom" rows). CONTAINED IN is a function, so its syntax is different from the previous sub-clauses. This query finds all the rows changed at least twice during October while omitting historical rows as they were before the start and after the end of the month:
Select *
From dbo.Orders
FOR SYSTEM_TIME CONTAINED IN ('2017-10-31', '2017-11-1')
Where Id = "A123"
Order by StartTime Desc
If you've ever tried to maintain an audit table for tracking changes made to some other table and struggled with the SQL statements that would let you extract the data you wanted (especially when using JOINs), then your life has just gotten simpler. This change also reflects that the world doesn't hold still. To misquote someone famous, real-world data is "a big ball of wobbly, timey wimey stuff" … and now you can query it.
About the Author
Peter Vogel is a system architect and principal in PH&V Information Services. PH&V provides full-stack consulting from UX design through object modeling to database design. Peter tweets about his VSM columns with the hashtag #vogelarticles. His blog posts on user experience design can be found at http://blog.learningtree.com/tag/ui/.