Speed Up Your Application with SQL Server In-Memory Tables
I've said it before and I'll say it again: If you want to speed up your application, don't look at your code, look at your data access. The two slowest things you can do in application development are to issue a call to another computer, and read or write to your hard disk. When you go to your database server, you're doing both of those things (though, to be fair, SSD drives help with the second problem).
The ultimate solution to this problem is, of course, caching: Store data at the client rather than accessing it from your database. However, if your data is volatile and being updated from multiple clients or applications, then caching can create as many problems as it solves because the data held at each of the individual clients can be different. In that scenario, it becomes difficult to determine what the "right" data is.
Server-Side Caching with Memory-Optimized Tables
The second-best solution is, when you go to your database server, hold your data in memory on your database server rather than on your hard disk. This reduces the caching problem by providing a single point of contact for all updates: The data in memory at the server is the "right" data. And that's where SQL Server's In-Memory OLTP (online transaction processing) tables come in.
In memory OLTP tables (I'll just call them memory-optimized tables from here on) were introduced in SQL Server 2014 and further enhanced with SQL Server 2016, 2017 and Azure (though, in Azure, they're only available in the Premium tier). Memory-optimized tables are, as the name suggests, completely held in memory, eliminating disk access except as an afterthought. Eliminating the disk-based component of the table allowed the SQL Server team to redesign how data access is handled (including locking and latching), resulting in substantially improved performance. There's also a memory-optimized table value type you can use in stored procedures that takes advantage of these performance gains.
"Holding data in memory" suggests that the data in those tables could be lost if your server goes down. Fortunately, by default, copies of all transactions are kept in the transaction log so if your server does go down, the tables with their current data at the time of failure are recreated when your server comes back up. You can also create a non-durable version of the table that only recreates the table structure on a restart.
You can even combine temporal tables and memory-optimized tables. In that scenario, the history of the table is kept on disk while the current data stays in memory. SQL Server handles moving historical data to disk, moving data more frequently when lots of changes have been made (to keep temporal data from swamping memory) and moving it less frequently when changes are infrequent.
In SQL Server 2014 there were a number of restrictions on memory-optimized tables: you couldn't alter their structure after they were created; many SQL operators weren't supported (Outer Joins, subqueries in Select clauses, the Distinct keyword, and the Not and Or operators, among others); and only the Latin collation was supported. Those restrictions (and others) were removed in SQL Server 2016.
There are still some limitations on memory-optimized tables: you can't use data compression with them; replication is only supported between other memory-optimized tables; server- and database-level triggers are not supported; and more. However, from a developer's point of view, it's difficult to tell the difference between a disk-based table and a memory-optimized table.
There are still two significant limitations on memory-optimized tables. The first is, obviously, size: Your database may have many tables that are far too large to be held in memory and are critical to speeding up your application. There are some strategies there that you can consider and I'll come back to those at the end of the article.
The other limitation worth mentioning is the way that OLTP treats indexes. First, memory-optimized tables don't support clustered indexes, the typical default for a table's primary key. You have to declare your primary key index either as a hash index or as a nonclustered index. You should use a hash index when most queries using the index will be retrieving a single value (that is, the usage pattern for a primary key index). You should use a nonclustered index to support queries that retrieve a range of records. "Range-based" queries includes those where the index would be used in an Order By clause or for Where clauses that don't use the equals sign (for example, !=, <,>, >=).
In either index, multiple duplicate values (more than 100 duplicate rows for the same value) reduce performance -- your primary reason for using a memory-optimized table. Where an index will have more than 100 entries for the same value, Microsoft's recommended best practice is to create a nonclustered index and add additional columns to the index to reduce duplication. That makes sense: Once you add additional columns to a hash index, the index is useless unless you specify values for all of the columns in the index.
Creating and Designing for Memory-Optimized Tables
Before adding a memory-optimized table to your database, you must create a file group that can hold memory-optimized data (you're only allowed one such file group) and add a file to the group. Those commands look like this:
Alter Database Customers
Add FileGroup custmo1_fg
Alter Database Customers
Add File (name='custmo1_con1', filename='C:\LocalData\custmo1_con1')
To FileGroup custmo1_fg;
After that, creating a memory-optimized tables is easy: You add a With clause to the end of your Create Table command specifying that you want the Memory_Optimized option turned on. To create a table to hold transactions I could use this DML:
Create Table [dbo].[Transaction] (
Id Int Not Null
Primary Key Nonclustered,
OrderId NChar(8) Not Null,
Date DateTime Not Null,
QuantityOnHand Decimal(8,2) Not Null )
With (Memory_Optimized = On;
By default, setting the Memory_Optimized option creates a durable table. If you're willing to lose all of your data when SQL Server shuts down you can add a Durability option of SCHEMA_ONLY.
Obviously, moving a large table from disk to memory may not be an option. However, there are a couple of ways to address this problem.
One approach is to use Pareto's 80/20 rule: That rule applies when, for example, 80 percent of your business comes from 20 percent of your products. In this approach, you can keep your more frequently used rows in a memory-optimized table for fast access while keeping less-frequently used rows on disk. A variation on this approach is to use the memory-optimized table to hold just key columns rather than key rows. An in-memory inventory table might consist of just the product Id and the quantity-on-hand, for example.
Another option is to take an "eventual consistency" approach and implement the producer/consumer pattern. In this pattern you would create a memory-optimized table that just holds transactions. Clients would insert rows into this table as they perform updates while a separate program (or programs) would run along behind, executing the transactions and deleting them from the table.
Really, the only limitation on memory-optimized tables is your ingenuity.
Peter Vogel is a system architect and principal in PH&V Information Services. PH&V provides full-stack consulting from UX design through object modeling to database design. Peter tweets about his VSM columns with the hashtag #vogelarticles. His blog posts on user experience design can be found at http://blog.learningtree.com/tag/ui/.