When You're Your Own DBA: Speeding Up SQL Server
I'm supposed to be a software developer/consultant. However, I'm also the guy people look at when the application isn't running "fast enough." With business applications, speeding up the application always means speeding up data access. Sometimes tweaking my application's SQL or the way my code uses my SQL will fix the problem ... and sometimes it doesn't.
When it doesn't, I have to pretend to be a DBA. Here are the most common problems I've found that I couldn't fix with better SQL (and are also problems that I've been able to fix, of course). A real DBA, I'm sure, could tell you more.
Two caveats before I begin, though.
First: Understanding why these changes make a difference requires some explanation of how SQL Server works. Be patient through the start of this column: I do have a point and I will get to it. Second: For simplicity's sake, in this discussion, I'm going to be referring to "workers" when many smarter people than I would be distinguishing between workers, tasks and threads.
Keep Connections Short to Distribute the Load
SQL requests come to SQL Server through connections, which are assigned to schedulers. Once a connection is assigned to a scheduler that connection can't be moved.
Requests are then broken down into one or more tasks with each task assigned, through the scheduler, to a worker. Workers, in turn, each manage a thread. Each CPU available to SQL Server is assigned a scheduler. Schedulers divide up the available threads evenly among themselves (i.e. if you have eight CPUs and 255 threads in the thread pool, each scheduler gets, roughly, 32 threads). Since each thread is managed by a worker, each scheduler in my example would have about 32 workers.
This system with a significant Achilles heel. If a connection is kept open for a long time, it can hog that scheduler. The easiest way to ensure that the workload is distributed more-or-less evenly across all your schedulers is to keep connections short (you're probably already doing that).
Workers handle all the processing associated with their task. When a new request is received, a worker is assigned to it and the worker is added to SQL Server's runnable list. If all the workers on the runnable list have been assigned tasks, new requests can't start processing. You don't, therefore, want to tie up workers any longer than you have to.
Unfortunately, getting to the end of a task quickly (and, therefore, freeing up the worker) isn't as easy as it sounds. Workers can become idle, typically because they're waiting on some locked resource or for some data to be fetched from the disk, at which point they go on the waiter list. It would be too bad if a scheduler wasn't accepting new requests because all of its workers were sitting on the waiter list.
Keep Windows Out of It
SQL Server's manages these workers (and their threads) almost completely independently of Windows. This reduces context switching and prevents Windows from preemptively interrupting SQL Server tasks with higher-priority tasks.
To keep Windows out of this process, moving workers on and off the waiter list is handled by the workers themselves. When a worker frees up a resource, it's the worker's responsibility to scan the waiter list for any other workers that were waiting on that resource and move them back to the runnable list. I/O requests are handled in a similar way.
This all works really well, as long as you don't hand any threads over to Windows. If you're using an extended procedure (a procedure that calls into a method in a DLL), that's exactly what happens. Extended procedures can't fully participate in this process because SQL Server launches them on their own thread that it turns over to Windows. As a result, the thread can be preemptively suspended if Windows finds a higher priority task (you're also exposed to, potentially, many more context switches). The worker also is hampered in its ability to move other workers off the waiter list as it frees up resources (and also to acknowledge completed I/O requests). And it's not just extended procedures: OLE object calls (sp_OA* calls), distributed queries and queries to linked servers also create this problem.
If your application sporadically runs slowly it may be because SQL Server's scheduling process is being hamstrung by the occasional process that uses these offending technologies. You should be tracking down these processes and eliminating them wherever possible. If, for example, you're using extended procedures you should be moving to CLR Integration.
Manage Waits and Queues
Essentially, reducing the time that a worker spends on the waiter list is key. But if a query is waiting on a locked resource, it may not be the query's fault.
For example, your worker may be sitting on the waiter list because some other worker is locking the resource your worker needs (by resources, I typically mean "rows in a table"). If that other worker is keeping its lock short, that's not a problem. In fact, if the other worker has a lock that is compatible with whatever your worker wants to do, your worker may not need to go on the waiter list at all.
Unfortunately, fundamentally, any lock that that is worth having (i.e. anything at the default isolation level of Read_Uncommitted or above) is going to be incompatible with a lock from other workers that are trying to write the same rows. To put it another way: If your queries sometimes run quickly and sometimes don't, it may not be those queries' problem -- some other query may be holding a resource that everyone needs, holding it with an lock that's incompatible with everything else, and holding it for a long time.
The key issue in resolving this problem is to determine which worker is doing the blocking. Using sys.dm_os_waiting_tasks will give you details of the resource each worker is waiting on and help you determine if there's a common resource some other worker is locking.
One solution is to speed up the blocking worker so it holds the resource for less time. Some of that may be done through good database design: It's possible that you don't have the indexes that the blocking query needs. Alternatively, the problem may be a lack of good database maintenance: The statistics on those indexes may be out of date, causing the optimizer to make poor choices when building a query plan. Using Update Statistic (or running sp_updatestats) can help here.
In theory Auto Update Statistics should keep your statistics up-to-date (it's turned on by default but you should check). I prefer to turn Auto Update Statistics off and turn Auto Update Statistics Async on. The Async version updates your statistics after the query has run rather than before and, all things considered, I'd rather have my query execute before doing any "statistics-related" housekeeping.
Another possibility is that your indexes are badly fragmented so that they can't be used efficiently. Using sys.dm_db_index_physical_stats will let you know if your indexes need to be defragmented. A value of over 5 percent indicates that you should use Alter Index Reorganize and a value over 30 percent indicates you should use Alter Index Rebuild.
If you can't speed the blocking query up, changing the blocking query's isolation level to use Snapshot is also worth trying (this will let read operations run in parallel, though at the expense of making more demands on tempdb -- see my suggestions on tempdb below).
Another typical cause for lots of workers on the waiter list is long I/O times as workers wait for pages to be read from your disk system. Speeding up your hard drives is an obvious solution (is it time to get that solid-state drive?) ... but probably not something you can control. Putting tempdb in its own file and on its own, fast drive is something you can do and can help a lot here (you may not realize it but SQL Server uses tempdb a lot -- it's not just for your #temptables). Sizing tempdb so that it only occasionally needs to grab more space is also a good idea.
If it's just reads that are taking a long time, the most usual cause is poor indexing, which forces more pages to be read from the hard drive than should be necessary (though, to be fair, query time can also be extended through unnecessary data conversions). The fixes to indexes that I suggested in the previous section can also help here.
If it's your updates that are running slow, a lot of WriteLog wait types indicates that you're waiting on writes to SQL Server's transaction log. There's practically nothing that will slow down performance of a transactional database more generally than a slow transaction log. If your log files aren't on a separate server from the rest of your data files, now's the time to move them (putting just your transaction log on an SSD can make a big difference).
In the "only updates" run slow scenario, you'll want to look at the number of page splits that are occurring. Adding new pages at the end of your table (the typical case when a table is clustered on an Identity column whose value ascends with any new row) isn't a problem. However, inserting new rows can result in updates made elsewhere in non-clustered indexes (think of the impact of adding new customers in Akron and Honolulu and where they would go in the index on the City column). Those can generate expensive page splits.
Setting your index's FillFactor to leave more empty space in earlier index pages can sharply reduce page splits. Don't go crazy here, though, or you'll end up with reading lots of index pages to find the row you want because there's so much empty space in each index. I've found that setting an index's FillFactor to 90 percent reduces my page splists without slowing down index lookups appreciably.
I'm assuming here, by the way, that you've got Auto Grow set to on to ensure that your database files can grab new space when they get full (and have Auto Shrink turned off because it takes time, messes indexes up, and why would you want to give back that file space anyway?).
If you have a table where new columns aren't always added at the end (as is the case with, for example, an intersection or union table that supports many-to-many relationships), FillFactor is worth considering there, also.
As I suggested at the start of this column, sometimes it's your query's fault that it's running slowly. Sometimes it's not. You should, at least, make sure -- before you start tweaking your queries or table design -- that SQL Server is set up to do the right thing by your queries.
Peter Vogel is a system architect and principal in PH&V Information Services. PH&V provides full-stack consulting from UX design through object modeling to database design. Peter tweets about his VSM columns with the hashtag #vogelarticles. His blog posts on user experience design can be found at http://blog.learningtree.com/tag/ui/.