Roll Your Own Data Snapshot Utility
Determine which processes use the most database server resources (CPU and physical IO) over a given period of time.
Technology Toolbox: SQL Server
Databases are often the primary cause of poor application performance.
A common first step to diagnosing poor database performance is to run the system stored procedure, sp_who2, a SQL Server utility for determining what processes are running on a given database server. This utility is useful for highlighting any blocked processes that are causing slow application performance. However, the utility falls short in one significant regard: It doesn't identify explicitly which processes are using the database server's primary resources (CPU and physical IO).
I'll show you how to create a dba_ActivitySnapshot utility, which overcomes this limitation (Listing 1). This utility extracts information about processes running on the database server at a given point in time (the first snapshot), then waits for a given time period to pass, and, finally, extract information about the running processes again (the second snapshot). By comparing the two snapshots, it's possible to determine what processes are using the most data-base server resources (as measured by CPU and physical IO usage).
This utility extracts data about the running processes by inspecting the system view sys.sysprocesses. This view queries the underlying system table sysprocesses, which is the same table that the sp_who2 stored procedure looks up information from.
Like sp_who2, the dba_ActivitySnapshot snapshot is a stored procedure. This stored procedure accepts two parameters. The first parameter, @delayBetweenSnapshots, is the time delay between the snapshots. The second parameter, @debugMode, indicates whether the utility should also display debug information.
The first parameter uses the format HH:MM:SS, where HH represents hours, MM represents minutes, and SS represents seconds. Setting this value to five or ten seconds would help you determine the most resource intensive processes quickly. Setting the second parameter to a value of 1 causes the utility to display the contents of both snapshots. This can be useful for discovering processes that you don't capture in both snapshots.
You build the identifier of the stored procedure dynamically using the SQL Server values for server name, database name, and the name of the routine. Applying this pattern to all your stored procedures makes it much easier to identify the routine that is reporting any errors/informational messages. The output uses this format:
You can prevent any unnecessary locking by setting the transaction isolation level to "read uncommitted." The first part of the stored procedure declares two table variables: @ProcessesAtStart and @ProcessesAtEnd. These variables hold the results of the two snapshots, which are the processes running at a given time.
Each snapshot selects relevant fields from the sys.sysprocesses view. These fields include the current CPU and physical IO cumulative usage for a given process (Table 1). Note that the snapshot ignores SQL process ids (spids) that have a value of 50 or lower because those are internal SQL Server processes.
p> After the first snapshot is taken, the stored procedure suspends execution until the interval you defined in @delayBetweenSnapshots has passed. You perform this wait using the WAITFOR statement. A second snapshot is taken after the defined interval.
Finally, you calculate the CPU and physical IO activity that occurred during the time period specified. The key to the snapshot table is a concatenation of the process id (spid) and the execution context id (ecid). You use this concatenated key to join the two snapshots together.
Display the Output
The dba_ActivitySnapshot utility displays the output as two on-screen grids (Figure 1). The first grid shows which processes have used the most CPU cycles, and the second grid shows which processes have used the most physical IO in the given time period. Note that the utility reports only on those processes that have increased their CPU or physical IO activity. You sort the grids by CPU activity and physical IO activity, respectively.
The grid enhances the value of the returned data by displaying the underlying SQL used by the processes. Examining this SQL can help you discern why the performance is slow.
You get the underlying SQL by calling into the function, dba_GetSQLForSpid. The code for this function is straightforward (Listing 2). It accepts one parameter, the SQL Server process id (@spid) of a running process.
You use the passed process id to obtain the sql_handle field from the sys.sysprocesses system view. This field is a handle to the currently running T-SQL, which you can retrieve by passing the sql_handle to the Dynamic Management Function, dm_exec_sql_text. The sql_handle relates to the second snapshot, so it contains the latest T-SQL for the running process.
You create both the stored procedure (dba_ActivitySnapshot) and the function (dba_GetSQLForSpid) in the master database because they are system-wide routines.
The database snapshot utility works with SQL Server 2005. You can also make it work with SQL Server 2000 by changing any references to sys.sysprocesses to master.dbo.sysprocesses. Additionally, you should also comment out the call to the function, dba_GetSQLForSpid. If you use SQL Server 2000, sp3 or greater, you can make a call to the fn_get_sql function if you want to obtain the SQL for the running process.
Executing the snaphot utility from SQL Server Management Studio takes only a single line of code:
This utility is useful for giving you a picture of your database usage, but you do need to keep in mind a handful of caveats as you use it. For example, processes that enter and leave between the two snapshots aren't recorded. Also, processes that end before second snapshot or start after the first snapshot aren't recorded either. It's possible to diminish or eliminate these errors by reducing the value of the time period between the snapshots. That said, you'll probably use this utility only when your database is experiencing poor performance, a common cause of which is long running queries. In that light, the limitations I've mentioned shouldn't be much of a factor in helping you find bottlenecks.
Used as described, this utility should give you a better understanding of what is happening on your database server when users report poor performance issues, helping you to identify and address the problems encountered. This snapshot utility accomplishes what it sets out to do, but there's also room to augment or expand on its functionality. For example, you might also make it write details of CPU/physical IO usage to transient storage and have the utility run more regularly, taking a snapshot every 10 seconds for five minutes. This would highlight areas of conflict, including areas where you should examine your SQL code for better performance. It would also provide elementary system monitoring. If you were to express the CPU and physical IO usages as percentages, you would get a strong indication of resource usage.