Speeding Up Queries: How to Think About Query Plans
Often, designing databases and speeding up SQL Server queries are treated as a kind of abstract problem: A just "Find this problem, apply this solution" approach to problem solving. In practice, picking the right solution requires having a deep understanding of how your application and its users actually interact with your data. A generic solution that's applied blindly can often just create new and more exciting problems. A good example of this is in looking for solutions to manage query plans.
It's worth recognizing that many SQL statements aren't, in fact, planned in any meaningful sense. One of the first things that the SQL Server optimizer does when it sees a query is determine if a query is "trivial." If a query is deemed trivial, the optimizer skips most of the planning process.
However, if a query is planned, the optimizer recognizes that it would be silly to plan it again. This is a good thing because, in our applications, the same queries do keep coming around again and again. So, once a plan is created, SQL Server flags it with a couple of identifiers that tie the plan to that query (or one very much like it) and puts that plan in a cache. When that query (or one very much like it) shows up again, then the query optimizer can skip the planning process, as it does with trivial plans.
By the way, this is worth remembering when someone tells you that a major benefit of using stored procedures is that they don't require any planning. In fact, SQL Server tries to plan only queries that aren't in stored procedures if they're (a) worth planning and (b) are running for the first time. Over time, as plans build up in the cache, only a small number of queries will require planning (ideally, less than 15 percent).
This is obviously a good thing ... but at least two things can go wrong.
For example, imagine the index on the Gender column of a table listing all valid drivers in Saudi Arabia. It's only recently that women have been allowed to drive in Saudi Arabi, so the number of women in the column is going to be very small.
An index on the Gender column is, obviously, a great index to use to find all the women drivers -- that index will reduce the number of rows to process to some very small number. Equally obviously, an index on the Gender column would be a terrible index to use to find male drivers because it would return virtually the whole table -- SQL Server would be better off just to scan the whole table.
But let's say that the first query used against the table is one to find all the women drivers. A plan is generated and added to the cache ... and will be used the next time the query is executed. If this query is usually used to find women drivers, this isn't a problem. If, however, the query is used later to find the male drivers, then only the value used against the Gender column will be changed. As a result, the plan for the women drivers will probably be used again ... and execution time is going to suffer.
Much, therefore, depends on whether the query used to generate the plan uses "typical" parameters and whether the query that generates the plan is a "typical" example of the query. After all, if you don't search for male drivers very often, then it doesn't matter if that query runs slowly, provided the "typical" query for female drivers runs quickly.
Understanding Your Data
There are three ways to address this problem, but picking the right one depends on how your application uses the data.
First, do nothing. If you assume the first version of the query seen by the optimizer is the most common version of the query, then the query plan that's generated will give the best results for the most "typical" query. It doesn't matter what value that first query uses, it's going to give the best result for your most common queries. Queries using "atypical" parameters will run slowly, but life's just like that.
Second, use Optimize For to specify the parameter value to be used when planning the query. Optimize For lets you pick a value that represents the "typical" query so that the plan will be optimized for the most common runs. This makes sense if the "atypical" values are used frequently enough that they may be the version used to generate the plan (but aren't used frequently enough to be allowed to control the plan).
Third, use With Recompile with your query. This causes the query to be re-planned each time it runs. The assumption here is that the cost of re-planning is less than the cost of running with an "atypical" parameter. This makes sense if queries with "atypical" parameters crop up frequently enough that you can't afford to let them run slowly.
My point here is that there isn't an abstract "right" answer. It's only by understanding both which parameter values aren't "typical" and how "typical" each query is that you can make the right decisions. In other words, knowing your users is every bit as important as knowing your technology.
About the Author
Peter Vogel is a system architect and principal in PH&V Information Services. PH&V provides full-stack consulting from UX design through object modeling to database design. Peter tweets about his VSM columns with the hashtag #vogelarticles. His blog posts on user experience design can be found at http://blog.learningtree.com/tag/ui/.