Make Code Robust With Flexible Queries
Stored procedures don't provide the flexibility old-style concatenated SQL does. By abstracting the filter and creating joined tables dynamically, you build a system that is robust in the face of data structure changes.
Technology Toolbox: VB.NET, SQL Server 2000, XML
In the past, if you needed extensive flexibility in data selection, you'd probably concatenate the query as a text string in your application. However, this concatenated Transact-SQL (T-SQL) is vulnerable to SQL injection attacks and platform-dependent, which suggests you should use stored procedures as a best practice. But this presents a new problem: What if you can't use stored procedures because you need to offer dozens or hundreds of filtering options? Stored procedures just don't provide the flexibility old-style concatenated SQL does.
I'll show you how to resolve this issue with platform-independent query abstractions you can express in the dialect of your back end. These abstractions also let you infer relations, rather than express them directly, to simplify your .NET code and make it more robust. I'll extend these ideas to include child queries in a single database call, and I'll show you how to parameterize the queries for protection against SQL injection. I've included a WinForms application for testing this system that you can download. It also illustrates how the query is broken down for the abstraction (see Figure 1).
Stored procedures execute under the privileges of the person who created them, so you can reduce the permissions granted to users of your application. This benefit is not available when you pass SQL as text in the solution I'll describe, so I suggest you use this article's solution only for SELECT queries and when you require a good deal of flexibility. Restricting non-stored procedure usage to SELECT queries lets you limit user permissions to SELECT access.
In this scenario, performance is secondary. The T-SQL won't be as fast as possible, but the queries will provide solid performance, assuming you're using the appropriate indexes. Parameterizing your SQL queries protects you against SQL injection attacks because SQL Server doesn't execute values passed as parameters unless you use sp_executesql, which I don't recommend doing.
Use the phrase query to refer to the entire set of SQL statementsthe entire T-SQL batchsent as a group to the server. This batch might consist of different types of SELECT statements. Support child retrieval and isolate the joins required for the filter (WHERE clause) from those required for the display columns by selecting the primary keys for the main table into a SQL Server temp table. SQL Server automatically manages temp tables, which are prefixed by a pound sign (#). The query always includes selections from the main table, and might optionally include statements to retrieve child data (see Listing 1 for a sample of these T-SQL statements). The filter criteria include support for a parenthetical segment, allowing you to control execution by overriding operator precedence when necessary.
Shift the Burden
Core to this approach is an abstraction for the query. An abstraction is a set of classes that work together to provide the behavior you're modeling. In addition to the parts of the SQL query itself, you'll want to shift the burden for detecting relations, and ultimately for creating JOIN clauses, from the programmer to the system. Inferring relations depends on the definition of tables and relations in your database. The abstraction also must support parameterization of the query. All these tasks force the resulting abstraction to consist of many classes (see the BaseQuery project in the downloadable code). The core Query class contains the main table (as a string), an array of strings for each child table, a strongly typed collection of DisplayColumn objects, and a Filter object (see Figure 2 for an illustration of the relations between these classes). The DisplayColumn objects simply contain table and column names.
The Filter class is more complex; it includes a collection of BaseFilterFragment objects, which can contain any combination of derived FilterItem objects or FilterParenFragment objects. FilterParenFragments have an additional collection of BaseFilterFragments, allowing you to nest parentheses to any depth. FilterItem objects define a comparison expression for the filter. Each FilterItem contains a column name, a table name, an operator, and a value. FilterItems can also contain the preceding conjugation. You can extend this structure to support operators with multiple values (such as BETWEEN and IN). You can also extend it to support comparisons with other columns.
Both filters and display columns allow you to select from many tables. SQL expresses the relation between tables as JOIN clauses. You don't define these joins in the approach shown here; the system infers them. You could infer the database relationships directly from the database at run time, but this would incur an unnecessary performance penalty. At least two other approaches exist. If you're using code generation, you can hard-code this information into your application. If you're not, you can use a technique based on XML extracted and saved to the user's machine, which is the approach I describe here. I used information schemas to extract this information, so it shouldn't take much to modify it for any SQL-92-compatible database (sorry, Access isn't SQL-92-compliant). I used a complete extraction tool I'd written already, and simplified the output with additional Extensible Stylesheet Language Transformations (XSLT) (see Listing 2 for a portion of the resulting structure).
Locate the Closest Relation
Determining the closest relation for each table is a graph theory problem. You can think of each table as a node with a line extending to its parent and child tables. You want the closest relation, so you need to expand the search in a concentric manner. First, look for any parent or child relations, then repeat the process for grandchildren and grandparents, and so on. If you only take each path to its conclusion, you might find a distant complex join before a simple one, and you'd be dependent on the ordering of the structure. You must determine the tables of interest, then cycle to a maximum depth through the adjacent nodes for each table (see Listing 3). The result is a structure of RelationInfo objects; again, you abstract the reusable stuff from the actual SQL generation.
You can easily avoid duplicates in the list of tables you're searching for, but a table still might appear twice in the join. For example, if you're interested in the Customers table and its Orders child and OrdersDetail grandchild, the Orders table appears twice. It appears once as the requested relation and once on the path to OrdersDetail. The simplest solution is to allow the duplicates, then remove them. Be sure to remove them in the same order you're building the JOIN clause, so tables are available in the JOIN when they're needed.
Armed with the structure defining your query, you're ready to dive into creating the T-SQL. A common base class allows you to interchange providers, perhaps based on a configuration setting. Each provider, such as your SQL provider, provides access to specific functionality. It also allows you to use the same query information to access data through completely different mechanisms, including custom caches and XML data.
The actual query creation is a string concatenation process. A divide-and-conquer approach with many discrete methods makes it easier to track down specific locations during debugging, but the concatenation is straightforward if the structure is well-built. For example, building the WHERE clause includes a loop containing code to add each portion of the clause:
With CType(filterItems(i), FilterItem)
paramInfo = New parameterInfo(paramInfoColl.Count, .Value)
sqlString &= " [" & .Table & "].[" & .Column & "] " & _
GetOperator(.Operator) & " @" & paramInfo.Name
Use ParamInfoColl to retrieve values for the parameters of your SQLCommand object when you retrieve data.
You can extend these ideas to control the children you retrieve or extend the Filter or FilterItem objects to include Top or OrderBy properties. You might also want to limit the tables available for selection to those with legal joins to the main table, rather than throwing an exception later. On the surface, this approach seems more complex than simply concatenating a string. But by abstracting the filter and creating the joined tables dynamically, you've built a system that is more robust in the face of data structure changes. Retrieving children in the same round trip improves your performance; forcing all SELECT statements through a single location makes debugging easier; and most important, using parameterized queries protects your system against current SQL injection attacks.
Kathleen is a consultant, author, trainer and speaker. She’s been a Microsoft MVP for 10 years and is an active member of the INETA Speaker’s Bureau where she receives high marks for her talks. She wrote "Code Generation in Microsoft .NET" (Apress) and often speaks at industry conferences and local user groups around the U.S. Kathleen is the founder and principal of GenDotNet and continues to research code generation and metadata as well as leveraging new technologies springing forth in .NET 3.5. Her passion is helping programmers be smarter in how they develop and consume the range of new technologies, but at the end of the day, she’s a coder writing applications just like you. Reach her at firstname.lastname@example.org.