Developer Product Briefs

SQLCLR for the .NET Developer

Learn about SQLCLR, writing database objects in managed code, and configuring SQL Server to use code located within .NET assemblies.

SQLCLR for the .NET Developer
Learn about SQLCLR, writing database objects in managed code, and configuring SQL Server to use code located within .NET assemblies.
by Jason Follas

May 1, 2006

The latest version of Microsoft's flagship database, SQL Server 2005, introduces an enormous number of enhancements to the product's security, scalability, performance, and programmability features. One such major enhancement (to the delight of .NET developers everywhere, who often struggle with writing complex Transact-SQL) is SQLCLR, and with it, the promise of being able to write .NET code that will run on the database itself.

SQLCLR is a specialized version of the .NET Common Language Runtime (CLR) that is hosted within the SQL Server 2005 database engine. Unlike the CLR provided by the "regular" .NET Framework that is managed by the operating system, SQLCLR is managed entirely by the database engine, and thus allows SQL Server to continue to do the things it does well, such as manage its own memory and processes and provide security. SQLCLR is also commonly referred to as "CLR Integration."

However, just because you can run complex business logic on the data tier does not mean you always should. The database is an integral and often critical part of any architecture, and you should take precautions to ensure that the server's valuable resources are not misused by tasks that could easily be offset to a middle tier.

SQLCLR becomes valuable, though, in cases where a large amount of "throw-away" data would normally be transferred across the network for processing by business logic (because you eliminate the bottleneck caused by the network), or where you need to use functionality that is already provided by the .NET Framework but would be difficult or impossible to re-create using Transact-SQL alone.

Comparison to Transact-SQL
SQL Server's native language, Transact-SQL (T-SQL), is an interpreted language that in itself never actually compiles into native code. It is optimized for set-based data operations (for example, one action that is applied to multiple rows of data at one time instead of individually), and has direct access to the data store through the Query Processor.

Now, a lot of database folks might argue that T-SQL does compile when you save it or before a batch of dynamic SQL executes. There are some important compile-like actions that take place during this time, such as syntax checking, resolving references to database objects, and creating query execution plans. But my point is that T-SQL itself remains to be interpreted at run time, which becomes particularly pertinent when comparing T-SQL expressions that do not perform data access to their SQLCLR counterparts.

Despite (or perhaps because of) all its optimizations, T-SQL is not an all-purpose programming language. For example, T-SQL does not support object-oriented programming concepts, such as interfaces, classes, or inheritance, nor does it support language constructs that are normally taken for granted, including arrays, collections, for-each loops, and bit-shifting (you do take bit-shifting for granted, don't you?).

On the other hand, code that is executed by the CLR (managed code) is first compiled into Microsoft Intermediate Language (MSIL). At run time, segments of MSIL are Just-In-Time (JIT) compiled into native machine code just prior to execution. The resulting native code is cached for a period of time, so subsequent calls to the same routine will not need to be JIT compiled.

Managed code is created using modern-day languages, such as Visual Basic .NET and C#, and these languages support all the modern-day programming constructs. Also, an incredible amount of functionality is already provided by the .NET Framework's Base Class Library (BCL), and can readily be used by managed code.

This might sound like SQLCLR is a prime candidate to fully replace T-SQL, but here's the catch: SQLCLR does not provide direct access to the data store. So, in order to perform data operations, managed code must make use of ADO.NET to execute T-SQL. This is slower than using T-SQL directly, but faster than running the same code on another machine because the data does not need to be transferred across a network.

Enabling CLR Integration
Following the new mantra of being "Secured by Default," SQL Server 2005 does not ship with SQLCLR enabled. Therefore, a DBA must explicitly enable CLR Integration before it can be used.

There are two ways to accomplish this task. The first, and probably the easiest for non-database folks, is to use the SQL Server Surface Area Configuration tool (see Figure 1):

  1. Run the tool from the Start menu.
  2. Select "Surface Area Configuration for Features."
  3. Locate the instance of SQL Server that you would like to modify.
  4. Click on "CLR Integration" from the "Database Engine" node.
  5. Check the "Enable CLR Integration" checkbox.
  6. Click on the OK button.

The second method, and probably preferred by hardcore database users who tend to live inside SQL Server Management Studio, is to execute a few T-SQL commands (as a user that is a member of the sysadmin role):

SP_CONFIGURE "clr enabled", 1 
GO 
RECONFIGURE 
GO 

Once CLR Integration is enabled on your server, you need to import your Assembly into the SQL Server database before you can use it. You can do this manually through SQL Server Management Studio, or you can let Visual Studio 2005 do the heavy lifting for you.

To manually import an assembly, you need to first navigate to the Programmability -> Assemblies branch of your database in the Object Explorer tree (see Figure 2). Then, right-click on the Assemblies folder and choose "New Assembly." You'll see a dialog that allows you to select an assembly and set properties (see Figure 3).

Of particular interest on this form is the Permission Set dropdown. This is part of Code Access Security (CAS), which is used to prevent unknown code within the assembly from performing a privileged action. SQL Server provides three built-in sets of permissions that the assembly (as a whole) can run under:

  • Safe: Most restrictive policy. Only internal computations are allowed.
  • External Access: Same as Safe, with the additional capability to access the network, local files, and the registry.
  • Unsafe: Unrestricted access to resources (internal and external to SQL Server) as well as the ability to call unmanaged code.

(Note: Be sure to read the next section on importing unsigned assemblies using the External Access and Unsafe permission sets.) Clicking on OK imports the assembly into the database and displays it in the Object Explorer's list of assemblies.

If you are writing your own managed code using Visual Studio 2005, the easier way to get your assembly into the database is to use the Deploy feature. First, create a new project for your SQLCLR code using the "SQL Server Project" template (see Figure 4). Next, select an existing (or create a new) database reference when prompted (see Figure 5). The resulting project will have the necessary references established, and will also be set to deploy the assembly to the selected database.

If you then view the project's properties (see Figure 6), you'll see a "Database" section that contains the Connection String, as well as a dropdown list for the Permission Level (which you can change to Safe, External Access, or Unsafe, according to the needs of your project).

To deploy your assembly, simply choose "Deploy {projectname}" from the Build menu. Subsequent deployments will first drop any objects that were automatically created as part of the previous deployment, import the assembly, and then re-create the various SQL objects found in your project.

Additional Security Precautions for Unsigned Assemblies
In your development environment, you will likely be producing an unsigned assembly. Before the assembly can be imported using the External Access and Unsafe permission sets, the database administrator must perform a few additional actions.

First, the database itself must have the Trustworthy property turned on in order to tell the server that it can trust the database and its contents. You can do this by executing this T-SQL DDL (substitute your database's name for 'adventureworks'):

alter database adventureworks set trustworthy on

Next, the user login performing the import must be granted the "External Access Assembly" and/or "Unsafe Assembly" permission at the server level. From SQL Server Management Studio, you can right-click on the server node in the Object Explorer, and choose Properties. The "Permissions" page will show a list of roles and logins, and the explicit permissions for each selected login (see Figure 7). Grant "External access assembly" and/or "Unsafe assembly" according to your needs.

Scalar Functions
A scalar function is one that accepts parameters, executes some logic, and then returns a single scalar value. This concept is not new to SQL Server, for it's quite possible to create scalar functions using T-SQL alone. However, functions written in managed code have the advantage of being JIT compiled into native machine code, and can also take advantage of a vast amount of pre-written (often complex) functionality provided by the BCL. This allows for the creation of functions that were either difficult or impossible to do using T-SQL alone, such as cryptography or complex string manipulation using regular expressions.

You write a scalar function in .NET as a static method of a class that returns a scalar type that can be converted to one of SQL Server's native scalar types. Unlike some other SQLCLR objects, a scalar function is not required to be annotated with a SQL attribute. However, if you are using Visual Studio 2005 to deploy your assembly, you can use this attribute to automate the creation (wiring-up) of the scalar function during deployment:

Microsoft.SqlServer.Server.SqlFunctionAttribute

As a simple real-world example, suppose that as part of analyzing sales data, you have the need to calculate the percentage of the 29.5-day lunar cycle for a given date/time value. That is, 0% represents the exact moment of the New Moon, 50% would be the Full Moon, and 99.9% would be just prior to the New Moon (which starts the next cycle).

One way to write this function in C# is to use the code in Listing 1. Compile and deploy the assembly, after which you can use the function just like any other scalar function as part of a query:

SELECT OrderDate, 
       SubTotal, 
       dbo.MoonPhase(OrderDate) 
         AS PctOfCycle
FROM   Sales.SalesOrderHeader

The results are shown in Listing 2.

Note: If you are manually importing the assembly into the database using SQL Server Management Studio, you also need to manually execute a CREATE FUNCTION statement to explicitly wire-up SQL Server to the function in the assembly (see Listing 3).

In this case, the T-SQL function signature matches the C# function signature (with equivalent native T-SQL data types), and the EXTERNAL NAME part of the DDL contains the Namespace, Class Name, and Method Name of the function from the assembly.

The Workhorse of SQL Server
Stored procedures are the workhorse of SQL Server. These are T-SQL programs that can perform a series of operations, and can return data back to the caller in four different ways: return values, output parameters, messages, and tabular data.

You create a stored procedure in .NET as a static method of a class that either returns void or an integer. This return value is accessible to the caller as the @return_status output parameter (ADO.NET). Like the scalar function, SQLCLR stored procedures are not required to be decorated with a SQL attribute. However, if you are using Visual Studio 2005 to deploy your assembly, you can use this attribute to automate the creation (wiring-up) of the stored procedure during deployment:

Microsoft.SqlServer.Server.SqlProcedureAttribute

Output parameters are useful for returning discrete non-integer values back to the calling application after the stored procedure is finished executing. These are usually used to convey some sort of state information that is secondary to whatever action the stored procedure is performing.

When you use output parameters, your calling code will initialize a value and pass it into the stored procedure as a parameter marked using direction output. The stored procedure can use this value just like any other type of input parameter, and any changes to the variable's value within the stored procedure will be passed back to the calling code.

In the stored procedure code, you must indicate that the output parameter is to be passed by reference (in C#, use the ref or out keyword; in VB, use the ByRef keyword).

Suppose you need a stored procedure that will conditionally log a message if a provided address is a microsoft.com host. The stored procedure accepts a message string and an address string as parameters. If the supplied address is an IP address (determined by a regular expression), then the stored procedure will attempt to perform a reverse DNS lookup to obtain a fully qualified hostname before checking whether the hostname ends in "microsoft.com". The caller will need to have access to the resulting DNS lookup data, so the address parameter is an output parameter (see Listing 4).

Note: While this is a good example of something you can do in SQLCLR that you can't do using T-SQL alone, it also demonstrates how you could misuse SQLCLR by performing tasks on the database that are better performed in middle-tier logic. In this case, the hostname resolution is probably not an appropriate use of the database's CPU time.

Once you deploy the stored procedure (using the External Access permission set, because the code will need to access the network to resolve the hostname), you can execute it from other .NET code using an ADO.NET Command object, or even invoke it entirely through T-SQL (see Listing 5). Here are the results:

tide150.microsoft.com
1

Note: As with the scalar function, if you are manually importing the assembly into the database using SQL Server Management Studio, you also need to manually execute a CREATE PROCEDURE statement to explicitly wire-up SQL Server to the method in the assembly (see Listing 6).

Output parameters provide a mechanism for you to return discrete values from your stored procedures, but they do not lend themselves to returning streams of data back to your calling code through the open Connection object. SQLCLR, however, does provide a mechanism to do just this by means of the SqlPipe object (accessed through the SqlContext.Pipe property).

For example, to send text to the caller (equivalent to the T-SQL Print command), you would use the SqlPipe.Send(string) method (see Listing 7). The Send() method has several overloads, and you can use it to return tabular data to the caller through a DataReader object instead of a string (see Listing 8).

Take particular note of this example, because a few key concepts are introduced.

First, notice the ConnectionString. Because this code is executing on the SQL Server, and in fact is executing within the context of a given database where the assembly has been deployed, there is no need to use an elaborate ConnectionString that lists the server and database names (in fact, in the interest of reuse, we wouldn't want to do that anyway). Simply use "context connection=true" to access the current database.

Next, take note of what I mentioned earlier: SQLCLR does not have direct access to the data store, and must use ADO.NET in order to execute T-SQL. In this case, the code simply creates a Command object using dynamic SQL, and then invokes the ExecuteReader() method to create a DataReader object.

Finally, you pass the live (unread) DataReader to the SqlPipe.Send() method, and the caller will receive the tabular data in the exact same way as if it had executed the SQL statement itself (in other words, there is no distinction between what this stored procedure will return and what would be returned by a T-SQL stored procedure that simply executes the SELECT statement).

This article only begins to explore the functionality offered by CLR Integration. Besides scalar functions and stored procedures, you can also use SQLCLR to create table value functions, triggers, user-defined aggregates, and user-defined types (the last two do not have T-SQL equivalents).

Also, in the interest of scope and space, there was some implementation information that was intentionally left out of this article. Be aware that the .NET Framework 2.0 uses Host Protection Attributes (HPAs) to flag the behavior of certain classes and methods. As a result, some functionality provided by the Framework cannot be used in SQLCLR because it could negatively affect performance or reliability (SQL Server will discover these HPAs at run time and raise an error).

The SQL Server Books Online (BOL) serves as a good reference for digging deeper into SQLCLR. There is also a wealth of information on the Internet in the form of MSDN whitepapers, blog posts, podcast interviews, and other technical articles (see Resources).

About the Author
Jason Follas has more than 10 years of professional experience developing for the Microsoft platform using Visual Basic and C#. He is a Technical Architect for Perficient Inc., serving the Toledo and Detroit markets, and a Microsoft Certified Professional (MCAD, MCSD). Jason was recently appointed President of the Northwest Ohio .NET User Group.

comments powered by Disqus

Featured

Subscribe on YouTube