VSM Cover Story
Optimize LINQ to SQL Performance
Persisting business objects to SQL Server 2005 or later with object/relational
mapping with LINQ to SQL can exact a performance penalty. Learn how to take maximum advantage of LINQ's strongly typed query capabilities without
overtaxing the database server or losing front-end agility.
Technology Toolbox: VB .NET, SQL Server 2005 SP2, Other: Visual Studio 2008 Beta 2 or later.
Language Integrated Query (LINQ) makes its formal debut in the RTM versions of .NET Framework 3.5 and Visual Studio 2008 with initial LINQ to Objects, LINQ to SQL, LINQ to DataSets, and LINQ to XML implementations. It's a good bet that Microsoft's first Object/Relational Mapping (O/RM) tool to make it past the beta stage, LINQ to SQL, will also be first on most .NET developer's test-drive lists.
LINQ to SQL's graphical O/R Designer simplifies defining entity objects that your project persists in SQL Server 2000+ relational tables. LINQ to SQL's ability to deliver business objects with strongly typed properties from a high-performance data access layer (DAL) that needs little developer-added code is the primary incentive for considering its use in data-intensive .NET projects.
Strongly typed DataSets incur significant overhead when your projects fill TableAdapters from and SaveChanges to relational tables; similarly, LINQ to SQL engages in substantial manipulation of raw SqlClient.DataReader rows when hydrating and dehydrating (persisting) entities. Moving to business objects also introduces the concept of object associations, which represent primary-key/foreign-key relationships between the persisting tables. If your project needs a single property value from an associated entity in a many:one association (m:1, called an EntityRef), it must load the entire instance. If you need to retrieve an entity from a one:many association (1:n, called an EntitySet), you usually must load all EntitySet members. Although LINQ to SQL's IdentityManager keeps only a single copy of an entity in memory, retrieving large or intricate objects to populate integer property values or gathering many child entities to get the one you want can result in many server round-trips and levy a serious performance tax. I'll show you how eager loading, lazy loading, and substituting stored procedures for auto-generated dynamic SQL affect data retrieval time. I'll also show you how to minimize round-trips to the server.
LINQ to SQL's claim to fame is its ability to translate strongly typed LINQ query expressions or chains of Standard Query Operators (SQOs) into dynamic T-SQL statements for CREATE, RETRIEVE, UPDATE, and DELETE (CRUD) operations. LINQ to SQL uses complex expression trees to convert LINQ's From Alias As DataSource Where ConstraintExpression OrderBySortExpression Select ColumnProjection to its T-SQL equivalent. For information on how LINQ's expression trees work, check out Bill McCarthy's article, "LINQ Changes How You Will Program" (September 2007). If you're new to LINQ to SQL, see my earlier "Layer Data Access with LINQ to SQL" (July 2007) and "Test-Drive VB9 and DLinq" (April 2006) articles. You can download the VB 9.0 sample code for the four performance test harnesses here.
Manage Associated Entities
ADO.NET DataSets provide an in-memory representation of relational tables and the relationships between the tables. It's a common practice to fill TableAdapters with T-SQL SELECT queries that return the desired rowset from the uppermost member of the relational hierarchy and related records from lower members. For example, if you want to provide all available information in the Northwind database about orders placed by U.S. customers, you load a subset of the rows from the Customers, Orders, and Order Details tables, which have successive 1:n relationships. The SELECT query returns 13, 122, and 352 rows, respectively, for a total of 487 rows. These numbers strike me as typical for multi-row hierarchical queries displayed in grids.
It's highly likely that you'll need all rows from Northwind tables that have a m:1 relationship to the Orders, Order Details, and Products tables because these tables have relatively few rows. So, you load the entire Employees, Shippers, Products, Suppliers, and Categories tables into their TableAdapters. You would probably need a small percentage of the corresponding tables for a large organization, but returning only the needed rows requires joins or subselects that complicate the main SELECT query. This query returns 74 of the 77 rows of the Products table that represent products purchased by U.S. customers:
SELECT DISTINCT P.*
FROM [Products] AS P
INNER JOIN [Order Details] AS D
ON P.ProductID = D.ProductID
INNER JOIN Orders AS O
ON O.OrderID = D.OrderID
INNER JOIN Customers AS C
ON O.CustomerID = C.CustomerID
WHERE C.Country = @Country
ORDER BY P.ProductID
Filling the DataTables requires eight queries -- one for each table -- so the database takes a substantial initial hit, but the DataSet is disconnected until you save changes, if any.
As I mentioned earlier, business objects have associations with child, sibling, and parent objects and collections; these associations correspond to relationships between tables of the underlying persistence data store. LINQ to SQL's top-level DataContext object provides access to the underlying Table<TEntity> objects that you create by dragging table nodes from the Server or Database Explorer to the graphical O/R Designer's surface. The DataContext generates partial classes for a DAL that's similar to a DAL based on a strongly typed DataSet. The primary difference is that, by default, the DataContext doesn't load associated entities from the data store into memory concurrently with the target collection. LINQ to SQL retrieves the entity with a new connection when a databound control needs to display one or more of the entity's properties in a text box or grid cell, or a LINQ query requires their values explicitly. For example, substituting proper names for primary key codes in a grid control usually requires a server round-trip to load an entire EntityRef for each distinct name displayed. Displaying a single child entity in a 1:n relationship requires retrieving the entire related EntitySet with another round-trip. The connections and round-trips generated by this "feature," which LINQ to SQL calls deferred loading and is more commonly called lazy loading, can impose an excessive load on the persistence database, especially if the data isn't cached.
Explore the Eager Loading Option
The alternative to lazy loading is called eager loading, which retrieves the associated EntityRefs and EntitySets along with the target entity. Eager loading requires explicit declaration of the entities to load by creating a new DataLoadOptions object, populating it with one or more LoadWith(Of EntityType)(AssociationLambda) expressions to define what EntitySet or EntityRef to load, and then attaching it as the DataContext.LoadOptions property value. You must attach the DataLoad options before executing the first LINQ to SQL query, which occurs when you set the BindingSource.Data Source property value to the IBindingList<T> object created by invoking the Table<TEntity>.ToBindingList() method. This event handler retrieves Orders and Order_Details collections with the Customers collection:
Private Sub Form_Load(ByVal sender As
System.Object, ByVal e As System.EventArgs)
dcNwind = New NorthwindDataContext
Dim dlOptions As New DataLoadOptions
dlOptions.LoadWith(Of Customer) _
dlOptions.LoadWith(Of Order) _
dcNwind.LoadOptions = dlOptions
bsCustomers.DataSource = _
This snippet loads Customers, Orders, and Order_Details BindingSource components, as well as DataGridView controls with the number of queries and loading times shown in Test 1A, Test 1B, Test 2A, and Test 2B (see Figure 1, and Table 1, below). Eager loading (see Test 2A and Test 2B in Table 1) reduces the number of database round-trips markedly. The OrdersBindingSource.DataSource is the Customer.Orders EntitySet, and the Order_DetailsBindingsource.Data Source is the Order.Order_Details EntitySet.
|Load Time, s.
||ALL ||91 ||830 ||922 ||1.489
|3A ||Orders-Customer-Order_Details ||Eager ||USA ||13 ||122 ||2 ||0.636
|3B ||Same ||ALL ||91 ||830 ||2 ||3.185
|4A ||Stored Procedures ||Lazy ||USA ||13 ||122 ||136 ||0.337
|4B ||Same ||ALL ||91 ||830 ||922 ||2.344
|5A ||Stored Procedures ||Eager ||USA ||13 ||122 ||136 ||0.682
|5B ||Same ||ALL ||91 ||830 ||922 ||2.720
|6A ||Typed DataSet (for comparison) ||N/A ||USA ||13 ||122 ||3 ||0.085
|6B ||Same ||ALL ||91 ||830 ||3 ||0.211
Table 1: Compare DataGridView Loading Times.
The DynamicSQLTestHarness.sln and StoredProcedureTestHarness.sln WinForm projects (see Figure 1) enable measuring LINQ to SQL's data load time, as well as the number of queries with different configurations. The auto-generated dynamic T-SQL queries, eager loaded in Test 2 and Test 3, minimize round-trips to the server. Top-down loading (Test 2) for two EntitySet (1:n) associations combines the fastest response time with a nominal number of round-trips. Reducing the number of queries by loading the Customers grid from an Orders EntityRef (m:1) and the Order_Details grid from an Orders EntitySet minimizes the number of round-trips, but delivers the worst response time; the complex T-SQL query (see Listing 1) produces an unusually large resultset. Stored procedures generate an excessive number of round-trips when processing EntitySet associations. Test 2 delivers the optimum combination of load time and round-trips.
If you have no more than one 1:n relationship in the dlOptions.LoadWith() expressions, you can decrease the number of round-trips to two--one to populate the Customer grid and the other to fill the Order and Order_Detail grids. This snippet illustrates what you need to do to change the query hierarchy so that Order.Customer EntityRef, an m:1 relationship, provides the association between Customer and Order entities. It also ensures that the only 1:n relationship is the one that exists between Order and Order_Detail:
Dim dlOptions As New DataLoadOptions
dlOptions.LoadWith(Of Order) _
dlOptions.LoadWith(Of Order) _
dcNwind.LoadOptions = dlOptions
This change auto-generates a simple SELECT query to load the CustomerDataGridView and a complex query with multiple LEFT OUTER JOINs to return a 32-column resultset with 2,155 rows that populates the associations and loads the Order and Order_Details DataGridViews (see Listing 1). The resultset, which takes more than three seconds for a local SQL Server 2005 instance to deliver, weighs in at 1.3MB of Unicode; that's three times the combined size of the Customers, Orders, and Order Details tables (see Table 1, Test 3B). Settling on a compromise of 92 queries that take a total of about 1.2 seconds to process is probably the best approach in this case (see Table 1, Test 2B).
||Retrieve Time, s.
||Load Time, s.
||Dynamic SQL, No Preload
||Dynamic SQL, Preload Customer,
||Stored Procedure, No Preload
||Stored Procedure, Preload Customer, Employee, Shipper
Table 2: Preload EntityRefs.
The LinqDataSource automates server-side paging of GridViews and other ASP.NET controls that enable or support paging. This table lists timing, and query (round-trip) values are for a 10-line GridView with three columns that contain three EntityRef values from m:1 associations: CustomerID replaced by Customer.CompanyName, EmployeeID by Employee.LastName, and ShipVia by Shipper.CompanyName (see Figure 2). The baseline 17 round-trips consist of one query for the row count, one for the 10 Order rows, seven for Customer.CompanyName, six for Employee.LastName and three for Shipper.CompanyName. Three stored procedures provide the preload entities, which have a different type than the Order rows; therefore, they aren't recognized as valid for replacing key values. This causes the three extra round-trips in Tests 2A and 2B. Stored procedures with preloaded EntityRefs deliver optimum performance. Test 4 provides the best performance.
If your DBAs require stored procedures for all database operations, including data retrieval, you're in for a major performance hit when your project hydrates objects that have EntitySet associations. In a partial class, you can override the functions that generate parameterized prepared statements for execution by the sp_executesql system stored procedure (see Listing 2). But in doing so you lose the principal benefit of eager loading - -a dramatically reduced number of round-trips to the persistence data store -- with stored procedures. However, there's a way out of the performance dilemma for displaying proper names from EntityRefs instead of primary key values to make grids more readable: You can preload the EntityRef objects that supply the values before you populate the grid (see Listing 3). The System.Data.Linq.IdentityManager recognizes that the needed EntityRefs are in memory and doesn't reload them when populating the grids.
In any case, you need to consider the approaches you take with LINQ to SQL carefully. As long as you don't hydrate more objects than you need at one time, and as long as you apply this article's workarounds to minimize server round-trips, LINQ to SQL can compete with other .NET O/R mapping tools on execution speed, data store connections, and resource consumption. You can also take advantage of other techniques, such as compiling LINQ to SQL queries to bypass the expression tree, or using stored procedure workarounds to support the LinqDataSource's server-side paging and sorting features in GridViews. I recommend exploring those options, as well as watching upcoming issues of VSM for more information on these topics. Also, it's important to keep in mind that this article's data and code are for VS 2008 beta 2, so download the sample code and run your own tests with this article's four test harnesses.