In-Depth

Using SQL Bulk Insert with the .NET Framework

SQL Bulk Insert has been tuned over the years to work well with the Microsoft .NET Framework. This tutorial shows you how to take advantage of its power.

SQL Bulk Insert is not a new technology. In fact, I've used it since SQL Server 6.5., when it was a very cumbersome implementation from Microsoft that required a great deal of preparation of the Bulk Copy Program, otherwise known as BCP.

To use BCP, developers and database administrators were required to set up elaborate BCP instruction files that outlined all the BCP flags, columns and data types. Additionally, the error handling was limited to outputting the results to a log file. Today's use of BCP -- or SQL Bulk Insert, as it's referred to now -- is much simpler and tightly integrated with the Microsoft .NET Framework. The migration of the Bulk Insert process to the integrated coding environment has opened many more effective uses of Bulk Insert to .NET developers. The goal of this article is not just to show developers how to use the SQL Bulk Insert component within the .NET Framework, but to show how it can be effectively integrated into an automated system to greatly improve flexibility, code reuse, speed and efficiency.

Get Organized
The first step may sound obvious, but many developers rush to the final step of a solution. This results in wasted time and unorganized code. In this article I'll use the following components:

  • Generic collections, found in the System.Collections.Generic namespace
  • A DataObject class with custom attributes
  • SqlBulkCopy, found in the System.Data.SqlClient namespace

Why Use Generics
I'm sure many of you will ask, "Why use a generic list?" The main reasons are for type safety, usability, supportability and flexibility. In the ever-changing world of business-focused information technology, I always strive to design these key attributes into my code at the beginning; it's not simply enough to create a DataTable from a data provider. However, when a custom DataObject class and custom attributes are used, these components can be reused and offer much more flexibility.

Use Case
When it comes to getting your data, I'll leave that part to you. In this scenario, consider that a generic list of data objects are loaded from another database where users have requested a large amount of data to be copied to a separate database. In Listing 1, I created a simple DataObject class with some basic properties. Each of these properties should correspond to the columns you're pushing to your database. It's important to consider your data types at this step, because the Bulk Insert process is sensitive to incompatible data types.

Listing 1. A simple DataObject class for a generic list.

public class DataObject {
  public Int32 ID { get; set; }
  public String Description { get; set; }
  public Int32 UseCode { get; set; }
  public Boolean Active { get; set; }
  public DateTime UpdateDate { get; set; }
}

Use a DataAdapter fill a DataTable from a basic SQL statement, like this:

SELECT ID, Desc, UseCode, Active, UpdateDate FROM SomeTable

The key to the flexibility and reuse of this code is the use of the DataObject class and a generic list. Listing 2 demonstrates the creation of this list from a DataTable. This is where that additional logic can be added to handle business requirements when transforming the data between sources. In this scenario, I had to remove leading spaces in the description.

Listing 2. Filling the DataObject list.

public List<DataObject> FillDataObjectList( DataTable dt ) {
    try {
      List<DataObject> myList = new List<DataObject>();
      foreach ( DataRow row in dt.Rows ) {
        DataObject myObject = new DataObject();
        myObject.ID = Convert.ToInt32( row[ "ID" ] );
        myObject.Description = row[ "Desc" ].ToString();
        myObject.UseCode = Convert.ToInt32( row[ "UseCode" ] );
        myObject.Active = Convert.ToBoolean( row[ "Active" ] );
        myObject.UpdateDate = Convert.ToDateTime( row[ "UpdateDate" ] );
        // The description has leading spaces so remove them.
        myObject.Description = 
        myObject.Description.TrimStart();
        myList.Add( myObject );
    }
    return myList;
  }
  catch ( Exception ex ) {
    throw ex;
  }
}

Before we pass this object to our Bulk Insert logic, it's useful to expand on our DataObject to include some custom attributes. The use of these attributes, shown in Listing 3, allows the subsequent Bulk Insert code to read our object and adapt to changes in the object, or allow other objects with the same custom attributes. I won't go into details about implementing custom attributes because there are plenty of samples publicly available. Additionally, the downloadable code for this article contains all the samples needed.

The changes to the DataObject class in Listing 3 are in bold. Each property is assigned a custom attribute that reflects the column name in the destination table. Additionally, the class itself has a custom attribute that reflects the destination table's name.

Listing 3. A DataObject class with custom attributes for a generic list.

[DataTable( "SomeTable" )]
public class DataObject {
  [DataColumn( "ID", DbType.Int32 )]
  public Int32 ID { get; set; }
  [DataColumn( "Desc", DbType.String )]
  public String Description { get; set; }
  [DataColumn( "UseCode", DbType.Int32 )]
  public Int32 UseCode { get; set; }
  [DataColumn( "isActive", DbType.Boolean )]
  public Boolean Active { get; set; }
  [DataColumn( "UpdateDate", DbType.DateTime )]
  public DateTime UpdateDate { get; set; }
}

Set Up the SqlBulkCopy
Now that I have the DataObject class set up with custom attributes for the column names, data types and destination table name, and have loaded the generic list of data objects from the source, I'm ready to set up the SqlBulkCopy method. The key to this implementation as compared to others is the use of these custom attributes. The following code shows that the method signature will take a list of data objects and a list of property names (note that the property name list is critical to loading the Bulk Insert object, so add your properties in the order that you've set them in your destination table):

public Boolean BulkInsertDataObjectList<T>( 
  List<T> myDataObjectList, 
  List<String> propertyList ) {

Within the BulkInsertDataObjectList method are two crucial blocks of code that will use the custom attributes. The first one, creating the dynamic DataTable shown in the following code, loops through the passed property list to dynamically create the data table and find the corresponding property in the DataObject:

PropertyInfo[] props = typeof( T ).GetProperties();
for ( int i = 0; i == propertyList.Count - 1; i++ ) {
  DataColumnAttribute col = 
  GetColumnNameFromProperty<T>( propertyList[ i ] );
  dt.Columns.Add( col.ColumnName, col.ColumnType );
}

The next step in this dynamic process is to load the data table. A simple loop is needed to create data rows for each data object in the list that was passed in:

foreach ( T rec in myDataObjectList ) {
  DataRow row = dt.NewRow();
  for ( int x = 0; x == propertyList.Count - 1; x++ ) {
    row[ x ] = typeof( T ).GetProperty( propertyList[ x ]
    ).GetValue( rec );
  }
  dt.Rows.Add( row );
}

To truly achieve a flexible design, incorporating custom attributes allows the SQL Bulk Insert method to "learn" what is required to correctly insert the data. Now you're able to pass any object to this method as long as it has its table and column attributes set.

SqlRowsCopied Event
The Bulk Insert component has a useful event that should not be overlooked: the SqlRowsCopied event. This is a valuable event to provide feedback to logs or perhaps to a UI from which a user may have kicked off the process. This event, when paired with the NotifyAfter property, will raise the number of rows inserted to any subscribers throughout the entire insert process. I've implemented this for log tracking as well as UI progress meters:

bulkCopy.NotifyAfter = 10000;
bulkCopy.SqlRowsCopied += new 
SqlRowsCopiedEventHandler
  (bulkCopy_SqlRowsCopied );

When it comes to inserting data quickly and with minimal impact to server performance, SQL Bulk Insert on average can load a database table in seconds. In my test runs, I was able to insert 700,000 records in less than 20 seconds.

High Performance and Flexibility
In every project, your design goals should be to maximize performance, flexibility and code reuse. Designing software for the moment just isn't feasible. SQL Bulk Insert is a powerful way to deliver high performance when moving large amounts of data. Designing flexibility into these types of methods will help you in the long run.

About the Author

Erik Bartlow has programmed with Microsoft technologies since 1992. He has specialized in C# and ASP.NET for the past eight years. Additionally, he has extensive expertise in database design, DHTML and object-oriented programming. Currently with Hewlett-Packard, he's worked in the past with a range of businesses, from small IT consulting firms to large corporations.

comments powered by Disqus

Featured

  • AI for GitHub Collaboration? Maybe Not So Much

    No doubt GitHub Copilot has been a boon for developers, but AI might not be the best tool for collaboration, according to developers weighing in on a recent social media post from the GitHub team.

  • Visual Studio 2022 Getting VS Code 'Command Palette' Equivalent

    As any Visual Studio Code user knows, the editor's command palette is a powerful tool for getting things done quickly, without having to navigate through menus and dialogs. Now, we learn how an equivalent is coming for Microsoft's flagship Visual Studio IDE, invoked by the same familiar Ctrl+Shift+P keyboard shortcut.

  • .NET 9 Preview 3: 'I've Been Waiting 9 Years for This API!'

    Microsoft's third preview of .NET 9 sees a lot of minor tweaks and fixes with no earth-shaking new functionality, but little things can be important to individual developers.

  • Data Anomaly Detection Using a Neural Autoencoder with C#

    Dr. James McCaffrey of Microsoft Research tackles the process of examining a set of source data to find data items that are different in some way from the majority of the source items.

  • What's New for Python, Java in Visual Studio Code

    Microsoft announced March 2024 updates to its Python and Java extensions for Visual Studio Code, the open source-based, cross-platform code editor that has repeatedly been named the No. 1 tool in major development surveys.

Subscribe on YouTube