Code Focused

Database Synchronization with the Microsoft Sync Framework

Use the Microsoft Sync Framework to synchronize data changes between SQL Server 2008 Standard, SQL Server 2008 Express and SQL Server Compact Edition 3.5.

Despite the availability of wireless and cellular networks, many applications must support data presentation and collection services while offline or in remote offices. The Microsoft Sync Framework is a highly flexible framework for synchronizing files and data between a client and a master data store. With great flexibility often comes complexity and confusion, however.

Last spring, I lead the Allen Park Neighborhood Center (APNC) team at the annual Lansing Michigan Give Camp during a very sleep-deprived weekend. At the event, more than 100 developers and designers donated their time to create a dozen Web sites and database applications for local charities. The APNC has a service program in which volunteers canvas the entire neighborhood every spring. They survey residents on numerous community health and well-being measures and then work to connect people with available services. The APNC needed a way to collect these surveys while out in the neighborhood and then synchronize the data back to the SQL Server database.

This type of scenario occurs at many businesses and non-profit organizations. A central SQL Server 2008 Standard (or Enterprise) instance will synchronize to multiple SQL Server 2008 Express instances at remote locations, perhaps connected over slower network lines, with the need to take the data offline on laptops or tablets at each remote location. Once the data is updated on the laptop, it needs to be synchronized with the SQL Server 2008 Express instance, and the Express instance is then synchronized with the central SQL Server 2008 Standard/Enterprise instance. The code in this article is a sample database synchronization library for this common scenario - the need to synchronize SQL Server between server editions and offline databases. The sample application is written as a hybrid C# and Visual Basic client.

Synchronization might appear relatively straightforward, but it can quickly become complicated and problematic, driving the need for a tool such as the Microsoft Sync Framework. The Microsoft Sync Framework 2.1 Software Development Kit (SDK) is available here.

Developer Challenges
Synchronization typically refers to file or database row-level synchronization. Most of us have used applications with offline synchronization such as Microsoft Outlook or Live Mesh, or third-party applications such as Dropbox or Evernote. These applications work well and can lull us into believing that synchronization is easy, when it's quite the opposite. Synchronization is hard for the developer, particularly when multiple nodes are independently updating a master database.

How can records be created at each node so that primary keys remain truly unique across all nodes? If multiple nodes attempt to update the same row, how are physical and logical conflicts resolved? The obvious conflict is if two nodes modify the same record. Often, both updates are permitted, if they impact different columns. But what if one node updates the first name and another node updates the last name in the same row? It's not a physical column conflict, but it's reasonable to believe that the nodes were not referring to the same person. Therefore, the updates present a logical conflict. These conflicts need to be detected and addressed.

Synchronization loops can occur in poorly designed architectures when one node synchronizes a change with one or more nodes. The same update is ultimately received back by that node as new data from another node, which is unaware that the receiving node already has the update. This problem is avoided by carefully defining the synchronization topology; a hub and spoke arrangement is often used to prevent loops.

Each table to be synchronized must have a unique primary key for each row. New records must generate primary key values that are unique across all nodes of the master database. If the schema for all nodes is exactly the same, the commonly used Identity Seed integer primary keys can cause problems. Each node would generate the same ID value upon its next insert. GUIDs are an obvious choice for guaranteed unique keys, but they're difficult to work with and their 16-byte size can bloat the database. The recommended option is to create a composite primary key for each table, consisting of a constant Node ID (per node), plus the Identity Seed integer column. This method is used in the tables of the database included in the accompanying code download, which can be accessed at VisualStudioMagazine.com/Kunk0612. (For more information, see the MSDN Library page, "Selecting an Appropriate Primary Key for a Distributed Environment.")

Tracking Tables Method
In order for any SQL database to synchronize with an offline application, both the remote and master databases need the ability to track changes made to the database between synchronization actions. The Microsoft Sync Framework providers currently support the Tracking Tables method.

SQL Server 2008 offers two options for collecting change tracking information within the database engine, using built-in features that offer reliability and performance advantages over tracking tables. I'll briefly introduce these options, but they're not fully compatible with the Microsoft Sync Framework database providers at this time. They can be used by writing a custom synchronization provider (see the MSDN Library page, "Synchronizing Data Stores by Using Custom Providers").

In order for SQL Server to track the changes made to the database, the database is modified to add a table with the _tracking suffix for each table to be synchronized, plus the three tables (schema_info, scope_config and scope_info) in a process called provisioning the database.

Figure 1 shows the tables in the Neighborhood_TrackingTables database after it has been provisioned. The database originally had four tables named Neighbor, Question, Response and Street. The provisioning process also adds three triggers to each original table as shown under the Street table in Figure 1.


[Click on image for larger view.]
Figure 1. Database, with tracking tables, for a database that originally had four tables.

The Tracking Tables method has the obvious disadvantage of requiring schema modifications to the database, which might conflict with existing schema triggers or not even be possible, in some cases. Despite these potential disadvantages, it's the only tracking method supported out of the box by the Microsoft Sync Framework.

SQL Server 2008 Change Data Capture has the advantage of automatically maintaining parallel tracking tables with change metadata, and filling these tables via asynchronous threads that parse the database logs to populate the tracking tables. This approach provides very low impact to core database performance. It also provides a full history of all data changes made - and not just the net result of those changes - making it useful for populating data warehouses and data marts. This tracking feature is available only in the Enterprise edition of SQL Server 2008.

Change Tracking
All versions of SQL Server 2008, including SQL Express, have the Change Tracking feature, which can be used in the Microsoft Sync Framework with a custom provider. Change Tracking differs from Change Data Capture in that it doesn't maintain a history of discrete changes. Using Change Tracking to track changes offers advantages over tracking tables:

  • It isn't necessary to create triggers, timestamp columns, or any other additional columns or tables.
  • Changes are tracked at commit time rather than as the rows are modified, for greater reliability and performance (compared to triggers).
  • Change Tracking data can be given an automatic expiration time span, preventing bloating of the database.

SQL Server 2008 Change Tracking isn't enabled by default. Queries for changes should always use snapshot isolation transactions in order to have consistent results. The T-SQL to enable snapshot isolation transactions and Change Tracking with a two-day expiration of tracking data is shown here (note that the expiration period should be a multiple of the expected synchronization frequency to ensure changes are not lost before the next synchronization):

ALTER DATABASE {database name} SET ALLOW_SNAPSHOT_ISOLATION ON 
ALTER DATABASE {database name} SET CHANGE_TRACKING = OFF
ALTER DATABASE {database name} SET CHANGE_TRACKING = ON (CHANGE_RETENTION = 5 DAYS, AUTO_CLEANUP = ON) 

Change Tracking must be enabled for each table to be tracked. Only tables that are actively synchronized should have Change Tracking enabled. Databases have numerous tables, so I recommend using the following query to generate an ALTER statement string for each table (then copy the ALTER statements for the desired tables to a new query window to be executed):

SELECT 'ALTER TABLE ' + name + ' ENABLE CHANGE_TRACKING'
FROM sys.tables WHERE [type] = 'U' AND name <> 'sysdiagrams'

For more information on Change Tracking in SQL Server 2008, see the MSDN Library pages, "Enable and Disable Change Tracking" and "Obtaining Changes by Using the Change Tracking Functions".

Synchronization Application
In order to make adding Microsoft Sync Framework synchronization to an application as easy as possible, I created a custom class library to perform the synchronizations. In the common APNC scenario, you'd write your application to update the SQL Server Standard/Enterprise database when running local to that server, or the SQL Server 2008 Express database when in a remote office, or the SQL Server Compact Edition database when offline.

In each case, the application functions as a standard database maintenance application with little concern for the synchronization process. The custom library DoSync function is invoked when the databases need to be synchronized as a separate application function. The DoSync method resides in the SynchronizationHelper class and uses a custom DatabaseSyncInfo class to hold the needed configuration settings.

The database synchronization providers are found in the Microsoft.Synchronization.Data namespace. SQL Server 2008 Standard and SQL Server 2008 Express use the same SqlServer.SqlSyncProvider class. The database synchronization provider for SQL Server Compact Edition version 3.5 uses the SqlServerCe.SqlCeSyncProvider class. Because the SQL Server 2008 Standard and Express versions both use the same database synchronization provider, the library supports the bidirectional synchronization scenarios shown in Table 1 .

Local Database Remote Database
SQL Server 2008 Standard or above SQL Server 2008 Standard or above
SQL Server 2008 Standard or above SQL Server 2008 Express
SQL Server 2008 Standard or above SQL Server Compact Edition 3.5
SQL Server 2008 Express SQL Server 2008 Standard or above
SQL Server 2008 Express SQL Server 2008 Express
SQL Server 2008 Express SQL Server Compact Edition 3.5

Table 1 Bidirectional synchronization scenarios available with the custom library.

The application assumes the local database has all the tables and data that need to be synchronized. For the first synchronization, it's best if the remote database is an exact copy of the local database, but it doesn't need to be - it can be empty or a copy of the local database with modified data.

At each synchronization, DoSync will create tracking tables in the local database and the base and tracking tables in the remote database as needed, and then synchronize the data between both databases.

In Figure 2, the SyncSQL2008_Standard_Express_Compact35 project is a C# project. That's because the project is an enhanced composite of several sample projects written in C#, which are available at the Microsoft Sync Framework Development Center. Keeping the code in C# makes it easier to update the library to include new functionality from other samples.


[Click on image for larger view.]
Figure 2. The custom synchronization library is referenced by a Visual Basic project in Solution Explorer.

The Visual Basic project DBSync_SQLServer demonstrates instantiating the DataSyncInfo class with the synchronization settings and calling DoSync. Sample code for calling DoSync from Visual Basic is shown in Listing 1.

The DatabaseSyncInfo object needs to have the local and remote database connection strings and driver types, the synchronization ScopeName of "Neighborhood" and a comma-delimited list of the tables to be included in the synchronization scope - in this case, SyncTables. The ScopeName represents the set of objects to synchronize in the database. The library can synchronize different portions of a database with different ScopeNames depending on the needs of the application.

By default, when testing your app in Visual Studio under the Debug configuration, your SQL Server Compact Edition database file is in the bin\debug folder, not the root folder of your application. Keep this in mind when connecting to your SQL Server Compact Edition data- base to review results during testing.

To make referencing the Microsoft Sync Framework assemblies easier, they've all been copied into an Assemblies folder in the C# project.

Database Synchronization Project
My interest in database synchronization began with the APNC charity project. The application tracks "interactions," which is any contact with a neighborhood member. Information on the quantity and quality of interactions is used to solicit funding for various APNC programs. If you're interested in using your .NET development skills to assist APNC in its community services, the project is located on github at github.com/JoeKunk/Interactions.

Database synchronization is difficult, but the Microsoft Sync Framework goes a long way to hide much of the complexity. To implement SQL Server database synchronization between SQL Server editions (including SQL Server Compact Edition), simply reference the SyncSQL2008_Standard_Express_Compact35 project in your Visual Basic or C# application and instantiate SynchronizationHelper with a properly initialized instance of DatabaseSyncInfo for settings information.

My hope is that the Sync Framework team releases an update to the database providers that fully supports SQL Server Change Tracking, in order to eliminate the need to modify the database schema to implement database synchronization. Good luck in your database synchronization adventures!

comments powered by Disqus

Featured

Subscribe on YouTube