Build Big-Data Apps in SQL Azure with Federation -- Visual Studio Magazine

Build Big-Data Apps in SQL Azure with Federation

Get ready to scale out SQL Azure databases beyond today's 50GB limit with the Transact-SQL and ADO.NET elastic sharding features, which are coming in the 2011 SQL Azure Federation Community Technology Previews.

By Roger Jennings
03/01/2011

An urban myth that relational databases and SQL can't achieve Internet-scale of terabytes -- or petabytes -- has fostered a growing "NoSQL" developer community and a raft of new entity-attribute-value, also known as key value, data stores. The Microsoft SQL Azure team gave credence to the myth by limiting the maximum size of the initial Business Edition of the cloud database to just 10GB. SQL Azure adds a pair of secondary data replicas to assure high availability, and the team cited performance issues with replication as the early size-limiting factor.

Today, you can rent a 50GB SQL Azure database for $499.95 per month, but the SQL Azure Team isn't talking publicly about future scale-up options. Instead, Microsoft recommends that you scale out your SQL Azure databases by partitioning them horizontally into smaller instances, called shards, running on individual SQL Azure database instances, and group the shards into federations.

Sharding increases database capacity and query performance, because each added SQL Azure database brings its own memory and virtual CPU. Microsoft Software Architect Lee Novik first described SQL Azure sharding details in his "Building Scale-Out Applications with SQL Azure" session at the Microsoft Professional Developers Conference 2010, held last October in Redmond.

Horizontal partitioning isn't new to SQL Server. Horizontally partitioning SQL Server 7.0 and later tables to multiple files and filegroups improves performance by reducing average table and index size. Placing each filegroup on an individual disk drive speeds T-SQL queries. Partitioning also streamlines backup and maintenance operations by reducing their time-window length.

SQL Server 2005 automated the process with the CREATE PARTITION FUNCTION command, which lets you automatically map the rows of a table or index into specified partitions based on the values of a specified column. You design a CREATE PARTITION SCHEME to determine how to assign partitioned files to filegroups. SQL Server partitioned views make rows of all partitions appear as a single table; distributed partitioned views enable partitioning data across multiple linked servers, not just filegroups, for scaling out. SQL Server 2000 introduced updateable distributed views with distributed transactions, and SQL Server 2000 SP3 used OLE DB to optimize query-execution plans for distributed partitioned views.

A group of linked servers that participates in distributed partitioned views is called a "federation." The partitioning column, whose values determine the partition to which the row belongs, must be part of the primary key and can't be an identity, timestamp or default column.

New Taxonomy
Scaling out SQL Azure with federated database instances follows a pattern similar to that for on-premises SQL Server, but is subject to several important limitations. For example, SQL Azure doesn't support linked servers, CREATE PARTITION FUNCTION, CREATE PARTITION SCHEME, cross-database joins, distributed (cross-database) transactions, OLE DB or the T-SQL NewSequentialID function. These restrictions require architectural changes for SQL Azure federations, starting with this new taxonomy:

Federation consists of a collection of all SQL Azure database instances that contain partitioned data having the same schema. The T-SQL script in "How to Create a Federation with Customers, Orders and OrderItems Tables" shows a T-SQL script to create an Orders_Federation with a schema based on three tables of the Northwind sample database.
Federation Members comprise the collection of SQL Azure databases that contain related tables with partitioned data, called Federated Tables. A Federation Member also can contain replicated lookup tables (Products) that provide supplementary data that's not dependent on the Federation Key.
Federation Key is the primary key value (CustomerID) that determines how data is partitioned among Federated Tables, each of which must contain the Federation Key in their primary key, which can be a big integer (bigint, a 64-bit signed integer) or GUID (uniqueidentifier) data type. For example, the Orders and OrderItems tables have composite primary keys (OrderID + CustomerID and OrderID + ProductID + CustomerID, respectively).
Atomic Unit (AU)is a cluster of a single parent table (Customers) row and all related rows of its dependent tables (Orders and OrderItems). AU clusters can't be separated in the partitioning (sharding) process or when moving data between Federation Members.
Federation Root is the initial database that contains metadata for specifying the Partitioning (sharding) Method, range of valid values for the Federation Key, and minimum/maximum Federation Key value ranges for each Federation Member.
Partitioning Method determines whether the Federation Key is generated by the application or the data tier. For this article's example, the data tier uniqueidentifier data type provides random 128-bit (16-byte) GUID values, which balance additions across multiple Federation Members. Sequential bigint values are easier to read, but require a feature similar to the SQL Server Denali Sequence object to generate identity values that are unique over multiple Federated Tables.

The SQL Azure Team plans to release SQL Azure Federation features in piecemeal fashion starting with a Community Technology Preview (CTP) of version 1 in 2011. The current plan is for the CTP 1 to support partitioning by uniqueidentifier FederationKey values only; a post-CTP 1 drop will add bigint FederationKeys (see Table 1).

Sample Values of a sys.federation_member_columns View for Orders_Federation from SalesDB

The following abbreviated system view specifies partitioning 100,000 Customer records into 50 Federation Members numbered 51 through 100 with sequential bigint Federation Key values starting at 1. If the Federation Root name is SalesDB, the Federation Members will be named SalesDB_51 through SalesDB_100.

federation_id	member_id	federation_key_name	range_low	range_high
1	51	CustomerID	- 9,223,372,036, 854,775,807	2000
1	52	CustomerID	2000	4000
1	53	CustomerID	4000	6000
1	...	CustomerID	...	...
1	100	CustomerID	98000	NULL

For this example, range_low values are inclusive and range_high values are exclusive to the given Federation Member. NULL represents the maximum bigint value +1.

Bigint values range from -2^63 (-9,223,372,036,854,775,808) to 2^63-1 (9,223,372,036,854,775,807) so negative CustomerID values (if present) would partition into SalesDB_51 and values > 100,000 would fall into SalesDB_100. Starting member_id values at 51 enables future sharding with negative Federation Key values to Federation Members SalesDB_01 to SalesDB_50. Random uniqueidentifier (GUID) values require accommodating the entire domain of a 128-bit integer, 00000000-0000-0000-0000-000000000000 to FFFFFFFF-FFFF-FFFF-FFFF-FFFFFFFFFFFF. An advantage of GUID Federation Keys is that they eliminate hotspots in a shard holding recently entered records. Note that this information is preliminary and might change before the version 1 CTP releases.

New T-SQL Key Words for Scripting SQL Azure Federation
The following new T-SQL syntax will support SQL Azure Federations in the version 1 CTP (see "How to Create a Federation with Customers, Orders and OrderItems Tables"):

CREATE FEDERATION Federation_Name(Federation_Key_Name RANGE uniqueidentifier)
DROP FEDERATION Federation_Name
CREATE TABLE Table_Name FEDERATE ON (Federation_Key_Name) 
USE FEDERATION Federation_Name (Fed_Key_Value) WITH FILTERING=ON
USE FEDERATION Federation_Name (Fed_Key_Value) WITH FILTERING=OFF
ALTER FEDERATION Federation_Name SPLIT AT(Fed_Key_Value)
ALTER FEDERATION Orders_Federation DROP AT(Fed_Key_Value)

How to Create a Federation with Customers, Orders and OrderItems Tables

Here's a preview of a T-SQL script to create a traditional (Northwind-style) sample sales orders schema with federated Customers, Orders and OrderItems tables and a

Products reference table (not federated) for sharding. The script has been tested (with Federation-specific and SQL Azure-specific keywords commented) in SQL Server 2008 R2 Management Studio:

-- Connect to 'master'. 
CREATE DATABASE SalesDB (EDITION='BUSINESS', 
  MAXSIZE=50GB)

-- Connect to 'SalesDB' and create Federation.
CREATE FEDERATION Orders_Federation (RANGE 
  uniqueidentifier)
GO

/* Connect to 'Order_Federation' federation member 
    covering CustomerID = 0. */
USE FEDERATION Orders_Federation(00000000-0000-0000-
  0000-000000000000) WITH RESET
GO

-- Create Customers
CREATE TABLE Customers(
  CustomerID uniqueidentifier NOT NULL,
  CompanyName nvarchar(50) NOT NULL)
  -- ... Additional non-key fields
FEDERATE ON (CustomerID)
GO
   
CREATE CLUSTERED INDEX IX_Customers 
  ON Customers(CompanyName)
ALTER TABLE Customers ADD CONSTRAINT PK_Customers 
  PRIMARY KEY NONCLUSTERED (CustomerID)
GO
-- Create a federated Orders table
CREATE TABLE Orders(
  CustomerID uniqueidentifier NOT NULL,
  OrderID uniqueidentifier NOT NULL,
  OrderDate datetime NOT NULL)
  -- ... Additional non-key fields
FEDERATE ON (CustomerID)
GO

/* Note that CustomerID, the federation key, must be
   part of all unique indexes but not non-unique 
   indexes. SQL Azure requires a clustered index on 
   each table, but clustered indexes on uniqueidentifier
   columns cause a performance hit from page splitting.
*/
CREATE CLUSTERED INDEX IX_OrderDate 
  ON Orders(OrderDate)
ALTER TABLE Orders ADD CONSTRAINT PK_Orders 
  PRIMARY KEY NONCLUSTERED (OrderID, CustomerID)
GO

/* Create a reference (replicated, not federated) 
  Products table. */
CREATE TABLE Products(
  ProductID uniqueidentifier NOT NULL,
  SupplierID uniqueidentifier NOT NULL,
  ProductName nvarchar(50) NOT NULL,
  Package nvarchar(50) NOT NULL)
  -- ... Additional non-key fields
GO

CREATE CLUSTERED INDEX IX_Products_ProductName_Package
  ON Products(ProductName, Package)
ALTER TABLE Products ADD CONSTRAINT PK_Products 
  PRIMARY KEY NONCLUSTERED (ProductID)
GO

-- Create a federated OrderItems table.
CREATE TABLE OrderItems(
  CustomerID uniqueidentifier NOT NULL,
  OrderID uniqueidentifier NOT NULL,
  ProductID uniqueidentifier NOT NULL,
  OrderItemDate datetime NOT NULL,
  -- ... Additional non-key fields
  CONSTRAINT FK_OrderItems_Orders_Customers 
    FOREIGN KEY (OrderID, CustomerID)
    REFERENCES Orders(OrderID, CustomerID),
  CONSTRAINT FK_OrderItems_Products FOREIGN KEY 
    (ProductID) REFERENCES Products(ProductID))
FEDERATE ON (CustomerID)
GO

CREATE CLUSTERED INDEX IX_OrderItemDate ON 
  OrderItems(OrderItemDate)
ALTER TABLE OrderItems ADD CONSTRAINT PK_OrderItems 
  PRIMARY KEY NONCLUSTERED 
  (OrderID, CustomerID, ProductID)
GO

The preceding code was adapted from code found in a posting by Cihan Biyikoglu, Microsoft program manager for SQL Azure. The SQL Azure team had not released the SQL Azure Federation version 1 Community Technology Preview (CTP) at publication. Post-version 1 CTPs are expected to support Federation Keys of the bigint data type. See Biyikoglu's Dec. 11, 2010, posting, "How to scale out an app with SQL Azure Federations ... The Quintessential Sales DB with Customer and Orders", for more information.

-- R.J.

The FILTERING=ON option restricts the visibility of AUs to the specific cluster specified by Fed_Key_Value; FILTERING=OFF makes all AUs in the Federation Member visible for bulk operations. The ALTER FEDERATION ... SPLIT AT Fed_Key_Value instruction lets you balance the size of Federation Members that have grown substantially larger than the average by moving AUs greater than Fed_Key_Value into a new Federation Member. SPLIT AT and DROP AT operations work with AUs exclusively.

The version 1 CTP ADO.NET sharding library will include sample Microsoft .NET Framework-based code for connecting to the Federation Root, which automatically routes the connection to the appropriate Federation Member based on information from the sys.federation_member_columns view, and retrieves AUs as ADO.NET DataSets. Here's the generic code for instructions that open a connection to a Federation Member whose member_id specified by dbname_postfix:

SqlConnection cn = new
SqlConnection("Server=tcp:servername.db.windows.net;" +
  "Db=SalesDB_" + dbname_postfix +
  ";User ID=username;Password=password;" +
  "Trusted_Connection=False;Encrypt=True");
cn.Open();
...

The code in Listing 1 automatically routes the connection to the appropriate Federation Member for a particular FederationKey value by connecting to the Federation Root database and executing a T-SQL USE FEDERATION FederationName(FederationKey) instruction (see "How to Get Data from the Appropriate Atomic Unit in a Federation Member").

How to Get Data from the Appropriate Atomic Unit in a Federation Member

SQL Azure Federations version 1 Community Technology Preview (CTP) will include a sample ADO.NET sharding library and code examples to manipulate federated data. The following procedure creates an ADO.NET DataSet from an Atomic Unit consisting of the Customers record and all Orders and OrderItems records for the customer specified by the customerId parameter. The only code that differs from that for populating DataSets from conventional SQL Azure tables is within the using block in lines 19 and 20:

private void GetData(SqlGuid customerId)
{
  SqlGuid fedKeyValue = customerId;

  // Create a Connection to the Federation Root
  SqlConnection connection = 
    new SqlConnection(@"Server=tcp:sqlAzureURI;
    Database=SalesDB;User ID=mylogin@myserver;
    Password=mypassword")
  connection.Open();

  // Create a DataSet.
  DataSet data = new DataSet();
  // Route to Specific Customer
  using (SqlCommand command = connection.CreateCommand())
  {
    command.CommandText = 
      "USE FEDERATION Orders_Federation(" +
      fedKeyValue.ToString() + ") WITH RESET";
    command.ExecuteNonQuery();
  }
 
  // Populate data from Orders table to the DataSet.
  SqlDataAdapter masterDataAdapter = 
    new SqlDataAdapter(@"SELECT * FROM Orders WHERE
    (customerId=@customerId1)", connection);
  masterDataAdapter.SelectCommand.Parameters.Add(
    "@customerId1", SqlDbType.UniqueIdentifier);
  masterDataAdapter.SelectCommand.Parameters[0].Value =
    customerId;
  masterDataAdapter.Fill(data, "Orders");
 
  // Add data from the OrderItems table to the DataSet.
  SqlDataAdapter detailsDataAdapter = 
    new SqlDataAdapter(@"SELECT * FROM OrderItems WHERE
    (customerId=@customerId1)", connection);
  detailsDataAdapter.SelectCommand.Parameters.Add(
    "@customerId1", SqlDbType.UniqueIdentifier);
  detailsDataAdapter.SelectCommand.Parameters[0].Value
    = customerId;
  detailsDataAdapter.Fill(data, "OrderItems");
   
  connection.Close();
  connection.Dispose();

  ...
}

The preceding code was adapted from code found in a posting by Cihan Biyikoglu, Microsoft program manager for SQL Azure. The SQL Azure team had not released the SQL Azure Federation version 1 Community Technology Preview (CTP) at publication. Post-version 1 CTPs are expected to support Federation Keys of the bigint data type. See Biyikoglu's Dec. 11, 2010 posting, "How to scale out an app with SQL Azure Federations ... The Quintessential Sales DB with Customer and Orders", for more information.

-- R.J.

At this point, you can issue parameterized select queries with CommandText such as SELECT * FROM Orders WHERE (CustomerId=@customerId). Note that SQL Azure supports USE FEDERATION but not USE DatabaseName instructions.

.NET projects based on CTP version 1 that require rows from more than one shard to populate a DataSet from a SELECT query will require client-side code that executes UNION ALL queries to aggregate the AUs. Obtaining aggregate values such as SUM(), AVG(), MIN(), MAX() and COUNT() with CTP version 1 also will require multiple queries that increment accumulator variables with values from individual Federation Members.

Automagical Online Partitioning and Other New Features
A primary SQL Azure selling point is minimizing -- and ultimately eliminating -- all common physical database management tasks by subscribers. The SQL Azure team wants to manage scaling federations out and in, as well as leveling the sizes of Federation Member databases.

According to Lev Novik and other SQL Azure team members, CTP versions later than version 1 will add at least the following features to SQL Azure Federations:

A new ALTER FEDERATION Federation_Name MERGE (Fed_Key_Value) instruction to enable repartitioning when scale-down is appropriate for reducing database subscription expense.
Auto repartitioning to enable SQL Azure to manage repartitioning the federated databases online with SPLIT and MERGE operations based on a policy you specify, such as Federated Table size or query execution time.
Fan-out queries so that a single query will be able to process results gathered across many Federation Members.
Schema management to allow multiversion schema deployment and management across Federation Members.
Multicolumn Federation Keys to enable federation on, for example, CustomerID+AccountID.

Java developers have had the Hibernate.Shards library for several years; it's now matured to version 3.0.0 beta 2 on the JBoss.org Web site. Developer Oren Eini, more commonly known by the alias "Ayende Rahien," rejuvenated the NHibernate Shard project for .NET last fall. SharpFellow blogger John Rayner demonstrated NHibernate Shards with SQL Azure and provided source code in his "Sharding into the Cloud" post. Rayner wrote in his post: "NHibernate.Shards is an extension to the well-known O/RM, which allows a logical database to be partitioned across multiple physical databases and servers. It's a port of the Hibernate.Shards project, as with lots of things in NHibernate. I thought it would be interesting to see how well it worked against SQL Azure. It turned out to be not interesting at all ... just plain easy!"

Microsoft has most of its object/relational mapping (O/RM) eggs in the Entity Framework basket. You can expect the teams that own data connectivity to SQL Server and SQL Azure -- including the ADO.NET Team that's responsible for Entity Framework -- to expand their repertoire to handle SQL Azure Federation. The Microsoft

Azure Marketplace DataMarket is a classic big-data application, so it's a good bet that Pablo Castro (a Microsoft software architect in the SQL Server Group) and his WCF Data Services team are hard at work updating the RESTful OData API. Synchronizing SQL Azure Federations across datacenter boundaries (geolocations) and with on-premises SQL Server databases will keep Liam Cavanagh (Microsoft senior program manager for SQL Azure Data Sync and Microsoft Sync Framework) and his team busy for at least a few months. Watch these groups' blogs closely for signs of progress on Federation-enabled connectivity for SQL Azure.

Don't believe everything you hear from the "NoSQL" crowd about the demise of SQL databases for big data. Microsoft intends to protect its database turf in the cloud from naysayers who claim it won't scale past 50GB. The company's approach is to automate the sharding process and its management for scaling out and back in to minimize human intervention. Get ready to take advantage of SQL Azure Federations by downloading the CTP 1 as soon as it's available, and then plan your route to high-scalability relational data nirvana.

Credits: Thanks to Cihan Biyikoglu, program manager for SQL Azure, for providing important technical insights about SQL Azure Federations and the forthcoming CTPs; visit his "SQL Azure -- Your Data in the Cloud" blog for up-to-date information on SQL Azure Federation developments. You'll find additional references to his posts, as well as other sharding approaches for SQL Azure, Hibernate and NHibernate, in the OakLeaf Systems Resource Links for SQL Azure Federations and Sharding Topics post.

Printable Format

comments powered by Disqus

Featured

Visual Studio Insiders Further Refines Copilot Usage Tracking

With devs reeling from usage-based billing sticker shock, Visual Studio 18.9 makes monthly Copilot plan usage easier to reach from the coding workflow while adding more cost information to a redesigned model picker.
VS Code 1.130 Expands Agent Host and Review Tools

Microsoft's latest weekly VS Code release advances shared agent sessions, multi-file change review, chat visibility and terminal navigation
Microsoft Agent Framework Makeover: Claws, Loops and Harnesses

Microsoft's newly released Agent Framework Harness packages the loops, planning, memory, context management and safety controls that developers previously had to assemble around AI models themselves.
Visual Studio 2026 Gives Copilot Built-In Skills -- and Makes Them Prove Their Worth

Microsoft is moving Agent Skills beyond bring-your-own instructions by shipping expert-authored workflows with the IDE, while keeping them off by default until testing shows their benefits justify the additional token use.