SQL Azure Federation: Is it "NoSQL" Azure too?
With all the noise (not to mention the funk) around HTML5 and Silverlight at PDC 2010, we could be forgiven for missing the numerous keynote announcements about Windows Azure and SQL Azure. And with all
those announcements, we could be forgiven for missing the ostensibly more arcane stuff in the breakout sessions. But one of those sessions covered material that was very important.
That session was Lev Novik's "Building Scale-out Database Applications with SQL Azure," and given the generic title, we could be further forgiven for not knowing how important the session was. But it was important. Because it covered the new Federation (aka "sharding") features coming to SQL Azure in 2011. Oh, and by the way, you'd also be forgiven for not knowing what sharding was. Perhaps a little recent history would help.
When SQL Azure was first announced, its databases were limited in size to 10GB, in the pro version of the service. That's big enough for lots of smaller Web apps, but not for bigger ones, and definitely not for Data Warehouses. Microsoft's answer at the time to criticism of this limitation was that developers were free to "shard" their databases. Translation: you could create a bunch of Azure databases, treat each one as a partition of sorts, and take it upon yourself to take the query you needed to do, and divide it up into several sub queries -- each one executing against the correct "shard" -- and then merge all the results back again.
To be honest, while that solution would work and has architectural merit, telling developers to build all that plumbing themselves was pretty glib, and not terribly actionable. Later, Microsoft upped the database size limitation to 50GB, which mitigated the criticism, but it didn't really fix the problem, so we've been in a bit of a holding pattern. Sharding is sensible, and even attractive. But the notion that a developer would have to build out all the sharding logic herself, in any environment that claims to be anything "as a service," was far-fetched at best.
That's why Novik's session was so important. In it, he outlined the explicit sharding support, to be known as Federation, coming in 2011 in SQL Azure, complete with new T-SQL keywords and commands like CREATE/USE/ALTER FEDERATION and CREATE TABLE...FEDERATE ON. Watch the session and you'll see how elegant and simple this is. Effectively, everything orbits around a Federation key, which then corresponds to a specific key in each table, and in turn determines what part of those tables' data goes in which Federation member ("shard"). Once that's set up, queries and other workloads are automatically divided and routed for you and the results are returned to you as a single rowset. That's how it should have been along. Never mind that now.
Sharding does more than make the 50GB physical limit become a mere detail, and the logical size limit effectively go away. It also makes Azure databases more elastic, since shards can become more or less granular as query activity demands. If a shard is getting too big or queried too frequently, it can be split into two new shards. Likewise, previously separate shards can be consolidated, if the demand on them decreases. With SQL Azure Federation, we really will have Database as a Service... instead of Database as a Self-Service.
But it goes beyond that. Because with Federation, SQL Azure gains a highly popular feature of so-called document-oriented NoSQL databases like MongoDB, which features sharding as a foundational feature. Plus, SQL Azure's soon-to-come support for decomposing a database-wide query into a series of federation member-specific ones, and then merging the results back together, starts to look a bit like the MapReduce processing in various NoSQL products and Google Hadoop. When you add to the mix SQL Azure's OData features and its support for a robust RESTful interface for database query and manipulation, suddenly staid old relational SQL Azure is offering many of the features people know and love about their NoSQL data stores.
While NoSQL databases proclaim their accommodation of "Internet scale," SQL Azure seems to be covering that as well, while not forgetting the importance of enterprise scale, and enterprise compatibility. Federating (ahem!) the two notions of scale in one product seems representative of the Azure approach overall: cloud computing, with enterprise values.
Posted by Andrew J. Brust on 11/23/2010