Redmond Diary

By Andrew J. Brust

Blog archive

SQL Azure Federation: Is it "NoSQL" Azure too?

With all the noise (not to mention the funk) around HTML5 and Silverlight at PDC 2010, we could be forgiven for missing the numerous keynote announcements about Windows Azure and SQL Azure. And with all those announcements, we could be forgiven for missing the ostensibly more arcane stuff in the breakout sessions. But one of those sessions covered material that was very important.

That session was Lev Novik's "Building Scale-out Database Applications with SQL Azure," and given the generic title, we could be further forgiven for not knowing how important the session was. But it was important. Because it covered the new Federation (aka "sharding") features coming to SQL Azure in 2011. Oh, and by the way, you'd also be forgiven for not knowing what sharding was. Perhaps a little recent history would help.

When SQL Azure was first announced, its databases were limited in size to 10GB, in the pro version of the service. That's big enough for lots of smaller Web apps, but not for bigger ones, and definitely not for Data Warehouses. Microsoft's answer at the time to criticism of this limitation was that developers were free to "shard" their databases. Translation: you could create a bunch of Azure databases, treat each one as a partition of sorts, and take it upon yourself to take the query you needed to do, and divide it up into several sub queries -- each one executing against the correct "shard" -- and then merge all the results back again.

To be honest, while that solution would work and has architectural merit, telling developers to build all that plumbing themselves was pretty glib, and not terribly actionable. Later, Microsoft upped the database size limitation to 50GB, which mitigated the criticism, but it didn't really fix the problem, so we've been in a bit of a holding pattern. Sharding is sensible, and even attractive. But the notion that a developer would have to build out all the sharding logic herself, in any environment that claims to be anything "as a service," was far-fetched at best.

That's why Novik's session was so important. In it, he outlined the explicit sharding support, to be known as Federation, coming in 2011 in SQL Azure, complete with new T-SQL keywords and commands like CREATE/USE/ALTER FEDERATION and CREATE TABLE...FEDERATE ON. Watch the session and you'll see how elegant and simple this is. Effectively, everything orbits around a Federation key, which then corresponds to a specific key in each table, and in turn determines what part of those tables' data goes in which Federation member ("shard"). Once that's set up, queries and other workloads are automatically divided and routed for you and the results are returned to you as a single rowset. That's how it should have been along. Never mind that now.

Sharding does more than make the 50GB physical limit become a mere detail, and the logical size limit effectively go away. It also makes Azure databases more elastic, since shards can become more or less granular as query activity demands. If a shard is getting too big or queried too frequently, it can be split into two new shards. Likewise, previously separate shards can be consolidated, if the demand on them decreases. With SQL Azure Federation, we really will have Database as a Service... instead of Database as a Self-Service.

But it goes beyond that. Because with Federation, SQL Azure gains a highly popular feature of so-called document-oriented NoSQL databases like MongoDB, which features sharding as a foundational feature. Plus, SQL Azure's soon-to-come support for decomposing a database-wide query into a series of federation member-specific ones, and then merging the results back together, starts to look a bit like the MapReduce processing in various NoSQL products and Google Hadoop. When you add to the mix SQL Azure's OData features and its support for a robust RESTful interface for database query and manipulation, suddenly staid old relational SQL Azure is offering many of the features people know and love about their NoSQL data stores.

While NoSQL databases proclaim their accommodation of "Internet scale," SQL Azure seems to be covering that as well, while not forgetting the importance of enterprise scale, and enterprise compatibility. Federating (ahem!) the two notions of scale in one product seems representative of the Azure approach overall: cloud computing, with enterprise values.

Posted by Andrew J. Brust on 11/23/2010


comments powered by Disqus

Featured

  • AI for GitHub Collaboration? Maybe Not So Much

    No doubt GitHub Copilot has been a boon for developers, but AI might not be the best tool for collaboration, according to developers weighing in on a recent social media post from the GitHub team.

  • Visual Studio 2022 Getting VS Code 'Command Palette' Equivalent

    As any Visual Studio Code user knows, the editor's command palette is a powerful tool for getting things done quickly, without having to navigate through menus and dialogs. Now, we learn how an equivalent is coming for Microsoft's flagship Visual Studio IDE, invoked by the same familiar Ctrl+Shift+P keyboard shortcut.

  • .NET 9 Preview 3: 'I've Been Waiting 9 Years for This API!'

    Microsoft's third preview of .NET 9 sees a lot of minor tweaks and fixes with no earth-shaking new functionality, but little things can be important to individual developers.

  • Data Anomaly Detection Using a Neural Autoencoder with C#

    Dr. James McCaffrey of Microsoft Research tackles the process of examining a set of source data to find data items that are different in some way from the majority of the source items.

  • What's New for Python, Java in Visual Studio Code

    Microsoft announced March 2024 updates to its Python and Java extensions for Visual Studio Code, the open source-based, cross-platform code editor that has repeatedly been named the No. 1 tool in major development surveys.

Subscribe on YouTube