Under the Hood With SQL Server 2012
SQL Server 2012 will offer a variety of new features centered on big data, business intelligence and cloud computing.
SQL Server 2012 is a large, complex product that represents Microsoft's vision of where data is headed.
If that vision were to be summed up in one word, that word would be: big.
To meet the challenges of big data in an ever-more connected world, on Tuesday Microsoft announced the release to manufacturing for SQL Server 2012 as well as an "evaluation version" for download, with general availability coming April 1.
With this release, Microsoft is touting SQL Server 2012's ability to handle "mission critical" operations in terms of performance, scale, security, high availability and disaster recovery. Additionally, Microsoft claims that SQL Server 2012 users have access to the "cloud on your terms" with SQL Server 2012, either private clouds (via virtualization and management) or public clouds (via SQL Azure).
The management part of the private cloud deployments is linked to Microsoft's System Center 2012 suite, where it's possible to deploy private clouds using a wizard in Virtual Machine Manager. Microsoft also has tools to synchronize and move data back and forth between private and public clouds with its Data Sync technology. Data Sync enables bidirectional synchronization and is currently at the technology preview stage, said Doug Leland, general manager of Microsoft's business platform marketing group, in a phone interview on Monday. Finally, SQL Server 2012 is optimized to produce "breaking insights," in part because of its leveraging of "big data" and Microsoft's tools.
Big Data and Hadoop
Leland noted three general trends that accompany this release of SQL Server. First, there is the data explosion, particularly with the use of unstructured data. Next, he cited a reduction in cost and computing that is "reshaping data processing." He described it as the "save everything era," allowing organizations to save signals data and even data in the wild and attain insight from them. Finally, he said that there has been a wave of innovation that is creating new business models. He described SQL Server 2012 as the platform designed to "embrace structured and unstructured data" and "unlock insights."
Microsoft previously announced its "big data" embrace at the October PASS meeting, where the company committed to backing open-source Apache Hadoop as a core component of its data platform and enabling interoperability with Windows Azure and Windows Server. Now, Microsoft is announcing a second preview for Apache Hadoop that will add new features and expand its capacity. It will be a new technical preview available by invitation only, according to Leland.
An example of a customer using these big data technologies is Klout, which offers a service that ranks influence on social media sites such as Twitter, Facebook and others. From a registered population of 100 million, Klout collect billions of signals across social media sites. The traffic is captured in Hadoop and moved into SQL Server Analysis Services, and the query results are graphed via PowerPivot and Power View. SQL Server Analysis Services provides the multidimensional online analysis processing (MOAP), while the Excel's PowerPivot and Power View provide the front-end tools that empower data workers, Leland explained.
Saptak Sen, a Microsoft senior product manager, explained that Hadoop allows users to apply a structure to unstructured data at query time. He said that most people use Hadoop as a "cheap shoebox" and create a product out of a cache flow. For instance, he described a medical company that used Hadoop to flag certain claims instead of using personnel to process forms. Yahoo uses Hadoop to process clickstream data to determine what sort of ad categories to create. They create cubes of data with a single cube consisting of 24 terabytes, he said.
Microsoft has a family of in-memory technologies, called "Xvelocity," for SQL Server 2012 that Microsoft depicts as enabling fast query processing. Leland said that the technology has enabled "dramatic" speed improvements among customers testing SQL Server 2012. Xvelocity is for business intelligence (BI) and data warehousing uses.
One of those Xvelocity-branded technologies is ColumnStore Indexing, which "grabs your tables and makes them into a column," according to Dandy Weyn, a Microsoft senior technology product manager, at the workshop. The columns get cached in memory. He said that ColumnStore Indexing, which is a read-only feature, takes up less space than a row-based index and uses compression via Microsoft's VertiPaq technology to speed up performance. ColumnStore Indexing is specifically designed for data warehouse applications. Weyn said that it can be compared with Oracle's column-store compression technology, but "you'd need to have a full rack appliance to take advantage of the techniques needed for data warehousing."
Microsoft's ColumnStore Indexing is not supported on SQL Azure yet, according to Greg Leake, a technical product manager on the SQL Server marketing team. It's available to users of Enterprise edition of SQL Server 2012.
AlwaysOn isn't a specific feature in SQL Server 2012. It's Microsoft's branding for high availability and disaster recovery technologies, according to Aaron Bertrand, a Microsoft MVP. Those technologies include "failover clustering," "availability groups," and "active secondaries," among others.
SQL Server 2012 has "native support for failover clustering," according to Weyn. It supports multisubnet clustering, where nodes will use an "OR" subset that will allow a failover to another subnet node.
Availability groups provide protection at the database level. It's an alternative to database mirroring. Multiple databases can be grouped together into an availability group. Users specify an availability group "listener," which will move over to another server if the first server fails, Weyn explained. Users need to specify failover clustering for this technology to work, he added. Availability groups can specify active secondaries, which are used for "offloading read workloads to a secondary instance," according to Microsoft's AlwaysOn FAQ. It's possible to have multiple availability groups on a single server, and you can have multiple secondaries, too. AlwaysOn can be enabled using PowerShell scripts, and it's possible for users to define their failover policies, Weyn said.
Development and Management Tools
Microsoft is releasing SQL Server Data Tools with SQL Server 2012. Those tools are free, but if users already have Visual Studio, SQL Server Data Tools will "meld seamlessly within that installation in Visual Studio 2010," Leake said. The tools will support versioning and team-based development. Moreover, they work with any past editions of SQL Server, Leake added. The tools work both with on-premises installations of SQL Server and with the cloud-based SQL Azure. It's also possible to disconnect from the network and use LocalDB, an in-memory database, and then upload any changes later to the live database, he explained.
SQL Server Data Tools include a new SQL Server object explorer, a SQL language service, buffered declarative editing, a table designer (with "intellisense" code completion) and an isolated local database runtime, Leake explained. The tools follow the SQL Azure release cycle, so they will be updated every three to four months, he added. SQL Server Data Tools are designed for database administrators and developers. Microsoft also has a tool for systems operators called "SQL Server Management Studio." This sys-op toolset is only available to those with the developer edition; it's not available with the Express edition.
In addition, Microsoft recently released a new management pack for SQL Azure that adds management capabilities, Leake said. Microsoft has a Web-based management portal that works with Windows Azure, which will produce "rich cloud-based data reports." SQL Azure is based on the SQL Server 2012 engine, Leake explained.
At the workshop, Leake was asked about modeling in SQL Server Data Tools. He said that "our team is not moving into the database modeling area" and that it was "opened up for third parties" to address that capability. Another question was about the status of "Project Barcelona," a Microsoft effort to enable metadata management in SQL Server. Leake said that the team is still working on Project Barcelona, but that those capabilities won't ship with SQL Server 2012.
Business Intelligence Tools
The two main BI tools that work with SQL Server 2012 are the Excel 2010-based PowerPivot and Power View. The later tool is an "interactive visualization program based on ad hoc queries," according to Sean Boon, a senior program manager for SQL Server business intelligence. A Microsoft TechNet article describes Power View as a "Reporting Services Add-in for Microsoft SharePoint Server 2010 Enterprise Edition." It's browser-based, using Silverlight, and gets "launched from SharePoint Server 2010."
PowerPivot is an Excel 2010 addition for BI purposes that's designed for "Excel users who know how to structure, analyze, and calculate multidimensional data in workbooks and PivotTables," according to an MSDN description.
Both tools connect with SQL Server Reporting Services and integrate with SharePoint 2010 for data sharing. SharePoint isn't required to use PowerPivot, but Microsoft conceives of SharePoint as a way to control data within an organization. It's possible to open a PowerPivot workbook in a browser via Excel Web access or a SharePoint WebPart, according to Boon.
IT pros can use a dashboard to monitor PowerPivot on SharePoint to check for CPU usage. It can be tracked over time, so that heavily used items can be flagged to become a managed item, Boon explained.
IT pros can also create alerts on data for Power View. They can set up execution logs devoted to the alerts. Power View is designed to create interactive reports quickly. "It's really about asking questions and being able to present that data and having a high level of interactivity," Boon said. Power View has an "Office-like look and feel" in a browser, he added. The data visualizations also are fully interactive after being exported to the Microsoft PowerPoint presentation program. Users get a link in PowerPoint called "click to interact" that enables this interactive capability.
SQL Server 2012 includes a number of services to help clean up the data. A data quality services feature allows users to correct data. For instance, the abbreviation, "in.," can be standardized as "inch." SQL Server 2012 also has master data services, which is used to categorize objects, such as "products" and "customers." It's designed for data that doesn't change too often. Both of those services are maintained through SQL Server 2012's integration services. Microsoft also has a data quality services in SQL Server 2012 that is "a knowledge-driven data quality solution." For instance, it can be set to determine that if the city is Los Angeles, then the state should be California, Boon explained.
While SQL Server 2012 handles various data feeds, it's also possible to connect with Microsoft's Windows Azure Marketplace or third-party data reference providers. Microsoft's marketplace is a portal that sells data feeds, which can be used to supplement a data mashup.