Microsoft hasn't been standing still since it became the new commercial steward of the R programming language by acquiring R specialist Revolution Analytics a year ago. The company has been busy revamping its acquired R offerings, including the open source version.
That free download, along with a free developer edition, highlighted a reorganization and rebranding of Microsoft's R products, now called Microsoft R Server.
Along with upgrading and maintaining the open source version, Microsoft exec Joseph Sirosh detailed other open source-related initiatives the company has made surrounding R in a blog post yesterday. For example, he noted that Microsoft became a founding member of the R Consortium, formed "to provide support to the R community, the R Foundation and groups and individuals, using, maintaining and distributing R software." Such moves are furthering Microssoft's transition from a bastion of proprietary, insular software to a champion of the open source movement.
"Microsoft R Open enhances the performance of R with multi-threaded processor optimized computations provided by Intel Math Kernel Libraries (MKL) delivering large speedups especially in matrix-oriented computations," Sirosh said. "It also makes it easier to build reliable applications with R on Windows, Mac and Linux by simplifying the management of R package versions. Microsoft R Open is 100 percent compatible with all R scripts and packages, and just like R is open source and free to download, use and share."
The R language is popular
in the Big Data world because of its statistics capabilities and facility for predictive analytics. The new Microsoft platform consists of the acquired company's offering, Revolution R Enterprise (RRE) for Hadoop, Linux and Teradata -- described as a Big Data-capable R distribution for servers, Hadoop clusters and data warehouses -- along with Microsoft quality enhancements, support and purchasing options.
In another blog post yesterday, David Smith of Revolution Analytics said that, in addition to its new name, "Microsoft R Server includes an updated R engine (R 3.2.2), new fuzzy matching algorithms, the ability to write to databases via ODBC and a streamlined install experience."
Sirosh expounded on the new features and provided further details about the company's R revamp. "Microsoft R Server is a broadly deployable enterprise-class analytics platform based on R that is supported, scalable and secure," he said. "Supporting a variety of Big Data statistics, predictive modeling and machine learning capabilities, R Server supports the full range of analytics -- exploration, analysis, visualization and modeling. By using and extending open source R, Microsoft R Server is fully compatible with R scripts, functions and CRAN packages, to analyze data at enterprise scale. It also addresses the in-memory limitations of open source R by adding parallel and chunked processing of data in Microsoft R Server, enabling users to run analytics on data much bigger than what fits in main memory."
Sirosh said Microsoft wants to make Microsoft R Server the enterprise standard for cross-platform analytics in the cloud or on-premises.
"Delivering Microsoft R Server across multiple platforms allows our enterprise customers to standardize advanced analytics on one core tool, regardless of whether they are using Hadoop (Hortonworks, Cloudera and MapR), Linux (Red Hat and SUSE) or Teradata," Sirosh said "For Windows, Microsoft R Server will be included in SQL Server 2016 as SQL Server R Services -- and the combined bundle is less expensive than RRE standalone. Until SQL Server 2016 is released, Revolution R Enterprise for Windows remains available as a standalone product." The company hasn't specified exactly when SQL Server 2016 will be released, but, barring a name change, it should be this year.
Microsoft, famously characterized by its former CEO as being all about "developers, developers, developers," is also offering a free Microsoft R Server Developer Edition, via Visual Studio Dev Essentials, including all the features found in the commercial version.
That developer edition will also be included in the new Microsoft Data Science Virtual Machine as a pre-installed and pre-configured component. That VM is described as "a Windows Server 2012-based custom virtual machine image on the Azure marketplace containing several popular tools that can be used by data scientists and developers for advanced analytics."
Microsoft R Server is also available free to students for academic use through the company's Microsoft DreamSpark program.
"Advanced and predictive analytics is about developing and testing new models," the company quoted IDC analyst Dan Vesset as saying. "But it's also about their incorporation by developers into production deployments of decision support and automation solutions that can benefit the whole organization. With its new offerings for the R ecosystem, Microsoft is playing an important role in bringing analyst modeling and productivity tools as well as deployment tools to a broader audience."
Posted by David Ramel on 01/13/2016 at 10:00 AM0 comments
Less than a month after the first public preview of SQL Server 2016, Microsoft has released an update that for the first time puts the flagship relational database into "rapid preview" cadence.
The counterpart cloud offering, Microsoft Azure, has already been following the model of releasing quicker Community Technology Previews (CTPs), and now the on-premises SQL Server 2016 is following suit.
"With the release of SQL Server 2016 CTP 2.1, for the first time customers can experience the rapid preview model for their on-premises SQL Server 2016 development and test environments," exec Tiffany Wissner said in a blog post yesterday. "This born in the cloud model means customers don't have to wait for traditional CTPs that are released after several months for the latest updates from Microsoft engineering, and can gain a faster time to production. The frequent updates are also engineered to be of the same quality as traditional major CTPs, so customers don't have to be concerned about build quality. "
Microsoft also released previews and general availability releases of several other data-related products, including SQL Server Management Studio (SSMS), which for the first time gets its own preview release separate from the main SQL Engine release cadence. "Our goal is to update this frequently with new features, fixes and support for the newest SQL Server features in SQL Server Engine and Azure SQL Database," the SQL Server engineering team announced in another blog post yesterday. The standalone SSMS has also adopted the rapid preview model.
In the SQL Server 2016 CTP 2.1 (version 2.0 was the first public preview, despite what the versioning number suggests), the Stretch Database functionality introduced last month has been improved. It archives historical data in the Azure cloud, silently migrating it to an Azure SQL Database.
Other functionality was also improved, concerning: Query Store, which deals with the handling of historical query plans; Temporal, which lets users handle and analyze database records as they're changed over time; and Columnstore Index, which received performance boosts in seek functionality and scanning of partitioned tables. More details on these improvements are available in yet another blog post published yesterday.
For the new standalone SSMS, key improvements include: a new lightweight Web installer; automatic update monitoring; fixes in response to top customer requests concerning row editing, Table Designer and database and table property dialogs; a new option to skip the prompt asking users if they want to save T-SQL files; updated import/export wizards; and bug fixes to improve support for Azure SQL Database.
Other preview and general availability releases (you guessed it, detailed in yet another blog post) include:
- The Limited Public Preview of Azure SQL Data Warehouse.
- General availability of Azure AD Connect and Connect Health.
- General availability of Azure Application Gateway.
- Microsoft Intune Conditional Access and Mobile Application Management for the Outlook app.
- General availability of the new Microsoft Power BI Content Pack and connector.
- General availability of Key Vault across all regions (except Australia).
Microsoft invited users to download the SQL Server 2016 preview or test it via an Azure virtual machine (VM) and provide feedback on their experiences.
Posted by David Ramel on 06/25/2015 at 8:33 AM0 comments
Look out Microsoft. New Big Data services with a taste of SQL highlighted a bevy of new offerings added to the Oracle Cloud Platform yesterday.
At a live event, exec Larry Ellison was on hand to introduce more than 25 new Software-as-a-Service (SaaS), Platform-as-a-Service (PaaS) and Infrastructure-as-a-Service (IaaS) cloud products.
Among these is the Oracle Big Data Cloud Service, providing an Apache Hadoop-based distribution from Cloudera Inc., dedicated high-performance hardware and networking, and security via Kerberos and Apache Sentry.
Combined with the company's new SQL-on-Hadoop Big Data SQL Cloud Service, they form an enterprise-oriented offering that Oracle calls a comprehensive Big Data Management System. The system extends Oracle's SQL-based implementation to enable unified queries and security across Hadoop and its traditional counterpart database technology, NoSQL.
"Oracle Big Data Cloud Service and Big Data SQL Cloud Service provides a high-performance, secure platform for running diverse workloads on Hadoop and NoSQL databases to help enterprises acquire and organize Big Data," the company said in a news release.
Oracle said that by leveraging the well-known SQL query language, its solution provides familiar technology, tools and training to help organizations address "the widening Big Data skills gap."
"While experts can easily work with data in Hadoop and NoSQL databases, most of your organization is not familiar with these new environments," an Oracle solution brief states. "But most people do know SQL, and it's the main way that business applications already access data."
"Your analysts can use their existing SQL skills to access new data, and your existing applications require no changes to access data in Hadoop," the brief continued. "Big Data SQL also extends the security capabilities of Oracle Database to data in Hadoop and NoSQL, so you can use your existing policies and processes to keep your data secure."
In addition to supplying familiar technology and the ability to use incumbent skills, Oracle said its new solution uses its SmartScan technology to reduce data movement, which it characterized as one of the major impediments to speedy Big Data analytics, even in the cloud.
On the more traditional database front, Oracle offered another tilt at Microsoft and its Azure Big Data services. The company announced the Oracle Database Cloud - Exadata Service, providing a cloud host that gives users the same functionality, performance and availability of its familiar on-premises Oracle Database packaged with the Exadata appliance. The company describes Exadata as "a modern architecture featuring scale-out industry-standard database servers, scale-out intelligent storage servers, state-of-the-art PCI flash storage servers and an extremely high-speed InfiniBand internal fabric that connects all servers and storage."
The cloud offering is 100 percent compatible with its on-premises counterpart, the company said, which paves the way for a smooth cloud migration or transition to a hybrid implementation providing the best of both worlds.
Another service introduced yesterday is the Oracle Archive Storage Cloud Service, which squarely targets a comparable Amazon Web Services Inc. (AWS) offering at one-tenth the price. You can read more about that at our sister publication, AWS Insider.net. You can also learn more about all the new services in the John K. Waters article, "Oracle 'Completes'its Cloud Platform," on sister site ADT Mag.
Posted by David Ramel on 06/23/2015 at 1:59 PM0 comments
Amazon Web Services Inc. (AWS) unveiled a new Microsoft SQL Server Enterprise Edition offering for the Amazon Elastic Compute Cloud (EC2.)
A blog post authored by exec Jeff Barr yesterday said the new, pre-configured Amazon Machine Image (AMI) improves upon the Standard Edition by adding more computing power and memory. Standard allows for using up to 16 cores and 128 GiB of memory, while Enterprise can go up to 32 cores and 244 GiB of memory available in an extra-large instance.
The Enterprise Edition comes with SQL Server Enterprise Edition 2012 and SQL Server Enterprise Edition 2014, available in several regions, as explained in the AWS Marketplace.
Barr highlighted the following new and unique features of the offering:
- High availability lets users configure a primary database and up to four active, readable secondary databases into an Always-On availability group.
- Self-service business intelligence through Power View, used to interactively explore and visualize data.
- Data quality services let organizational and third-party reference data be used to profile, cleanse and match your own data.
- Online change functionality lets users restore files and file groups, alter schemas and make indexing changes while a database remains online.
"You can run the AMI on-demand or you can purchase an EC2 Reserved Instance with a one- or three-year term," Barr said.
In another data-related move, AWS on the same day announced it had added support for the enormously popular Apache Spark project to its Amazon Elastic MapReduce (Amazon EMR) service. "Amazon EMR is a Web service that makes it easy for you to process and analyze vast amounts of data using applications in the Hadoop ecosystem, including Hive, Pig, HBase, Presto, Impala and others," the company said. "We're delighted to officially add Spark to this list. Although many customers have previously been installing Spark using custom scripts, you can now launch an Amazon EMR cluster with Spark directly from the Amazon EMR Console, CLI or API."
Spark is an open-source, distributed processing framework often used for Big Data workloads. It leverages in-memory caching and optimized execution to boost performance over older Hadoop ecosystem components such as MapReduce, supporting general batch processing, streaming Big Data analytics, machine learning, graph databases, and interactive, ad hoc queries, according to the AWS Spark page.
Among Spark's many components is Spark SQL for low-latency, interactive SQL queries.
Posted by David Ramel on 06/17/2015 at 12:11 PM0 comments
With prior partnerships in place with Hortonworks Inc. and Cloudera Inc., Microsoft has now teamed up with MapR Technologies Inc., the final member of the "big three" distributors of Apache-Hadoop based software.
"Today, we are excited to announce that MapR will also be available in the summer as an option for customers to deploy Hadoop from the Azure Marketplace," Microsoft exec T.K "Ranga" Rengarajan said in a blog post yesterday.
Hortonworks had been the primary Hadoop partner in the Azure cloud. The companies teamed up in 2011 to eventually offer the Azure-based HDInsight service, featuring the Hortonworks Data Platform (HDP) as the Hadoop distribution providing the software foundation. Hortonworks also developed the Hortonworks Data Platform for Windows, letting Windows users in on the traditionally Linux-based Hadoop ecosystem. HDP is also available as a virtual machine (VM) option in the Azure Marketplace.
Microsoft then added Cloudera to the Azure mix in October 2013, putting Cloudera Enterprise in the Azure Marketplace as another Hadoop-based option.
Now, sometime this summer, the MapR Hadoop-based distribution will join the Azure Hadoop party.
"MapR is a leader in the Hadoop community that offers the MapR Distribution including Hadoop, which includes MapR-FS, an HDFS and POSIX compliant file store, and MapR-DB, a NoSQL key value store," Rengarajan said. "The distribution also includes core Hadoop projects such as Hive, Impala, SparkSQL, and Drill, and MapR Control System, a comprehensive management system."
MapR said its distribution on the Azure Marketplace will let users:
- Deploy MapR directly from the Azure Marketplace.
- Transfer data between MapR and Microsoft SQL Server services within Azure.
- Deploy MapR-DB, the MapR in-Hadoop NoSQL database, to support a wide variety of real-time use cases and deployment scenarios.
"As part of this agreement, MapR will fully support deployment of its top-ranked NoSQL database, MapR-DB on Azure," the company said yesterday in a news release. "MapR-DB offers advanced operational features such as multi-master table replication, where business users can analyze data across geographic regions while maintaining low latency and automatic synchronization with a centralized table for analytics and BI."
MapR also yesterday announced that a new version of its distribution, MapR 5.0, will be available in 30 days. "The latest MapR release auto synchronizes storage, database and search indices to support complex, real-time applications to increase revenue, reduce operational costs and mitigate risk," the company said. "MapR 5.0 also includes comprehensive security auditing, Apache Drill support, and the latest Hadoop 2.7 and YARN features."
Rengarajan noted that the Hortonworks distribution was also just updated, to HDP 2.3. That, he said, will also be available in the Azure Marketplace this summer.
Posted by David Ramel on 06/11/2015 at 11:59 AM0 comments
Well, Microsoft lied to us. They didn't provide the public preview of SQL Server 2016 "this summer" as promised -- they delivered today.
"The first public preview of SQL Server 2016 is now available for download," the SQL Server 2016 Preview page says today. "It is the biggest leap forward in Microsoft's data platform history with real-time operational analytics, rich visualizations on mobile devices, built-in advanced analytics, new advanced security technology, and new hybrid cloud scenarios."
The new edition of the company's flagship relational database features enhanced in-memory performance, Always Encrypted security technology developed in the company's research unit, "stretch databases" that move data back and forth to the cloud when needed, built-in advanced analytics and many more. You can read more about the new release in my earlier report.
Company exec T.K. "Ranga" Rengarajan provided more details in a blog post today. "Unique in this release of SQL Server, we are bringing capabilities to the cloud first in Microsoft Azure SQL Database such as Row-Level Security and Dynamic Data Masking and then bringing the capabilities, as well as the learnings from running these at hyper-scale in Microsoft Azure, back to SQL Server to deliver proven features at scale to our on-premises offering," he said. "This means all our customers benefit from our investments and learnings in Azure."
Row-Level Security lets administrators control access to data based on user characteristics. Security is implemented inside a database, so no modifications are required to an application. With Dynamic Data Masking, real-time data obfuscation is supported so data requesters can't access unauthorized data. Rengarajan said this helps protect sensitive data even when it's not encrypted.
Rengarajan highlighted the following additional capabilities in the Community Technology Preview 2:
- PolyBase -- More easily manage relational and non-relational data with the simplicity of T-SQL.
- Native JSON support -- Allows easy parsing and storing of JSON and exporting relational data to JSON.
- Temporal Database support -- Tracks historical data changes with temporal database support.
- Query Data Store -- Acts as a flight data recorder for a database, giving full history of query execution so DBAs can pinpoint expensive/regressed queries and tune query performance.
- MDS enhancements -- Offer enhanced server management capabilities for Master Data Services.
- Enhanced hybrid backup to Azure -- Enables faster backups to Microsoft Azure and faster restores to SQL Server in Azure Virtual Machines. Also, you can stage backups on-premises prior to uploading to Azure.
He also said ongoing preview updates were coming soon. "New with SQL Server 2016, customers will have the opportunity to receive more frequent updates to their preview to help accelerate internal development and test efforts," Rengarajan said. "Instead of waiting for CTP3, customers can choose to download periodic updates to CTP2 gaining access to new capabilities and features as soon as they are available for testing. More details will be shared when the first preview update is available."
He didn't say when that would be. Just as well, as the company seems to have trouble following its promised timelines.
Posted by David Ramel on 05/27/2015 at 1:23 PM0 comments
After stewarding the open source project from incubation to its new 1.0 release, MapR Technologies Inc. added Apache Drill for SQL-based Big Data analytics to its Apache Hadoop distribution.
The company -- one of the "big three" Hadoop vendors along with Hortonworks Inc. and Coudera Inc. -- this week announced the general availability of the open source Apache Drill 1.0 and its inclusion in the MapR Hadoop distribution.
Drill is a low-latency query engine based on ANSI SQL standards that facilitates self-service, interactive analytics at Big Data scales, including up to petabyte scale (1 PB is equal to 1 million GB). One of its key features is that it doesn't depend on traditional database schemas that describe how data is categorized. Discovering such schemas on the fly makes for quicker analytics, the company said.
MapR engineers including Jacques Nadeau and Steven Phillips have taken the lead on the open source project, which was incubated at the Apache Sofwtare Foundation (ASF) in September 2012 with the goal of wedding the familiar workings of relational databases with the huge new scalability demanded by the Big Data era and the agility of Hadoop systems and their heavy use of NoSQL databases.
"The project has been on the fast track in the last nine months since the developer preview in August 2014, delivering seven significant iterative releases, each adding exciting new features and most importantly, improving on the stability, scale and performance required for broader enterprise deployments," MapR exec Neeraja Rentachintala said in a blog post Tuesday.
In addition to SQL queries, the tool can work with varying types of data, including files, NoSQL databases and more complex types of data such as JSON and Parquet.
"Drill enables interactivity with data from both legacy transactional systems and new data sources, such as Internet of Things (IOT) sensors, Web click-streams and other semi-structured data, along with support for popular business intelligence (BI) and data visualization tools," MapR said in a news release. "Drill provides reliability and performance at Hadoop scale with integrated granular security and governance capabilities required for multi-tenant data lakes or enterprise data hubs."
Upcoming features planned for future editions of Drill include more functionality centered on JSON, SQL, complex data functions and new file formats, Rentachintala said.
Posted by David Ramel on 05/22/2015 at 5:44 AM0 comments
Syncfusion Inc. has updated its Big Data Platform, unique for its claim to be "the one and only Hadoop distribution designed and optimized for Windows" and free for even commercial use.
"We have fine-tuned the entire Big Data Platform experience, from the download to the end result," said exec Daniel Jebaraj in a statement.
A key update to the platform lets developers handle multiple-node Hadoop clusters on Windows. With a point-and-click cluster management tool, developers can create, monitor and otherwise manage multiple-node jobs running in C#, Java, Pig, Hive, Python and Scala.
Syncfusion says developers can create clusters using commodity machines that run Windows 7, Windows Server 2008 and later Windows versions in just minutes.
"It is still completely free, and typically installs in less than 15 minutes (for a starter cluster) with absolutely no manual configuration," Jebaraj said. "Developers can start with either a 5- or 10-node cluster, and scale as they need in order to grow their business. Between the platform's updates, capabilities, and support options, developers will be able to take their work further than ever."
The company also listed the following features of the new platform:
- Free commercial support for clusters with up to five nodes.
- Optional paid support with service level agreements for larger clusters.
- Unlimited personal commercial support for Syncfusion Plus members.
- A set of C# samples demonstrating use under different scenarios.
- A unique, local, single-node distribution of Hadoop, complete with an interactive development environment and no dependencies outside the Microsoft .NET Framework, facilitating the development and testing of solutions prior to deployment.
In addition to on-premises installations, the company said users can run their own Hadoop clusters on virtual machines supplied by cloud service providers such as Microsoft Azure and Amazon Web Services (AWS), with customization functionality not found in other cloud-based Hadoop services.
The new platform is available now for download.
Posted by David Ramel on 05/20/2015 at 12:35 PM0 comments
Native support for JSON in the upcoming SQL Server 2016 was buried among the many goodies announced earlier this month for the flagship RDBMS, but it's clearly an important feature for data developers, who this week got more details on the new functionality.
The item titled "Add native support for JSON to SQL Server, a la XML (as in, FOR JSON or FROM OPENJSON)" is the No. 1 requested feature on the Connect site used to garner feature requests for users of SQL Server and Windows Azure SQL Database.
With more than 1,000 votes and leading other items by more than a 140 votes, the item posted more than four years ago reads:
It may have taken a while, but Microsoft -- as it's increasingly doing on many fronts these days -- has listened and responded to its customers. Microsoft's Jovan Popovic wrote a recent blog post with extensive details about the new support.
For one thing, "native support" doesn't the mean same thing as introducing a new native JSON type, as was done for XML.
And, instead of getting its own type like XML, it will be represented by the existing NVARCHAR type, used for representing variable-length strings. Popovic said Microsoft studied the issue and decided to go the NVARCHAR type for many reasons concerning issues such as migration, cross-feature compatibility and client-side support.
"Our goal is to create simpler but still useful framework for processing JSON documents," Popovic said.
Thus the company is focusing on functionality -- such as export/import and prebuilt functions for JSON processing -- and query optimization, rather than storage.
Not to say that this strategy is set in stone. As I said, Microsoft is really listening to customers these days, and things could change.
"We know that PostgreSQL has a native type and JSONB support, but in this version we want to focus on the other things that are more important (do you want to see SQL Server with native type but without built-in functions that handle JSON -- I don’t think so :) )," Popovic said. "However, we are open for suggestions and if you believe the native type will help, you can create request on connect site so we can discuss it there."
(Some readers didn't even wait to open a Connect item. "This seems like a massively crippled feature and until there is native support I don't believe it offers anything even remotely similar to PostgreSQL," a reader named Phillip wrote in a blog comment.)
Anyway, Popovic explains the nitty-gritty details on the company's initial focus for JSON functionality, detailing how to export JSON with the FOR JSON clause (as was suggested in the original Connect request), how to transform JSON text to relational tables with the OPENJSON function and so on.
"Someone might say -- this will not be fast enough, but we will see," Popovic said. "Built-in JSON parser is the fastest way to process JSON in database layer. You might use CLR type or CLR parsers as external assemblies, but this will not be better than the native code that parses JSON."
Popovic said the JSON functionality will be rolled out over time in the SQL Server 2016 previews. SQL Server 2016 CTP2 is planned to include the ability to format and export data as JSON string, while SQL server 2016 CTP3 is expected to incorporate the ability to load JSON text in tables, extract values from JSON text, index properties in JSON text stored in columns, and more, he said.
The SQL Server team will be publishing more details about the huge new release of SQL Server 2016 as the days count down to the first public preview
, expected this summer.
If you can't wait, those wacky wonks on Hacker News managed to stay somewhat on-topic in a discussion about the new JSON support, and noted database expert Aaron Bertrand goes into really extensive detail in a blog post on the SQL Sentry site.
Posted by David Ramel on 05/19/2015 at 10:33 AM0 comments
Fresh from last week's Build developer conference in San Francisco, Microsoft executives appeared at the company's first-ever Ignite conference in Chicago and provided more details about the company's new Azure SQL Data Warehouse.
The company yesterday demoed the first "sneak peek" at the new "elastic data warehouse in the cloud," and today exec Tiffany Wissner penned a blog post to highlight specific functionalities.
Wissner explained how Azure SQL Data Warehouse expands upon the company's flagship relational database management system (RDBMS), SQL Server.
"Azure SQL Data Warehouse is a combination of enterprise-grade SQL Server augmented with the massively parallel processing architecture of the Analytics Platform System (APS), which allows the SQL Data Warehouse service to scale across very large datasets," Wissner said. "It integrates with existing Azure data tools including Power BI for data visualization, Azure Machine Learning for advanced analytics, Azure Data Factory for data orchestration and movement as well as Azure HDInsight, our 100 percent Apache Hadoop service for Big Data processing."
Just over a year ago, Microsoft introduced the APS as a physical appliance wedding its SQL Server Parallel Data Warehouse (PDW) with HDInsight. APS is described by Microsoft as "the evolution of the PDW product that now supports the ability to query across the traditional relational data warehouse and data stored in a Hadoop region -- either in the appliance or in a separate Hadoop cluster."
Wissner touted the pervasiveness of SQL Server as a selling point of the new solution, as enterprises can leverage developer skills and knowledge acquired from its years of everyday use.
"The SQL Data Warehouse extends the T-SQL constructs you're already familiar with to create indexes, partitions, functions and procedures which allows you to easily migrate to the cloud," Wissner said. "With native integrations to Azure Data Factory, Azure Machine Learning and Power BI, customers are able to quickly ingest data, utilize learning algorithms, and visualize data born either in the cloud or on-premises."
PolyBase in the cloud is another attractive feature, Wissner said. PolyBase was introduced with PDW in 2013 to integrate data stored in the Hadoop Distributed File System (HDFS) with SQL Server, one of many emerging SQL-on-Hadoop solutions.
"SQL Data Warehouse can query unstructured and semi-structured data stored in Azure Storage, Hortonworks Data Platform, or Cloudera using familiar T-SQL skills making it easy to combine data sets no matter where it is stored," Wissner said. "Other vendors follow the traditional data warehouse model that requires data to be moved into the instance to be accessible."
Although Wissner didn't identify any of those "other vendors," Microsoft took pains to position Azure SQL Data Warehouse as an improvement upon the Redshift cloud database offered by Amazon Web Services Inc. (AWS), which Microsoft is challenging for public cloud supremacy.
One of the advantages Microsoft sees in its product over Redshift is the ability to save costs by pausing cloud compute instances. This was mentioned at Build and echoed today by Wissner.
"Dynamic pause enables customers to optimize the utilization of the compute infrastructure by ramping down compute while persisting the data and eliminating the need to backup and restore," Wissner said. "With other cloud vendors, customers are required to back up the data, delete the existing cluster, and, upon resume, generate a new cluster and restore data. This is both time consuming and complex for scenarios such as data marts or departmental data warehouses that need variable compute power."
Again parroting the Build message, Wissner also discussed Azure SQL Data Warehouse's ability to separate compute and storage services, scaling them independently up and down immediately as needed.
"With SQL Data Warehouse you are able to quickly move to the cloud without having to move all of your infrastructure along with it," Wissner concluded. "With the Analytics Platform System, Microsoft Azure and Azure SQL Data Warehouse, you can have the data warehouse solution you need on-premises, in the cloud or a hybrid solution."
Users interested in trying out the new offering, expected to hit general availability this summer, can sign up to be notified when that happens.
Posted by David Ramel on 05/05/2015 at 12:04 PM0 comments
Forget all that techy Azure Big Data stuff -- Microsoft found a new way to put databases to work that's really interesting: guessing your age from your photo.
Threatening to upstage all the groundbreaking announcements at the Build conference is a Web site where you provide a photo and Microsoft's magical machinery consults a database of face photos to guess the age of the subjects.
Tell me you didn't (or won't) visit How-Old.net (How Old Do I Look?) and provide your own photo, hoping the Azure API would say you look 10 years younger than you are?
I certainly did. But it couldn't find my face (I was wearing a bicycle helmet in semi-profile), and then I had to get back to work. But you can bet I'll be back. So will you, right?
(Unless you're one of those fine-print privacy nuts.)
Why couldn't Ballmer come up with stuff like this? Could there be a better example of how this isn't your father's Microsoft anymore?
Microsoft machine learning (ML) engineers Corom Thompson and Santosh Balasubramanian explained in a Wednesday blog post how they were fooling around with the company's new face-recognition APIs. They sent out a bunch of e-mails to garner perhaps 50 testers.
"We were shocked," they said. "Within a few hours, over 35,000 users had hit the page from all over the world (about 29k of them from Turkey, as it turned out -- apparently there were a bunch of tweets from Turkey mentioning this page). What a great example of people having fun thanks to the power of ML!"
They said it took just a day to wire the solution up, listing the following components:
- Extracting the gender and age of the people in the pictures.
- Obtaining real-time insights on the data extracted.
- Creating real-time dashboards to view the results.
Their blog post gives all the details about the tools used and their implementation, complete with code samples. Go read it if you're interested.
Me? It's Friday afternoon and the boss is 3,000 miles away -- I'm finding a better photo of myself and going back to How-Old.net. I'm sure I don't look a day over 29.
In fact, I'll do it now. Hold on.
OK, it says I look seven years older than I am. I won't even give you the number. Stupid damn site, anyway ...
Posted by David Ramel on 05/01/2015 at 1:00 PM0 comments
A new Azure SQL Data Warehouse preview offered as a counter to Amazon's Redshift headed several data-related announcements at the opening of the Microsoft Build conference today.
Also being announced were Azure Data Lake and "elastic databases" for Azure SQL Database, further demonstrating the company's focus on helping customers implement and support a "data culture" in which analytics are used for everyday business decisions.
"The data announcements are interesting because they show an evolution of the SQL Server technology towards a cloud-first approach," IDC analyst Al Hilwa told this site. "A lot of these capabilities like elastic query are geared for cloud approaches, but Microsoft will differentiate from Amazon by also offering them for on-premises deployment. Other capabilities like Data Lake, elastic databases and Data Warehouse are focused on larger data sets that are typically born in the cloud. The volumes of data supported here builds on Microsoft's persistent investments in datacenters."
Azure SQL Data Warehouse will be available as a preview in June, Microsoft announced during the Build opening keynote. It was designed to provide petabyte-scale data warehousing as a service that can elastically scale to suit business needs. In comparison, the Amazon Web Services Inc. (AWS) Redshift -- unveiled more than two years ago -- is described as "a fast, fully managed, petabyte-scale data warehouse solution that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools."
Microsoft pointed out what it said are numerous advantages that Azure SQL Data Warehouse provides over AWS Redshift, such as the ability to independently adjust compute and storage, as opposed to Redshift's fixed compute/storage ratio. Concerning elasticity, Microsoft described its new service as "the industry’s first enterprise-class cloud data warehouse as a service that can grow, shrink and pause in seconds," while it could take hours or days to resize a Redshift service. Azure SQL Data Warehouse also comes with a hybrid configuration option for hosting in the Azure cloud or on-premises -- as opposed to cloud-only for Redshift -- and offers pause/resume functionality and compatibility with true SQL queries, the company said. Redshift has no support for indexes, SQL UDFs, stored procedures or constraints, Microsoft said.
Enterprises can use the new offering in conjunction with other Microsoft data tools such as PowerBI, Azure Machine Learning, Azure HDInsight and Azure Data Factory.
Speaking of other data offerings, the Azure Data Lake repository for Big Data analytics project workloads provides one system for storing structured or unstructured data in native formats. It follows the trend -- disparaged by some analysts -- pioneered by companies such as Pivotal Software Inc. and its Business Data Lake. It can work with the Hadoop Distributed File System (HDFS) so it can be integrated with a range of other tools in the Hadoop/Big Data ecosystem, including Cloudera and Hortonworks Hadoop distributions and Microsoft's own Azure HDInsight and Azure Machine Learning.
For straight SQL-based analytics, Microsoft introduced the concept of elastic databases for Azure SQL Database, its cloud-based SQL Database-as-a-Service (DBaaS) offering. Azure SQL Database elastic databases reportedly provide one pool to help enterprises manage multiple databases and provision services as needed.
The elastic database pools let enterprises pay for all database usage at once and facilitate the running of centralized queries and reports across all data stores. The elastic databases support full-text search, column-level access rights and instant encryption of data. They "allow ISVs and software-as-a-service developers to pool capacity across thousands of databases, enabling them to benefit from efficient resource consumption and the best price and performance in the public cloud," Microsoft said in a news release.
Posted by David Ramel on 04/29/2015 at 12:26 PM0 comments