News

Azure Data Lake Store, Services and U-SQL Make Debut

Microsoft fills in the gaps on its Big Data offering with previews of its Azure Data Lake and Language Services, which includes the brand-new U-SQL language for crunching the incoming explosion of data.

Microsoft announced updates to key technologies in its Big Data portfolio, as well as new offerings -- specifically, Azure Data Lake and Language services -- for developers to wrap their code around. Scott Guthrie, executive vice president of the Microsoft Cloud and Enterprise group, announced them on his blog the day before the company's online AzureCon online conference taking place this week.

"Azure Data Lake Store solves the big data challenges of volume, variety, and velocity by enabling you to store data of any type, at any size, and process it at any scale," writes Guthrie. The gist of his announcement calls out the proliferation of data sources, including the immense real-time data crunching involved with Internet of Things, which the new hyper-scale HDFS repository was built to gobble up and manage.

At the heart of the Azure Data Lake Store, which is currently in preview mode, is the Azure Data Lake Analytics service that is used to manage and crunch data in ADLS repositories right from within the Azure Management portal. And the new jewel in the rough is U-SQL, which appears to be a SQL language superset that adds some flexibility for dealing with data at scale.

"We designed U-SQL from the ground-up as an evolution of the declarative SQL language with native extensibility through user code written in C#," said Michael Rys, Principal Program Manager, Microsoft Big Data, in a separate blog that offers up more detail on the language. "This unifies both paradigms, unifies structured, unstructured, and remote data processing, unifies the declarative and custom imperative coding experience, and unifies the experience around extending your language capabilities."

Rys writes that U-SQL is based on his group's experiences working internally with SCOPE (a declarative and extensible scripting language for analyzing large massive data sets and actually borrows from SQL in the way it lays out the data prior to analysis) as well as the T-SQL, ANSI-SQL and Hive languages, and can be extended directly through C# code. Visual Studio developers will be able to tap right into U-SQL through the Aure Management portal with new tools to come that will allow for debugging and optimizaton of U-SQL jobs natively.

Some of the commenters to Rys's blog question U-SQL's ability to scale, citing the lack of examples. "How well does [U-SQL} scale? Say you wanted to run an algorithm on 100 TB of data, or a PB of data? Is that a use case this is designed for? Or, say, if you need a teraflop of computing power on an algorithm on 10 TB of data, is that a possibility?" asks Xsistor.

"Assuming you are adding enough processing capacity, U-SQL can scale over very large files. There will be a demo by us later this week that shows a bit more data than my simple example above (that was just highlighting the language functionality)," responded Rys.

Guthrie also announced that its HDInsight service, another component of the Azure Data Lake Services toolset that's generally available now, has support for Ubuntu Linux. In the Data + Analytics section of the Azure Management portal is an Ubuntu option, which when selected allows for creation of Linux-based HDInsight clusters. Besides Hadoop, it also supports "clusters pre-configured for workloads like Storm, Spark, HBase, etc.," writes Guthrie. Hadoop clusters can then be managed via Apache Ambari, and those clusters can be accessorized with Hadoop components or other apps beyond the default ones installed under HDInsight.

More details will be revealed during this week's AzureCon.

About the Author

You Tell 'Em, Readers: If you've read this far, know that Michael Domingo, Visual Studio Magazine Editor in Chief, is here to serve you, dear readers, and wants to get you the information you so richly deserve. What news, content, topics, issues do you want to see covered in Visual Studio Magazine? He's listening at [email protected].

comments powered by Disqus

Featured

  • Microsoft Revamps Fledgling AutoGen Framework for Agentic AI

    Only at v0.4, Microsoft's AutoGen framework for agentic AI -- the hottest new trend in AI development -- has already undergone a complete revamp, going to an asynchronous, event-driven architecture.

  • IDE Irony: Coding Errors Cause 'Critical' Vulnerability in Visual Studio

    In a larger-than-normal Patch Tuesday, Microsoft warned of a "critical" vulnerability in Visual Studio that should be fixed immediately if automatic patching isn't enabled, ironically caused by coding errors.

  • Building Blazor Applications

    A trio of Blazor experts will conduct a full-day workshop for devs to learn everything about the tech a a March developer conference in Las Vegas keynoted by Microsoft execs and featuring many Microsoft devs.

  • Gradient Boosting Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the gradient boosting regression technique, where the goal is to predict a single numeric value. Compared to existing library implementations of gradient boosting regression, a from-scratch implementation allows much easier customization and integration with other .NET systems.

  • Microsoft Execs to Tackle AI and Cloud in Dev Conference Keynotes

    AI unsurprisingly is all over keynotes that Microsoft execs will helm to kick off the Visual Studio Live! developer conference in Las Vegas, March 10-14, which the company described as "a must-attend event."

Subscribe on YouTube