MapR Bolsters Hadoop Distro with Apache Drill for SQL-Based Big Data Analytics

After stewarding the open source project from incubation to its new 1.0 release, MapR Technologies Inc. added Apache Drill for SQL-based Big Data analytics to its Apache Hadoop distribution.

The company -- one of the "big three" Hadoop vendors along with Hortonworks Inc. and Coudera Inc. -- this week announced the general availability of the open source Apache Drill 1.0 and its inclusion in the MapR Hadoop distribution.

Drill is a low-latency query engine based on ANSI SQL standards that facilitates self-service, interactive analytics at Big Data scales, including up to petabyte scale (1 PB is equal to 1 million GB). One of its key features is that it doesn't depend on traditional database schemas that describe how data is categorized. Discovering such schemas on the fly makes for quicker analytics, the company said.

MapR engineers including Jacques Nadeau and Steven Phillips have taken the lead on the open source project, which was incubated at the Apache Sofwtare Foundation (ASF) in September 2012 with the goal of wedding the familiar workings of relational databases with the huge new scalability demanded by the Big Data era and the agility of Hadoop systems and their heavy use of NoSQL databases.

"The project has been on the fast track in the last nine months since the developer preview in August 2014, delivering seven significant iterative releases, each adding exciting new features and most importantly, improving on the stability, scale and performance required for broader enterprise deployments," MapR exec Neeraja Rentachintala said in a blog post Tuesday.

The Apache Drill Project Timeline
[Click on image for larger view.] The Apache Drill Project Timeline (source: MapR Technologies Inc.)

In addition to SQL queries, the tool can work with varying types of data, including files, NoSQL databases and more complex types of data such as JSON and Parquet.

"Drill enables interactivity with data from both legacy transactional systems and new data sources, such as Internet of Things (IOT) sensors, Web click-streams and other semi-structured data, along with support for popular business intelligence (BI) and data visualization tools," MapR said in a news release. "Drill provides reliability and performance at Hadoop scale with integrated granular security and governance capabilities required for multi-tenant data lakes or enterprise data hubs."

Upcoming features planned for future editions of Drill include more functionality centered on JSON, SQL, complex data functions and new file formats, Rentachintala said.

Posted by David Ramel on 05/22/2015 at 5:44 AM0 comments


Syncfusion Updates Windows-Centric Big Data Platform

Syncfusion Inc. has updated its Big Data Platform, unique for its claim to be "the one and only Hadoop distribution designed and optimized for Windows" and free for even commercial use.

"We have fine-tuned the entire Big Data Platform experience, from the download to the end result," said exec Daniel Jebaraj in a statement.

A key update to the platform lets developers handle multiple-node Hadoop clusters on Windows. With a point-and-click cluster management tool, developers can create, monitor and otherwise manage multiple-node jobs running in C#, Java, Pig, Hive, Python and Scala.

Syncfusion says developers can create clusters using commodity machines that run Windows 7, Windows Server 2008 and later Windows versions in just minutes.

"It is still completely free, and typically installs in less than 15 minutes (for a starter cluster) with absolutely no manual configuration," Jebaraj said. "Developers can start with either a 5- or 10-node cluster, and scale as they need in order to grow their business. Between the platform's updates, capabilities, and support options, developers will be able to take their work further than ever."

The company also listed the following features of the new platform:

  • Free commercial support for clusters with up to five nodes.
  • Optional paid support with service level agreements for larger clusters.
  • Unlimited personal commercial support for Syncfusion Plus members.
  • A set of C# samples demonstrating use under different scenarios.
  • A unique, local, single-node distribution of Hadoop, complete with an interactive development environment and no dependencies outside the Microsoft .NET Framework, facilitating the development and testing of solutions prior to deployment.

In addition to on-premises installations, the company said users can run their own Hadoop clusters on virtual machines supplied by cloud service providers such as Microsoft Azure and Amazon Web Services (AWS), with customization functionality not found in other cloud-based Hadoop services.

The new platform is available now for download.

Posted by David Ramel on 05/20/2015 at 12:35 PM0 comments


How the New JSON Support Will Work in SQL Server 2016

Native support for JSON in the upcoming SQL Server 2016 was buried among the many goodies announced earlier this month for the flagship RDBMS, but it's clearly an important feature for data developers, who this week got more details on the new functionality.

The item titled "Add native support for JSON to SQL Server, a la XML (as in, FOR JSON or FROM OPENJSON)" is the No. 1 requested feature on the Connect site used to garner feature requests for users of SQL Server and Windows Azure SQL Database.

With more than 1,000 votes and leading other items by more than a 140 votes, the item posted more than four years ago reads:

While JSON import and export is possible in SQL Server using horribly complex T-SQL code or CLR integration using the JavaScript JSON methods, such methods are system-resource intensive. It would be nice if MS could integrate JSON into SQL Server the same ways they do XML: say, a FOR JSON clause, an OPENJSON statement, etc.

It may have taken a while, but Microsoft -- as it's increasingly doing on many fronts these days -- has listened and responded to its customers. Microsoft's Jovan Popovic wrote a recent blog post with extensive details about the new support.

For one thing, "native support" doesn't the mean same thing as introducing a new native JSON type, as was done for XML.

As a refresher, JSON -- standing for JavaScript Object Notation -- uses text name/value (or attribute/value) pairs to represent data, serving as a data transmission technology that's easier to read and less complicated than XML.

And, instead of getting its own type like XML, it will be represented by the existing NVARCHAR type, used for representing variable-length strings. Popovic said Microsoft studied the issue and decided to go the NVARCHAR type for many reasons concerning issues such as migration, cross-feature compatibility and client-side support.

"Our goal is to create simpler but still useful framework for processing JSON documents," Popovic said.

Thus the company is focusing on functionality -- such as export/import and prebuilt functions for JSON processing -- and query optimization, rather than storage.

SQL Server 2016 : JSON Support
[Click on image for larger view.] SQL Server 2016: JSON Support (source: SQL Sentry Inc.)

Not to say that this strategy is set in stone. As I said, Microsoft is really listening to customers these days, and things could change.

"We know that PostgreSQL has a native type and JSONB support, but in this version we want to focus on the other things that are more important (do you want to see SQL Server with native type but without built-in functions that handle JSON -- I don’t think so :) )," Popovic said. "However, we are open for suggestions and if you believe the native type will help, you can create request on connect site so we can discuss it there."

(Some readers didn't even wait to open a Connect item. "This seems like a massively crippled feature and until there is native support I don't believe it offers anything even remotely similar to PostgreSQL," a reader named Phillip wrote in a blog comment.)

Anyway, Popovic explains the nitty-gritty details on the company's initial focus for JSON functionality, detailing how to export JSON with the FOR JSON clause (as was suggested in the original Connect request), how to transform JSON text to relational tables with the OPENJSON function and so on.

"Someone might say -- this will not be fast enough, but we will see," Popovic said. "Built-in JSON parser is the fastest way to process JSON in database layer. You might use CLR type or CLR parsers as external assemblies, but this will not be better than the native code that parses JSON."

Popovic said the JSON functionality will be rolled out over time in the SQL Server 2016 previews. SQL Server 2016 CTP2 is planned to include the ability to format and export data as JSON string, while SQL server 2016 CTP3 is expected to incorporate the ability to load JSON text in tables, extract values from JSON text, index properties in JSON text stored in columns, and more, he said.

The SQL Server team will be publishing more details about the huge new release of SQL Server 2016 as the days count down to the first public preview, expected this summer.

If you can't wait, those wacky wonks on Hacker News managed to stay somewhat on-topic in a discussion about the new JSON support, and noted database expert Aaron Bertrand goes into really extensive detail in a blog post on the SQL Sentry site.

Posted by David Ramel on 05/19/2015 at 10:33 AM0 comments


Microsoft Provides Sneak Peek at Azure SQL Data Warehouse

Fresh from last week's Build developer conference in San Francisco, Microsoft executives appeared at the company's first-ever Ignite conference in Chicago and provided more details about the company's new Azure SQL Data Warehouse.

The company yesterday demoed the first "sneak peek" at the new "elastic data warehouse in the cloud," and today exec Tiffany Wissner penned a blog post to highlight specific functionalities.

Wissner explained how Azure SQL Data Warehouse expands upon the company's flagship relational database management system (RDBMS), SQL Server.

"Azure SQL Data Warehouse is a combination of enterprise-grade SQL Server augmented with the massively parallel processing architecture of the Analytics Platform System (APS), which allows the SQL Data Warehouse service to scale across very large datasets," Wissner said. "It integrates with existing Azure data tools including Power BI for data visualization, Azure Machine Learning for advanced analytics, Azure Data Factory for data orchestration and movement as well as Azure HDInsight, our 100 percent Apache Hadoop service for Big Data processing."

Working with Azure SQL Data Warehouse
[Click on image for larger view.] Working with Azure SQL Data Warehouse (source: Microsoft)

Just over a year ago, Microsoft introduced the APS as a physical appliance wedding its SQL Server Parallel Data Warehouse (PDW) with HDInsight. APS is described by Microsoft as "the evolution of the PDW product that now supports the ability to query across the traditional relational data warehouse and data stored in a Hadoop region -- either in the appliance or in a separate Hadoop cluster."

Wissner touted the pervasiveness of SQL Server as a selling point of the new solution, as enterprises can leverage developer skills and knowledge acquired from its years of everyday use.

"The SQL Data Warehouse extends the T-SQL constructs you're already familiar with to create indexes, partitions, functions and procedures which allows you to easily migrate to the cloud," Wissner said. "With native integrations to Azure Data Factory, Azure Machine Learning and Power BI, customers are able to quickly ingest data, utilize learning algorithms, and visualize data born either in the cloud or on-premises."

PolyBase in the cloud is another attractive feature, Wissner said. PolyBase was introduced with PDW in 2013 to integrate data stored in the Hadoop Distributed File System (HDFS) with SQL Server, one of many emerging SQL-on-Hadoop solutions.

"SQL Data Warehouse can query unstructured and semi-structured data stored in Azure Storage, Hortonworks Data Platform, or Cloudera using familiar T-SQL skills making it easy to combine data sets no matter where it is stored," Wissner said. "Other vendors follow the traditional data warehouse model that requires data to be moved into the instance to be accessible."

Although Wissner didn't identify any of those "other vendors," Microsoft took pains to position Azure SQL Data Warehouse as an improvement upon the Redshift cloud database offered by Amazon Web Services Inc. (AWS), which Microsoft is challenging for public cloud supremacy.

One of the advantages Microsoft sees in its product over Redshift is the ability to save costs by pausing cloud compute instances. This was mentioned at Build and echoed today by Wissner.

"Dynamic pause enables customers to optimize the utilization of the compute infrastructure by ramping down compute while persisting the data and eliminating the need to backup and restore," Wissner said. "With other cloud vendors, customers are required to back up the data, delete the existing cluster, and, upon resume, generate a new cluster and restore data. This is both time consuming and complex for scenarios such as data marts or departmental data warehouses that need variable compute power."

Again parroting the Build message, Wissner also discussed Azure SQL Data Warehouse's ability to separate compute and storage services, scaling them independently up and down immediately as needed.

"With SQL Data Warehouse you are able to quickly move to the cloud without having to move all of your infrastructure along with it," Wissner concluded. "With the Analytics Platform System, Microsoft Azure and Azure SQL Data Warehouse, you can have the data warehouse solution you need on-premises, in the cloud or a hybrid solution."

Users interested in trying out the new offering, expected to hit general availability this summer, can sign up to be notified when that happens.

Posted by David Ramel on 05/05/2015 at 12:04 PM0 comments


Microsoft Finds New Use for Database: Guessing Your Age

Forget all that techy Azure Big Data stuff -- Microsoft found a new way to put databases to work that's really interesting: guessing your age from your photo.

Threatening to upstage all the groundbreaking announcements at the Build conference is a Web site where you provide a photo and Microsoft's magical machinery consults a database of face photos to guess the age of the subjects.

Tell me you didn't (or won't) visit How-Old.net (How Old Do I Look?) and provide your own photo, hoping the Azure API would say you look 10 years younger than you are?

I certainly did. But it couldn't find my face (I was wearing a bicycle helmet in semi-profile), and then I had to get back to work. But you can bet I'll be back. So will you, right? (Unless you're one of those fine-print privacy nuts.)

Why couldn't Ballmer come up with stuff like this? Could there be a better example of how this isn't your father's Microsoft anymore?

Microsoft machine learning (ML) engineers Corom Thompson and Santosh Balasubramanian explained in a Wednesday blog post how they were fooling around with the company's new face-recognition APIs. They sent out a bunch of e-mails to garner perhaps 50 testers.

How Old Do I Look?
[Click on image for larger view.] How Old Do I Look? (source: Microsoft)

"We were shocked," they said. "Within a few hours, over 35,000 users had hit the page from all over the world (about 29k of them from Turkey, as it turned out -- apparently there were a bunch of tweets from Turkey mentioning this page). What a great example of people having fun thanks to the power of ML!"

They said it took just a day to wire the solution up, listing the following components:

  • Extracting the gender and age of the people in the pictures.
  • Obtaining real-time insights on the data extracted.
  • Creating real-time dashboards to view the results.

Their blog post gives all the details about the tools used and their implementation, complete with code samples. Go read it if you're interested.

Me? It's Friday afternoon and the boss is 3,000 miles away -- I'm finding a better photo of myself and going back to How-Old.net. I'm sure I don't look a day over 29.

In fact, I'll do it now. Hold on.

OK, it says I look seven years older than I am. I won't even give you the number. Stupid damn site, anyway ...

Posted by David Ramel on 05/01/2015 at 1:00 PM0 comments


Azure SQL Data Warehouse Leads New Microsoft Data Products

A new Azure SQL Data Warehouse preview offered as a counter to Amazon's Redshift headed several data-related announcements at the opening of the Microsoft Build conference today.

Also being announced were Azure Data Lake and "elastic databases" for Azure SQL Database, further demonstrating the company's focus on helping customers implement and support a "data culture" in which analytics are used for everyday business decisions.

"The data announcements are interesting because they show an evolution of the SQL Server technology towards a cloud-first approach," IDC analyst Al Hilwa told this site. "A lot of these capabilities like elastic query are geared for cloud approaches, but Microsoft will differentiate from Amazon by also offering them for on-premises deployment. Other capabilities like Data Lake, elastic databases and Data Warehouse are focused on larger data sets that are typically born in the cloud. The volumes of data supported here builds on Microsoft's persistent investments in datacenters."

Azure SQL Data Warehouse will be available as a preview in June, Microsoft announced during the Build opening keynote. It was designed to provide petabyte-scale data warehousing as a service that can elastically scale to suit business needs. In comparison, the Amazon Web Services Inc. (AWS) Redshift -- unveiled more than two years ago -- is described as "a fast, fully managed, petabyte-scale data warehouse solution that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools."

Azure SQL Data Warehouse
[Click on image for larger view.] Azure SQL Data Warehouse (source: Microsoft)

Microsoft pointed out what it said are numerous advantages that Azure SQL Data Warehouse provides over AWS Redshift, such as the ability to independently adjust compute and storage, as opposed to Redshift's fixed compute/storage ratio. Concerning elasticity, Microsoft described its new service as "the industry’s first enterprise-class cloud data warehouse as a service that can grow, shrink and pause in seconds," while it could take hours or days to resize a Redshift service. Azure SQL Data Warehouse also comes with a hybrid configuration option for hosting in the Azure cloud or on-premises -- as opposed to cloud-only for Redshift -- and offers pause/resume functionality and compatibility with true SQL queries, the company said. Redshift has no support for indexes, SQL UDFs, stored procedures or constraints, Microsoft said.

Enterprises can use the new offering in conjunction with other Microsoft data tools such as PowerBI, Azure Machine Learning, Azure HDInsight and Azure Data Factory.

Speaking of other data offerings, the Azure Data Lake repository for Big Data analytics project workloads provides one system for storing structured or unstructured data in native formats. It follows the trend -- disparaged by some analysts -- pioneered by companies such as Pivotal Software Inc. and its Business Data Lake. It can work with the Hadoop Distributed File System (HDFS) so it can be integrated with a range of other tools in the Hadoop/Big Data ecosystem, including Cloudera and Hortonworks Hadoop distributions and Microsoft's own Azure HDInsight and Azure Machine Learning.

For straight SQL-based analytics, Microsoft introduced the concept of elastic databases for Azure SQL Database, its cloud-based SQL Database-as-a-Service (DBaaS) offering. Azure SQL Database elastic databases reportedly provide one pool to help enterprises manage multiple databases and provision services as needed.

The elastic database pools let enterprises pay for all database usage at once and facilitate the running of centralized queries and reports across all data stores. The elastic databases support full-text search, column-level access rights and instant encryption of data. They "allow ISVs and software-as-a-service developers to pool capacity across thousands of databases, enabling them to benefit from efficient resource consumption and the best price and performance in the public cloud," Microsoft said in a news release.

Posted by David Ramel on 04/29/2015 at 12:26 PM0 comments


Survey: SQL Server and Oracle Holding Off Unstructured Big Data

Despite all the publicity around Big Data and Apache Hadoop, a new database deployment survey indicates traditional, structured relational database management systems (RDBMS) still reign among enterprises, with SQL Server dueling Oracle for the overall lead.

Also, the new survey commissioned by Dell Software shows the use of traditional structured data is growing even faster than unstructured data, so the findings aren't just an example of a larger installed user base being eroded by upstart technologies.

"Although the growth of unstructured data has garnered most of the attention, Dell's survey shows structured data growing at an even faster rate," the company said in a statement. "While more than one-third of respondents indicated that structured data is growing at a rate of 25 percent or more annually, fewer than 30 percent of respondents said the same about their unstructured data."

Dell commissioned Unisphere Research to poll some 300 database administrators and other corporate data staffers in a report titled "The Real World of the Database Administrator."

What Database Brands Do You Have Running in Your Organization?
[Click on image for larger view.] What Database Brands Do You Have Running in Your Organization? (source: Dell Software)

"Although advancements in the ability to capture, store, retrieve and analyze new forms of unstructured data have garnered significant attention, the Dell survey indicates that most organizations continue to focus primarily on managing structured data, and will do so for the foreseeable future," the company said.

In fact, Dell said, more than two-thirds of enterprises reported that structured data constitutes 75 percent of the data being managed, while almost one-third of organizations reported not managing unstructured data at all -- yet.

There are many indications that organizations will widely adopt the new technologies, Dell said, as they need to support new analytical use cases.

But for the present, some 78 percent of respondents reported they were running mission-critical data on the Oracle RDBMS, closely followed by Microsoft SQL Server at about 72 percent. After MySQL and IBM DB2, the first NoSQL database to crack the most-used list is MongoDB.

Also, the survey stated, "Clearly, traditional RDBMSs shoulder the lion's share of data management in most organizations. And since more than 85 percent of the respondents are running Microsoft SQL Server and about 80 percent use Oracle, the evidence is clear that most companies support two or more DBMS brands.

Looking to the future, Dell highlighted two specific indicators of the growing dependence on NoSQL databases, especially in large organizations:

  • Approximately 70 percent of respondents using MongoDB are running more than 100 databases, 30 percent are running more than 500 databases, and nearly 60 percent work for companies with more than 5,000 employees.
  • Similarly, 60 percent of respondents currently using Hadoop are running more than 100 databases, 45 percent are running more than 500 databases, and approximately two-thirds work for companies with more than 1,000 employees.

But despite the big hype and big plans on the part of big companies, the survey indicated that Big Data isn't making quite as big an impact on enterprises as might be expected.

Instead, even newer upstart technologies are seen as being more disruptive.

"Most enterprises believe that more familiar 'new' technologies such as virtualization and cloud computing will have more impact on their organization over the next several years than 'newer' emerging technologies such as Hadoop," the survey states. "In fact, Hadoop and NoSQL do not factor into many companies' plans over the next few years."

Posted by David Ramel on 04/21/2015 at 9:07 AM0 comments


SQL Is Popular, Pays Well, Big New Survey Reveals

SQL skills pay well and the technology is among the most popular as indicated by a big new developer survey from Stack Overflow, which tracked everything from caffeine consumption to indentation preferences.

While Objective-C was reported as the most lucrative technology to learn -- garnering an average salary of $98,828 in the U.S. -- SQL wasn't far behind, coming in at No. 5 on the list with an average reported salary of $91,431 in the U.S.

In terms of popularity, SQL was at No. 2 in the rankings, listed by 48 percent of respondents, second only to the ubiqutous JavaScript, listed by 54.4 percent of respondents.

These results are inline with previous such surveys from a few years ago, showing SQL isn't losing much ground in the technology wars.

"These results are not unbiased," Stack Overflow warned about the new survey. "Like the results of any survey, they are skewed by selection bias, language bias, and probably a few other biases. So take this for what it is: the most comprehensive developer survey ever conducted. Or at least the only one that asks devs about tabs vs. spaces."

Stack Overflow, in case you didn't know, is the go-to place for coders to get help with their problems. The site reports some 32 million monthly visitors.

Using this unique status, the site polled 26,086 people from 157 countries in February, posing a list of 45 questions.

One of the key areas of inquiry concerned salaries, of course. The survey found that behind Objective-C, the most lucrative skills were Node.js., C#, C++ and SQL.

Who Makes What
[Click on image for larger view.] Who Makes What (source: Stack Overflow)

But if it's purchasing power you're interested in, Ukraine is tops -- at least according to the metric of how many Big Macs you can buy on your salary.

It also might help to work remotely, as coders who don't have to fight commute traffic earn about 40 percent more than those who never work from home.

On the technology front, Apple's young Swift language was the most loved, Salesforce the most dreaded, and Android the most wanted (devs who aren't developing with the tech but have indicated interest in doing so).

Most Popular Technologies
[Click on image for larger view.] Most Popular Technologies (source: Stack Overflow)

Interestingly, the popular Java programming language didn't make the top 10 list of most loved languages, or most dreaded, though it was in the middle of the pack for most wanted and came in at No. 3 in overall technology popularity.

Other survey highlights include:

  • JavaScript is the most popular technology, followed by SQL, Java, C# and PHP.
  • NotePad++ is the most popular text editor, followed by Sublime Text, Vim, Emacs and Atom.io.
  • 1,900 respondents reported being mobile developers, with 44.6 percent working with Android, 33.4 percent working with iOS and 2.3 percent working with Windows Phone (and 19.8 percent not identifying with any of those).
  • The biggest age group is 25-29, where 28.5 percent of respondents lie. Only 1.9 percent of respondents were over 50.
  • Only 5.8 percent of respondents reported themselves as being female.
  • Most developers (41.8 percent) reported being self-taught, with only 18.4 percent having earned a master's degree in computer science or a related field.
  • Most respondents identified themselves as full-stack developers (6,800). Two reported being farmers.
  • And, oh, by the way, Norwegian developers consume the most caffeinated beverages per day (3.09), and tabs were the more popular indentation technique, preferred by 45 percent of respondents. Spaces were popular with 33.6 percent of respondents.

    But there's more to that story.

    "Upon closer examination of the data, a trend emerges: Developers increasingly prefer spaces as they gain experience," the survey stated. "Stack Overflow reputation correlates with a preference for spaces, too: users who have 10,000 rep or more prefer spaces to tabs at a ratio of 3 to 1."

    I'm a spaces guy myself. How about you?

Posted by David Ramel on 04/14/2015 at 1:06 PM0 comments


Analyst: SQL Server Not Enough in Modern Data World

Forrester Research Inc. analyst Boris Evelson said existing approaches to business intelligence (BI) need updating in the modern world, and converging them with Big Data technologies requires more than traditional DBMS systems such as SQL Server can provide.

BI is alive and well in the age of Big Data and will continue to enjoy a thriving market, Evelson said, but the world is constantly changing and more innovation is needed.

"Some of the approaches in earlier-generation BI applications and platforms started to hit a ceiling a few years ago," Evelson said in a blog post today. "For example, SQL and SQL-based database management systems (DBMS), while mature, scalable and robust, are not agile and flexible enough in the modern world where change is the only constant."

But Big Data can provide those agile and flexible alternatives in a convergence with BI. "In order to address some of the limitations of more traditional and established BI technologies, Big Data offers more agile and flexible alternatives to democratize all data, such as NoSQL, among many others," the analyst said.

While the emergence of NoSQL data stores as a necessary replacement for traditional DBMS in some Big Data scenarios is well-known, the research firm's suggested solution is somewhat less obvious.

Forrester proposes using a flexible hub-and-spoke data platform to meld the BI and Big Data worlds, Evelson said in publicizing a new research report titled, "Boost Your Business Insights By Converging Big Data And BI." The research builds on previous themes explored by Forrester, such as a 2013 report that features the hub-and-spoke pattern prominently in a discussion of Big Data patterns.

The new report describes such an architecture as featuring the following components:
  • Hadoop-based data hubs/lakes to store and process majority of the enterprise data.
  • Data discovery accelerators to help profile and discover definitions and meanings in data sources.
  • Data governance that differentiates the processes you need to perform at the ingest, move, use and monitor stages.
  • BI that becomes one of many spokes of the Hadoop-based data hub.
  • A knowledge management portal to front-end multiple BI spokes.
  • Integrated metadata for data lineage and impact analysis.

Evelson isn't the first Forrester analyst to hint at big changes in the datacenter as Big Data matures and gets integrated with other tools.

"Enterprises that have a more complete data platform story, as well as a vision, are more likely to succeed in the coming years and also have a competitive advantage if they get onto this bandwagon of data platform, which includes Hadoop, Big Data, NoSQL as well as traditional databases -- all integrated," Forrester analyst Noel Yuhanna told ADTMag last year. "Because that's where you see customers that are more successful, having all those data types together and managed together and provided together in a manner that will be helpful for businesses to operate."

That theme is echoed in this new research, which identifies three key areas upon which that the hub-and-spoke system should be based. These three layers are labeled cold, warm and hot, expressing the relationship between speedy and powerful analytics and associated expenses. The cold layer holds most enterprise data in the Hadoop framework, which can be slower than databases such as SQL Server but costs less to operate. The warm layer uses DBMS for somewhat faster queries at a somewhat more expensive price. The hot layer is for speedy analysis with in-memory tools where cost might not be as important as the benefits gleaned from real-time, interactive data processing.

"But at the end of the day, while new terms are important to emphasize the need to evolve, change and innovate, what's infinitely more imperative is that both strive to achieve the same goal: transform data into information and insight," Evelson said. "Alas, while many developers are beginning to recognize the synergies and overlaps between BI and Big Data, quite a few still consider and run both in individual silos."

Posted by David Ramel on 03/27/2015 at 12:25 PM0 comments


Idera Unveils New SQL Server Suites

Idera Inc. announced new suites for managing SQL Servers, providing performance monitoring, automated maintenance, security features and more.

Known for its performance monitoring solutions, the Houston-based company renamed its primary management suite from SQL Suite to SQL Management Suite and launched three new software packages for specific functionalities.

"These suites combine top Idera products and are designed to help DBAs tackle complementary tasks more effectively as part of their daily job maintaining SQL Servers," the company said in a blog post this week.

The three new offerings are SQL Performance Suite, SQL Maintenance Suite and SQL Security Suite.

The flagship SQL Management Suite includes many of the tools found in the other packages for performance monitoring, alerting and diagnostics; compliance management for monitoring, auditing and alerting of activity and changes to data; security and permissions management; backup and restoration of SQL data; and defragmentation.

Some of the other suites add tools for their specific functionality -- for example, a business intelligence (BI) manager for tracking the health of SQL Server Analysis Services (SSAS), SQL Server Reporting Services (SSRS) and SQL Server Integration Services (SSIS).

"Our customers are responsible for maintaining and managing complex SQL environments and we listen to them to ensure our products consistently meet their performance, compliance and administrative needs," said CEO Randy Jacobs. "With most customers typically purchasing more than one product, we developed the new SQL Suites to improve the buying process and increase solution value for DBAs."

Idera, which offers a 14-day trial of its suites, didn't provide pricing information.

Posted by David Ramel on 03/19/2015 at 9:23 AM0 comments


Microsoft Updates PHP Driver for SQL Server

Continuing the company's shift to openness and interoperability, Microsoft yesterday released an updated PHP driver for its flagship Relational Database Management System (RDBMS), SQL Server.

The new driver supports PHP 5.6 and eases working with the Open Database Connectivity (ODBC) interface, described by Wikipedia as "a standard programming language middleware API for accessing database management systems."

"This driver allows developers who use the PHP scripting language to access Microsoft SQL Server and Microsoft Azure SQL Database, and to take advantage of new features implemented in ODBC," Microsoft said in a blog post. "The new version works with Microsoft ODBC Driver 11 or higher."

In January 2013 Microsoft introduced new ODBC drivers for SQL Server in version 11. "Microsoft is adopting ODBC as the de-facto standard for native access to SQL Server and Windows Azure SQL Database," the company said at the time. "We have provided longstanding support for ODBC on Windows and, in the SQL Server 2012 time frame, released support for ODBC on Linux (Red Hat Enterprise Linux 5 and 6, and SUSE Enterprise Linux)."

Key new features in the Windows version included driver-aware connection pooling, connection resiliency and asynchronous execution (polling method).

Posted by David Ramel on 03/10/2015 at 8:00 AM0 comments


Microsoft Azure Data Updates Continue Open Source Trend

One day after Big Data player Pivotal Software Inc. changed its business model by open sourcing core technologies, Microsoft today announced related product updates with a definite open source slant.

The "new and enhanced" data services include an Azure HDInsight preview that runs on Linux, and the general availability of Storm on HDInsight, Azure Machine Learning, and Informatica technology on the Microsoft Azure cloud.

"Just about every interesting innovation that's going on -- in data today, in machine learning and other areas -- has its roots in an open source ecosystem," Pivotal CEO Paul Maritz said yesterday at a live streaming event.

Perhaps an exaggeration, but the underlying meaning was grokked by Microsoft years ago, and the company is in the middle of a swing to openness and interoperability, led by new CEO Satya Nadella and top lieutenants such as T. K. "Ranga" Rengarajan, head of the data platform.

"Azure Machine Learning reflects our support for open source," stated a blog post today authored by Rengarajan and machine learning exec Joseph Sirosh. "The Python programming language is a first-class citizen in Azure Machine Learning Studio, along with R, the popular language of statisticians." Microsoft acquired stewardship of the R language earlier this year.

Data developers can now use the Machine Learning Marketplace to discover appropriate APIs and prebuilt services for common concerns such as recommendation engines, detecting anomalies and forecasting.

The open source story continues with Storm for Azure HDInsight. Azure HDInsight is Microsoft's cloud service based on 100 percent Apache Hadoop technology, open sourced by the Apache Software Foundation.

"Storm is an open source stream analytics platform that can process millions of data 'events' in real time as they are generated by sensors and devices," Microsoft said. "Using Storm with HDInsight, customers can deploy and manage applications for real-time analytics and Internet-of-Things (IoT) scenarios in a few minutes with just a few clicks. We are also making Storm available for both .NET and Java and the ability to develop, deploy and debug real-time Storm applications directly in Visual Studio. That helps developers to be productive in the environments they know best." Microsoft added Storm integration last fall.

Of course, there's nothing more open source than the Linux OS, and Azure HDInsight is now available as a preview project running on Ubuntu clusters. Ubuntu is a popular Linux distribution, described by Microsoft as "the leading scale-out Linux."

Adding Linux support in addition to Windows "is particularly compelling for people that already use Hadoop on Linux on-premises like on Hortonworks Data Platform, because they can use common Linux tools, documentation, and templates and extend their deployment to Azure with hybrid cloud connections," Microsoft said.

Also, to increase customer options for leveraging technology from Microsoft partners, the Redmond software giant announced that Informatica data integration technology will be available in the Azure Marketplace.

"Today, Informatica is announcing the availability of its Cloud Integration Secure Agent on Microsoft Azure and Linux Virtual Machines as well as an Informatica Cloud Connector for Microsoft Azure Storage," Informatica exec Ronen Schwartz said in a blog post today. "Users of Azure data services such as Azure HDInsight, Azure Machine Learning and Azure Data Factory can make their data work with access to the broadest set of data sources including on-premises applications, databases, cloud applications and social data."

All the Microsoft news comes during the Strata + Hadoop World conference underway in San Jose, Calif.

"These new services are part of our continued investment in a broad portfolio of solutions to unlock insights from data," Microsoft said. "They can help businesses dramatically improve their performance, enable governments to better serve their citizenry, or accelerate new advancements in science. Our goal is to make Big Data technology simpler and more accessible to the greatest number of people possible: Big Data pros, data scientists and app developers, but also everyday businesspeople and IT managers."

Posted by David Ramel on 02/18/2015 at 10:27 AM0 comments


Upcoming Events

.NET Insight

Sign up for our newsletter.

I agree to this site's Privacy Policy.