Amazon Web Services Inc. (AWS) unveiled a new Microsoft SQL Server Enterprise Edition offering for the Amazon Elastic Compute Cloud (EC2.)
A blog post authored by exec Jeff Barr yesterday said the new, pre-configured Amazon Machine Image (AMI) improves upon the Standard Edition by adding more computing power and memory. Standard allows for using up to 16 cores and 128 GiB of memory, while Enterprise can go up to 32 cores and 244 GiB of memory available in an extra-large instance.
The Enterprise Edition comes with SQL Server Enterprise Edition 2012 and SQL Server Enterprise Edition 2014, available in several regions, as explained in the AWS Marketplace.
Barr highlighted the following new and unique features of the offering:
- High availability lets users configure a primary database and up to four active, readable secondary databases into an Always-On availability group.
- Self-service business intelligence through Power View, used to interactively explore and visualize data.
- Data quality services let organizational and third-party reference data be used to profile, cleanse and match your own data.
- Online change functionality lets users restore files and file groups, alter schemas and make indexing changes while a database remains online.
"You can run the AMI on-demand or you can purchase an EC2 Reserved Instance with a one- or three-year term," Barr said.
In another data-related move, AWS on the same day announced it had added support for the enormously popular Apache Spark project to its Amazon Elastic MapReduce (Amazon EMR) service. "Amazon EMR is a Web service that makes it easy for you to process and analyze vast amounts of data using applications in the Hadoop ecosystem, including Hive, Pig, HBase, Presto, Impala and others," the company said. "We're delighted to officially add Spark to this list. Although many customers have previously been installing Spark using custom scripts, you can now launch an Amazon EMR cluster with Spark directly from the Amazon EMR Console, CLI or API."
Spark is an open-source, distributed processing framework often used for Big Data workloads. It leverages in-memory caching and optimized execution to boost performance over older Hadoop ecosystem components such as MapReduce, supporting general batch processing, streaming Big Data analytics, machine learning, graph databases, and interactive, ad hoc queries, according to the AWS Spark page.
Among Spark's many components is Spark SQL for low-latency, interactive SQL queries.
Posted by David Ramel on 06/17/2015 at 12:11 PM0 comments
With prior partnerships in place with Hortonworks Inc. and Cloudera Inc., Microsoft has now teamed up with MapR Technologies Inc., the final member of the "big three" distributors of Apache-Hadoop based software.
"Today, we are excited to announce that MapR will also be available in the summer as an option for customers to deploy Hadoop from the Azure Marketplace," Microsoft exec T.K "Ranga" Rengarajan said in a blog post yesterday.
Hortonworks had been the primary Hadoop partner in the Azure cloud. The companies teamed up in 2011 to eventually offer the Azure-based HDInsight service, featuring the Hortonworks Data Platform (HDP) as the Hadoop distribution providing the software foundation. Hortonworks also developed the Hortonworks Data Platform for Windows, letting Windows users in on the traditionally Linux-based Hadoop ecosystem. HDP is also available as a virtual machine (VM) option in the Azure Marketplace.
Microsoft then added Cloudera to the Azure mix in October 2013, putting Cloudera Enterprise in the Azure Marketplace as another Hadoop-based option.
Now, sometime this summer, the MapR Hadoop-based distribution will join the Azure Hadoop party.
"MapR is a leader in the Hadoop community that offers the MapR Distribution including Hadoop, which includes MapR-FS, an HDFS and POSIX compliant file store, and MapR-DB, a NoSQL key value store," Rengarajan said. "The distribution also includes core Hadoop projects such as Hive, Impala, SparkSQL, and Drill, and MapR Control System, a comprehensive management system."
MapR said its distribution on the Azure Marketplace will let users:
- Deploy MapR directly from the Azure Marketplace.
- Transfer data between MapR and Microsoft SQL Server services within Azure.
- Deploy MapR-DB, the MapR in-Hadoop NoSQL database, to support a wide variety of real-time use cases and deployment scenarios.
"As part of this agreement, MapR will fully support deployment of its top-ranked NoSQL database, MapR-DB on Azure," the company said yesterday in a news release. "MapR-DB offers advanced operational features such as multi-master table replication, where business users can analyze data across geographic regions while maintaining low latency and automatic synchronization with a centralized table for analytics and BI."
MapR also yesterday announced that a new version of its distribution, MapR 5.0, will be available in 30 days. "The latest MapR release auto synchronizes storage, database and search indices to support complex, real-time applications to increase revenue, reduce operational costs and mitigate risk," the company said. "MapR 5.0 also includes comprehensive security auditing, Apache Drill support, and the latest Hadoop 2.7 and YARN features."
Rengarajan noted that the Hortonworks distribution was also just updated, to HDP 2.3. That, he said, will also be available in the Azure Marketplace this summer.
Posted by David Ramel on 06/11/2015 at 11:59 AM0 comments
Well, Microsoft lied to us. They didn't provide the public preview of SQL Server 2016 "this summer" as promised -- they delivered today.
"The first public preview of SQL Server 2016 is now available for download," the SQL Server 2016 Preview page says today. "It is the biggest leap forward in Microsoft's data platform history with real-time operational analytics, rich visualizations on mobile devices, built-in advanced analytics, new advanced security technology, and new hybrid cloud scenarios."
The new edition of the company's flagship relational database features enhanced in-memory performance, Always Encrypted security technology developed in the company's research unit, "stretch databases" that move data back and forth to the cloud when needed, built-in advanced analytics and many more. You can read more about the new release in my earlier report.
Company exec T.K. "Ranga" Rengarajan provided more details in a blog post today. "Unique in this release of SQL Server, we are bringing capabilities to the cloud first in Microsoft Azure SQL Database such as Row-Level Security and Dynamic Data Masking and then bringing the capabilities, as well as the learnings from running these at hyper-scale in Microsoft Azure, back to SQL Server to deliver proven features at scale to our on-premises offering," he said. "This means all our customers benefit from our investments and learnings in Azure."
Row-Level Security lets administrators control access to data based on user characteristics. Security is implemented inside a database, so no modifications are required to an application. With Dynamic Data Masking, real-time data obfuscation is supported so data requesters can't access unauthorized data. Rengarajan said this helps protect sensitive data even when it's not encrypted.
Rengarajan highlighted the following additional capabilities in the Community Technology Preview 2:
- PolyBase -- More easily manage relational and non-relational data with the simplicity of T-SQL.
- Native JSON support -- Allows easy parsing and storing of JSON and exporting relational data to JSON.
- Temporal Database support -- Tracks historical data changes with temporal database support.
- Query Data Store -- Acts as a flight data recorder for a database, giving full history of query execution so DBAs can pinpoint expensive/regressed queries and tune query performance.
- MDS enhancements -- Offer enhanced server management capabilities for Master Data Services.
- Enhanced hybrid backup to Azure -- Enables faster backups to Microsoft Azure and faster restores to SQL Server in Azure Virtual Machines. Also, you can stage backups on-premises prior to uploading to Azure.
He also said ongoing preview updates were coming soon. "New with SQL Server 2016, customers will have the opportunity to receive more frequent updates to their preview to help accelerate internal development and test efforts," Rengarajan said. "Instead of waiting for CTP3, customers can choose to download periodic updates to CTP2 gaining access to new capabilities and features as soon as they are available for testing. More details will be shared when the first preview update is available."
He didn't say when that would be. Just as well, as the company seems to have trouble following its promised timelines.
Posted by David Ramel on 05/27/2015 at 1:23 PM0 comments
After stewarding the open source project from incubation to its new 1.0 release, MapR Technologies Inc. added Apache Drill for SQL-based Big Data analytics to its Apache Hadoop distribution.
The company -- one of the "big three" Hadoop vendors along with Hortonworks Inc. and Coudera Inc. -- this week announced the general availability of the open source Apache Drill 1.0 and its inclusion in the MapR Hadoop distribution.
Drill is a low-latency query engine based on ANSI SQL standards that facilitates self-service, interactive analytics at Big Data scales, including up to petabyte scale (1 PB is equal to 1 million GB). One of its key features is that it doesn't depend on traditional database schemas that describe how data is categorized. Discovering such schemas on the fly makes for quicker analytics, the company said.
MapR engineers including Jacques Nadeau and Steven Phillips have taken the lead on the open source project, which was incubated at the Apache Sofwtare Foundation (ASF) in September 2012 with the goal of wedding the familiar workings of relational databases with the huge new scalability demanded by the Big Data era and the agility of Hadoop systems and their heavy use of NoSQL databases.
"The project has been on the fast track in the last nine months since the developer preview in August 2014, delivering seven significant iterative releases, each adding exciting new features and most importantly, improving on the stability, scale and performance required for broader enterprise deployments," MapR exec Neeraja Rentachintala said in a blog post Tuesday.
In addition to SQL queries, the tool can work with varying types of data, including files, NoSQL databases and more complex types of data such as JSON and Parquet.
"Drill enables interactivity with data from both legacy transactional systems and new data sources, such as Internet of Things (IOT) sensors, Web click-streams and other semi-structured data, along with support for popular business intelligence (BI) and data visualization tools," MapR said in a news release. "Drill provides reliability and performance at Hadoop scale with integrated granular security and governance capabilities required for multi-tenant data lakes or enterprise data hubs."
Upcoming features planned for future editions of Drill include more functionality centered on JSON, SQL, complex data functions and new file formats, Rentachintala said.
Posted by David Ramel on 05/22/2015 at 5:44 AM0 comments
Syncfusion Inc. has updated its Big Data Platform, unique for its claim to be "the one and only Hadoop distribution designed and optimized for Windows" and free for even commercial use.
"We have fine-tuned the entire Big Data Platform experience, from the download to the end result," said exec Daniel Jebaraj in a statement.
A key update to the platform lets developers handle multiple-node Hadoop clusters on Windows. With a point-and-click cluster management tool, developers can create, monitor and otherwise manage multiple-node jobs running in C#, Java, Pig, Hive, Python and Scala.
Syncfusion says developers can create clusters using commodity machines that run Windows 7, Windows Server 2008 and later Windows versions in just minutes.
"It is still completely free, and typically installs in less than 15 minutes (for a starter cluster) with absolutely no manual configuration," Jebaraj said. "Developers can start with either a 5- or 10-node cluster, and scale as they need in order to grow their business. Between the platform's updates, capabilities, and support options, developers will be able to take their work further than ever."
The company also listed the following features of the new platform:
- Free commercial support for clusters with up to five nodes.
- Optional paid support with service level agreements for larger clusters.
- Unlimited personal commercial support for Syncfusion Plus members.
- A set of C# samples demonstrating use under different scenarios.
- A unique, local, single-node distribution of Hadoop, complete with an interactive development environment and no dependencies outside the Microsoft .NET Framework, facilitating the development and testing of solutions prior to deployment.
In addition to on-premises installations, the company said users can run their own Hadoop clusters on virtual machines supplied by cloud service providers such as Microsoft Azure and Amazon Web Services (AWS), with customization functionality not found in other cloud-based Hadoop services.
The new platform is available now for download.
Posted by David Ramel on 05/20/2015 at 12:35 PM0 comments
Native support for JSON in the upcoming SQL Server 2016 was buried among the many goodies announced earlier this month for the flagship RDBMS, but it's clearly an important feature for data developers, who this week got more details on the new functionality.
The item titled "Add native support for JSON to SQL Server, a la XML (as in, FOR JSON or FROM OPENJSON)" is the No. 1 requested feature on the Connect site used to garner feature requests for users of SQL Server and Windows Azure SQL Database.
With more than 1,000 votes and leading other items by more than a 140 votes, the item posted more than four years ago reads:
It may have taken a while, but Microsoft -- as it's increasingly doing on many fronts these days -- has listened and responded to its customers. Microsoft's Jovan Popovic wrote a recent blog post with extensive details about the new support.
For one thing, "native support" doesn't the mean same thing as introducing a new native JSON type, as was done for XML.
And, instead of getting its own type like XML, it will be represented by the existing NVARCHAR type, used for representing variable-length strings. Popovic said Microsoft studied the issue and decided to go the NVARCHAR type for many reasons concerning issues such as migration, cross-feature compatibility and client-side support.
"Our goal is to create simpler but still useful framework for processing JSON documents," Popovic said.
Thus the company is focusing on functionality -- such as export/import and prebuilt functions for JSON processing -- and query optimization, rather than storage.
Not to say that this strategy is set in stone. As I said, Microsoft is really listening to customers these days, and things could change.
"We know that PostgreSQL has a native type and JSONB support, but in this version we want to focus on the other things that are more important (do you want to see SQL Server with native type but without built-in functions that handle JSON -- I don’t think so :) )," Popovic said. "However, we are open for suggestions and if you believe the native type will help, you can create request on connect site so we can discuss it there."
(Some readers didn't even wait to open a Connect item. "This seems like a massively crippled feature and until there is native support I don't believe it offers anything even remotely similar to PostgreSQL," a reader named Phillip wrote in a blog comment.)
Anyway, Popovic explains the nitty-gritty details on the company's initial focus for JSON functionality, detailing how to export JSON with the FOR JSON clause (as was suggested in the original Connect request), how to transform JSON text to relational tables with the OPENJSON function and so on.
"Someone might say -- this will not be fast enough, but we will see," Popovic said. "Built-in JSON parser is the fastest way to process JSON in database layer. You might use CLR type or CLR parsers as external assemblies, but this will not be better than the native code that parses JSON."
Popovic said the JSON functionality will be rolled out over time in the SQL Server 2016 previews. SQL Server 2016 CTP2 is planned to include the ability to format and export data as JSON string, while SQL server 2016 CTP3 is expected to incorporate the ability to load JSON text in tables, extract values from JSON text, index properties in JSON text stored in columns, and more, he said.
The SQL Server team will be publishing more details about the huge new release of SQL Server 2016 as the days count down to the first public preview
, expected this summer.
If you can't wait, those wacky wonks on Hacker News managed to stay somewhat on-topic in a discussion about the new JSON support, and noted database expert Aaron Bertrand goes into really extensive detail in a blog post on the SQL Sentry site.
Posted by David Ramel on 05/19/2015 at 10:33 AM0 comments
Fresh from last week's Build developer conference in San Francisco, Microsoft executives appeared at the company's first-ever Ignite conference in Chicago and provided more details about the company's new Azure SQL Data Warehouse.
The company yesterday demoed the first "sneak peek" at the new "elastic data warehouse in the cloud," and today exec Tiffany Wissner penned a blog post to highlight specific functionalities.
Wissner explained how Azure SQL Data Warehouse expands upon the company's flagship relational database management system (RDBMS), SQL Server.
"Azure SQL Data Warehouse is a combination of enterprise-grade SQL Server augmented with the massively parallel processing architecture of the Analytics Platform System (APS), which allows the SQL Data Warehouse service to scale across very large datasets," Wissner said. "It integrates with existing Azure data tools including Power BI for data visualization, Azure Machine Learning for advanced analytics, Azure Data Factory for data orchestration and movement as well as Azure HDInsight, our 100 percent Apache Hadoop service for Big Data processing."
Just over a year ago, Microsoft introduced the APS as a physical appliance wedding its SQL Server Parallel Data Warehouse (PDW) with HDInsight. APS is described by Microsoft as "the evolution of the PDW product that now supports the ability to query across the traditional relational data warehouse and data stored in a Hadoop region -- either in the appliance or in a separate Hadoop cluster."
Wissner touted the pervasiveness of SQL Server as a selling point of the new solution, as enterprises can leverage developer skills and knowledge acquired from its years of everyday use.
"The SQL Data Warehouse extends the T-SQL constructs you're already familiar with to create indexes, partitions, functions and procedures which allows you to easily migrate to the cloud," Wissner said. "With native integrations to Azure Data Factory, Azure Machine Learning and Power BI, customers are able to quickly ingest data, utilize learning algorithms, and visualize data born either in the cloud or on-premises."
PolyBase in the cloud is another attractive feature, Wissner said. PolyBase was introduced with PDW in 2013 to integrate data stored in the Hadoop Distributed File System (HDFS) with SQL Server, one of many emerging SQL-on-Hadoop solutions.
"SQL Data Warehouse can query unstructured and semi-structured data stored in Azure Storage, Hortonworks Data Platform, or Cloudera using familiar T-SQL skills making it easy to combine data sets no matter where it is stored," Wissner said. "Other vendors follow the traditional data warehouse model that requires data to be moved into the instance to be accessible."
Although Wissner didn't identify any of those "other vendors," Microsoft took pains to position Azure SQL Data Warehouse as an improvement upon the Redshift cloud database offered by Amazon Web Services Inc. (AWS), which Microsoft is challenging for public cloud supremacy.
One of the advantages Microsoft sees in its product over Redshift is the ability to save costs by pausing cloud compute instances. This was mentioned at Build and echoed today by Wissner.
"Dynamic pause enables customers to optimize the utilization of the compute infrastructure by ramping down compute while persisting the data and eliminating the need to backup and restore," Wissner said. "With other cloud vendors, customers are required to back up the data, delete the existing cluster, and, upon resume, generate a new cluster and restore data. This is both time consuming and complex for scenarios such as data marts or departmental data warehouses that need variable compute power."
Again parroting the Build message, Wissner also discussed Azure SQL Data Warehouse's ability to separate compute and storage services, scaling them independently up and down immediately as needed.
"With SQL Data Warehouse you are able to quickly move to the cloud without having to move all of your infrastructure along with it," Wissner concluded. "With the Analytics Platform System, Microsoft Azure and Azure SQL Data Warehouse, you can have the data warehouse solution you need on-premises, in the cloud or a hybrid solution."
Users interested in trying out the new offering, expected to hit general availability this summer, can sign up to be notified when that happens.
Posted by David Ramel on 05/05/2015 at 12:04 PM0 comments
Forget all that techy Azure Big Data stuff -- Microsoft found a new way to put databases to work that's really interesting: guessing your age from your photo.
Threatening to upstage all the groundbreaking announcements at the Build conference is a Web site where you provide a photo and Microsoft's magical machinery consults a database of face photos to guess the age of the subjects.
Tell me you didn't (or won't) visit How-Old.net (How Old Do I Look?) and provide your own photo, hoping the Azure API would say you look 10 years younger than you are?
I certainly did. But it couldn't find my face (I was wearing a bicycle helmet in semi-profile), and then I had to get back to work. But you can bet I'll be back. So will you, right?
(Unless you're one of those fine-print privacy nuts.)
Why couldn't Ballmer come up with stuff like this? Could there be a better example of how this isn't your father's Microsoft anymore?
Microsoft machine learning (ML) engineers Corom Thompson and Santosh Balasubramanian explained in a Wednesday blog post how they were fooling around with the company's new face-recognition APIs. They sent out a bunch of e-mails to garner perhaps 50 testers.
"We were shocked," they said. "Within a few hours, over 35,000 users had hit the page from all over the world (about 29k of them from Turkey, as it turned out -- apparently there were a bunch of tweets from Turkey mentioning this page). What a great example of people having fun thanks to the power of ML!"
They said it took just a day to wire the solution up, listing the following components:
- Extracting the gender and age of the people in the pictures.
- Obtaining real-time insights on the data extracted.
- Creating real-time dashboards to view the results.
Their blog post gives all the details about the tools used and their implementation, complete with code samples. Go read it if you're interested.
Me? It's Friday afternoon and the boss is 3,000 miles away -- I'm finding a better photo of myself and going back to How-Old.net. I'm sure I don't look a day over 29.
In fact, I'll do it now. Hold on.
OK, it says I look seven years older than I am. I won't even give you the number. Stupid damn site, anyway ...
Posted by David Ramel on 05/01/2015 at 1:00 PM0 comments
A new Azure SQL Data Warehouse preview offered as a counter to Amazon's Redshift headed several data-related announcements at the opening of the Microsoft Build conference today.
Also being announced were Azure Data Lake and "elastic databases" for Azure SQL Database, further demonstrating the company's focus on helping customers implement and support a "data culture" in which analytics are used for everyday business decisions.
"The data announcements are interesting because they show an evolution of the SQL Server technology towards a cloud-first approach," IDC analyst Al Hilwa told this site. "A lot of these capabilities like elastic query are geared for cloud approaches, but Microsoft will differentiate from Amazon by also offering them for on-premises deployment. Other capabilities like Data Lake, elastic databases and Data Warehouse are focused on larger data sets that are typically born in the cloud. The volumes of data supported here builds on Microsoft's persistent investments in datacenters."
Azure SQL Data Warehouse will be available as a preview in June, Microsoft announced during the Build opening keynote. It was designed to provide petabyte-scale data warehousing as a service that can elastically scale to suit business needs. In comparison, the Amazon Web Services Inc. (AWS) Redshift -- unveiled more than two years ago -- is described as "a fast, fully managed, petabyte-scale data warehouse solution that makes it simple and cost-effective to efficiently analyze all your data using your existing business intelligence tools."
Microsoft pointed out what it said are numerous advantages that Azure SQL Data Warehouse provides over AWS Redshift, such as the ability to independently adjust compute and storage, as opposed to Redshift's fixed compute/storage ratio. Concerning elasticity, Microsoft described its new service as "the industry’s first enterprise-class cloud data warehouse as a service that can grow, shrink and pause in seconds," while it could take hours or days to resize a Redshift service. Azure SQL Data Warehouse also comes with a hybrid configuration option for hosting in the Azure cloud or on-premises -- as opposed to cloud-only for Redshift -- and offers pause/resume functionality and compatibility with true SQL queries, the company said. Redshift has no support for indexes, SQL UDFs, stored procedures or constraints, Microsoft said.
Enterprises can use the new offering in conjunction with other Microsoft data tools such as PowerBI, Azure Machine Learning, Azure HDInsight and Azure Data Factory.
Speaking of other data offerings, the Azure Data Lake repository for Big Data analytics project workloads provides one system for storing structured or unstructured data in native formats. It follows the trend -- disparaged by some analysts -- pioneered by companies such as Pivotal Software Inc. and its Business Data Lake. It can work with the Hadoop Distributed File System (HDFS) so it can be integrated with a range of other tools in the Hadoop/Big Data ecosystem, including Cloudera and Hortonworks Hadoop distributions and Microsoft's own Azure HDInsight and Azure Machine Learning.
For straight SQL-based analytics, Microsoft introduced the concept of elastic databases for Azure SQL Database, its cloud-based SQL Database-as-a-Service (DBaaS) offering. Azure SQL Database elastic databases reportedly provide one pool to help enterprises manage multiple databases and provision services as needed.
The elastic database pools let enterprises pay for all database usage at once and facilitate the running of centralized queries and reports across all data stores. The elastic databases support full-text search, column-level access rights and instant encryption of data. They "allow ISVs and software-as-a-service developers to pool capacity across thousands of databases, enabling them to benefit from efficient resource consumption and the best price and performance in the public cloud," Microsoft said in a news release.
Posted by David Ramel on 04/29/2015 at 12:26 PM0 comments
Despite all the publicity around Big Data and Apache Hadoop, a new database deployment survey indicates traditional, structured relational database management systems (RDBMS) still reign among enterprises, with SQL Server dueling Oracle for the overall lead.
Also, the new survey commissioned by Dell Software shows the use of traditional structured data is growing even faster than unstructured data, so the findings aren't just an example of a larger installed user base being eroded by upstart technologies.
"Although the growth of unstructured data has garnered most of the attention, Dell's survey shows structured data growing at an even faster rate," the company said in a statement. "While more than one-third of respondents indicated that structured data is growing at a rate of 25 percent or more annually, fewer than 30 percent of respondents said the same about their unstructured data."
Dell commissioned Unisphere Research to poll some 300 database administrators and other corporate data staffers in a report titled "The Real World of the Database Administrator."
"Although advancements in the ability to capture, store, retrieve and analyze new forms of unstructured data have garnered significant attention, the Dell survey indicates that most organizations continue to focus primarily on managing structured data, and will do so for the foreseeable future," the company said.
In fact, Dell said, more than two-thirds of enterprises reported that structured data constitutes 75 percent of the data being managed, while almost one-third of organizations reported not managing unstructured data at all -- yet.
There are many indications that organizations will widely adopt the new technologies, Dell said, as they need to support new analytical use cases.
But for the present, some 78 percent of respondents reported they were running mission-critical data on the Oracle RDBMS, closely followed by Microsoft SQL Server at about 72 percent. After MySQL and IBM DB2, the first NoSQL database to crack the most-used list is MongoDB.
Also, the survey stated, "Clearly, traditional RDBMSs shoulder the lion's share of data management in most organizations. And since more than 85 percent of the respondents are running Microsoft SQL Server and about 80 percent use Oracle, the evidence is clear that most companies support two or more DBMS brands.
Looking to the future, Dell highlighted two specific indicators of the growing dependence on NoSQL databases, especially in large organizations:
- Approximately 70 percent of respondents using MongoDB are running more than 100 databases, 30 percent are running more than 500 databases, and nearly 60 percent work for companies with more than 5,000 employees.
- Similarly, 60 percent of respondents currently using Hadoop are running more than 100 databases, 45 percent are running more than 500 databases, and approximately two-thirds work for companies with more than 1,000 employees.
But despite the big hype and big plans on the part of big companies, the survey indicated that Big Data isn't making quite as big an impact on enterprises as might be expected.
Instead, even newer upstart technologies are seen as being more disruptive.
"Most enterprises believe that more familiar 'new' technologies such as virtualization and cloud computing will have more impact on their organization over the next several years than 'newer' emerging technologies such as Hadoop," the survey states. "In fact, Hadoop and NoSQL do not factor into many companies' plans over the next few years."
Posted by David Ramel on 04/21/2015 at 9:07 AM0 comments
SQL skills pay well and the technology is among the most popular as indicated by a big new developer survey from Stack Overflow, which tracked everything from caffeine consumption to indentation preferences.
While Objective-C was reported as the most lucrative technology to learn -- garnering an average salary of $98,828 in the U.S. -- SQL wasn't far behind, coming in at No. 5 on the list with an average reported salary of $91,431 in the U.S.
These results are inline with previous such surveys from a few years ago, showing SQL isn't losing much ground in the technology wars.
"These results are not unbiased," Stack Overflow warned about the new survey. "Like the results of any survey, they are skewed by selection bias, language bias, and probably a few other biases. So take this for what it is: the most comprehensive developer survey ever conducted. Or at least the only one that asks devs about tabs vs. spaces."
Stack Overflow, in case you didn't know, is the go-to place for coders to get help with their problems. The site reports some 32 million monthly visitors.
Using this unique status, the site polled 26,086 people from 157 countries in February, posing a list of 45 questions.
One of the key areas of inquiry concerned salaries, of course. The survey found that behind Objective-C, the most lucrative skills were Node.js., C#, C++ and SQL.
But if it's purchasing power you're interested in, Ukraine is tops -- at least according to the metric of how many Big Macs you can buy on your salary.
It also might help to work remotely, as coders who don't have to fight commute traffic earn about 40 percent more than those who never work from home.
On the technology front, Apple's young Swift language was the most loved, Salesforce the most dreaded, and Android the most wanted (devs who aren't developing with the tech but have indicated interest in doing so).
Interestingly, the popular Java programming language didn't make the top 10 list of most loved languages, or most dreaded, though it was in the middle of the pack for most wanted and came in at No. 3 in overall technology popularity.
Other survey highlights include:
- NotePad++ is the most popular text editor, followed by Sublime Text, Vim, Emacs and Atom.io.
- 1,900 respondents reported being mobile developers, with 44.6 percent working with Android, 33.4 percent working with iOS and 2.3 percent working with Windows Phone (and 19.8 percent not identifying with any of those).
- The biggest age group is 25-29, where 28.5 percent of respondents lie. Only 1.9 percent of respondents were over 50.
- Only 5.8 percent of respondents reported themselves as being female.
- Most developers (41.8 percent) reported being self-taught, with only 18.4 percent having earned a master's degree in computer science or a related field.
- Most respondents identified themselves as full-stack developers (6,800). Two reported being farmers.
And, oh, by the way, Norwegian developers consume the most caffeinated beverages per day (3.09), and tabs were the more popular indentation technique, preferred by 45 percent of respondents. Spaces were popular with 33.6 percent of respondents.
But there's more to that story.
"Upon closer examination of the data, a trend emerges: Developers increasingly prefer spaces as they gain experience," the survey stated. "Stack Overflow reputation correlates with a preference for spaces, too: users who have 10,000 rep or more prefer spaces to tabs at a ratio of 3 to 1."
I'm a spaces guy myself. How about you?
Posted by David Ramel on 04/14/2015 at 1:06 PM0 comments
Forrester Research Inc. analyst Boris Evelson said existing approaches to business intelligence (BI) need updating in the modern world, and converging them with Big Data technologies requires more than traditional DBMS systems such as SQL Server can provide.
BI is alive and well in the age of Big Data and will continue to enjoy a thriving market, Evelson said, but the world is constantly changing and more innovation is needed.
"Some of the approaches in earlier-generation BI applications and platforms started to hit a ceiling a few years ago," Evelson said in a blog post today. "For example, SQL and SQL-based database management systems (DBMS), while mature, scalable and robust, are not agile and flexible enough in the modern world where change is the only constant."
But Big Data can provide those agile and flexible alternatives in a convergence with BI. "In order to address some of the limitations of more traditional and established BI technologies, Big Data offers more agile and flexible alternatives to democratize all data, such as NoSQL, among many others," the analyst said.
While the emergence of NoSQL data stores as a necessary replacement for traditional DBMS in some Big Data scenarios is well-known, the research firm's suggested solution is somewhat less obvious.
Forrester proposes using a flexible hub-and-spoke data platform to meld the BI and Big Data worlds, Evelson said in publicizing a new research report titled, "Boost Your Business Insights By Converging Big Data And BI." The research builds on previous themes explored by Forrester, such as a 2013 report that features the hub-and-spoke pattern prominently in a discussion of Big Data patterns.
The new report describes such an architecture as featuring the following components:
- Hadoop-based data hubs/lakes to store and process majority of the enterprise data.
- Data discovery accelerators to help profile and discover definitions and meanings in data sources.
- Data governance that differentiates the processes you need to perform at the ingest, move, use and monitor stages.
- BI that becomes one of many spokes of the Hadoop-based data hub.
- A knowledge management portal to front-end multiple BI spokes.
- Integrated metadata for data lineage and impact analysis.
Evelson isn't the first Forrester analyst to hint at big changes in the datacenter as Big Data matures and gets integrated with other tools.
"Enterprises that have a more complete data platform story, as well as a vision, are more likely to succeed in the coming years and also have a competitive advantage if they get onto this bandwagon of data platform, which includes Hadoop, Big Data, NoSQL as well as traditional databases -- all integrated," Forrester analyst Noel Yuhanna told ADTMag last year. "Because that's where you see customers that are more successful, having all those data types together and managed together and provided together in a manner that will be helpful for businesses to operate."
That theme is echoed in this new research, which identifies three key areas upon which that the hub-and-spoke system should be based. These three layers are labeled cold, warm and hot, expressing the relationship between speedy and powerful analytics and associated expenses. The cold layer holds most enterprise data in the Hadoop framework, which can be slower than databases such as SQL Server but costs less to operate. The warm layer uses DBMS for somewhat faster queries at a somewhat more expensive price. The hot layer is for speedy analysis with in-memory tools where cost might not be as important as the benefits gleaned from real-time, interactive data processing.
"But at the end of the day, while new terms are important to emphasize the need to evolve, change and innovate, what's infinitely more imperative is that both strive to achieve the same goal: transform data into information and insight," Evelson said. "Alas, while many developers are beginning to recognize the synergies and overlaps between BI and Big Data, quite a few still consider and run both in individual silos."
Posted by David Ramel on 03/27/2015 at 12:25 PM0 comments