Microsoft Announces SQL Server 2014

Microsoft today announced SQL Server 2014, designed with "cloud-first principles" and featuring built-in, in-memory OLTP and a focus on real-time, Big Data-style analytics. No specific realease date was provided in the announcement.

"Our Big Data strategy to unlock real-time insights continues with SQL Server 2014," said Quentin Clark, corporate vice president with the Data Platform Group, in a blog post. "We are embracing the role of data--it dramatically changes how business happens. Real-time data integration, new and large data sets, data signals from outside LOB systems, evolving analytics techniques and more fluid visualization and collaboration experiences are significant components of that change."

The news came with a slew of other big product announcements at the TechEd North America conference in New Orleans, such as Windows Server 2012 R2 and System Center 2012 R2. All will be available in preview later this month.

A key feature of SQL Server 2014 is the incorporation of in-memory, online transaction processing (OLTP) technology stemming from a project that has been in the works for several years, codenamed "Hekaton," Clark said. Developed in conjunction with Microsoft Research, Hekaton greatly improves transaction processing speeds and reduces latency by virtue of working with in-memory data, as opposed to disk-based data.

Microsoft touted the benefits of the "conscious design choice" to build the Hekaton technology into SQL Server 2014, with no need for a separate data engine. "Other vendors are either introducing separate in-memory optimized caches or building a unification layer over a set of technologies and introducing it as a completely new product," said Dave Campbell, Microsoft technical fellow, when Hekaton was announced as a coming component of SQL Server 2014 last November. "This adds complexity forcing customers to deploy and manage a completely new product or, worse yet, manage both a 'memory-optimized' product for the hot data and a storage-optimized' product for the application data that is not cost-effective to reside primarily in memory," Campbell said.

Clark picked up on that theme in today's announcement. "For our customers, 'in the box' means they don’t need to buy specialized hardware or software and can migrate existing applications to benefit from performance gains," he said.

Clark also emphasized the embrace of cloud computing, noting how SQL Server 2014 will work seamlessly with the cloud-based Windows Azure to reduce operating expenditures for mission-critical applications. "Simplified cloud backup, cloud disaster recovery and easy migration to Windows Azure Virtual Machines are empowering new, easy to use, out-of-the-box hybrid capabilities," he said.

The Microsoft exec also noted SQL Server 2014 will include improvements to the AlwaysOn feature, supporting "new scenarios, scale of deployment and ease of adoption."

As mentioned, Microsoft provided no release date, but that detail was bound to be foremost in the minds of many users, such as one named Patrick who posted the very first reader comment on Clark's blog post: "Are there some dates (other than 2014)?"

What do you think of the big news about SQL Server 2014? Comment here or drop me a line.

Posted by David Ramel on 06/04/2013 at 9:03 AM0 comments


Still Time to Become a Data Scientist?

A podcast posted yesterday on the IEEE Spectrum site asked "Is Data Science Your Next Career?" That's a question I've been exploring recently in research for an article on the Big Data skills shortage.

"Opportunities abound, and universities are meeting them with new programs," the podcast states. But I was wondering--in view of the transient nature of IT industry fads or hype cycles or whatever you want to call them--would a data developer "going back to school" or getting training and experience to capitalize on the Big Data craze run out of time? That is: What's the likelihood of a developer getting Big Data training, certification and so on only to find out the need for these skills has greatly diminished? That's a question I put to several experts in the Big Data field.

"Very low likelihood," said Jon Rooney, director of developer marketing at Big Data vendor Splunk Inc. "There appears to be ongoing demand in the space as companies scratch the surface with Big Data. As Big Data technologies evolve to incorporate more established standards, developers skilled in these languages and frameworks can leverage those skills broadly, thus keeping them in demand."

"I see this as exceedingly unlikely," said Will Cole, product manager at the developer resource site, Stack Overflow. "Possibly if someone decides to go back to school. However, the Web and mobile are growing and APIs are getting more open. As long as the flow of data and the increase of scale continues, we're all going to need [machine learning] specialists and data scientists."

"No, we don't think so," said Joe Nicholson, vice president of marketing at Big Data vendor Datameer Inc. "But it's a matter of focusing on skills that will add value as the technology and market matures. Again, it's really about better understanding the use cases in marketing, customer service, security and risk, operations, etc., and how best to apply the technology and functionality to those use cases that will add value over time. Big Data analytics is in its early stages, but the problems it is addressing are problems that have been around a long time. How do we get a true, 360-degree view of customers and prospects, how do we identify and prevent fraud, how do we protect our IT infrastructure from intrusion or how do we correlate patient data to better understand clinical trial data."

Bill Yetman, senior director of engineering at Ancestry.com, was much more succinct and definitive in his answer: "No".

So let's go with that. There's still time, so get on board!

Here are some resources to get you started:

What are you doing to capitalize on the Big Data trend? Share your experiences here or drop me a line.

Posted by David Ramel on 05/29/2013 at 1:15 PM0 comments


Tracking Big Data Trends at Stack Overflow

You know Stack Overflow, of course (a recent Slashdot.org posting was titled "Developers May Be Getting 50 Percent of Their Documentation From Stack Overflow").

So, while doing research for an upcoming article, I learned that StackOverflow.com (which says it gets more than 20 million visitors per month) could provide an interesting take on trends such as the move to Big Data, both from a job-seeking/recruiting point of view and by measuring the number of questions about the technology.

From the jobs/career aspect: "I ran a quick query through our database of 106,000 developer profiles (worldwide) and found that of these, less than 1 percent (only 951) have listed Hadoop as one of their technologies," reported Bethany Marzewski, marketing coordinator at the company.

"Comparatively, of the 1,589 job listings on our job board, a search for 'Big Data' returns 776 open roles--nearly 50 percent," she said. "A query for jobs seeking programmers with Hadoop experience yields 90 open jobs (nearly 6 percent), and a search for 'machine learning' yields 115 open roles." You can view the site's job board to run your own queries.

For the developer interest angle, Stack Overflow developer Kevin Montrose ran a query to chart questions on the site tagged with "Hadoop" for each month since 2008 (see Figure 1).

[Click on image for larger view.] Figure 1. Hadoop Questions Over Time

The bottom line: The trend of companies adopting Big Data technologies--and resulting skills shortage--is huge and shows no signs of slowing down, so if you're a data developer looking for a pay hike, you may want to jump on board.

Have you been looking for a Big Data job? We'd love to hear about your experience, so please comment here or drop me a line.

Posted by David Ramel on 05/10/2013 at 1:15 PM0 comments


How to Grab a Slice of the Big Data Pie

Cloudera Inc.'s recent announcement of its SQL-on-Hadoop tool is one of the latest examples of vendors trying to make Big Data analytics more accessible. But "more accessible" is a long way from "easy," and it will be a while before your average Excel jockey can take over the reins of a typical company's Big Data initiatives.

So data developers are still key, and those with Hadoop and related Big Data skills are commanding top dollars to meet an insatiable demand for their services. But the very top dollars go to the very top developers, and those folks might have to grow beyond the traditional programmer role.

While doing research for an upcoming article, I asked some experts in the field what developers can do to make themselves more marketable in this growing field.

"A general background on Hadoop is certainly a must," said Joe Nicholson, vice president of marketing at Datameer Inc., which makes prebuilt analytics applications--yet another path to that aforementioned accessibility. "But probably more important is understanding Big Data in terms of what the correlation of various data sources, new and old, can uncover to drive new business use cases.

"This is especially true of 'new' data sources like social media, machine and Web logs and text data sources like e-mail," Nicholson continued. "There is a wealth of new insights that are possible with the analysis of the new data sources combined with traditional, structured data, and these new use cases are becoming mission critical as businesses seek new competitive advantages. This is especially true when looking for insights, patterns and relationships across all types of data."

It also helps to show your work, as noted by Jon Rooney, director of developer marketing at Splunk Inc., another Big Data vendor. "There's no substitute for hands-on experience," Rooney said. "Developers who show experience by writing code and posting their work on places like GitHub are always marketable."

That sentiment is echoed by Will Cole, a product manager at Stack Overflow. Besides taking courses and attending meetups, he said, "the more concrete way to market yourself is to build side projects or contribute to open source projects where you can take what you've learned and show some working production results you've achieved."

In fact, some companies are looking for the best coding talent by using services such as that provided by Gild Inc. to measure the quality of code posted on GitHub and participation in developer forums and question-and-answer sites such as Stack Overflow, using--ironically enough--Big Data analysis, as I reported in an article on the Application Development Trends site.

Beyond showing your work, posting good code on developer-related social sites and answering questions in forums, a new way of thinking is required for developers looking to become top-notch Big Data rock stars, according to Bill Yetman. He is senior director of engineering at Ancestry.com, where he has held various software engineering/development roles. "Developers need to approach new technologies and their careers with a 'learning mindset,' " Yetman said. "Always be willing to pick up something new, embrace it and master it. Developers who love to learn will always stay up to date and be marketable."

But it might not be that easy for some positions. "A software engineer can't simply become a data scientist in the same way a Java developer can become a Ruby developer," noted Mark A. Herschberg, CTO at Madison Logic in New York. He's in the process of starting a data science team at the B2B lead generation company, and he points out the distinction between a software engineer and a data scientist.

"A good data scientist has a combination of three different skills: data modeling, programming and business analysis," Herschberg said. "The data modeling is the hardest. Most candidates have a masters degree or PhD in math or science and have worked with various statistical models. They have programming skills--not so much the type to let you build a scalable enterprise system, but in that they can access the database and move data around. They are probably better at R and sci py (a type of Python) than at building a Web application. They also are familiar with tools like Hadoop and NoSQL databases. Finally, they have some basic business sense, so [they] will know how to ask meaningful business questions of the data.

"If a software engineer is serious about moving into data science, he or she should probably begin by taking some classes in advanced statistics and data modeling," Herschberg said.

Several sources noted that with the extreme skills shortage, companies are resorting to all kinds of ways to find talent, including contracting, outsourcing and retraining existing staff.

For those taking the latter route, some advice was offered by Yetman, who writes a Tech Roots Blog at Ancestry.com, including a recent post with the title, "Adventures in Big Data: How do you start?".

"If you are looking for developers within your organization for a Big Data project, find the guys who are always playing with new technologies just for the fun of it," Yetman told me. "Recruit them to work on your project."

Hmm, maybe that's the best advice of all: have fun with it.

Are you having fun yet in your Big Data adventures? Share your thoughts here or drop me a line.

Posted by David Ramel on 05/03/2013 at 1:15 PM0 comments


Microsoft Ramps Up Database Lifecycle Management Guidance

Need help troubleshooting a Windows Azure SQL Database connection, or getting started with SQL Server Data Tools? Well, there's now a one-stop shop where you can identify aspects of your problem as it relates to the big picture of database development and immediately go to the appropriate resources to solve it.

Yes, data developers now have their own lifecycle management acronym and accompanying guidance.

That's thanks to Microsoft's Louis Berner and his Database Lifecycle Management topic page in the MSDN Library.

"DLM is not a product but a comprehensive approach to managing the database schema, data, and metadata for a database application," the page states. It includes a complicated diagram (see Figure 1) that developers can use to identify apps and actions specific to their scenarios, along with links for guidance in the categories SQL Server Data Tools, SQL Server Management Studio and Windows Azure SQL Database.

Database Lifecycle Management
[Click on image for larger view.]
Figure 1. The DLM Diagram

I asked Berner about his DLM topic page, which has been getting some buzz in data development circles since it went live around the end of January. He explained that he worked in SQL Server Education, collaborating with Microsoft Customer Support Services to spot trends in customer support calls and resolve problems.

Last summer, he identified several issues in his bailiwick: SQL Server and SQL Database (Azure) manageability. He listed them as:

  1. Customers were frustrated in their efforts to get started with the Windows Azure platform.
  2. Data portability was a specific area that stood out as problematic, especially as it relates to service-level agreements, for example, backup/restore, business continuity and disaster recovery.
  3. Connectivity and troubleshooting connection issues were topics of concern.
  4. Understanding basic concepts was lacking in terms of what Microsoft offers for database development, portability and monitoring across scenarios that include on-premise, hybrid and cloud architectures; I described it in general terms as "Database Lifecycle Management."

"Customer frustration was understandable because we had developed features and tools over the course of many releases and a time span of five years or more," Berner explained. "Many resources within feature teams didn't understand the holistic view because they worked on individual pieces of the puzzle, and many were recent arrivals to the product unit. Because of their focus on individual features, they weren't expected to understand the big picture."

One particularly important issue he wanted to emphasize was data-tier applications. "Developers can benefit from use of data-tier applications features to create a package for deployment to production, to create a snapshot of a schema for version control, or to publish a schema update in a controlled manner," he said. "This provides developers the ability to cleanly hand off to DBA or Ops resources. The data-tier application is an under-used and under-appreciated feature, in my opinion, maybe because customers don't know about it."

Well, if you didn't know about it, you do now--and you know where to go to learn more. Berner says he will continue to improve on the DLM page. "As I continue to monitor [Customer Support Services] data and other sources of customer experience, I have developed a backlog of additional topics to include in a topic refresh. I will also iterate on the artwork to improve it. Eventually, I would like customers to be able to drill down through the diagram to get to the content they want."

Berner has received good comments on his project and would like to get more feedback from data developers to help him in his improvement process. So check out the page and let him know what you think in the comments section or by sending him an e-mail with the subject: "DLM topic on MSDN."

We'd like to hear from you, too. Please comment here or drop me a line.

Posted by David Ramel on 04/17/2013 at 1:15 PM0 comments


3D Data Visualization Takes Another Step

Tables, graphs and 3D bar charts just don't cut it anymore. To really glean insights from all that data you're collecting, you need pretty pictures, maps and interactive "cinematic guided tours" that users can play with.

That's the vibe at the PASS Business Analytics Conference underway in Chicago, as witnessed by today's announcement of "project codename 'GeoFlow' Preview for Excel 2013."

"Now you can apply geographic and temporal data visually, analyze that data in 3D and create visual tours to share your insights with others," says this video about the new product that started out in Microsoft Research:

And there's plenty more where that came from. The conference session schedule lists 11 presentations on data visualization, another tool to further Microsoft's cause of self-service business intelligence (BI).

These things are a step up from the Pivot tool that I played around with a while back, and even more recent cool toys such as the "Data Explorer" preview and Power View. GeoFlow lets you work with more than 1 million data rows in an Excel workbook and combine it with a 3D package on Bing Maps, according to a Microsoft announcement.

It's available now for download, provided you have: Microsoft Office Professional Plus 2013 or Office 365 ProPlus; and Windows 8 or Windows 7 or Windows Server 2008 R2.

Will you data devs be using the latest in 3D data visualization? Comment here or drop me a line.

Posted by David Ramel on 04/11/2013 at 1:15 PM0 comments


Silverlight Gets Some VS12 Love, Users Want More

It was just an inconspicuous little reference seemingly buried in the verbiage announcing all the new goodies in the Visual Studio 2012 Update 2, looking almost like an afterthought jammed in at the last minute:

"It includes support in Blend for SketchFlow, WPF 4.5, and Silverlight 5."

But it was like getting a note from an old, long-lost friend. "Oh yeah, Silverlight is still around."

Some 16 months since Silverlight 5 has been available for download, you can now use it with the latest Blend hooked up to the latest Visual Studio, just like you used to. It made me want to dive right in and play around with those 3D-like animations built with the storyboard. I had used it for weeks (months?) to build a cool blackjack game with all the bells and whistles, including playing cards flipping through the air onto the card table. I loved it.

But, for a while now, having sensed the winds of change, I've been plugging away at getting up to speed on apps built with HTML5, JavaScript and CSS. I hate it.

I know that's just me the hobbyist, and it means nothing. Maybe Silverlight is irrelevant to serious LOB devs. And even if that's not the case, after all, Microsoft has said it's not killing off Silverlight tomorrow or anything drastic like that. And it has been available as a preview download since last August. But still, it was nice to catch up with that old friend and see that others still cared about him, too, fitting him into their plans. I wondered what the user reaction would be, so I checked. It looks like Silverlight still has a loyal following.

There it was, No. 2 on the "Hot ideas" page of the Visual Studio UserVoice site--where you can present and vote on ideas for VS--with the simple title of "Silverlight 6." The OP had written, "Please do work on Silverlight next version. I feel Silverlight is great tool ... but as you guys stopped working on it; I feel that I wasted my time in learning Silverlight." The Jan. 13 post had 874 votes and 54 comments, mostly along the same lines.

But I don't want to give the impression that this is a landslide movement or anything. The most popular idea was to bring color back to VS, and it had more than 12,000 votes, which astounds me, right up there with the ALL CAPS menu fiasco. And Silverlight wasn't even mentioned in the comments section of the S. Somasegar blog post announcing the new update.

So maybe Silverlight is like Douglas MacArthur, an old soldier who doesn't die, but just fades away. An old friend with whom I'll stay in touch but won't meet at the pub for a couple of beers anymore. And that's cool. I know, it's only Silverlight and Blend ... but I like it.

Do you like Silverlight? Please comment here or drop me a line.

Posted by David Ramel on 04/05/2013 at 1:15 PM0 comments


Retro Database dBASE Making a Comeback?

Ok, that report is due soon, so I'm going to fire up dBASE to run some reports, export the data into Lotus 1-2-3 and summarize everything with WordPerfect--while listening to Wham! and Foreigner, of course.

Oops, my mind was momentarily transported back into the mid '80s.

Amazingly, though, one of those pioneering software products was just updated as of yesterday. Yup, dust off those old .dbf files, dBASE PLUS 8 has been released.

And, while the original Ashton-Tate version was developed for the CP/M operating system (remember those dual 5-1/4 in. floppies--one for the program, one for the data?), this new one runs on Windows 8 (yes, even the 64-bit version). My, how times have changed.

WordPerfect, of course, is still around under the stewardship of Corel Corp., but I hadn't heard anything about dBASE for quite a while. The new dBASE guardian, dBase LLC, claims it's still in use by "millions of software developers." The company was formed last year with the help of some people who formerly worked at dataBased Intelligence Inc., "the legal heir" to dBASE.

I'm not sure exactly what happened to dBASE after the astounding success of dBASE III, but, according to Wikipedia, the decline started with "the disastrous introduction of dBase IV, whose stability was so poor many users were forced to try other solutions. This was coincident with an industry-wide switch to SQL in the client-server market, and the rapid introduction of Microsoft Windows in the business market."

Anyway, the new version includes ADO support, a new UI and "enhanced developer features with support for callbacks and the ability to perform high precision math."

Pricing is $399 for the regular edition, $299 for an upgrade and $199 for a personal edition without ADO support. I wonder what those prices equate to in 1985 dollars?

UPDATE: Here's a pretty good history of dBASE by Jean-Pierre Martel, editor of The dBASE Developers Bulletin.

Any old-timers out there with a good memory? What did dBASE III sell for? And why did some of these pioneering products die or fade into obscurity, while others continue to thrive? Comment here or drop me a line.

Posted by David Ramel on 03/20/2013 at 1:15 PM0 comments


BIDS Templates Come to Visual Studio 2012 in SSDT Update

"Does SSDT for Visual Studio 2012 support BI project templates?" asked James V. Serra in a TechNet forum last September.

Some six months later, the answer was yes: "Hi James, the download to add the BI Project Templates to the VS2012 shell is now available."

Microsoft last week announced the online release of "SQL Server Data Tools – Business Intelligence for Visual Studio 2012" (SSDT BI), available for download here.

The release includes templates for Visual Studio 2012 BI projects, including Analysis Services, Integration Services and Reporting Services. These templates were part of the old Business Intelligence Development Studio (BIDS).

SQL Server Data Tools (SSDT) encompasses a bunch of integrated services and enhancements to improve database development entirely from within the Visual Studio IDE, such as incorporating functionality found in BIDS and SQL Server Management Studio (SSMS), among a host of other features.

Prior to this, the BI templates were available only in Visual Studio 2010, SSDT 2010 or SQL Server 2012. The new release will be installed through the SQL Server 2012 setup tool as a shared service and will install a Visual Studio 2012 integrated shell if you don't already have VS 2012.

This will hopefully relieve a lot of the frustration of data developers confused by different versions of SSDT, which was introduced with SQL Server 2012 but hosted in the VS 2010 shell, and inconsistencies in functionality as data development tools have evolved.

Apparently, though, there are still some frustrated users and more integration to be done. SQL DBA John Pertell welcomed the announcement. "That’s great news as a lot of developers, myself included, have been waiting for this functionality," he said. However, he added, "the bad news is that it doesn’t include the Database Projects templates released last year. You’ll still need to install them separately. But they will work together."

He explained further:

So if you want just the BI templates for Visual Studio 2012 you only have to install the BI version of SSDT. If you also want the database projects you will need to install both the BI templates and the database templates. And if you want to use the test plans for your new database projects and create SSRS reports or SSIS packages you’ll need a full edition of VS 2012, either Premium or Ultimate, plus the database templates plus the BI templates.

There were also some users frustrated by the install experience, especially on 64-bit machines running SQL Server 2012 (see comments on this blog post). Visual Studio and the SSDT integrated shell are 32-bit apps, and users reported errors, some of which were apparently caused by the installation tool trying to install the 32-bit version of SQL Server 2012, Service Pack 1. The solution seems to be to choose the "perform new install" option during installation and not the "add features to existing" option.

Still, many data devs are happy with the new capabilities. Those include Serra, who said on his blog, "It took 8 months, but at least it was quicker than being able to use BI in VS 2010, which took about two years."

Other enhancements to Visual Studio 2012 added last week include Office Developer Tools and a SQL Server Data-Tier Application Framework update.

What do you think of the new BI functionality in SSDT? Are we headed toward one big, comprehensive IDE that will include everything you need for SQL Server development in one place? Comment here or by e-mail.

Posted by David Ramel on 03/14/2013 at 1:15 PM0 comments


SQL Encroaches on Big Data Turf

Remember when SQL developers felt threatened by Big Data? Relational database management systems were old-school relics that couldn't cope with the vast amounts of unstructured, disparate data. NoSQL was the future. You needed to get onboard with Hadoop and MapReduce, running on Linux.

Well, not anymore.

Maybe not ever, really. There is just too big of an installed base of SQL developers and systems for the two camps, Big Data and SQL, to have remained apart. Even four or five years ago the convergence was underway with Hive, a data warehouse system for Hadoop that uses "a SQL-like language called HiveQL."

That convergence seems to be rapidly accelerating. Microsoft has been helping out, of course, with PolyBase in its SQL Server 2012 Parallel Data Warehouse to enable SQL queries of Big Data and initiatives such as HDInsight and the Hortonworks Data Platform to get Big Data into the Windows ecosystem.

But Redmond has plenty of company. Just this week I had the opportunity to interview Web coding pioneer Lloyd Tabb about the subject when his new company, Looker Data Sciences Inc., announced a query-based business intelligence (BI) platform called Looker. "SQL and relational querying is the best way to ask questions of large related data sets," Tabb told me.

He should know what he's talking about. He was a database and languages architect at Borland in the earlier days of RDBMS and went on to build LiveWire, the first application server for the World Wide Web. He was later a principal engineer at Netscape where he was architect of Netscape Navigator Gold (later named Composer), the first WYSIWYG HTML editor, and the engineering lead for Netscape Communicator. He helped found Mozilla.org and later became a pioneer in crowdsourcing, just to name a few of his accomplishments.

Looker, according to the company, "uses a new modeling language, LookML, which enhances SQL for analytics so end-users can perform powerful analytics without needing to know how a query is written."

I asked Tabb about the use of SQL instead of NoSQL, Hadoop or other Big Data technologies associated with BI analytics, and he gave me a little history lesson.

"Back in the day conventional wisdom was that if you were going to create an application for a PC you had to write it in Assembly language," Tabb said. "Higher-level languages generated code that was too big and too slow. Later, conventional wisdom was that you couldn't build a 'real-applicaiton' in an agile language--it was too big and too slow.

"Hadoop was designed because at the time there were no SQL engines that could deal with data sets that large. Developers regressed to hand coding queries in MapReduce. Both SQL and C are still in use today because they are the best abstractions for the kinds of problems they solve."

Looking around, I see lots of other evidence pointing to the Borg-like assimilation of Big Data by SQL. A few weeks ago GigaOM explored the subject with an article titled "SQL is what's next for Hadoop: Here's who's doing it," and just yesterday a PluralSight course on the topic was announced, described as "An investigation into the convergence of relational SQL database technologies from several vendors and Big Data technologies like Apache Hadoop."

And there are plenty more similar things going on out there. So rest easy, SQL data developers, your future is still bright.

What do you think about the convergence of Big Data and SQL? Share your thoughts by commenting here or by e-mail.

Posted by David Ramel on 03/08/2013 at 1:15 PM0 comments


Red Hat Goes All In On Big Data (Whatever That Is)

I tuned in to a Webcast earlier this week where Red Hat announced it was contributing its Hadoop plug-in to the open source Apache Hadoop community and totally embracing Big Data with an "open hybrid cloud" strategy. More on that later.

What I found really interesting was the response to an audience member who asked, "How do you define Big Data?"

Hmmm. Good question. It's one of the most over-hyped terms in the tech world today, but exactly what is it? Red Hat executive Ranga Rangachari provided the following:

So ... what we think of ... analysts have different ways to talk about this. You've heard some analysts talk about the four Vs, which is the volume, the velocity and a few other attributes to it. And, yes, that is one way to look at it, but I think our view of Big Data is, fundamentally I think, the underlying type of data, either semi-structured or unstructured. That's one way, at least, from a technology standpoint, which contrasts very much from your typical structured databases that people are used to over the last 20 years or so.

Huh?

Obviously, it's not that easy to define Big Data.

John K. Waters addressed the question a year ago:

While there's lots of talk about big data these days (a lot of talk), there currently is no good, authoritative definition of big data, according to Microsoft Regional Director and Visual Studio Magazine columnist Andrew Brust.

"It's still working itself out," Brust says. "Like any product in a good hype cycle, the malleability of the term is being used by people to suit their agendas. And that's okay; there's a definition evolving."

Wikipedia defines it as "collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications."

In other words, no one knows.

Anyway, Red Hat will open source it's Hadoop plug-in and jump on the Big Data bandwagon with it's vision of an open hybrid cloud application platform and infrastructure. Rangachari said it was designed to give companies the ability to create Big Data workloads on a public cloud and move them back and forth between their own private clouds, "without having to reprogram those applications." Red Hat said in a news release that many companies use public clouds such as Amazon Web Services for developing software, proving concepts and pre-production phases of projects that use Big Data. "Workloads are then moved to their private clouds to scale up the analytics with the larger data set," the company said.

The Red Hat Hadoop plug-in is part of Red Hat Storage, running on Linux, which is based on the GlusterFS distributed file system. It's provided as an alternative to the Hadoop Distributed File System, known for some technical limitations that Apache and other organizations have also addressed.

Rangachari said the path to the open hyrbrid cloud Big Data application platform will eventually incorporate an Apache Hive connector (now in preview), NoSQL/MongoDB Java interoperability and RESTful OData Web protocol access, in addition to its existing JBoss middleware.

He emphasized that the new cloud strategy will be woven throughout every Red Hat project, noting that "Big Data could be one of the killer apps for the open hybrid cloud."

When asked why Red Hat was contributing its Hadoop plug-in to Apache, Rangachari said the Apache Hadoop community was the "center of gravity" in the Hadoop world and that the move will provide developers with easier access to the plug-in from the same ecosystem. He also said the company expects that, rather than stopping innovation of the technology, the move to open source will actually contribute to more innovation.

So what exactly is Big Data. Please explain here in a comment or via e-mail. We'll all appreciate it.

Posted by David Ramel on 02/22/2013 at 1:15 PM0 comments


Bill Gates Says Biggest Product Regret Was WinFS Data Storage

Data developers were interested to learn this week that it was a futuristic data storage product called WinFS that Bill Gates identified as the Microsoft product he most regretted not making it to market.

In a live question-and-answer event on Reddit.com called Ask Me Anything, the legendary Microsoft co-founder answered dozens of questions from readers. While he was most concerned with the charitable work of the Bill & Melinda Gates Foundation, many questions inevitably focused on his Microsoft and programming days.

Here's the exchange about the database product:

Q: What one Microsoft program or product that was never fully developed or released do you wish had made it to market?

A: We had a rich database as the client/cloud store that was part of a Windows release that was before its time. This is an idea that will remerge since your cloud store will be rich with schema rather than just a bunch of files and the client will be a partial replica of it with rich schema understanding.

When another reader guessed that it might be WinFS, Gates answered in the affirmative. Another reader wondered if the OS mentioned was Vista, and Gates replied that: "Vista was what eventually shipped but Winfs had been dropped by then."

According to Wikipedia, WinFS is short for Windows Future Storage, described as:

the code name for a cancelled data storage and management system project based on relational databases, developed by Microsoft and first demonstrated in 2003 as an advanced storage subsystem for the Microsoft Windows operating system, designed for persistence and management of structured, semi-structured as well as unstructured data.

I found it interesting to learn that even way back then, Microsoft was thinking ahead to the cloud, and then, as now, it's all about the data.

What did you think about Gates' AMA session? Please comment here or send me an e-mail.

Posted by David Ramel on 02/15/2013 at 1:15 PM1 comments


.NET Insight

Sign up for our newsletter.

I agree to this site's Privacy Policy.