More details are emerging about in-memory capabilities in the new SQL Server 2014, announced at the recent TechEd 2013 conference.
The first Community Technology Preview is expected to be released soon, possibly this month, and you can register with Microsoft to be notified of its availability.
Highlights of the new release are data warehousing and business intelligence (BI) enhancements made possible through new in-memory capabilities built in to the core Relational Database Management System (RDBMS). As memory prices have fallen dramatically, 64-bit architectures have become more common and usage of multicore servers has increased, Microsoft has sought to tailor SQL Server to take advantage of these trends.
The in-memory Online Transaction Processing (OLTP) capability--formerly known by the codename Hekaton--lets developers boost performance and reduce processing time by declaring tables as "memory optimized," according to a whitepaper (PDF download) titled "SQL Server In-Memory OLTP Internals Overview for CTP1."
"Memory-optimized tables are stored completely differently than disk-based tables and these new data structures allow the data to be accessed and processed much more efficiently," Kalen Delaney wrote in the whitepaper. "It is not unreasonable to think that most, if not all, OLTP databases or the entire performance-sensitive working dataset could reside entirely in memory," she said. "Many of the largest financial, online retail and airline reservation systems fall between 500GB to 5TB with working sets that are significantly smaller."
"It’s entirely possible that within a few years you’ll be able to build distributed DRAM-based systems with capacities of 1-10 Petabytes at a cost less than $5/GB," Delaney continued. "It is also only a question of time before non-volatile RAM becomes viable."
Another new in-memory benefit is "new buffer pool extension support to non-volatile memory such as solid state drives (SSDs)," according to a SQL Server Blog post
. This will "increase performance by extending SQL Server in-memory buffer pool to SSDs for faster paging."
Independent database expert Brent Ozar expounded on this subject, writing "SQL Server 2014 will automatically cache data [on SSDs] with zero risk of data loss."
"The best use case is for read-heavy OLTP workloads," Ozar continued. "This works with local SSDs in clusters, too--each node can have its own local SSDs (just like you would with TempDB) and preserve the SAN throughput for the data and log files. SSDs are cheap, and they’re only getting cheaper and faster."
Other in-memory features mentioned by the Microsoft SQL Server team include "enhanced in-memory ColumnStore for data warehousing," which supports real-time analytics and "new enhanced query processing" that speeds up database queries "regardless of workload."
Some readers expressed enthusiasm for the new features, but, of course, wanted more. "Ok the in-memory stuff (specifically OLTP and SSD support) is valuable but the rest is so so," read one comment from a reader named John on the Microsoft blog post. "Really I wish that we would see continued improvements in reporting and analysis services and in general less dependence on SharePoint which is a painful platform to manage. QlikView and Tableau are a real threat here."
Besides the in-memory capabilities, Microsoft also emphasized increased support for hybrid solutions where, for example, a company might have part of its system on-premises because of complex hardware configurations that don't lend themselves to hosting in the cloud. These companies can then use the cloud--Windows Azure--for backup, disaster recovery and many more applications. You can read more about that in this whitepaper (also a PDF download).
What do you think of the new in-memory capabilities of SQL Server 2014? Comment here or drop me a line.
Posted by David Ramel on 06/13/2013 at 1:15 PM0 comments
Microsoft today announced SQL Server 2014, designed with "cloud-first principles" and featuring built-in, in-memory OLTP and a focus on real-time, Big Data-style analytics. No specific realease date was provided in the announcement.
"Our Big Data strategy to unlock real-time insights continues with SQL Server 2014," said Quentin Clark, corporate vice president with the Data Platform Group, in a blog post. "We are embracing the role of data--it dramatically changes how business happens. Real-time data integration, new and large data sets, data signals from outside LOB systems, evolving analytics techniques and more fluid visualization and collaboration experiences are significant components of that change."
The news came with a slew of other big product announcements at the TechEd North America conference in New Orleans, such as Windows Server 2012 R2 and System Center 2012 R2. All will be available in preview later this month.
A key feature of SQL Server 2014 is the incorporation of in-memory, online transaction processing (OLTP) technology stemming from a project that has been in the works for several years, codenamed "Hekaton
," Clark said. Developed in conjunction with Microsoft Research, Hekaton greatly improves transaction processing speeds and reduces latency by virtue of working with in-memory data, as opposed to disk-based data.
Microsoft touted the benefits of the "conscious design choice" to build the Hekaton technology into SQL Server 2014, with no need for a separate data engine. "Other vendors are either introducing separate in-memory optimized caches or building a unification layer over a set of technologies and introducing it as a completely new product," said Dave Campbell, Microsoft technical fellow, when Hekaton was announced as a coming component of SQL Server 2014 last November. "This adds complexity forcing customers to deploy and manage a completely new product or, worse yet, manage both a 'memory-optimized' product for the hot data and a storage-optimized' product for the application data that is not cost-effective to reside primarily in memory," Campbell said.
Clark picked up on that theme in today's announcement. "For our customers, 'in the box' means they don’t need to buy specialized hardware or software and can migrate existing applications to benefit from performance gains," he said.
Clark also emphasized the embrace of cloud computing, noting how SQL Server 2014 will work seamlessly with the cloud-based Windows Azure to reduce operating expenditures for mission-critical applications. "Simplified cloud backup, cloud disaster recovery and easy migration to Windows Azure Virtual Machines are empowering new, easy to use, out-of-the-box hybrid capabilities," he said.
The Microsoft exec also noted SQL Server 2014 will include improvements to the AlwaysOn feature, supporting "new scenarios, scale of deployment and ease of adoption."
As mentioned, Microsoft provided no release date, but that detail was bound to be foremost in the minds of many users, such as one named Patrick who posted the very first reader comment on Clark's blog post: "Are there some dates (other than 2014)?"
What do you think of the big news about SQL Server 2014? Comment here or drop me a line.
Posted by David Ramel on 06/04/2013 at 9:03 AM0 comments
A podcast posted yesterday on the IEEE Spectrum site asked "Is Data Science Your Next Career?" That's a question I've been exploring recently in research for an article on the Big Data skills shortage.
"Opportunities abound, and universities are meeting them with new programs," the podcast states. But I was wondering--in view of the transient nature of IT industry fads or hype cycles or whatever you want to call them--would a data developer "going back to school" or getting training and experience to capitalize on the Big Data craze run out of time? That is: What's the likelihood of a developer getting Big Data training, certification and so on only to find out the need for these skills has greatly diminished? That's a question I put to several experts in the Big Data field.
"Very low likelihood," said Jon Rooney, director of developer marketing at Big Data vendor Splunk Inc. "There appears to be ongoing demand in the space as companies scratch the surface with Big Data. As Big Data technologies evolve to incorporate more established standards, developers skilled in these languages and frameworks can leverage those skills broadly, thus keeping them in demand."
"I see this as exceedingly unlikely," said Will Cole, product manager at the developer resource site, Stack Overflow. "Possibly if someone decides to go back to school. However, the Web and mobile are growing and APIs are getting more open. As long as the flow of data and the increase of scale continues, we're all going to need [machine learning] specialists and data scientists."
"No, we don't think so," said Joe Nicholson, vice president of marketing at Big Data vendor Datameer Inc. "But it's a matter of focusing on skills that will add value as the technology and market matures. Again, it's really about better understanding the use cases in marketing, customer service, security and risk, operations, etc., and how best to apply the technology and functionality to those use cases that will add value over time. Big Data analytics is in its early stages, but the problems it is addressing are problems that have been around a long time. How do we get a true, 360-degree view of customers and prospects, how do we identify and prevent fraud, how do we protect our IT infrastructure from intrusion or how do we correlate patient data to better understand clinical trial data."
Bill Yetman, senior director of engineering at Ancestry.com, was much more succinct and definitive in his answer: "No".
So let's go with that. There's still time, so get on board!
Here are some resources to get you started:
What are you doing to capitalize on the Big Data trend? Share your experiences here or drop me a line.
Posted by David Ramel on 05/29/2013 at 1:15 PM0 comments
You know Stack Overflow, of course (a recent Slashdot.org posting
was titled "Developers May Be Getting 50 Percent of Their Documentation From Stack Overflow").
So, while doing research for an upcoming article, I learned that StackOverflow.com (which says it gets more than 20 million visitors per month) could provide an interesting take on trends such as the move to Big Data, both from a job-seeking/recruiting point of view and by measuring the number of questions about the technology.
From the jobs/career aspect: "I ran a quick query through our database of 106,000 developer profiles (worldwide) and found that of these, less than 1 percent (only 951) have listed Hadoop as one of their technologies," reported Bethany Marzewski, marketing coordinator at the company.
"Comparatively, of the 1,589 job listings on our job board, a search for 'Big Data' returns 776 open roles--nearly 50 percent," she said. "A query for jobs seeking programmers with Hadoop experience yields 90 open jobs (nearly 6 percent), and a search for 'machine learning' yields 115 open roles." You can view the site's job board to run your own queries.
For the developer interest angle, Stack Overflow developer Kevin Montrose ran a query to chart questions on the site tagged with "Hadoop" for each month since 2008 (see Figure 1).
The bottom line: The trend of companies adopting Big Data technologies--and resulting skills shortage--is huge and shows no signs of slowing down, so if you're a data developer looking for a pay hike, you may want to jump on board.
Have you been looking for a Big Data job? We'd love to hear about your experience, so please comment here or drop me a line.
Posted by David Ramel on 05/10/2013 at 1:15 PM0 comments
Cloudera Inc.'s recent announcement of its SQL-on-Hadoop tool is one of the latest examples of vendors trying to make Big Data analytics more accessible. But "more accessible" is a long way from "easy," and it will be a while before your average Excel jockey can take over the reins of a typical company's Big Data initiatives.
So data developers are still key, and those with Hadoop and related Big Data skills are commanding top dollars to meet an insatiable demand for their services. But the very top dollars go to the very top developers, and those folks might have to grow beyond the traditional programmer role.
While doing research for an upcoming article, I asked some experts in the field what developers can do to make themselves more marketable in this growing field.
"A general background on Hadoop is certainly a must," said Joe Nicholson, vice president of marketing at Datameer Inc., which makes prebuilt analytics applications--yet another path to that aforementioned accessibility. "But probably more important is understanding Big Data in terms of what the correlation of various data sources, new and old, can uncover to drive new business use cases.
"This is especially true of 'new' data sources like social media, machine and Web logs and text data sources like e-mail," Nicholson continued. "There is a wealth of new insights that are possible with the analysis of the new data sources combined with traditional, structured data, and these new use cases are becoming mission critical as businesses seek new competitive advantages. This is especially true when looking for insights, patterns and relationships across all types of data."
It also helps to show your work, as noted by Jon Rooney, director of developer marketing at Splunk Inc., another Big Data vendor. "There's no substitute for hands-on experience," Rooney said. "Developers who show experience by writing code and posting their work on places like GitHub are always marketable."
That sentiment is echoed by Will Cole, a product manager at Stack Overflow. Besides taking courses and attending meetups, he said, "the more concrete way to market yourself is to build side projects or contribute to open source projects where you can take what you've learned and show some working production results you've achieved."
In fact, some companies are looking for the best coding talent by using services such as that provided by Gild Inc. to measure the quality of code posted on GitHub and participation in developer forums and question-and-answer sites such as Stack Overflow, using--ironically enough--Big Data analysis, as I reported in an article on the Application Development Trends site.
Beyond showing your work, posting good code on developer-related social sites and answering questions in forums, a new way of thinking is required for developers looking to become top-notch Big Data rock stars, according to Bill Yetman. He is senior director of engineering at Ancestry.com, where he has held various software engineering/development roles. "Developers need to approach new technologies and their careers with a 'learning mindset,' " Yetman said. "Always be willing to pick up something new, embrace it and master it. Developers who love to learn will always stay up to date and be marketable."
But it might not be that easy for some positions. "A software engineer can't simply become a data scientist in the same way a Java developer can become a Ruby developer," noted Mark A. Herschberg, CTO at Madison Logic in New York. He's in the process of starting a data science team at the B2B lead generation company, and he points out the distinction between a software engineer and a data scientist.
"A good data scientist has a combination of three different skills: data modeling, programming and business analysis," Herschberg said. "The data modeling is the hardest. Most candidates have a masters degree or PhD in math or science and have worked with various statistical models. They have programming skills--not so much the type to let you build a scalable enterprise system, but in that they can access the database and move data around. They are probably better at R and sci py (a type of Python) than at building a Web application. They also are familiar with tools like Hadoop and NoSQL databases. Finally, they have some basic business sense, so [they] will know how to ask meaningful business questions of the data.
"If a software engineer is serious about moving into data science, he or she should probably begin by taking some classes in advanced statistics and data modeling," Herschberg said.
Several sources noted that with the extreme skills shortage, companies are resorting to all kinds of ways to find talent, including contracting, outsourcing and retraining existing staff.
For those taking the latter route, some advice was offered by Yetman, who writes a Tech Roots Blog at Ancestry.com, including a recent post with the title, "Adventures in Big Data: How do you start?".
"If you are looking for developers within your organization for a Big Data project, find the guys who are always playing with new technologies just for the fun of it," Yetman told me. "Recruit them to work on your project."
Hmm, maybe that's the best advice of all: have fun with it.
Are you having fun yet in your Big Data adventures? Share your thoughts here or drop me a line.
Posted by David Ramel on 05/03/2013 at 1:15 PM0 comments
Need help troubleshooting a Windows Azure SQL Database connection, or getting started with SQL Server Data Tools? Well, there's now a one-stop shop where you can identify aspects of your problem as it relates to the big picture of database development and immediately go to the appropriate resources to solve it.
Yes, data developers now have their own lifecycle management acronym and accompanying guidance.
That's thanks to Microsoft's Louis Berner and his Database Lifecycle Management topic page in the MSDN Library.
"DLM is not a product but a comprehensive approach to managing the database schema, data, and metadata for a database application," the page states. It includes a complicated diagram (see Figure 1) that developers can use to identify apps and actions specific to their scenarios, along with links for guidance in the categories SQL Server Data Tools, SQL Server Management Studio and Windows Azure SQL Database.
[Click on image for larger view.]
|Figure 1. The DLM Diagram|
I asked Berner about his DLM topic page, which has been getting some buzz in data development circles since it went live around the end of January. He explained that he worked in SQL Server Education, collaborating with Microsoft Customer Support Services to spot trends in customer support calls and resolve problems.
Last summer, he identified several issues in his bailiwick: SQL Server and SQL Database (Azure) manageability. He listed them as:
- Customers were frustrated in their efforts to get started with the Windows Azure platform.
- Data portability was a specific area that stood out as problematic, especially as it relates to service-level agreements, for example, backup/restore, business continuity and disaster recovery.
- Connectivity and troubleshooting connection issues were topics of concern.
- Understanding basic concepts was lacking in terms of what Microsoft offers for database development, portability and monitoring across scenarios that include on-premise, hybrid and cloud architectures; I described it in general terms as "Database Lifecycle Management."
"Customer frustration was understandable because we had developed features and tools over the course of many releases and a time span of five years or more," Berner explained. "Many resources within feature teams didn't understand the holistic view because they worked on individual pieces of the puzzle, and many were recent arrivals to the product unit. Because of their focus on individual features, they weren't expected to understand the big picture."
One particularly important issue he wanted to emphasize was data-tier applications. "Developers can benefit from use of data-tier applications features to create a package for deployment to production, to create a snapshot of a schema for version control, or to publish a schema update in a controlled manner," he said. "This provides developers the ability to cleanly hand off to DBA or Ops resources. The data-tier application is an under-used and under-appreciated feature, in my opinion, maybe because customers don't know about it."
Well, if you didn't know about it, you do now--and you know where to go to learn more. Berner says he will continue to improve on the DLM page. "As I continue to monitor [Customer Support Services] data and other sources of customer experience, I have developed a backlog of additional topics to include in a topic refresh. I will also iterate on the artwork to improve it. Eventually, I would like customers to be able to drill down through the diagram to get to the content they want."
Berner has received good comments on his project and would like to get more feedback from data developers to help him in his improvement process. So check out the page and let him know what you think in the comments section or by sending him an e-mail with the subject: "DLM topic on MSDN."
We'd like to hear from you, too. Please comment here or drop me a line.
Posted by David Ramel on 04/17/2013 at 1:15 PM0 comments
Tables, graphs and 3D bar charts just don't cut it anymore. To really glean insights from all that data you're collecting, you need pretty pictures, maps and interactive "cinematic guided tours" that users can play with.
That's the vibe at the PASS Business Analytics Conference underway in Chicago, as witnessed by today's announcement of "project codename 'GeoFlow' Preview for Excel 2013."
"Now you can apply geographic and temporal data visually, analyze that data in 3D and create visual tours to share your insights with others," says this video about the new product that started out in Microsoft Research:
And there's plenty more where that came from. The conference session schedule lists 11 presentations on data visualization, another tool to further Microsoft's cause of self-service business intelligence (BI).
These things are a step up from the Pivot
tool that I played around with a while back, and even more recent cool toys such as the "Data Explorer" preview
and Power View
. GeoFlow lets you work with more than 1 million data rows in an Excel workbook and combine it with a 3D package on Bing Maps
, according to a Microsoft announcement
It's available now for download, provided you have: Microsoft Office Professional Plus 2013 or Office 365 ProPlus; and Windows 8 or Windows 7 or Windows Server 2008 R2.
Will you data devs be using the latest in 3D data visualization? Comment here or drop me a line.
Posted by David Ramel on 04/11/2013 at 1:15 PM0 comments
It was just an inconspicuous little reference seemingly buried in the verbiage announcing all the new goodies in the Visual Studio 2012 Update 2, looking almost like an afterthought jammed in at the last minute:
"It includes support in Blend for SketchFlow, WPF 4.5, and Silverlight 5."
But it was like getting a note from an old, long-lost friend. "Oh yeah, Silverlight is still around."
Some 16 months since Silverlight 5 has been available for download, you can now use it with the latest Blend hooked up to the latest Visual Studio, just like you used to. It made me want to dive right in and play around with those 3D-like animations built with the storyboard. I had used it for weeks (months?) to build a cool blackjack game with all the bells and whistles, including playing cards flipping through the air onto the card table. I loved it.
I know that's just me the hobbyist, and it means nothing. Maybe Silverlight is irrelevant to serious LOB devs. And even if that's not the case, after all, Microsoft has said it's not killing off Silverlight tomorrow or anything drastic like that. And it has been available as a preview download since last August. But still, it was nice to catch up with that old friend and see that others still cared about him, too, fitting him into their plans. I wondered what the user reaction would be, so I checked. It looks like Silverlight still has a loyal following.
There it was, No. 2 on the "Hot ideas" page of the Visual Studio UserVoice site--where you can present and vote on ideas for VS--with the simple title of "Silverlight 6." The OP had written, "Please do work on Silverlight next version. I feel Silverlight is great tool ... but as you guys stopped working on it; I feel that I wasted my time in learning Silverlight." The Jan. 13 post had 874 votes and 54 comments, mostly along the same lines.
But I don't want to give the impression that this is a landslide movement or anything. The most popular idea was to bring color back to VS, and it had more than 12,000 votes, which astounds me, right up there with the ALL CAPS menu fiasco. And Silverlight wasn't even mentioned in the comments section of the S. Somasegar blog post announcing the new update.
So maybe Silverlight is like Douglas MacArthur, an old soldier who doesn't die, but just fades away. An old friend with whom I'll stay in touch but won't meet at the pub for a couple of beers anymore. And that's cool. I know, it's only Silverlight and Blend ... but I like it.
Do you like Silverlight? Please comment here or drop me a line.
Posted by David Ramel on 04/05/2013 at 1:15 PM0 comments
Ok, that report is due soon, so I'm going to fire up dBASE to run some reports, export the data into Lotus 1-2-3 and summarize everything with WordPerfect--while listening to Wham! and Foreigner, of course.
Oops, my mind was momentarily transported back into the mid '80s.
Amazingly, though, one of those pioneering software products was just updated as of yesterday. Yup, dust off those old .dbf files, dBASE PLUS 8 has been released.
And, while the original Ashton-Tate version was developed for the CP/M operating system (remember those dual 5-1/4 in. floppies--one for the program, one for the data?), this new one runs on Windows 8 (yes, even the 64-bit version). My, how times have changed.
WordPerfect, of course, is still around under the stewardship of Corel Corp., but I hadn't heard anything about dBASE for quite a while. The new dBASE guardian, dBase LLC, claims it's still in use by "millions of software developers." The company was formed last year with the help of some people who formerly worked at dataBased Intelligence Inc., "the legal heir" to dBASE.
I'm not sure exactly what happened to dBASE after the astounding success of dBASE III, but, according to Wikipedia, the decline started with "the disastrous introduction of dBase IV, whose stability was so poor many users were forced to try other solutions. This was coincident with an industry-wide switch to SQL in the client-server market, and the rapid introduction of Microsoft Windows in the business market."
Anyway, the new version includes ADO support, a new UI and "enhanced developer features with support for callbacks and the ability to perform high precision math."
Pricing is $399 for the regular edition, $299 for an upgrade and $199 for a personal edition without ADO support. I wonder what those prices equate to in 1985 dollars?
UPDATE: Here's a pretty good history of dBASE by Jean-Pierre Martel, editor of The dBASE Developers Bulletin.
Any old-timers out there with a good memory? What did dBASE III sell for? And why did some of these pioneering products die or fade into obscurity, while others continue to thrive? Comment here or drop me a line.
Posted by David Ramel on 03/20/2013 at 1:15 PM0 comments
"Does SSDT for Visual Studio 2012 support BI project templates?" asked James V. Serra in a TechNet forum last September.
Some six months later, the answer was yes: "Hi James, the download to add the BI Project Templates to the VS2012 shell is now available."
Microsoft last week announced
the online release of "SQL Server Data Tools – Business Intelligence for Visual Studio 2012" (SSDT BI), available for download here
The release includes templates for Visual Studio 2012 BI projects, including Analysis Services, Integration Services and Reporting Services. These templates were part of the old Business Intelligence Development Studio (BIDS).
SQL Server Data Tools (SSDT) encompasses a bunch of integrated services and enhancements to improve database development entirely from within the Visual Studio IDE, such as incorporating functionality found in BIDS and SQL Server Management Studio (SSMS), among a host of other features.
Prior to this, the BI templates were available only in Visual Studio 2010, SSDT 2010 or SQL Server 2012. The new release will be installed through the SQL Server 2012 setup tool as a shared service and will install a Visual Studio 2012 integrated shell if you don't already have VS 2012.
This will hopefully relieve a lot of the frustration of data developers confused by different versions of SSDT, which was introduced with SQL Server 2012 but hosted in the VS 2010 shell, and inconsistencies in functionality as data development tools have evolved.
Apparently, though, there are still some frustrated users and more integration to be done. SQL DBA John Pertell welcomed the announcement. "That’s great news as a lot of developers, myself included, have been waiting for this functionality," he said. However, he added, "the bad news is that it doesn’t include the Database Projects templates released last year. You’ll still need to install them separately. But they will work together."
He explained further:
So if you want just the BI templates for Visual Studio 2012 you only have to install the BI version of SSDT. If you also want the database projects you will need to install both the BI templates and the database templates. And if you want to use the test plans for your new database projects and create SSRS reports or SSIS packages you’ll need a full edition of VS 2012, either Premium or Ultimate, plus the database templates plus the BI templates.
There were also some users frustrated by the install experience, especially on 64-bit machines running SQL Server 2012 (see comments on this blog post). Visual Studio and the SSDT integrated shell are 32-bit apps, and users reported errors, some of which were apparently caused by the installation tool trying to install the 32-bit version of SQL Server 2012, Service Pack 1. The solution seems to be to choose the "perform new install" option during installation and not the "add features to existing" option.
Still, many data devs are happy with the new capabilities. Those include Serra, who said on his blog, "It took 8 months, but at least it was quicker than being able to use BI in VS 2010, which took about two years."
Other enhancements to Visual Studio 2012 added last week include Office Developer Tools and a SQL Server Data-Tier Application Framework update.
What do you think of the new BI functionality in SSDT? Are we headed toward one big, comprehensive IDE that will include everything you need for SQL Server development in one place? Comment here or by e-mail.
Posted by David Ramel on 03/14/2013 at 1:15 PM0 comments
Remember when SQL developers felt threatened by Big Data? Relational database management systems were old-school relics that couldn't cope with the vast amounts of unstructured, disparate data. NoSQL was the future. You needed to get onboard with Hadoop and MapReduce, running on Linux.
Well, not anymore.
Maybe not ever, really. There is just too big of an installed base of SQL developers and systems for the two camps, Big Data and SQL, to have remained apart. Even four or five years ago the convergence was underway with Hive, a data warehouse system for Hadoop that uses "a SQL-like language called HiveQL."
That convergence seems to be rapidly accelerating. Microsoft has been helping out, of course, with PolyBase in its SQL Server 2012 Parallel Data Warehouse to enable SQL queries of Big Data and initiatives such as HDInsight and the Hortonworks Data Platform to get Big Data into the Windows ecosystem.
But Redmond has plenty of company. Just this week I had the opportunity to interview Web coding pioneer Lloyd Tabb about the subject when his new company, Looker Data Sciences Inc., announced a query-based business intelligence (BI) platform called Looker. "SQL and relational querying is the best way to ask questions of large related data sets," Tabb told me.
He should know what he's talking about. He was a database and languages architect at Borland in the earlier days of RDBMS and went on to build LiveWire, the first application server for the World Wide Web. He was later a principal engineer at Netscape where he was architect of Netscape Navigator Gold (later named Composer), the first WYSIWYG HTML editor, and the engineering lead for Netscape Communicator. He helped found Mozilla.org and later became a pioneer in crowdsourcing, just to name a few of his accomplishments.
Looker, according to the company, "uses a new modeling language, LookML, which enhances SQL for analytics so end-users can perform powerful analytics without needing to know how a query is written."
I asked Tabb about the use of SQL instead of NoSQL, Hadoop or other Big Data technologies associated with BI analytics, and he gave me a little history lesson.
"Back in the day conventional wisdom was that if you were going to create an application for a PC you had to write it in Assembly language," Tabb said. "Higher-level languages generated code that was too big and too slow. Later, conventional wisdom was that you couldn't build a 'real-applicaiton' in an agile language--it was too big and too slow.
"Hadoop was designed because at the time there were no SQL engines that could deal with data sets that large. Developers regressed to hand coding queries in MapReduce. Both SQL and C are still in use today because they are the best abstractions for the kinds of problems they solve."
Looking around, I see lots of other evidence pointing to the Borg-like assimilation of Big Data by SQL. A few weeks ago GigaOM explored the subject with an article titled "SQL is what's next for Hadoop: Here's who's doing it," and just yesterday a PluralSight course on the topic was announced, described as "An investigation into the convergence of relational SQL database technologies from several vendors and Big Data technologies like Apache Hadoop."
And there are plenty more similar things going on out there. So rest easy, SQL data developers, your future is still bright.
What do you think about the convergence of Big Data and SQL? Share your thoughts by commenting here or by e-mail.
Posted by David Ramel on 03/08/2013 at 1:15 PM0 comments
I tuned in to a Webcast earlier this week where Red Hat announced it was contributing its Hadoop plug-in to the open source Apache Hadoop community and totally embracing Big Data with an "open hybrid cloud" strategy. More on that later.
What I found really interesting was the response to an audience member who asked, "How do you define Big Data?"
Hmmm. Good question. It's one of the most over-hyped terms in the tech world today, but exactly what is it? Red Hat executive Ranga Rangachari provided the following:
So ... what we think of ... analysts have different ways to talk about this. You've heard some analysts talk about the four Vs, which is the volume, the velocity and a few other attributes to it. And, yes, that is one way to look at it, but I think our view of Big Data is, fundamentally I think, the underlying type of data, either semi-structured or unstructured. That's one way, at least, from a technology standpoint, which contrasts very much from your typical structured databases that people are used to over the last 20 years or so.
Obviously, it's not that easy to define Big Data.
John K. Waters addressed the question a year ago:
While there's lots of talk about big data these days (a lot of talk), there currently is no good, authoritative definition of big data, according to Microsoft Regional Director and Visual Studio Magazine columnist Andrew Brust.
"It's still working itself out," Brust says. "Like any product in a good hype cycle, the malleability of the term is being used by people to suit their agendas. And that's okay; there's a definition evolving."
Wikipedia defines it as "collection of data sets so large and complex that it becomes difficult to process using on-hand database management tools or traditional data processing applications."
In other words, no one knows.
Anyway, Red Hat will open source it's Hadoop plug-in and jump on the Big Data bandwagon with it's vision of an open hybrid cloud application platform and infrastructure. Rangachari said it was designed to give companies the ability to create Big Data workloads on a public cloud and move them back and forth between their own private clouds, "without having to reprogram those applications." Red Hat said in a news release that many companies use public clouds such as Amazon Web Services for developing software, proving concepts and pre-production phases of projects that use Big Data. "Workloads are then moved to their private clouds to scale up the analytics with the larger data set," the company said.
The Red Hat Hadoop plug-in is part of Red Hat Storage, running on Linux, which is based on the GlusterFS distributed file system. It's provided as an alternative to the Hadoop Distributed File System, known for some technical limitations that Apache and other organizations have also addressed.
Rangachari said the path to the open hyrbrid cloud Big Data application platform will eventually incorporate an Apache Hive connector (now in preview), NoSQL/MongoDB Java interoperability and RESTful OData Web protocol access, in addition to its existing JBoss middleware.
He emphasized that the new cloud strategy will be woven throughout every Red Hat project, noting that "Big Data could be one of the killer apps for the open hybrid cloud."
When asked why Red Hat was contributing its Hadoop plug-in to Apache, Rangachari said the Apache Hadoop community was the "center of gravity" in the Hadoop world and that the move will provide developers with easier access to the plug-in from the same ecosystem. He also said the company expects that, rather than stopping innovation of the technology, the move to open source will actually contribute to more innovation.
So what exactly is Big Data. Please explain here in a comment or via e-mail. We'll all appreciate it.
Posted by David Ramel on 02/22/2013 at 1:15 PM0 comments