The Query Optimizer: Q&A with Microsoft's David DeWitt
Database administrators and developers converged on Seattle for this week's annual Professional Association for SQL Server (PASS) conference, where Microsoft is talking up its recently released SQL Server 2008 and the forthcoming upgrade, code-named "Kilimanjaro." You can read all about that here
One of the key advances that will enable Kilimanjaro is "Madison," the code name for the technology that will allow SQL Server to handle massive parallel processing. Microsoft's acquisition of DATAllegro back in September is providing the key assets in developing Madison.
It turns out that much of that work is happening in Madison, Wis., where Microsoft back in March announced its database research lab, called the Jim Gray Systems Lab, located at a facility not far from the university. To run the lab, Microsoft brought on board as a technical fellow David DeWitt, who spent 32 years in academic research at the University of Wisconsin-Madison. DeWitt will be making his first public appearance as a Microsoft employee in front of his largest audience ever at PASS on Friday in a keynote address.
I had the opportunity, joined by my colleague Kathleen Richards, to talk with DeWitt this week. Here's an excerpt:
Was this lab built from the ground up?
I am still building it. It's a lot of work. I currently just have three staff members; we'll be finding up to six graduate students next semester. I have some open staff positions but I am very fussy on who I hire. I'm never going to have 50, and the goal is to have 10 to 15 full-time staff, mostly Ph.D.s and some masters students, but people that like to build systems. I am a real hands-on systems builder.
What are you working on?
We are working with the DATAllegro team to look at parallel query optimization techniques. Optimizing queries is hard, optimizing for a scalable database system is even harder, and query optimization is something I've been interested in for a long time. We have one project that involves looking at some optimization techniques that will come out in a future release of the DATAllegro product.
What role did you have in proposing, suggesting the DATAllegro acquisition?
Zero. I had absolutely no role in the acquisition process. I knew about it soon after I joined, but Microsoft is very careful about how it does acquisitions these days. I was not involved in any way in the technical decision on whether to buy it or not. But I think it's a great acquisition. They've got a great product and I think Microsoft's expertise will be able to take it to an entirely new level. It's a great team. We were there last week. We are excited about working with them. It was like a big Christmas present as far as I am concerned because now, all of a sudden, I am working at a company that has a really seriously scalable parallel database system. Having built three in my life, getting a chance to work on a fourth [is] just like Christmas.
How do you see taking it to the next level?
First of all, replacing Ingres with SQL Server will certainly drastically improve the kinds of performance we should be able to get. SQL Server is a modern database system and Ingres is an old system. The DATAllegro system avoided using indices because the indexing in Ingres was not very effective. I think we'll get all of the benefits SQL Server has as the underlying engine. We're going to get this huge boost, and I think that the DATAllegro is a startup and they have a great system but it's a startup, and there are a lot of things that were done in the area of query optimization [that] I think we can improve on. Having built a number of parallel database systems in the past, I think we can offer something when it comes to optimization of queries that will allow us to scale even higher.
How else do you see SQL Server advancing as a platform?
SQL Server will advance as a platform by using DATAllegro as the base. Will DATAllegro make SQL Server more scalable? Absolutely. I think query optimization is the main unsolved problem in data warehousing today. I think we know how to build parallel database systems that scale to hundreds of thousands of nodes. DATAllegro already has one customer that's 400 terabytes. Ebay has a competitor's system that has 5 petabytes. But there are really serious challenges of optimizing queries for hundreds of nodes and thousands of spindles. I think those are the opportunities that a team like mine can get its teeth into and make some forward progress. Query optimization is something that will come for a very long time, and we have some ideas for some new models for optimizing and executing queries that we will be exploring as part of the DATAllegro process.
You mentioned it can take 10 years for research to make it into a commercial product. Is that timeframe changing?
That's one of the goals of the lab. One of our ideas in setting up this lab was to have a much shorter path from the innovation by the graduate students and by my staff, into the product line. That's one of the reasons I am not part of Microsoft Research, even though I'm an academic researcher. I am part of the SQL Server organization and we intentionally put this lab as part of the SQL Server organization so that we had a direct path from the university into the SQL Server organization. It would not have made much sense to try to do this lab as part of Microsoft Research because we don't have a direct path.
What will you be talking about in your keynote later this week?
The other keynotes, they get to introduce new products and do fancy demos. I am just the academic guy. The talk is really going to be about the key components of a parallel or scalable database system, how partitioning works, here's the relationship between partitioning indices, here's what happens to a SQL query when it gets compiled on scalable parallel database systems. It will really be a lecture on the fundamental technologies behind today's scalable database products.
If you had to sum up your key message, what is your vision for where you'd like to see your efforts at Microsoft take the SQL Server platform moving forward?
I'd like to have us become the world leader in data warehousing. I think that we have a great SMP, product, it's easy to use, it's got great performance. We can take on Teradata. I don't see any reason why we should not become the premier solution for very large-scale data warehousing.
Posted by Jeffrey Schwartz on 11/19/2008