Data Driver in the Cloud
Microsoft Technical Fellow Dave Campbell on Windows Azure and SDS.
Windows Azure may be the overall platform for Microsoft's cloud services, but SQL Data Services (SDS) is the new name for the repository it will be based upon. Dave Campbell, a Microsoft technical fellow, is leading the SDS effort.
Campbell, who first joined Microsoft's SQL Server team in 1994, now oversees the technical strategy and architecture of the company's data repository and other storage products. His roots in database technology date back to his days at Digital Equipment Corp., where he worked on the DEC Rdb and DEC DBMS products.
At Microsoft's Professional Developers Conference (PDC) in Los Angeles last month, where the company outlined its Azure and SDS strategy, Campbell sat down with the editors of Redmond Developer News
and discussed his vision for data management in the cloud.
What's your opinion of the trend right now where people are having to do more with less? As a result they're having their app people deal with database design when-in some cases-the developers don't have the fundamentals to be doing that type of work.
I think it's a double-edged sword. It cuts both ways in that what we're trying to do is raise the level of abstraction. That means letting people focus more on the business problem and less on the glue to connect the pieces together. This is something we've pushed in SQL Server for some time. If I look at it from the administrative standpoint, when we released SQL Server 7.0, we took 90 percent of the knobs out of the engine and made them self-tuning, and a lot of DBAs thought we were going to put them out of a job. As it turns out, what happened was, rather than waking up at 2:00 in the morning to turn some knob, they focused more on helping developers write better queries, actually thinking more about information architecture.
As we move to the services space, there's another quantum shift or transition. We want the administrator and information architect and developer to be more productive on the logical data administration. One of the things we're trying to do is raise the level of abstraction, so rather than worrying about normalization and index tuning and lots of these things, for a large class of applications we're pushing the boundaries on how to automate much of that. So some of the work we've done in SQL Server in physical design automation-physical database tuning and index tuning-there has been very interesting work done with Microsoft Research and some of that work will find its way into SQL Services that scale as we evolve the service.
Do you have a problem with the same person performing the DBA role and the development role?
||"Over time a lot of the capabilities that would make sense to project into the cloud will fall into the SQL Services umbrella. We'll have some new things there that we don't have an equivalent for in SQL Server today."
|Dave Campbell, Technical Fellow, Microsoftt
Part of our plan is to develop the platform that enables that. Businesses can't afford the two years of analysis to develop a system, and the platforms have evolved. We see this more often even when we go into large corporations, with very small teams taking on larger systems so that front-end design, back-end design and BI are all being done by a small team, sometimes on behalf of a business unit. I think that providing that agility is part of what makes our platform attractive to a lot of these people.
That applies to SQL Server 2008 and SQL Services?
It's something we've been working on in the product for some time, and I think the big separation between physical and logical administration is something that will really enable SQL Services.
What kind of feedback are you getting on SQL Data Services?
One of the things we've strived for was to make the barrier to entry, the friction, be very low. Today, if you want a SQL Server database, you have to ask questions-such as, 'what server are we going to put it on,' 'do I have enough disk space,' etc. If we can take that friction out of that, so that, [for example], if I just want a database of this size, boom! I have an end point that I can bind to and start working. The beauty of this is it's very approachable, very friendly for people designing outside-in and really offers schema flexibility in the system.
What's the estimated time frame for a public beta?
In a small matter of weeks or months we'll open it up. We've been focused on the back-end infrastructure for several years, and we're also building this back-end infrastructure for some internal properties at Microsoft. Some of those are actually going to go live and be available early next year, so the back-end we feel very, very good about. The interesting thing about the front-end is this model; we're learning so much-such as what are the use cases, what are the optimal design points. So we're still trying to sort that out. Now, once we make the call there, we'll be able to settle that down quite quickly because most of the investment in settling things has been in the back-end.
What about the front-end piece?
With respect to this balanced data model, approachability versus flexibility, how do we separate logical and physical data administration? How much power do we put in and still be able to run it in an automated fashion, since we don't have runaway queries and that sort of stuff? Where do we draw the line? That line will move over time as we become more mature with the service. We've got a tremendous amount of power on the back-end. We're just shining though a very small amount of it right now, so that when we know that we can do it in a lights-out fashion and scale, we'll open up and expose more of that latent power over time.
Is the front-end going to be a Web interface?
It's a database service. It's REST-based protocols and Internet-friendly protocols, and we have a SOAP interface to it as well. We return results in Internet-friendly form to consume, an extension to ATOM publishing protocol that Live Mesh uses that other Live Services use that we've taken back to the standards committee. So it's meant to be very approachable and in fact approachable in Internet-friendly terms. We announced support for a Ruby toolkit. We'll have support for PHP, so you can go off today and write something in PHP or Perl or your other favorite language quite easily.
Do you see this targeted at specific applications or will it be for traditional database applications?
It's a good question, and frankly my thinking right now is much different than it was nine to 12 months ago. Since we've announced this we've learned a tremendous amount.
What were you initially thinking?
I was initially thinking it was more a case that I'm going to go build an app and then I'm going to decide, am I going to deploy it to the cloud, or am I going to deploy it on premises? That's where a lot of people still are. The thing I see as different is that business can solely focus on the business logic and the application. Today, if you were going to go sell a multitenant sort of ASP.NET app, you'd have to take on some of the burden of, how am I going to get this to scale, and if it does scale am I going to be in trouble? If I've overprovisioned how much is it going to affect me? You take that completely out of the equation. With respect to existing on-premises solutions, what we are seeing is many people extending them.
With the cloud-based services, what will the DBA's role be-or will the DBA have a role?
Absolutely they have a role. I'd ask the DBA to go back and say, 'what are the things I'm doing over and over and over that are pain, that don't really add any business value?'
But is it the data architects that are going to have to address that type of decision?
Yes, but I think there are some places that have true data architects-someone who has the role and title-and there are other cases where the developer and DBA in many cases are playing the role of data architect for that particular solution. I think they'll naturally up-level themselves to provide more value to the business.
Why has the name SQL Data Services changed to SQL Services?
We initially called it SQL Server Data Services and we were focused on this first piece. We've since come up with this umbrella name called SQL Services because what we see happening is, we call it our team's mission internally to extend the data platform to the cloud. So we're going to be moving reporting to the cloud: we'll do analysis, we may have reference-data services available, and what we announced was this SQL Services Labs. So if you go to Azure.com and then SQL, you'll see a list of incubations, and we're showing our reporting services connector that will roll against SQL Data Services. Over time a lot of the capabilities that would make sense to project into the cloud will fall into the SQL Services umbrella. We'll have some new things there that we don't have an equivalent for in SQL Server today.
For the corporate dev team managers who are looking at the different options, what are some of the questions they should be asking themselves?
I'd encourage them to think about new scenarios. Where are there cases where having access to a database endpoint in the cloud where anyone can get at it, and they can control access to it; what sorts of new scenarios can they build out?
In order to consider cloud services, are we talking about a huge amount of data?
Not necessarily. It can be a small amount of data, but it's important for me to have it in a place where anyone who has a connection to the Internet can access it. The other piece we're really focusing on is synchronization to connect all this. If going up to the cloud to synchronize is cheaper for both people to rendezvous than it is to put up the infrastructure point-to-point, then it's a win. I see this scaling: Business-to-business is painful to set up in VPNs, and it's much easier to rendezvous up at a point they can both get to. Device-to-device that may never be connected-it's much easier for them to rendezvous.
So it doesn't have to be an application where I think it's going to scale beyond my infrastructure?
No. We think about it as in cases where there are millions and millions of little things we want to move as well as some things in and of themselves that are very large.
How do you see SQL Services interfacing with Windows Azure and Project Velocity?
Velocity is actually built from the same team-a lot of the underlying infrastructure is the same, so we see them fitting together quite naturally. In fact, when we started Velocity some time ago, we knew that caching would be important.
Will SQL Services be the repository for a lot of the Azure-based services, regardless of whether they're intended to be your traditional database-type applications?
Many of the Azure and .NET services will host their storage in SQL Services. Some will use the underlying [storage] that's part of the base [OS]. Just like today: People build solutions, they put some stuff in the file system where it makes sense and in the database where it makes sense.
Thanks for all this great information.
If I could move the dial on one thing, I'd say look, this is something that's very large and I think it will be every bit as big as the shift to client/server computing. I'd encourage people to step back and say, 'it's not so much I'm going to build the same thing and make a decision, is it here or there; it's about enabling new scenarios and it's going to be a very interesting five or 10 years as this plays out.'