VSLive! Keynote: Abel Wang on Microsoft's DevOps Journey
- By John K. Waters
The changes Microsoft had to make over the past few years to transform itself from a traditional, waterfall-oriented software provider into an agile, cloud-based organization -- from a company that shipped new versions of its products once every three years to one that delivers new features every three weeks, with multiple bug fixes and patches deployed daily -- were part of a "very painful journey," recalled Abel Wang, but one the company simply had to make.
"We had to adopt this DevOps mindset," Wang told attendees during his keynote presentation on day two of this year's Austin edition of the Visual Studio Live! conference, which runs through Friday. "It was 'adapt or die.' And nobody was interested in dying."
Wang is a senior cloud developer advocate at Microsoft specializing in DevOps and Azure. He's currently a member of a five-person team of DevOps practitioners with the unlikely name "The League of Extraordinary Cloud DevOps Advocates" (#LoECDA).
"Our sole charter is to help developers everywhere get into Azure using DevOps best practices."
Abel Wang, Senior Cloud Developer Advocate, Microsoft
"It's the most ridiculous name on the planet," Wang said. "It was a joke that stuck. But the team is amazing. Our sole charter is to help developers everywhere get into Azure using DevOps best practices. Notice I did not say 'using VSTS [Visual Studio Team Services] to get into Azure.' I want everyone to use VSTS, of course, but I really just want everyone to get into Azure."
Microsoft defines DevOps as "the union of people, process, and products to enable the continuous delivery of value to our end users," Wang explained.
"Notice I didn't say 'continuously delivery of code,'" he said, "because what does that give us? Piles and piles of code does us no good whatsoever. And notice that I didn't say 'continuous delivery of features,' because we could be delivering feature after feature, sprint after sprint, but if it's not what the users need or want, we're just wasting our time. We have to make sure that we are continually delivering value. That's what it's all about."
That and speed, apparently. Wang showed attendees a short film clip that compared the time it took a Formula One pit crew to service a race car at the 1950 Indianapolis 500 -- change the tires, gas up, and clean the windshield -- with a similar pit stop during a 2013 race in Melbourne, Australia. The 50s crew took 67 seconds; the modern crew about 5 seconds. (The sound didn't work, but the audience got the point.)
"Someone pointed out that, back in the day, they had to put gas in the car, and in the modern example, they didn't," Wang said. "How do you explain that in the DevOps world? I said, it works perfectly in the DevOps world, because putting fuel in the car was a giant bottle neck. Guess what they did? They shifted left. Technology advanced to the point where they could build engines that are not only much more powerful, but also require much less fuel…."
"We have bottlenecks in the world of software, too," he added, "and one of the biggest is testing. Sure, we can build out these fancy CI/CD pipelines that deploy our code superfast into production. But how do we maintain quality, because we no longer have the luxury of spending weeks, if not months -- if not years -- doing end-to-end functional testing. So, we shifted left, where a lot of the responsibility for quality now with the developer. Instead of trying to tack on quality at the end, we build quality in from the start ... by shifting left, just like the race car."
Wang offered one Microsoft team's DevOps evolution as emblematic of the company's overall transformation.
"We had a boxed product call TFS [Team Foundation Server]," Wang explained, "and every three to four years we would come out with a new version, and we thought that was great. And back in 2005, it may have been good enough. But about seven years ago we started realizing that we were getting out-innovated by our competitors. They were moving at a much faster pace, and we quickly saw that, if we did not change the way we worked, we would become obsolete."
The application would have to be re-architected from a boxed product -- a CD that had to be installed on physical iron -- to an app that would run in the cloud, Wang said. But even more challenging, the group itself would have to be re-organized, and familiar roles would have to be redefined or eliminated.
The new team structure now recognizes only two roles: program manager and engineer. The program manager is roughly the equivalent of a product owner in the Scrum process. Everyone else is an engineer, with no distinctions between developers and testers. Also, restructured: the teams themselves, which had operated in segregated environments: UI developers worked on the UI layer, for example, while database people worked on the database layer. The restructured teams now own the entire feature set from beginning to end, including the UI layer, the data layer, and the database itself, as well as installation, deployment, and quality. Even the workspace was reconfigured: individual offices were replaced by team rooms, where everyone works together, including the program managers.
"Developers traditionally make incredibly bad testers," Wang pointed out. "And testers traditionally make very bad coders. So how did we do this? We trained our people and we required them to adapt...."
All these changes took a toll early in the process. The attrition rate hit around 20 percent, Wang said, but the remaining 80 percent adapted successfully to their new roles.
"It was incredibly painful," he said. "We suffered a lot of attrition from all sides -- management, developers, testers -- because the new way of looking at things and doing things was very different from the way we did things before. And we all know no one really likes change. But if there's one constant in our industry it's change."
Another element of Microsoft's new DevOps orientation is something called "aligned autonomy."
"Each team is autonomous in the sense that we get to decide what's in the backlog," Wang explained. "We get to decide what the priority is -- our program manager does. We decide which process we want to use, too, so we have some teams that are very strict Scrum users; other teams that are small-A agile; and other teams that use Kanban."
That's the "autonomous" part; the "alignment" part takes the form of practices required of every team, such as three-week sprints.
In lockstep to a point; some overlap does occur, Wang said.
"VSTS services the entire globe, so deployments take a while," he said. "Because of that, we deploy through these different rings before it ends up all the way into production. It takes about a week, so we have an overlap between when we're deploying and when the next sprint starts."
To manage these overlaps, the group sets aside a couple of days for sprint planning, and then sends an e-mail to every member of every team explaining the plan for that sprint. At the end of the sprint, a second email that includes a video of what was accomplished goes out to the teams.
"It's not slick or produced," Wang said. "It's simple, just the raw video of what we're doing."
After every third sprint, all the related feature teams get together for what Wang calls a "scrum of scrums" to talk about what they did and what they're going to do. Every six months, the teams get together to check their progress against the long-term plan (18 months) and make sure they're on the right track. At that time, they re-evaluate and re-prioritize and make a new long-term plan.
"Each individual feature team is, of course, responsible for the work they need to do within a sprint," Wang said. "They are also responsible for the three-week plans. And by virtue of that, they kind of have to know and figure out what they're going to be doing every six months. Upper management, on the other hand, is looking at the big picture. They're looking at our 18 month scenarios. They're also looking at the six-month picture. But they're not looking into our plans or our backlog sprints. No micro-managing, which would cause developers to lose their minds."
These and other changes helped the VSTS group to shorten its product release cadence from once every three or four years to updates every three or four months, and eventually, every three weeks.
"Someone once asked me, Why three weeks?" Wang said. "Is that some magical number that Microsoft came up with? The answer is, no. We tried four weeks, and that was a really long time and didn't seem very agile to us. We tried one week, and that was a disaster. We tried two, and I think if we were a slightly smaller organization, it would have been perfect, but there's just too much overhead when you have 500 developers working together, trying to have this aligned autonomy. Three weeks just felt great."
"It's just something that worked for us," Wang added. "Everything I'm saying right now is what works for us, but that doesn't mean it'll work for everybody. Every team and every company is different."
The next Visual Studio Live! events are Visual Studio Live! Boston from June 10-14 and Visual Studio Live! Redmond at Microsoft Headquarters from August 13-17.
John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS. He can be reached at [email protected].