News

VSLive! Keynote: Abel Wang Details Microsoft's Painful DevOps Journey

It sometimes seems as though Abel Wang is vying to replace The Most Interesting Man in the World. The senior cloud developer advocate at Microsoft who specializes in DevOps is a hard-core coder, which is not a surprise.

But he's also a hard-core runner and rocker who plans to take on the Great Wall Marathon while playing his guitar.

"I'm training hard so I don't end up throwing up along the way," Wang told attendees during his keynote at this year's Redmond, WA, edition of the Visual Studio Live! conference.

Wang is also a member of a five-person team of DevOps practitioners led by Microsoft's Principal DevOps Manager, Donovan Brown.

The group earned the unlikely name, "The League of Extraordinary Cloud DevOps Advocates," (#LoECDA). Wang admitted that it's "a stupid name," but the team's mission is anything but.

"We are five DevOps practitioners and our sole charter is to help our customers use DevOps best practices to get into Azure," he said.

"Notice I didn't say 'using Visual Studio Team Services.' I love VSTS, but our charter is to help anybody using any tools whatsoever to get into Azure using DevOps best practices."

Wang's team has been at the forefront of Microsoft's journey from plodding, waterfall-oriented software provider to agile, cloud-based organization, and his keynote focused on the lessons learned during that transformation.

It was, he said, "incredibly painful."

"I can't begin to explain to you guys how miserably painful it was to make this change," he said.

"We had to re-architect our apps, change the way we collect metrics, and restructure our teams to that we could continuously deliver value. It was so painful, in fact, that we suffered major amounts of attrition. People left our groups because of these changes."

Why did Microsoft decide to make these painful changes? About seven years ago, the company noticed that it was being "out-innovated" by its competitors, Wang said. "We were starting to become obsolete, and we quickly realized that we needed to iterate faster and adopt the DevOps mantra of continuously delivering value."

"I can't begin to explain to you guys how miserably painful it was to make this change. People left our groups because of these changes."

Abel Wang, Senior Cloud Developer Advocate, Microsoft

One of Microsoft's fundamental challenges during this transition, Wang said, was one the entire software industry faces: how to define DevOps.

"I've said 'DevOps' a whole bunch, but what exactly is it?" he said. "If I asked 10 people in this room what DevOps is, I bet I'm going to get 20 different responses. And I'm not even saying that any of those responses would be wrong. At Microsoft, DevOps is the union of people, process, and products to enable the continuous delivery of value to our end users. Notice I didn't say, 'continuously deliver code,' because what do piles and piles of code give us? Nothing. And notice that I didn't even say 'continuously deliver features,' because we could be pumping out feature after feature, but if it's not what our end users want or need, why are we doing it?"

Microsoft today is a highly agile organization that has adopted the DevOps mindset, he said. "In everything that we do, we are continuously thinking, how can we continuously deliver value to our end users, better, faster, with higher quality?"

Another key realization: Microsoft would definitely have to move into the cloud. "We knew we would have to be cloud-driven, and that meant that we would have to have TFS hosted in the cloud," he said. "But in order to get that to happen, we would have to rewrite TFS. It was never designed to run in the cloud and have multi-tenants. We spent many, many sprints just trying to figure out how to re-architect it and do that behind the scenes. We lost a lot of time in terms of innovation, just to re-architecting. And on top of that, we had change how we structured our engineering teams as well."

"Microsoft used iterate every three years," he added. "Today, like clockwork the company puts out new bits into VSTS. Every three weeks, magically, new features just light up. What we don't tell people -- but I'm going to tell you, and we'll see if I have a job afterward -- every single day we push out two flights of bug fixes.

"That's moving at DevOps speed!"

Fixing Bugs
[Click on image for larger view.] Fixing Bugs (source: Abel Wang).

Microsoft now recognizes only two roles: program manager and engineer. The program manager is roughly the equivalent of a product owner in the Scrum process. Everyone else is an engineer, with no distinctions between developers and testers. Also, restructured: the teams themselves, which had operated in segregated environments: UI developers worked on the UI layer, for example, while database people worked on the database layer. The restructured teams now own the entire feature set from beginning to end, including the UI layer, the data layer, and the database itself, as well as installation, deployment, and quality. Even the workspace was reconfigured: individual offices were replaced by team rooms, where everyone works together, including the program managers.

"We realized that we no longer had the time to do full, end-to-end functional testing," Wang said. "That takes, what, months to do? Maybe even years? We had to find a way to maintain quality at the faster pace."

To solve that problem, the company decided to shift left and embrace unit testing, Wang said, which brought on the most painful cultural change. "We realized that we needed engineers who could do everything," he said. "No more of this, 'Quality is not my problem, I just write code,' from developers, and we couldn't just have QA people who only knew how to run QA scripts."

The company structured its new teams with one program manager for between 10 12 engineers. About 50 teams would span the entire VSTS product line, with people located in Redmond, North Carolina, and India. And each team would be completely cross discipline and responsible for a specific feature, which they own from beginning to end -- basically everything that pertains to that feature, including the front end, the backend, the database, the install and deployment scripts, and the quality.

Upper management looks at the big picture: the six-month picture, 18-month scenarios. But there's not micro-managing, Wang said.

And each team enjoys what Microsoft calls "aligned autonomy." They decide which process they are going to use -- some teams are strict Scrum users; other teams are small-A agile; and other teams that use Kanban, Wang explained. "They even decide what's in the backlog and what's a priority. The "alignment" comes in the form of practices required of every team, such as three-week sprints. And the members of each team are required to work in the same room together.

Abel Wang at Visual Studio Live!
[Click on image for larger view.] Abel Wang at Visual Studio Live!

"It's a physical team room in which we all sit together," Wang said. "And I know some people think team rooms are horrible, and I used to be one of those people. But there is something so incredibly productive about being in a room where all the people who is working on the feature set for your stuff are in one place. If I have a question, I just turn around and ask it, and somebody answers it."

Each team room has its own culture at Microsoft, Wang added. His own team, for example, loves to shoot Nerf guns at each other to release stress. "If you walk into that room unarmed, you are going to die!" he said.

To keep things fresh, team members are allowed to move to other teams every 12-18 months. "Every 12-18 months we do this crazy, yellow sticky note wall exercise thing, where managers put up signs saying, 'come work on my feature, it's the coolest ever,' and we put our names under the features we want to work on, and it's absolutely crazy and awesome," he said. About 90 percent of the time, people stick with their teams, he said, but giving developers the ability to pick means they no longer feel trapped. They have the power.

Another "alignment" rule: everyone works in three-week sprints, a number Microsoft arrived at through trial and error, Wang said. "I was a huge fan of two-week sprints, but with so many teams, it became too cumbersome," he said.

VSTS services the entire globe, so deployments take a while -- about a week, Wang said. And the teams don't push everything to everyone at once, but through a set of "rings." The code is tested in each ring, so that if any bad code has gotten through, it doesn't impact the whole world.

This approach creates some overlap between when the code is deployed and the next sprint. To manage these overlaps, the teams set aside a couple of days for sprint planning, and then send e-mail to each other explaining the plan for that sprint. At the end of the sprint, a second email that includes a video of what was accomplished goes out to the teams.

After every third sprint, all the related feature teams get together for what Wang called a "scrum of scrums" to talk about what they did and what they're going to do. Every six months, the teams get together to check their progress against the long-term plan (18 months) and make sure they're on the right track. At that time, they re-evaluate and re-prioritize and make a new long-term plan.

These and other changes helped the VSTS group to shorten its product release cadence from once every three or four years to updates every three or four months, and eventually, every three weeks.

"Some of you might think that this is what you should do, but every organization is different, but you should really do what works for you."

Wang invited attendees to post their DevOps questions on Twitter with the #LoECDA hashtag and promised that someone from his team will respond or find someone at Microsoft who can.

About the Author

John K. Waters is the editor in chief of a number of Converge360.com sites, with a focus on high-end development, AI and future tech. He's been writing about cutting-edge technologies and culture of Silicon Valley for more than two decades, and he's written more than a dozen books. He also co-scripted the documentary film Silicon Valley: A 100 Year Renaissance, which aired on PBS.  He can be reached at [email protected].

comments powered by Disqus

Featured

  • Get Started Using .NET Aspire with SQL Server & Azure SQL Database

    Microsoft experts are making the rounds educating developers about the company's new, opinionated, cloud-ready stack for building observable, production ready, distributed, cloud-native applications with .NET.

  • Microsoft Revamps Fledgling AutoGen Framework for Agentic AI

    Only at v0.4, Microsoft's AutoGen framework for agentic AI -- the hottest new trend in AI development -- has already undergone a complete revamp, going to an asynchronous, event-driven architecture.

  • IDE Irony: Coding Errors Cause 'Critical' Vulnerability in Visual Studio

    In a larger-than-normal Patch Tuesday, Microsoft warned of a "critical" vulnerability in Visual Studio that should be fixed immediately if automatic patching isn't enabled, ironically caused by coding errors.

  • Building Blazor Applications

    A trio of Blazor experts will conduct a full-day workshop for devs to learn everything about the tech a a March developer conference in Las Vegas keynoted by Microsoft execs and featuring many Microsoft devs.

  • Gradient Boosting Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the gradient boosting regression technique, where the goal is to predict a single numeric value. Compared to existing library implementations of gradient boosting regression, a from-scratch implementation allows much easier customization and integration with other .NET systems.

Subscribe on YouTube