News

VS Offline? Mid-Month Outage, Explained

An explanation of the Visual Studio Online outage that occurred July 18 was simply Azure SQL Database slowness, but it was a bit more than that.

Visual Studio Online was offline for about 90 minutes on July 18, and the outage was significant enough that Microsoft Technical Fellow Brian Harry blogged an explanation of it. In short, the cause was slowness of an Azure SQL Database.

Harry explained that the outage occurred during non-peak hours so as to affect very few customers. Even so, there were some lessons to be taken away from the cause of the outage.

A simple explanation goes like this: the Visual Studio IDE was calling Shared Platform Services (SPS) "to establish a connection to get notified about updates to roaming settings"; SPS called Azure Service Bus (ASB); and ASB was calling the Azure SQL Database. The cause wasn't the services themselves, but as Harry explained, failure to "handle a transient failure in a secondary service properly and allowed it to cascade into a total service outage." In other words, one thing leads to another and problems tend to pile up rather than resolve themselves of their own accord.

Harry said that several takeaway lessons could be learned from the incident:

  • Smaller services are better: Don't pair critical and non-critical services on a shared resource; factor services into as small an atomic unit of work as possible to minimize failure points.
  • With services reliant on each other, retries will pile up exponentially. Assume the worst even if the retries seem normal.
  • Prioritize traffic and set up thresholds along the Service Bus.
  • Synchronous or asynchronous calls, threading matters.

In short: "The key thing is to examine each and every failure, trace the failure all the way to the root cause, generalize the lessons and build defenses for the future."

How do outages like the one that took down Visual Studio Online impact you or your confidence of cloud-based services? Chime in below or write me at [email protected]

About the Author

You Tell 'Em, Readers: If you've read this far, know that Michael Domingo, Visual Studio Magazine Editor in Chief, is here to serve you, dear readers, and wants to get you the information you so richly deserve. What news, content, topics, issues do you want to see covered in Visual Studio Magazine? He's listening at [email protected].

comments powered by Disqus

Featured

  • Death of the Dev Machine?

    Here's a takeaway from this week's Ignite 2020 event: An advanced Azure cloud portends the death of the traditional, high-powered dev machine packed with computing, memory and storage components.

  • COVID-19 Is Ignite 2020's Elephant in the Room: 'Frankly, It Sucks'

    As in all things of our new reality, there was no escaping the drastic changes in routine caused by the COVID-19 pandemic during Microsoft's big Ignite 2020 developer/IT pro conference, this week shifted to an online-only event after drawing tens of thousands of in-person attendees in years past.

  • Visual Studio 2019 v16.8 Preview Update Adds Codespaces

    To coincide with the Microsoft Ignite 2020 IT pro/developer event, the Visual Studio dev team shipped a new update, Visual Studio 2019 v16.8 Preview 3.1, with the main attraction being support for cloud-hosted Codespaces, now in a limited beta.

  • Speed Lines Graphic

    New for Blazor: Azure Static Web Apps Support

    With Blazor taking the .NET web development world by storm, one of the first announcements during Microsoft's Ignite 2020 developer/IT event was its new support in Azure Static Web Apps.

  • Entity Framework Core 5 RC1 Is Feature Complete, Ready for Production

    The first release candidate for Entity Framework 5 -- Microsoft's object-database mapper for .NET -- has shipped with a go live license, ready for production.

Upcoming Events