News

When an Outage Keeps You from Reporting Said Outage: GitHub Addresses Vexing Problem

GitHub has addressed vexing outages, including recent days-long email delivery problems that had developers raging on social media, blasting the popular Microsoft-owned code repository for not being able to log in to their accounts -- or report the problem.

"[W]aiting for my email verification for 2 days now," read one of some 63 comments on a seemingly innocuous GitHub social media post last month titled "What could you achieve with GitHub Copilot?"

The answer to that question, for many, was along the lines of "not much." Many users couldn't even report the problem because it relied on emails that weren't being sent, with one developer saying, "It's ridiculous that I have to login to be able to report a login issue :-?"

'Also, 1 more funny thing, when you try to send an e-mail'
[Click on image for larger view.]'Also, 1 more funny thing, when you try to send an e-mail' (source: Facebook).

A sampling of other comments includes:

  • GitHub is broken. Rejecting mail in and not sending mail out.
  • I'm experiencing this as well. Verification and password reset emails are not being sent. Edit: The emails finally came through after about 2 hours
  • Unable to log in and retrieve password, the emails are full and the app cannot change password, the experience is very bad
  • I don't receive any e-mail for verification at all. The funny thing is that to write to the support you need to write the verification code
  • No email with password reset, cannot contact support because verification email doesn't work, and when I try to send you an email I get the below answer [same as screenshot above]. Mates, you need to fix it ASAP, people are locked out of their accounts

In its recently published "GitHub Availability Report: April 2024," the company explained the problem and how it was being addressed. The report noted, "In April, we experienced four incidents that resulted in degraded performance across GitHub services." The email issue was the longest-lasting and apparently most concerning to users.

The Problem

April 11 08:18 UTC (lasting 3 days, 4 hours, 23 minutes)

Between April 11 and April 14, GitHub.com experienced significant delays (up to two hours) in delivering emails, particularly for time-sensitive emails like password reset and unrecognized device verification. Users without 2FA attempting to sign in on an unrecognized device were unable to complete device verification, and users attempting to reset their password were unable to complete the reset. The delays were caused by increased usage of a shared resource pool, and a separate internal job queue that became unhealthy and prevented the mailer queue from processing.

The Solution

Immediate improvements have been made to better detect and react to similar situations in the future, including a queue-bypass ability for time-sensitive emails and updated methods of detection for anomalous email delivery. The unhealthy job queue has been paused to prevent impact to other queues using shared resources.

The report also detailed problems of much shorter durations, ranging from 30 minutes to 2 hours, and the company's response to those issues.

All Systems Operational
[Click on image for larger view.] All Systems Operational (source: GitHub).

The company's status page is available to check on how GitHub is doing at any given time, with the latest report showing all systems operational. An incident history site is also available for information on past problems.

Oops

While this article was being written, these messages popped up on the status page:

Degraded Performance
[Click on image for larger view.] Degraded Performance (source: GitHub).

Just minutes after the above messages popped up, all systems returned to operational status. So problems do happen with complicated software systems, as any developer reading this well knows. But GitHub is good at keeping users apprised and informed about what's going on, and what's being done about it.

About the Author

David Ramel is an editor and writer at Converge 360.

comments powered by Disqus

Featured

  • Uno Platform Wants Microsoft to Improve .NET WebAssembly in Two Ways

    Uno Platform, a third-party dev tooling specialist that caters to .NET developers, published a report on the state of WebAssembly, addressing some shortcomings in the .NET implementation it would like to see Microsoft address.

  • Random Neighborhoods Regression Using C#

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the random neighborhoods regression technique, where the goal is to predict a single numeric value. Compared to other ML regression techniques, advantages are that it can handle both large and small datasets, and the results are highly interpretable.

  • As Some Orgs Restrict DeepSeek AI Usage, Microsoft Offers Models and Dev Guidance

    While some organizations are restricting employee usage of the new open source DeepSeek AI from a Chinese company due to data collection concerns, Microsoft has taken a different approach.

  • Useful New-ish Features in .NET/C#

    We often hear about the big new features in .NET or C#, but what about all of those lesser known, but useful new features? How exactly do you use constructs like collection indices and ranges, date features, and pattern matching?

  • TypeScript 5.8 Beta Speeds Program Loads, Updates

    "TypeScript 5.8 introduces a number of optimizations that can both improve the time to build up a program, and also to update a program based on a file change in either --watch mode or editor scenarios."

Subscribe on YouTube