Developer's Toolkit

Blog archive

We Can’t Live with Embedded Bugs

The Tuesday, May 30, 2006 Wall Street Journal (www.wsj.com; paid subscription required) offers an unusually candid look at how flaws in avionics software can introduce new elements of risk into flying. It cites several documented examples where software bugs provided wrong or conflicting information to flight control systems, which responded by making the wrong decisions, putting crew and passengers in danger.

Avionics and other safety-critical systems are built differently from business software, and any comparison between the two is unrealistic. Aviation software is regulated by a standards body called RTCA (http://www.rtca.org); the applicable standard is DO-178B, Software Considerations in Airborne Systems and Equipment Certification. Aviation system software cost a great deal more to develop, and is commensurately more reliable.

But aviation software is also getting much more complex. The WSJ article notes that the average airliner today has about five million lines of code, as opposed to about one million on older planes. As complexity grows, so does the potential for bugs. And future airliners (the Airbus A380 and Boeing 787) are integrating avionics systems so that the same software will perform many related tasks, rather than using separate programs for specific and narrow operations. This could well further increase complexity.

How can we resolve the conflict between complexity and reliability when lives hang in the balance? Here are two broad suggestions.

1. Establish development and testing procedures that maximize the likelihood of producing high quality software. Experience tells us that perfection cannot be achieved, even at prohibitive costs. But if there were ever a need for defined and enforced processes for development and test, this is it. But rigorous adherence to software engineering principles during development, and the use of comprehensive cases and analysis during test, can bring incremental improvements to software quality.

2. Fail safe. Besides being the name of a classic movie, it also refers to the concept that a system should fail in the safest possible way. In the case of aviation systems, a system failure or data conflict should result in control being turned back over to the pilots in a way that enables them to seamlessly take over. The design of such software will be a challenge, but it is achievable.

There is no panacea to producing high-quality software for safety-critical systems. The challenge will be greater as avionics become more complex and interrelated. But as the WSJ article points out, software-enhanced avionics systems have actually helped make flying significantly safer over the last two decades. We shouldn't give up these benefits, but building safety-critical systems is a need that is growing rapidly.

Posted by Peter Varhol on 05/30/2006


comments powered by Disqus

Featured

  • VS Code 1.123 Adds Agent Session Sync, 1M Context Windows

    Microsoft released Visual Studio Code 1.123 on June 3, adding agent-focused features, larger model context support, integrated browser updates and a new delay for some automatic extension updates.

  • Copilot Billing Shock Hits Developers

    Developer complaints about GitHub Copilot's new usage-based billing model have centered on unexpectedly rapid AI credit consumption, and neither GitHub nor Microsoft has responded directly to the backlash, though they have previously published guidance to lessen model usage costs.

  • Hands On with GitHub Copilot App Technical Preview: Turning a Blazor Issue into a PR

    GitHub's brand-new Copilot desktop app, in technical preview, handled a small Blazor issue from planning through pull request creation, but the hands-on test also showed why developers still need to verify agent work in the running app before merging.

  • At Build 2026, Microsoft Sets Up Windows as an OS for AI Agents

    Microsoft's Build 2026 Windows developer announcements point to a broader platform strategy for agentic AI, spanning terminal workflows, local models, app-building skills, Cloud PCs and operating system-level containment.

Subscribe on YouTube