Livermore Lab Pioneers Debugging Tool

How do you find a bug in a program when that program is spread across 200,000 processors?

As incredible as that scenario might sound, it is becoming a routine problem for Lawrence Livermore National Laboratory, home of the 212,992-core BlueGene/L supercomputer. To help spot bugs, laboratory researchers, along with those from the University of Wisconsin, have developed a new software program, named Stack Trace Analysis Tool (STAT).

"What we are finding is that today's architectures require novel [debugging] techniques," said Lawrence Livermore researcher Gregory Lee, who presented a paper about the new software at the recent SC08 conference in Austin.

Such debugging may become more crucial in years to come, as the largest petascale-systems might soon consist of at least 1 million cores.

Lee noted that many full-featured debuggers for parallel processor-based programs are already on the market, such as TotalView Technologies' TotalView. Such parallel debugging tools do not scale well for programs that run across thousands of processors because they cannot complete analyses within a reasonable amount of time. Such tools' thoroughness slows them down when they work on too many processors -- the data structures they create grow too unwieldy.

"Even if your tool works with today's scales, if you take that same application and add one or two orders of magnitude, then some of the things you do now may not work well," Lee said.

The open-source STAT is not a full-featured debugger. It can encircle the problem area within a large parallel program, and more-thorough commercial debuggers then fix the problem.

"We wanted to develop lightweight tools that would help the heavyweight tools by identifying processes that behave in a similar fashion," Lee said.

STAT takes advantage of the fact that most parallel applications run similar processes across multiple nodes. Most debuggers can show each and every process. When analyzing thousands of processors, it would be too difficult for the developer to sort through all those processes even if the debugger could generate all that information in a reasonable amount of time.

STAT works by collapsing identical processes into a single visual representation. The software program gathers information about all the processes running and then merges them into a tree graph. It also offers the option of building a 3-D graph tree, which can show the program running over a period of time. Both approaches are good at locating weaknesses in unstable programs, such as deadlocking.

In one test using BlueGene/L, the research team was able to merge all 212,992 processes of a program into a single graph tree in about of a third of a second. "If you interpolate those results to a machine with 1 million cores, you're still talking about latencies that are tolerable," Lee said.

The Lawrence Livermore BlueGene/L support team has just installed STAT for production debugging use, Lee said. Users can deploy STAT alongside the laboratory’s copy of TotalView to vector and remediate code bugs. "We ran it on a couple of real end-cases," Lee said.

About the Author

Joab Jackson is the chief technology editor of Government Computing News (

comments powered by Disqus


  • .NET Core Ranks High Among Frameworks in New Dev Survey

    .NET Core placed high in a web-dominated ranking of development frameworks published by CodinGame, which provides a tech hiring platform.

  • Here's a One-Stop Shop for .NET 5 Improvements

    Culled from reams of Microsoft documentation, here's a high-level summary of what's new for performance, networking, diagnostics and more, along with links to the nitty-gritty details for those wanting to dig in more.

  • Azure SQL Database Ranked Among Top 3 Databases of 2020

    Microsoft touted the inclusion of Azure SQL Database among the top three databases of 2020 in a popularity ranking by DB-Engines, which collects and manages information about database management systems, updating its lists monthly.

  • Time Tracker Says VS Code Is No. 1 Editor for Devs, Some Working 15+ Hours Per Day

    WakaTime, which does time tracking for programmers, released data for 2020 showing that Visual Studio Code is by far the top editor/IDE used by its coders, some of whom are hacking away for more than 15 hours per day.

Upcoming Events