Developer's Toolkit

Blog archive

Simulation May Be Key for Multicore Debugging

I hate race conditions. Developers of multithreaded applications who have bugs that are dependent upon the timing both within and between threads curse race conditions. They go against every instinct that a developer has about computers and software – they are, or at least appear to be, nondeterministic. It depends on when a particular thread finishes a particular task, and it can vary depending on the execution path taken, or the amount of data read and written. Or it can vary based on the time of day, or phase of the moon.

Just kidding about the last part, but to many developers it can seem like a reasonable statement.

Most of us have encountered bugs in clean builds that mysteriously disappear when debug information is added. This is often related to the race condition, in that debugging information tends to slow down execution, changing timings and getting rid of the problem. Of course, the problem is still there, but it cannot be found while running the debugger.

These were the thoughts that perked me up as I participated in a briefing from Paul McLellan of Virtutech (www.virtutech.com). Virtutech makes software, primarily for embedded systems development, that enables developers to simulate the underlying hardware and operating systems. This is especially important for embedded development, such as cell phone software, where the hardware may not even be ready when application development is in full swing.

But there are lessons here for PC developers, who are increasingly facing problems that can be addressed by debugging on simulated systems. The lessons come from, of all things, debugging software. You can run the simulator backwards, explained McLellan. Just get to the point where the bug occurs, then step backwards.

That is an extremely powerful debugging technique, because you do not have to guess at where to start stepping forward using conventional debugging techniques. But it does not end there. If you have a multicore system, McLellan continued. You can run one CPU at ten times the clock rate of the other. You can change the timing until the bug disappears.

Of course, this by itself does not find the bug for you, but it does give you one more tool in your arsenal, and it is a tool that isn't possible when you are debugging on real hardware. We would find good uses for such a tool in practical debugging situations.

I commented a few weeks ago (http://www.ftponline.com/weblogger/forum.aspx?ID=11&DATE=8/20/2006) that programming multiprocessor systems represents one of the fundamental problems of computer science for the foreseeable future, perhaps requiring new languages or new techniques that are not yet in common practice. (NB – The distinction I am making between multiprocessor and multicore systems is that in a multiprocessor system, the system has multiple complete processors on separate dies, whereas in multicore systems the processor has multiple CPUs on the same die).

One of those techniques may well be running multiprocessor and multicore application software on a simulated processor and operating system platform. Consider how this might work. You write a threaded application in which different threads are designed to run on different CPU cores. In the course of debugging, you come across an application crash that seems to occur almost randomly, even in different parts of the code.

On a simulated platform, you can do a debug build and engage the debugger. Then let the application run until the crash, and back up the execution of the simulator, watching thread activity and variable values on different cores as you normally would in the debugger, except in reverse. Once you've established that a race condition exists, you can speed up one of the processors until it disappears, confirming the condition and getting a better idea of the timing involved.

This will not make the code easier to write, of course. The developer is still responsible for locking resources and data from access and change by multiple threads, and for blocking thread execution when it can be harmful. But I cannot help but think that simulated platforms will play some role in multiprocessor and multicore application development in the future.

Posted by Peter Varhol on 09/18/2006


comments powered by Disqus

Featured

  • AI for GitHub Collaboration? Maybe Not So Much

    No doubt GitHub Copilot has been a boon for developers, but AI might not be the best tool for collaboration, according to developers weighing in on a recent social media post from the GitHub team.

  • Visual Studio 2022 Getting VS Code 'Command Palette' Equivalent

    As any Visual Studio Code user knows, the editor's command palette is a powerful tool for getting things done quickly, without having to navigate through menus and dialogs. Now, we learn how an equivalent is coming for Microsoft's flagship Visual Studio IDE, invoked by the same familiar Ctrl+Shift+P keyboard shortcut.

  • .NET 9 Preview 3: 'I've Been Waiting 9 Years for This API!'

    Microsoft's third preview of .NET 9 sees a lot of minor tweaks and fixes with no earth-shaking new functionality, but little things can be important to individual developers.

  • Data Anomaly Detection Using a Neural Autoencoder with C#

    Dr. James McCaffrey of Microsoft Research tackles the process of examining a set of source data to find data items that are different in some way from the majority of the source items.

  • What's New for Python, Java in Visual Studio Code

    Microsoft announced March 2024 updates to its Python and Java extensions for Visual Studio Code, the open source-based, cross-platform code editor that has repeatedly been named the No. 1 tool in major development surveys.

Subscribe on YouTube