Fuzzing Fundmentals -- Visual Studio Magazine

Tech Brief

Fuzzing Fundmentals

Secure your software with fuzz testing.

By Dave Weinstein
06/01/2007

One of the industry's most useful and effective security techniques owes its existence to a dark and stormy night. In 1989, a member of Professor Barton Miller's team at the University of Wisconsin was logged on to his workstation via a dial-up line that was experiencing extreme line noise. The scrambled characters in command-line arguments were sufficient to crash a significant number of basic operation system utilities. This phenomenon led Miller's team to investigate the response of software to deliberately malformed data.

The result of this research was a technique called "fuzzing" (named after one of the tools developed during the course of the initial research). At Microsoft, its use has resulted in the discovery of more than 1,000 Bulletin-class issues during software development. Here's a look at the three basic steps in the fuzzing process:

Malform the Data
The first step is to take existing valid data and change it maliciously. Malformed data can range from random noise to subtle manipulations of well-understood data structures.

On targets that have never been subject to fuzz testing, even dumb attacks (for example, generation of utterly random binary data) can result in crashes. Even something as simple as randomly flipping bits in the data, or removing null terminators, lengthening strings or buffers, or randomly replacing a value with an "interesting" byte (for example, 0xff, or 0x80, or 0x00) can cause crashes or reveal vulnerabilities. Fuzzers can either create their own data ("generation fuzzing") or they can modify data from actual sources ("mutation fuzzing"). If you're using any kind of mutation fuzzing, you want to have a representative set of templates, whether that's files that the application consumes, or test tools that exercise the network protocol fully, or a set of database queries.

When fuzzing, it's vital to get the malformed data to the components under test. If the format is in XML, it's important to confine fuzzing to the data in the XML document rather than the XML structure itself if you're doing anything other than testing an XML parser. Fuzzing the XML structure will stress the XML format verification code rather than the application that consumes the XML data. If a format includes a Checksum or Cyclic Redundancy Check (CRC), it's important to either disable the value checking or recompute it before passing the data to the application; otherwise the malformed data will not get further than the code that verifies the Checksum or CRC.

Deliver the Data to the Target
The second step in the fuzzing process is to deliver the malformed data to the target. How the data is delivered is very much dependent on individual cases. For example, it may be easiest to write a tool that uses the file load code to load the data structure, malform one or more fields, and then use the file save code to write a legitimately formatted file with invalid data. The same techniques can be used for XML/SOAP, or even binary formats. It's possible to develop tools that are designed to deliver data to a wide range of applications or services; however, this type of investment probably isn't necessary for smaller projects or organizations.

Monitor the Target
The final step of the fuzzing process is to monitor the target to determine if the fuzzing had an impact. The classic three cases to look for are:

crash;
memory spike;
CPU usage spike.

Any crash is a code bug, and may well be an exploitable one. Exploitable means it's possible to run code of the attacker's bidding. Because significant expertise is required to determine if a crash is exploitable, the best practice is simply to fix any crashes you find.

A memory spike or CPU usage spike indicates that malicious data could cause a denial of service, and again, the best practice is to fix the code that was using the fuzzed data.

Fuzzing is weakest when you try to find vulnerabilities that don't result in a crash. In those cases, you need custom monitoring that's specific to your application or service and the changes being made to the data. For example, if the fuzzing tried to inject a SELECT * FROM DATABASE into a URL-based query, monitoring the network traffic could detect SQL injection. If fuzzing attempts to inject a JavaScript alert, the monitoring system would need to detect the pop-up window.

Writing and maintaining secure code is fundamentally a process. Fuzzing isn't a panacea: It doesn't solve fundamental design issues, nor can it easily find vulnerabilities that don't result in a crash. However, it's such an efficient and effective technique for finding defects that its use has been mandated across Microsoft. Whether you use a commercial or freely available fuzzer, or develop your own technology, applying fuzzing to any part of your application or service that accepts data from third parties is a vital part of modern software development.

Printable Format

comments powered by Disqus

Featured

Visual Studio 2026 Gives Copilot Built-In Skills -- and Makes Them Prove Their Worth

Microsoft is moving Agent Skills beyond bring-your-own instructions by shipping expert-authored workflows with the IDE, while keeping them off by default until testing shows their benefits justify the additional token use.
Copilot AI Billing Shock Met with Meters, Caps and Token-Saving Tools

GitHub is layering spending limits, expanded credit allowances and increasingly granular usage reporting onto Copilot, while Microsoft is reworking Visual Studio and VS Code to expose -- and reduce -- the cost of agentic development.
The AI-Powered Software Development Lifecycle

René van Osnabrugge makes the case that AI's biggest opportunity in software development is not faster coding -- it's reducing the friction everywhere else in the SDLC.
Copilot Usage-Based Billing Gets a Token Dashboard

Microsoft is keeping Visual Studio's new built-in Agent Skills switched off by default while a public dashboard measures whether their performance gains justify the additional tokens they may consume.