Tech Brief

Fuzzing Fundmentals

Secure your software with fuzz testing.

One of the industry's most useful and effective security techniques owes its existence to a dark and stormy night. In 1989, a member of Professor Barton Miller's team at the University of Wisconsin was logged on to his workstation via a dial-up line that was experiencing extreme line noise. The scrambled characters in command-line arguments were sufficient to crash a significant number of basic operation system utilities. This phenomenon led Miller's team to investigate the response of software to deliberately malformed data.

The result of this research was a technique called "fuzzing" (named after one of the tools developed during the course of the initial research). At Microsoft, its use has resulted in the discovery of more than 1,000 Bulletin-class issues during software development. Here's a look at the three basic steps in the fuzzing process:

Malform the Data
The first step is to take existing valid data and change it maliciously. Malformed data can range from random noise to subtle manipulations of well-understood data structures.

On targets that have never been subject to fuzz testing, even dumb attacks (for example, generation of utterly random binary data) can result in crashes. Even something as simple as randomly flipping bits in the data, or removing null terminators, lengthening strings or buffers, or randomly replacing a value with an "interesting" byte (for example, 0xff, or 0x80, or 0x00) can cause crashes or reveal vulnerabilities. Fuzzers can either create their own data ("generation fuzzing") or they can modify data from actual sources ("mutation fuzzing"). If you're using any kind of mutation fuzzing, you want to have a representative set of templates, whether that's files that the application consumes, or test tools that exercise the network protocol fully, or a set of database queries.

When fuzzing, it's vital to get the malformed data to the components under test. If the format is in XML, it's important to confine fuzzing to the data in the XML document rather than the XML structure itself if you're doing anything other than testing an XML parser. Fuzzing the XML structure will stress the XML format verification code rather than the application that consumes the XML data. If a format includes a Checksum or Cyclic Redundancy Check (CRC), it's important to either disable the value checking or recompute it before passing the data to the application; otherwise the malformed data will not get further than the code that verifies the Checksum or CRC.

Deliver the Data to the Target
The second step in the fuzzing process is to deliver the malformed data to the target. How the data is delivered is very much dependent on individual cases. For example, it may be easiest to write a tool that uses the file load code to load the data structure, malform one or more fields, and then use the file save code to write a legitimately formatted file with invalid data. The same techniques can be used for XML/SOAP, or even binary formats. It's possible to develop tools that are designed to deliver data to a wide range of applications or services; however, this type of investment probably isn't necessary for smaller projects or organizations.

Monitor the Target
The final step of the fuzzing process is to monitor the target to determine if the fuzzing had an impact. The classic three cases to look for are:

  • crash;
  • memory spike;
  • CPU usage spike.

Any crash is a code bug, and may well be an exploitable one. Exploitable means it's possible to run code of the attacker's bidding. Because significant expertise is required to determine if a crash is exploitable, the best practice is simply to fix any crashes you find.

A memory spike or CPU usage spike indicates that malicious data could cause a denial of service, and again, the best practice is to fix the code that was using the fuzzed data.

Fuzzing is weakest when you try to find vulnerabilities that don't result in a crash. In those cases, you need custom monitoring that's specific to your application or service and the changes being made to the data. For example, if the fuzzing tried to inject a SELECT * FROM DATABASE into a URL-based query, monitoring the network traffic could detect SQL injection. If fuzzing attempts to inject a JavaScript alert, the monitoring system would need to detect the pop-up window.

Writing and maintaining secure code is fundamentally a process. Fuzzing isn't a panacea: It doesn't solve fundamental design issues, nor can it easily find vulnerabilities that don't result in a crash. However, it's such an efficient and effective technique for finding defects that its use has been mandated across Microsoft. Whether you use a commercial or freely available fuzzer, or develop your own technology, applying fuzzing to any part of your application or service that accepts data from third parties is a vital part of modern software development.
comments powered by Disqus

Featured

  • Windows Community Toolkit v8.2 Adds Native AOT Support

    Microsoft shipped Windows Community Toolkit v8.2, an incremental update to the open-source collection of helper functions and other resources designed to simplify the development of Windows applications. The main new feature is support for native ahead-of-time (AOT) compilation.

  • New 'Visual Studio Hub' 1-Stop-Shop for GitHub Copilot Resources, More

    Unsurprisingly, GitHub Copilot resources are front-and-center in Microsoft's new Visual Studio Hub, a one-stop-shop for all things concerning your favorite IDE.

  • Mastering Blazor Authentication and Authorization

    At the Visual Studio Live! @ Microsoft HQ developer conference set for August, Rockford Lhotka will explain the ins and outs of authentication across Blazor Server, WebAssembly, and .NET MAUI Hybrid apps, and show how to use identity and claims to customize application behavior through fine-grained authorization.

  • Linear Support Vector Regression from Scratch Using C# with Evolutionary Training

    Dr. James McCaffrey from Microsoft Research presents a complete end-to-end demonstration of the linear support vector regression (linear SVR) technique, where the goal is to predict a single numeric value. A linear SVR model uses an unusual error/loss function and cannot be trained using standard simple techniques, and so evolutionary optimization training is used.

  • Low-Code Report Says AI Will Enhance, Not Replace DIY Dev Tools

    Along with replacing software developers and possibly killing humanity, advanced AI is seen by many as a death knell for the do-it-yourself, low-code/no-code tooling industry, but a new report belies that notion.

Subscribe on YouTube