Announcing Manticore 0.3.0

Earlier this week, Manticore leapt forward to version 0.3.0. Advances for our symbolic execution engine now include: “fast forwarding” through concrete execution that you don’t care about, support for Linux binaries statically compiled for AArch64, and an interface for selectively solving for interesting test cases. We’ve been working really hard on these and other features over the past quarter, and we’re excited to share what we’ve built.

Executor Refactor

Felipe Manzano completed a major refactor of Manticore’s state machine. It now uses the multiprocessing module, which could make it easier one day to implement distributed symbolic execution. You can read more details about the state machine in the pull request description. Be advised that it does introduce a few small changes to the API, the most important of which are:

  • You must now explicitly call the finalize method in order to dump test cases after a run. That means that you can inspect a state before deciding whether to invest the time to solve for a test case.
  • The will_start_run callback has been renamed to will_run
  • The solver singleton must now be accessed explicitly as Z3Solver.instance()

Unicorn Preloading

Manticore models native instructions in Python, a language that is not known for speed. Instruction throughput is only a tiny fraction of what you’d expect on a concrete CPU, which can be really unfortunate when the code you care about is buried deep within a binary. You might spend several minutes waiting for Manticore to execute a complicated initialization routine before it ever reaches anything of interest.

To handle cases like this, we’ve added a Unicorn emulator plugin that allows Manticore to “fast forward” through concrete execution that you don’t care about. Unicorn is a fast native CPU emulator that leverages QEMU’s JIT engine for better performance. By replacing Manticore’s executor with Unicorn for unimportant initialization routines, we’ve encountered speed improvements of up to 50x. See an example of how to invoke the Unicorn emulator on the pull request.

AArch64 Support

Over the past four months, Nikita Karetnikov added support for Linux binaries statically compiled for AArch64. Since it’s a brand-new architecture, we’ve left in many of the debugging components in order to help us diagnose issues, a decision that may make it a bit slower than other architectures. With the growing popularity of ARMv8 CPUs for platforms ranging from embedded development boards to server farms, we look forward to receiving feedback on this new architecture.

System Call Audit

To provide an accurate symbolic execution environment, Manticore needs symbolic models of all the Linux system calls. Previously, we implemented only a subset of the most common system calls, and Manticore would throw an exception as soon as it encountered an unimplemented call. This is enough to execute many binaries, but there’s room for improvement.

With the 0.3.0 release, we’ve added a dozen new system calls, and added “stubs” to account for the ones we haven’t implemented. Now, instead of throwing an exception when it encounters an unimplemented call, Manticore will attempt to pretend that the call completed successfully. The program may still break afterwards, but we’ve found that this technique is often “good enough” to analyze a variety of problematic binaries. Just be sure to keep your eyes peeled for the “Unimplemented system call” warning message, since further analysis may be unsound if Manticore has ignored an important syscall!

Symbolic EVM Tests

One of the important guarantees that Manticore provides is that when it executes a transaction with a symbol, the result holds for all possible values of that symbol. In order for this to be trustworthy, the symbolic implementation of each instruction needs to be correct. That’s why we’ve extended our continuous integration pipeline to automatically run Manticore against the Frontier version of the Ethereum VM tests on each new commit. This will ensure that throughout further development, you’ll always be able to rely on Manticore to correctly reason about your code.

Black

We believe in clean code, which is why we’ve run Manticore through the black autoformatter. Black deterministically formats your code according to a fairly strict reading of the pycodestyle conventions so that you can focus on the content instead of the formatting. From now on, you should run black -t py36 -l 100 . on your branch before submitting a pull request.

What’s Next?

We believe that security tools are only beneficial if people actually use them, so we want to make Manticore easier for everyone to use. Over the next few months, we have big plans for Manticore’s usability, including improvements to our documentation, updating our examples repository, and conducting a formal usability study. Don’t think we’ll let the code languish, though! Our next release should include support for crytic-compile, making it even easier to analyze smart contracts in Manticore. We’ll continue working towards improved performance and eventual support for EVM Constantinople.

You can download Manticore 0.3.0 from our GitHub, via PyPI, or as a pre-built Docker container.

Using osquery for remote forensics

System administrators use osquery for endpoint telemetry and daily monitoring. Security threat hunters use it to find indicators of compromise on their systems. Now another audience is discovering osquery: forensic analysts. While osquery core is great for querying various system-level data remotely, forensics extensions will give it the ability to inspect to deeper-level data structures and metadata not even available to a user at a local system. We continued our collaboration with Crypsis, a security consulting company, to show some immediate scenarios where osquery comes in handy for forensic analysts.

Previously, we announced and briefly introduced the features of the new NTFS forensics extension that we added to our osquery-extensions repository. Today, we’ll demonstrate some familiar real-world use-cases for forensic analysts interested in leveraging osquery in their incident response efforts.

Identifying “Timestomping” Attacks

Every interaction with a filesystem leaves a trace. Attackers who want to remain undetected for as long as possible need to clean up these traces. File timestamps, if left unmodified, provide a great deal of information about the attacker’s timeline and behavior. They’re a common focus for both the attacker and the forensic analyst. “Timestomping” is the common name for the anti-forensics tactic of destroying filesystem timestamp evidence of the attacker’s file modifications.

When it comes to covering up evidence in timestamps, NTFS is a little more complicated than other filesystems. To explain, we’ll have to explore some of NTFS’s structure.

The core element of NTFS is the Master File Table (MFT), which stores an entry for every single file on the system. Every entry in the MFT contains a number of attributes that store metadata describing the file. One attribute – $STANDARD_INFORMATION ($SI) – stores a collection of timestamps. Standard files also have a $FILE_NAME ($FN) attribute that contains its own set of timestamps. The timestamps in the $SI attribute roughly correlate to interactions with the contents of the file. The timestamps in the $FN attribute roughly correlate to interactions with the location and name of the file. Finally, directory entries in the MFT have an index attribute that stores a copy of the $FN attribute (including timestamps) for all files in that directory.

Example 1: Timestamp Inconsistency

The simplest example of a timestamp attack is to change the file-creation date to a time prior to incursion. Done poorly, the $FN creation timestamp and $SI creation timestamp won’t match. The discrepancy stands out. To use osquery to find files in a directory whose timestamps don’t match, for example, I’d run the following: SELECT path,fn_btime,btime from ntfs_file_data where device=”\\.\PhysicalDrive0” and partition=3 and directory=”/Users/mmyers/Desktop/test_dir” and fn_btime != btime;

We can also look for other forms of timestamp inconsistency. Perhaps the file-creation times are left alone, and thus match, but the last modified time was set to some earlier time to avoid detection. Would you trust a file whose MFT entry’s modified time predates its creation time? Me neither: SELECT filename, path from ntfs_file_data where device=”\\.\PhysicalDrive0” and partition=2 and path=”/Users/Garret/Downloads” and fn_btime > ctime OR btime > ctime;

Example 2: Timestamp Missing Full Precision

Attackers can be lazy sometimes and timestomp a file with a built-in system utility. These utilities have a lower precision for time values than the operating system would naturally use. An analyst can spot this kind of forgery by checking the nanosecond portion of the timestamp — it’s unlikely to be all zeros, unless it has been tampered with.

We saw above that NTFS timestamps are 64-bit values. For example, consider the NTFS timestamp 131683876627452045. If you have a Windows command prompt handy, that’s Monday, April 16, 2018 9:27:43 PM — to be specific, it’s 9:27:42 PM and 0.7452045 minutes, but it was rounded up. Pretty specific! This is what a natural file timestamp looks like.

However, a file timestamp that has been set by a system utility will only have seconds-level precision, and that’s as much detail as most user-interfaces show. 131683876620000000 is also Monday, April 16, 2018 9:27:42 PM, but it sticks out like a sore thumb in integer representation. This timestamp was forged.

At first use, it might seem odd for osquery to output the NTFS timestamps in integer representation, but it serves to make this type of forgery easy to spot for an experienced forensic analyst.

Locating Evidence of Deleted Files

A user clicks a bad link or opens a bad email attachment. The malware goes to work. It downloads a couple of payloads, deploys them, collects some data on the system into a file, sends that data upstream, then deletes itself and all downloaded files from the filesystem. All neat and tidy, right?
Well, maybe not. The contents of those files might not be available any longer, but NTFS is lazy about cleaning up metadata for files, especially in the context of directory indices. A complete explanation of NTFS and directory index management is beyond the scope of this post, but we can provide a high-level overview (readers who are inclined to learn more might wish to read NTFS.com or the documentation by Russon and Fledel of the Linux-NTFS project).

Like any file on NTFS, every directory has an entry in the MFT. These entries have various attributes. The relevant attribute here is the index attribute, which in turn contains copies of the $FN attributes of the directory’s child files, arranged in a tree structure. As files are added and removed from the directory, the contents of the index attribute are updated. Entries in the index are not deleted, though—they’re simply marked as inactive, and may be overwritten later as new entries are added. Even though a file was deleted, a copy of its $FN attribute may still remain in its parent directory’s index for some time afterwards.

The NTFS forensic extension makes finding these entries relatively simple.

Example 3: A Directory’s Unused Filename Entries

Let’s delete all of the files from the last example, and empty the Recycle Bin. Then, let’s look at the unused entries in that folder’s directory index by running the following query: SELECT parent_path,filename,slack from ntfs_indx_data WHERE parent_path=”/Users/mmyers/Desktop/test_dir” and slack!=0;

There’s more information available than just filenames. Since the entire $FN attribute is stored, there are time stamps available as well. We can reconstruct a partial timeline of file activity in a directory just from the index entries. Some extra work is required, though: since directory indices are filename-based, renaming a file will in effect cause the old entry to be marked as inactive, and create a new entry in the index. Differentiating a renamed file from a deleted one will require additional analysis.
Also note that there were three files deleted, but only two files left artifacts in slack. When looking at unused data structures, we are often only seeing a partial record of what used to be there.

Getting Started

This extension offers a fast and convenient way to perform filesystem forensics on Windows endpoints as a part of an incident response. Go get it – and our other osquery extensions – from our repository. We’re committed to maintaining and extending our collection of extensions. Take a look, and see what else we have available. Visit the osquery community on Slack if you need help.

Helping incident responders with remote forensics is an area of increasing capability for osquery. Besides our NTFS forensics extension, osquery already supports file carving, system activity queries, and audit-based monitoring. There is undoubtedly still more that could be added to osquery: remote memory carving, USB device history retrieval, or filesystem forensic metadata for other filesystems.

Join us on June 20th-21st for QueryCon!

Trail of Bits is hosting the QueryCon osquery conference in New York City, June 20th and 21st, 2019. As we have demonstrated in this article with the NTFS forensics extension, there are many potential use-cases for osquery extensions, and some of the talks at QueryCon 2019 will explore some of those specifically. Victor Vrantchan will give a lesson on how to use extensions and logger plugins to integrate osquery with your existing logging infrastructure; Atul Kabra will speak about enriching osquery with ‘event-driven’ extensions.

As of the time of this writing, tickets for QueryCon are still available! Purchase yours today, and meet with the others from the osquery user and developer community. Bring your ideas for extensions, and participate in the workshop. We look forward to seeing you there!

Fuzzing Unit Tests with DeepState and Eclipser

If unit tests are important to you, there’s now another reason to use DeepState, our Google-Test-like property-based testing tool for C and C++. It’s called Eclipser, a powerful new fuzzer very recently presented in an ICSE 2019 paper. We are proud to announce that Eclipser is now fully integrated into DeepState.

Eclipser provides many of the benefits of symbolic execution in fuzzing, without the high computational and memory overhead usually associated with symbolic execution. It combines “the best of both white-box and grey-box fuzzing” using only lightweight instrumentation and, most critically, never calling an expensive SMT or SAT solver. Eclipser is the first in what we hope (perhaps with your help) to make a series of push-button front-ends to promising tools that require more work to apply than AFL or libFuzzer. Eclipser allows DeepState to quickly detect more hard-to-reach bugs.

What Makes Eclipser Special?

Traditional symbolic execution, supported by DeepState through tools such as Manticore and angr, keeps track of path constraints: conditions on a program’s input such that the program will take a particular path given an input satisfying the constraint. Unfortunately, solving such conditions is difficult and expensive, especially since many constraints are infeasible: they cannot be solved.

Many workarounds for the high cost of solving path constraints have been proposed, but most symbolic-execution based tools are still limited in scalability and prone to failure when asked to produce long paths or handle complex code. Eclipser builds on ideas developed in KLEE and MAYHEM to substitute approximate path constraints for path constraints. These conditions are (as the name suggests) less precise, but much easier to solve. Critically, they don’t require a slow solver. Eclipser still has to solve these approximate, “easy” constraints, but it can assume they are either simple and linear (in which case inexpensive techniques suffice) or at least monotonic, in which case Eclipser uses a binary search instead of a solver call. If the real constraint is neither linear nor monotonic, Eclipser will not be able to generate relevant inputs, but fuzzing may let it make progress despite this failure. In practice, symbolic execution will also often fail because of such constraints, but with a solver timeout, after wasting considerable computational effort. Eclipser will produce some input much more quickly (though not necessarily one satisfying the too-hard-to-solve conditions).

Why Should You Care?

Eclipser is interesting primarily because the authors report that it performed better in terms of code coverage on coreutils than KLEE, better in terms of bugs detected on LAVA-M benchmarks than AFLFast, LAF-intel, VUzzer, and Steelix and, most compellingly, better in terms of bugs detected on real Debian packages than AFLFast and LAF-intel. The Debian experiments produced eight new CVEs.

Given this promising performance, we decided to integrate Eclipser into DeepState, making it easy to apply the Eclipser approach to your unit testing. Out of the box, DeepState could already be used with Eclipser. The fuzzer works with any binary that takes a file as input. DeepState works with all file-based fuzzers we have tried. However, it is important to use the right arguments to DeepState with Eclipser, or else Eclipser’s QEMU-based instrumentation will not work. It also takes some manual effort to produce standalone test cases and crashing inputs for DeepState, since Eclipser stores tests in a custom format not usable by other tools. We therefore added a simple front-end to make your life (and our life) easier.

The Eclipser Paper Example

The DeepState examples directory has the code for a DeepState-ized version of the main example used in the Eclipser paper:

#include <deepstate/DeepState.hpp>
using namespace deepstate;
#include <assert.h>

int vulnfunc(int32_t intInput, char * strInput) {
   if (2 * intInput + 1 == 31337)
      if (strcmp(strInput, "Bad!") == 0)
         assert(0);
   return 0;
}

TEST(FromEclipser, CrashIt) {
   char *buf = (char*)DeepState_Malloc(9);
   buf[8] = 0;
   vulnfunc(*((int32_t*) &buf[0]), &buf[4]);
}

The easiest way to try this example out is to build the DeepState docker image (yes, DeepState now makes it easy to create a full-featured docker image):

$ git clone https://github.com/trailofbits/deepstate
$ cd deepstate
$ docker build -t deepstate . -f docker/Dockerfile
$ docker run -it deepstate bash

Building the docker image will take a while: DeepState, AFL, and libFuzzer are quick, but building Eclipser is a fairly involved process.

Once you are inside the DeepState docker image:

$ cd deepstate/build/examples
$ deepstate-eclipser ./FromEclipser --timeout 30 --output_test_dir eclipser-FromEclipser

Eclipser doesn’t need the full 30 seconds; it produces a crashing input almost immediately, saving it in eclipser-FromEclipser/crash-0. The other fuzzers we tried, AFL and libFuzzer, fail to find a crashing input even if given four hours to generate tests. They generate and execute, respectively, tens and hundreds of millions of inputs, but none that satisfy the conditions to produce a crash. Even using libFuzzer’s value profiles does not help.

Running the experiments yourself is easy:

$ mkdir foo; echo foo > foo/foo
$ afl-fuzz -i foo -o afl-FromEclipser -- ./FromEclipser_AFL --input_test_file @@ --no_fork --abort_on_fail

and

$ mkdir libFuzzer-FromEclipser
$ export LIBFUZZER_EXIT_ON_FAIL=TRUE
$ ./FromEclipser_LF libFuzzer-FromEclipser -use_value_profile=1

You’ll want to interrupt both of these runs, when you get tired of waiting.

Both angr and Manticore find this crashing input in a few seconds. The difference is that while Eclipser is able to handle this toy example as well as a binary analysis tool, the binary analysis tools fail to scale to complex problems like testing an ext3-like file system, testing Google’s leveldb, or code requiring longer tests to hit interesting behavior, like a red-black-tree implementation. Eclipser is exciting because it outperforms libFuzzer on both the file system and the red-black-tree, but can still solve “you need symbolic execution” problems like FromEclipser.cpp.

Behind the Scenes: Adding Eclipser Support to DeepState

As noted above, in principle there’s literally “nothing to” adding support for Eclipser, or most file-based fuzzers. DeepState makes it easy for a fuzzer that uses files as a way to communicate inputs to a program to generate values for parameterized unit tests. However, figuring out the right DeepState arguments to use with a given fuzzer can be difficult. At first we thought Eclipser wasn’t working because it doesn’t, if DeepState forks to run tests. Once we ran DeepState with no_fork, everything went smoothly. Part of our goal in producing front-ends like deepstate-eclipser is to make sure you never have to deal with such mysterious failures. The full code for setting up Eclipser runs, parsing command line options (translating DeepState tool argument conventions into Eclipser’s arguments), and getting Eclipser to produce standalone test files from the results takes only 57 lines of code. We’d love to see users submit more simple “front-ends” to other promising fuzzers that require a little extra setup to use with DeepState!

So, Is This the Best Fuzzer?

Will some advance in test generation technology, like Eclipser, obsolete DeepState’s goal of supporting many different back-ends? The answer is “not likely.” While Eclipser is exciting, our preliminary tests indicate that it performs slightly worse than everyone’s favorite workhorse fuzzer, AFL, on both the file system and red-black-tree. In fact, even with the small set of testing problems we’ve explored in some depth using DeepState, we see instances where Eclipser performs best, instances where libFuzzer performs best, and instances where AFL performs best. Some bugs in the red black tree required a specialized symbolic execution test harness to find (and Eclipser doesn’t help, we found out). Moreover, even when one fuzzer performs best overall for an example, it may not be best at finding some particular bug for that example.

The research literature and practical wisdom of fuzzer use repeatedly show that, even when a fuzzer is good enough to “beat” other fuzzers (and thus get a paper published at ICSE), it will always have instances where it performs worse than an “old,” “outdated” fuzzer. In fuzzing, diversity is not just helpful, it’s essential, if you really want the best chance to find every last bug. No fuzzer will be best for all programs under test, or for all bugs in a given real-world program.

The authors of the Eclipser paper recognize this, and note that their technique is complimentary to that used in the Angora fuzzer. Angora shares some of Eclipser’s goals, but relies on metaheuristics about branch distances, rather than approximate path conditions, and uses fine-grained taint analysis to penetrate some branches Eclipser cannot handle. Angora also requires source code. One big advantage of Eclipser is that unlike AFL (in non-QEMU mode) or libFuzzer, it doesn’t require you to rebuild any libraries you want to test with DeepState with additional instrumentation. At the time the Eclipser paper was written, Angora was not available to compare with, but it was recently released and is another good candidate for full integration with DeepState.

Eclipser is a great horse to add to your fuzzer stable, but it won’t win every race. As new and exciting fuzzers emerge, DeepState’s ability to support many fuzzers will only become more important. Using a diverse array of fuzzers is easy if it’s a matter of changing a variable and doing FUZZER=FOO make; deepstate-foo ./myprogram, and practically impossible if it requires rewriting your tests for every tool. In the near future, we plan to make life even easier, and support an automated ensemble mode where DeepState makes use of multiple fuzzers to test your code even more aggressively, without any effort on your part other than deciding how many cores you want to use.

Announcing Automated Reverse Engineering Trainings

Trail of Bits is excited to announce new training offerings for automated reverse engineering with Binary Ninja.

We’ve been raving about Vector35’s Binary Ninja for years. We’ve used it to:

That work, and a whole lot of correspondence, has garnered high praise from an author of Binary Ninja:

Josh is without a doubt our most knowledgeable Binary Ninja user. We pay attention very closely to any of his feedback and we couldn’t think of a better third-party instructor to teach about how to use Binary Ninja to solve reverse engineering problems.

– Jordan Wiens, Co-Founder, Vector35

If you’re doing any amount of manual reverse engineering, you really should consider learning to use Binary Ninja. Its API is much clearer than its competitors. There’s more documentation on it as well as lots of examples. You can find what you need quickly.

Binary Ninja is a much more modern design than other binary analysis tools. Vector35 built it from the ground up with the intention to continue to innovate on top of it, and avoid handcuffing themselves with past design choices. They’re constantly adding more new features and better analysis, which is exposed to allow you to write plugins on top of it and create your own tooling.

It’s much easier to automate things as well. Because of those analyses that are baked in, you don’t have to implement them yourself. Everything is lifted to an architecture-agnostic language, so that you can perform the same analysis on any language that Binary Ninja can disassemble. If you write your own architecture plugin and implement the lifter using the API, you get all of that analysis for free immediately.

If that weren’t enough to get your attention, Binary Ninja is significantly less expensive than its major competitors.

Master Binary Ninja with Help from Industry Experts

You could learn Binary Ninja by yourself. Vector35 has done a great job publishing helpful materials, managing a healthy Slack community, and giving informative presentations.

However, if you can’t bill for hours spent studying, consider our modular trainings. They can be organized to suit your company’s needs. You choose the number of skills and days to spend honing them. Here’s what you can learn and accomplish:

  • Reverse Engineering with Binary Ninja (1 day)
    By the end of this one-day module, you will be able to reverse engineer software and automate simple tasks, and you’ll be ready to dive into the primary module, Automated Reverse Engineering.
  • Automated Reverse Engineering with Binary Ninja (2 days)
    Take your reverse engineering skills to the next level. This two-day training module dives deeper into the Python API. By the end of the module, you will be able to automate common analysis tasks, as well as extend Binary Ninja’s built-in functionality with plugins.
  • Automated Malware Analysis with Binary Ninja (2 days)
    Building on the Automated Reverse Engineering module, this two-day module provides a toolbox for tackling the advanced techniques that malware uses to hide or obscure its functionality. By the end of the module, you will be able to write plugins that detect and deobfuscate strings and control flow to make sense of a binary’s functionality, as well as scripting detection routines to identify malicious behavior for batch processing.
  • Automated Vulnerability Research with Binary Ninja (2 days)
    Adding to the Automated Reverse Engineering module, this two-day module gives you the tools to automate bug-hunting tasks in binary applications, then write exploit payloads in C with Binary Ninja. Exercises are provided as a friendly Capture-the-Flag format.
  • Custom Loaders and Architectures (1 day)
    This one-day module trains you to expand Binary Ninja’s support for new file types and architectures. You will also learn how to extend existing architecture plugins. At the end of the module, you will be able to reverse engineer an instruction set, and implement disassemblers, lifters, and loader plugins.
  • Extending Binary Ninja with the C++ API (1 day)
    This one-day module demonstrates the differences between the various APIs and how to write effective Binary Ninja plugins in C++. At the end of the module, you will be able to develop standalone applications that interface with Binary Ninja’s core.

Download a PDF containing all of these modules’ descriptions.

Empower Your Analysts to do More

Reverse engineering offers tremendous potential, but if you do it manually, you’re wasting a lot of time and intelligence. Automate your reverse engineering with Binary Ninja, and accelerate your capabilities with our training modules.

Contact us to schedule a training.

Slither: The Leading Static Analyzer for Smart Contracts

We have published an academic paper on Slither, our static analysis framework for smart contracts, in the International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB), colocated with ICSE.

Our paper shows that Slither’s bug detection outperforms other static analysis tools for finding issues in smart contracts in terms of speed, robustness, and balance of detection and false positives. The paper provides more details on how the use of a sophisticated intermediate language based on Static Single Assignment (SSA) form, a key advance in the development of modern optimizing compilers, lets Slither go about its work quickly and effectively, and makes it easy to extend Slither to new tasks.

Overview and applications

First, we describe how Slither was designed and what it can do. Slither was designed to be a static analysis framework that provides fine-grained information about smart contract code and has the necessary flexibility to support many applications. The framework is currently used for the following:

  • Automated vulnerability detection. A large variety of smart contract bugs can be detected without user intervention or additional specification effort.
  • Automated optimization detection. Slither detects code optimizations that the compiler misses.
  • Code understanding. Slither summarizes and displays contracts’ information to aid your study of the codebase.
  • Assisted code review. A user can interact with Slither through its API.

Slither works as follows:

  1. It takes as initial input the Solidity Abstract Syntax Tree (AST) generated by the Solidity compiler. Slither works out of the box with the most common frameworks, including Truffle, Embark, and Dapp. You just point Slither at a contract to analyze.
  2. It then generates important information, such as the contract’s inheritance graph, the control flow graph (CFG), and the list of all expressions in the contract.
  3. Slither then translates the code of the contract into SlithIR, an internal representation language that makes precise and accurate analyses easier to write.
  4. Finally, Slither runs a set of pre-defined analyses that provide enhanced information to other modules (e.g., computing data flow, protected function calls, etc.).

Fig. 1: How Slither works

Slither vs. the World

An important part of our paper focuses on comparing Slither to other smart contract static analysis tools. We contrast Slither (release 0.5.0) with other open-source static analysis tools to detect vulnerabilities in Ethereum smart contracts: Securify (revision 37e2984), SmartCheck (revision 4d3367a) and Solhint (release 1.1.10). We decided to focus our evaluation almost exclusively on the tools’ reentrancy detectors, since reentrancy is one of the oldest, best understood, and most dangerous security issues. Figure 2 shows the classic example of a simple reentrant contract that can be exploited to drain all of its ether by calling withdrawBalance with a fallback function that calls withdrawBalance again.

Fig. 2: An exploitable reentrant contract

The reentrancy detector is one of the few that is available in all the tools we evaluated. Furthermore, we experimented with one thousand of the most used contracts (those with the largest number of transactions) for which Etherscan provides the source code, to obtain the following results:

Fig. 3: Slither outperforms the other tools in every category

Using a dataset of one thousand contracts, the tools were run on each contract with a timeout of 120 seconds, using only reentrancy detectors. We manually disabled other detection rules to avoid the introduction of bias in the measurements.

In summary, we observed the following strengths in our tool in terms of vulnerability detection:

  • Accuracy. The False positives, Flagged contracts, and Detections per contract rows summarize accuracy results. Our experiments reveal that Slither is the most accurate tool with the lowest false positive rate of 10.9%; followed by Securify with 25%. On the contrary, SmartCheck and Solhint have extremely high false positive rates: 73.6% and 91.3% (!) respectively.
    Additionally, we include the number of contracts for which at least one reentrancy is detected (flagged contracts) and the average number of findings per flagged contract. On one hand, SmartCheck flags a larger number of contracts, confirming its high false positive rate (it flags about seven times as many contracts as Slither, and has a false positive rate roughly seven times higher). On the other hand, Securify flags a very small number of contracts, which indicates that the tool fails to detect a number of true positives found by other tools; note that Securify flags far fewer contracts than Slither, but still flags more that are false positives.
  • Performance. The Average execution time and Timed-out analyses rows summarize performance results, confirming that Slither is the fastest tool, followed by Solhint, SmartCheck, and, finally, Securify. In our experiments, Slither was typically as fast as a simple linter. Other tools, such as Solhint and SmartCheck, parse Solidity source code or analyze precompiled contracts, such as Securify.
  • Robustness. The Failed analyses row summarizes robustness results, showing that Slither is the most robust tool, followed by Solhint, SmartCheck, and Securify. Slither failed only for 0.1% of the contracts; meanwhile, Solhint failed around 1.2%. SmartCheck and Securify are less robust, failing 10.22% and 11.20% of the time, respectively.

We also compared Slither to Surya, the most similar tool for smart contract code understanding. We found that Slither includes all the important information provided by Surya, but is able to integrate more advanced information due to the static analyses it performs. Code understanding tools that do not incorporate deeper analyses are limited to superficial information, while Slither is easily extensible to more sophisticated code summarization tasks.

The Talk

This paper will be presented by our security engineers, Josselin Feist and Gustavo Grieco, at WETSEB 2019 on May 27th at 11am.

Beyond the Paper

Slither is in constant evolution. We recently released the version 0.6.4 and several improvements and features were added since we wrote the paper, including automated checks for upgradeable contracts, and Visual Studio integration. We are proud to have more than 30 detectors that are open source, and Slither has about the same amount of private detectors for race conditions, weak cryptography, and other critical flaws.

Slither is the core of crytic.io, our continuous assurance system (think “Travis-CI but for Ethereum”), which unleashes all the Slither analyses to protect smart contracts.

Contact us, or join the Empire Hacking Slack, if you need help to integrate Slither to your development process, or if you want to learn more about Slither capacities.

Announcing the community-oriented osquery fork, osql

For months, Facebook has been heavily refactoring the entire osquery codebase, migrating osquery away from standard development tools like CMake and integrating it with Facebook’s internal tooling. Their intention was to improve code quality, implement additional tests, and move the project to a more modular architecture. In practice, the changes sacrificed support for a number of architectures, operating systems, and a variety of useful developer tools that integrate well only with the standard build system preferred by the open-source C++ community.

Worse still, the project’s new inward focus has greatly delayed the review of community contributions — effectively stalling development of features or fixes for the needs of the community — without a clear end in sight. Lacking a roadmap or predictable release cycle, user confidence in the project has fallen. Enterprises are postponing their planned osquery deployments and searching for alternative solutions.

Many of the most secure organizations in the world have already invested in making osquery the absolute best endpoint management solution for their needs. Being forced to look elsewhere would be a waste of their investment, and leave them relying on less effective alternatives. That is why we are announcing the community-oriented osquery fork: osql.

What are the goals of osql?

With osql, we are committed to restoring the community’s confidence in the osquery project, to making the development process more open and predictable, and to reviewing and accepting community contributions more quickly. Our goal is to restore direct community participation.

An open and transparent development process

In the immediate term, osql will be maintained as a “soft-fork.” We will closely track Facebook’s upstream updates without diverging from the codebase. Plenty of completed work is simply waiting upstream, in Pull Requests. We prepared a workflow through which the osql project can accept Pull Requests that the community deems stable enough to be shipped, but which have been ignored by the upstream maintainers. The community can pick and choose its priorities from those contributions, and incorporate them into the next release of osql.

Screen Shot 2019-04-18 at 8.56.40 AM

The osql organization on GitHub will be a hub for community projects

Continuous Integration, Continuous Delivery

We’ve also integrated a much-needed public CI using Azure Pipelines, which will build and run tests at each commit. Find the results here. The CI will help us build, test, and release faster and more frequently. We are committing to release a new osql binary (package installer) on a regular monthly cadence. We will communicate the changes that users can expect in the next release. They will know when to expect it, and that the version they download has passed all tests.

Screen Shot 2019-04-18 at 8.58.37 AM

Determine if the latest code is building for all platforms, at a glance

Restoring standard tool support for developers

We rewrote the build system from scratch to return it to CMake, the C++ community’s de-facto standard for building projects. This effort was non-trivial, but we believe it was central to preserving the project’s compatibility with open-source toolchains. The libraries and tools that represent the foundation of modern C++ development, such as Boost or the LLVM/Clang compiler toolchain, all support CMake natively. The most-used third party libraries use CMake as well, making it quite easy to include them in a CMake-based project.

Developers benefit from built-in CMake support in their IDEs. Visual Studio, VS Code, CLion and QtCreator can all easily open a project from its CMakeLists file, enabling a precise view of the project’s structure and the outputs of its build process. They’ll also regain the convenience of CMake-supporting static analyzer frameworks, like Clang’s scan-build, which helps discover critical bugs across an entire project.

By re-centering everything around a CMake build process, we made osql a more developer-friendly project than upstream osquery. If you would like to see for yourself and begin contributing to osql, check out the build guide.

VSCode

Work conveniently in the Visual Studio Code IDE, with CMake integration

What’s next for osql

Our work is just beginning! We plan to continue improving the automation of osql releases. Initially, osql releases will be unsigned binaries/packages. The next priority for the project is to implement a secure code-signing step into the CI procedure, so that every release is a binary signed by the “osql” organization.

The osquery project’s build process used to allow you to choose whether to download or to build third-party dependencies, thanks to easily modifiable Homebrew formulas. Not only that, you could also choose from where these dependencies were downloaded. That is no longer true for osquery currently, but we will restore that ability in osql (a task made easier thanks to CMake).

We also plan to extend the public CI for osql to enable it to test PRs opened against upstream osquery. This will help the community review those PRs, and provide a kind of quality assurance for their inclusion in a future release of osql.

In the longer term, thanks to CMake’s support for building on various platforms, it will be possible for osql to be built for whatever new systems that the community demands.

Want More? Let’s Talk

When we originally ported osquery to Windows, we didn’t imagine it would become so big, or that it would outgrow what Facebook alone could maintain. A whole community of organizations now deploy and depend on osquery. That’s why we’ve launched osql, the community-oriented osquery fork. If you are part of this community and are interested in porting to other platforms, need special features from the project, or want some customization done to the core, join our osquery/osql support group or contact us!

Announcing QueryCon 2019

Exciting news: We’re hosting the second annual QueryCon on June 20th-21st in New York City, co-sponsored by Kolide and Carbon Black!

Register here

QueryCon has become the foremost event for the osquery and osql open-source community. QueryCon brings together core maintainers, developers, and end-users to teach, discuss, and collaborate on Facebook’s award-winning open-source endpoint detection tool.

Last year’s inaugural conference (hosted by Kolide in San Francisco) boasted 120 attendees, 16 speakers, and talk topics ranging from ‘super features’ to ‘the extensions skunkworks’ to ‘catching everything with osquery events.’ This year, we’re switching coasts and growing the event in honor of the growing community. Join us for what is sure to be a great event!

Event details

Conference room at the venue

  • When: June 20th – 21st
  • Where: Convene at 32 Old Slip in downtown Manhattan, just steps from Wall Street and the New York Stock Exchange.
  • What to expect:
    • Two days of talks by osquery and osql engineers, users, and fans — no sales talks
    • Structured time to discuss and collaborate on fixing issues and improving the project
    • Networking with users and experts
    • Sponsored afterparty in downtown Manhattan

Learn more and register

Make sure to buy your tickets ASAP — last year’s event sold out!

Call for Papers

Would you like to be a featured speaker at this year’s QueryCon? You’re in luck: Speaker slots are still open.

Apply here!

About Trail of Bits

It’s no secret that we are huge fans of osquery. From when we ported osquery to Windows in 2016 to our launch of our osquery extension repo last year, we’ve been one of the leading contributors to the tool’s development.

Trail of Bits helps secure the world’s most targeted organizations and products. We combine high-end security research with a real-world attacker mentality to reduce risk and fortify code.

We’re a security research and engineering firm headquartered in New York City. Our engineering services team works closely with business customers in tech, defense, and finance on quick-response feature development, bug fixes, and integration of the tools they depend on for endpoint detection and response, event log aggregation, secure software updates, and security testing.

We leverage the best of open-source software for our work, and regularly contribute enhancements to these projects as a result. In this way, we plan to bring projects like osquery, Santa, Omaha and StreamAlert to parity with the leading proprietary alternatives.

User-Friendly Fuzzing with Sienna Locomotive

Fuzzing is a great way to find bugs in software, but many developers don’t use it. We hope to change that today with the release of Sienna Locomotive, a new open-source fuzzer for Windows that emphasizes usability. Sienna Locomotive aims to make fuzzing accessible to developers with limited security expertise. Its user-oriented features make it easy to configure, easy to run, and easy to interpret the results.

Fuzzing is Underused

At Trail of Bits, we use state-of-the-art program analysis tools every day, but even those can’t replace the bug-finding potential of random fuzz testing.

Explicitly testing software for bugs is incredibly important. Many open-source developers once believed in the “many eyes” hypothesis–-that open-source projects would be less susceptible to security issues because anyone could search the code for bugs. The past few years have shown that this only applies to high-value targets, and even then, only in a limited capacity. For example, OpenSSL was running on at least 2/3rds of web servers in 2014, but it still took over two years for researchers to discover the Heartbleed flaw. Open-source software makes up so much of the internet that we can’t afford to count on that kind of luck again. As Google poignantly put it when they announced OSS-Fuzz: “It is important that the open source foundation be stable, secure, and reliable, as cracks and weaknesses impact all who build on it.”

We asked ourselves: why don’t developers fuzz their own software, and what can we do to change that?

Barrier to Entry

One likely reason for fuzzers’ disuse is that they can be relatively difficult to use, especially on Windows. WinAFL (the de-facto standard), in particular, places fairly strict requirements on the functions it can target. If your code doesn’t meet these constraints, WinAFL won’t work correctly. It doesn’t help that most fuzzers are designed for Unix platforms, despite Windows’ 75+% market share. It’s easy to understand why developers often do not bother setting up a fuzzer to test their Windows software.

What to Do?

We believe that security tools only succeed if they’re actually used. Fuzzing techniques that improve code coverage or increase executions per second are the subject of new research at almost every security conference, but these improvements are moot if the code gathers dust on a shelf. We saw this as an opportunity to improve the state of security in a different vein: instead of building a smarter or faster fuzzer, we would build a fuzzer with a lower barrier to entry.

Introducing Sienna Locomotive

To address these problems, we engineered Sienna Locomotive with three features that make fuzzing as painless as possible.

Easy Configuration

We want Sienna Locomotive to be usable for testing a wide variety of software, so we made it easy for developers to tailor the fuzzer to their specific application. New targets can be configured with just the path to the executable and a command line string. Other than setting timeouts for applications that don’t exit on their own, there are very few settings the user needs to configure.

Configuring the Fuzzgoat application shown in the demo video

Powerful Function Targeting

Developers don’t have to be picky about which functions they can target, nor do they have to modify the binary to target a function for fuzzing. Sienna Locomotive runs the target application once and scans for functions that take user input (like ReadFile or fread) and allows the developer to select individual function calls to target. This is especially useful when fuzzing a program that makes incremental reads because it allows the user to fuzz only one specific portion of the file.

A sample function targeting window for a built-in Windows utility

Helpful Crash Triage

Fuzzers produce a myriad of crashes with varying severity. You want to debug the most critical crashes first, but how can you tell where to start? Tools like Breakpad and !exploitable analyze a crashing program to estimate the severity of the crash, which helps developers decide which crashes to debug first. In September, we open-sourced Winchecksec, a component of Sienna Locomotive that helps power our triaging system. Sienna Locomotive combines a reimplementation of the heuristics used by Breakpad and !exploitable, augmented with information from Winchecksec and a custom taint tracer, to estimate the severity of each crash. This helps developers to prioritize which crashes to debug and fix first.

An export directory containing triaged crash information

Will Sienna Locomotive Work for You?

If you describe yourself as more of a Windows developer than a security expert, Sienna Locomotive may be the right tool for you. With relatively minimal effort, it can help you test your code against a much larger space of mutated inputs than you could ever write unit tests for. Depending on how you’ve structured your program, you may also be able to make testing during development more efficient by only fuzzing newly implemented features.

Sienna Locomotive makes some performance tradeoffs for the sake of usability. If you’re more interested in test case throughput than usability, or you’re looking for bugs in Chrome and need to perform thousands of iterations per second, Sienna Locomotive isn’t for you.

Try it Out!

We think that Sienna Locomotive will improve the state of Windows software security by making it easier for developers to test their code via fuzzing. To try out Sienna Locomotive for yourself, download a prebuilt binary from the releases page on our GitHub Repo, or follow the instructions in the readme to build it yourself. If you’d like to help make Sienna Locomotive better, visit our issues page.

Performing Concolic Execution on Cryptographic Primitives

Alan Cao

For my winternship and springternship at Trail of Bits, I researched novel techniques for symbolic execution on cryptographic protocols. I analyzed various implementation-level bugs in cryptographic libraries, and built a prototype Manticore-based concolic unit testing tool, Sandshrew, that analyzed C cryptographic primitives under a symbolic and concrete environment.

Sandshrew is a first step for crypto developers to easily create powerful unit test cases for their implementations, backed by advancements in symbolic execution. While it can be used as a security tool to discover bugs, it also can be used as a framework for cryptographic verification.

Playing with Cryptographic Verification

When choosing and implementing crypto, our trust should lie in whether or not the implementation is formally verified. This is crucial, since crypto implementations often introduce new classes of bugs like bignum vulnerabilities, which can appear probabilistically. Therefore, by ensuring verification, we are also ensuring functional correctness of our implementation.

There are a few ways we could check our crypto for verification:

  • Traditional fuzzing. We can use fuzz testing tools like AFL and libFuzzer. This is not optimal for coverage, as finding deeper classes of bugs requires time. In addition, since they are random tools, they aren’t exactly “formal verification,” so much as a sotchastic approximation thereof.
  • Extracting model abstractions. We can lift source code into cryptographic models that can be verified with proof languages. This requires learning purely academic tools and languages, and having a sound translation.
  • Just use a verified implementation! Instead of trying to prove our code, let’s just use something that is already formally verified, like Project Everest’s HACL* library. This strips away configurability when designing protocols and applications, as we are only limited to what the library offers (i.e HACL* doesn’t implement Bitcoin’s secp256k1 curve).

What about symbolic execution?

Due to its ability to exhaustively explore all paths in a program, using symbolic execution to analyze cryptographic libraries can be very beneficial. It can efficiently discover bugs, guarantee coverage, and ensure verification. However, this is still an immense area of research that has yielded only a sparse number of working implementations.

Why? Because cryptographic primitives often rely on properties that a symbolic execution engine may not be able to emulate. This can include the use of pseudorandom sources and platform-specific optimized assembly instructions. These contribute to complex SMT queries passed to the engine, resulting in path explosion and a significant slowdown during runtime.

One way to address this is by using concolic execution. Concolic execution mixes symbolic and concrete execution, where portions of code execution can be “concretized,” or run without the presence of a symbolic executor. We harness this ability of concretization in order to maximize coverage on code paths without SMT timeouts, making this a viable strategy for approaching crypto verification.

Introducing sandshrew

After realizing the shortcomings in cryptographic symbolic execution, I decided to write a prototype concolic unit testing tool, sandshrew. sandshrew verifies crypto by checking equivalence between a target unverified implementation and a benchmark verified implementation through small C test cases. These are then analyzed with concolic execution, using Manticore and Unicorn to execute instructions both symbolically and concretely.

Fig 1. Sample OpenSSL test case with a SANDSHREW_* wrapper over the MD5() function.

Writing Test Cases

We first write and compile a test case that tests an individual cryptographic primitive or function for equivalence against another implementation. The example shown in Figure 1 tests for a hash collision for a plaintext input, by implementing a libFuzzer-style wrapper over the MD5() function from OpenSSL. Wrappers signify to sandshrew that the primitive they wrap should be concretized during analysis.

Performing Concretization

Sandshrew leverages a symbolic environment through the robust Manticore binary API. I implemented the manticore.resolve() feature for ELF symbol resolution and used it to determine memory locations for user-written SANDSHREW_* functions from the GOT/PLT of the test case binary.

Fig 2. Using Manticore’s UnicornEmulator feature in order to concretize a call instruction to the target crypto primitive.

Once Manticore resolves out the wrapper functions, hooks are attached to the target crypto primitives in the binary for concretization. As seen in Figure 2, we then harness Manticore’s Unicorn fallback instruction emulator, UnicornEmulator, to emulate the call instruction made to the crypto primitive. UnicornEmulator concretizes symbolic inputs in the current state, executes the instruction under Unicorn, and stores modified registers back to the Manticore state.

All seems well, except this: if all the symbolic inputs are concretized, what will be solved after the concretization of the call instruction?

Restoring Symbolic State

Before our program tests implementations for equivalence, we introduce an unconstrained symbolic variable as the returned output from our concretized function. This variable guarantees a new symbolic input that continues to drive execution, but does not contain previously collected constraints.

Mathy Vanhoef (2018) takes this approach to analyze cryptographic protocols over the WPA2 protocol. We do this in order to avoid the problem of timeouts due to complex SMT queries.

Fig 3. Writing a new unconstrained symbolic value into memory after concretization.

As seen in Figure 3, this is implemented through the concrete_checker hook at the SANDSHREW_* symbol, which performs the unconstrained re-symbolication if the hook detects the presence of symbolic input being passed to the wrapper.

Once symbolic state is restored, sandshrew is then able to continue to execute symbolically with Manticore, forking once it has reached the equivalence checking portion of the program, and generating solver solutions.

Results

Here is Sandshrew performing analysis on the example MD5 hash collision program from earlier:

The prototype implementation of Sandshrew currently exists here. With it comes a suite of test cases that check equivalence between a few real-world implementation libraries and the primitives that they implement.

Limitations

Sandshrew has a sizable test suite for critical cryptographic primitives. However, analysis still becomes stuck for many of the test cases. This may be due to the large statespace needing to be explored for symbolic inputs. Arriving at a solution is probabilistic, as the Manticore z3 interface often times out.

With this, we can identify several areas of improvement for the future:

  • Add support for allowing users to supply concrete input sets to check before symbolic execution. With a proper input generator (i.e., radamsa), this potentially hybridizes Sandshrew into a fuzzer as well.
  • Implement Manticore function models for common cryptographic operations. This can increase performance during analysis and allows us to properly simulate execution under the Dolev-Yao verification model.
  • Reduce unnecessary code branching using opportunistic state merging.

Conclusion

Sandshrew is an interesting approach at attacking the problem of cryptographic verification, and demonstrates the awesome features of the Manticore API for efficiently creating security testing tools. While it is still a prototype implementation and experimental, we invite you to contribute to its development, whether through optimizations or new example test cases.

Thank you

Working at Trail of Bits was an awesome experience, and offered me a lot of incentive to explore and learn new and exciting areas of security research. Working in an industry environment pushed me to understand difficult concepts and ideas, which I will take to my first year of college.

Fuzzing In The Year 2000

It is time for the second installment of our efforts to reproduce original fuzzing research on modern systems. If you haven’t yet, please read the first part. This time we tackle fuzzing on Windows by reproducing the results of “An Empirical Study of the Robustness of Windows NT Applications Using Random Testing” (aka ‘the NT Fuzz Report’) by Justin E. Forrester and Barton P. Miller, published in 2000.

The NT Fuzz Report tested 33 applications on Windows NT and an early release copy of Windows 2000 for susceptibility to malformed window messages and randomly generated mouse and keyboard events. Since Dr. Miller published the fuzzer code, we used the exact same tools as the original authors to find bugs in modern Windows applications.

The results were nearly identical: 19 years ago 100% of tested applications crashed or froze when fuzzed with malformed window messages. Today, 93% of tested applications crash or freeze when confronted with the same fuzzer. Among the the applications that did not crash was our old friend, Calculator (Figure 1). We also found a bug (but not a security issue) in Windows.

Figure 1: Bruised but not beaten. The recently open-sourced Windows Calculator was one of two tested applications that didn’t freeze or crash after facing off against the window message fuzzer from the year 2000. Calculator was resized after fuzzing to showcase artifacts of the fuzzing session.

A Quick Introduction to Windows

So what are window messages and why do they crash programs?

Windows applications that display a GUI are driven by events: a mouse move, a button click, a key press, etc. An event-driven application doesn’t do anything until it is notified of an event. Once an event is received, the application takes action based on the event, and then waits for more events. If this sounds familiar, it’s because the architecture is making a comeback in platforms like node.js.

Window messages are the event notification method in Windows. Each window message has a numeric code associated with a particular event. Each message has one or more parameters, by convention called lParam and wParam, that specify more details about the event. Examples of such details include the coordinates of mouse movement, what key was pressed, or what text to draw in a window. These messages can be sent by the program itself, by the operating system, or by other programs. They can arrive at any time and in any order, and must be handled by the receiving application.

Security Implications

Prior to Windows Vista it was possible for a low-privilege process to send messages to a high-privilege process. Using the right combination of messages, it was possible to gain code execution in the high-privilege process. These “shatter attacks” have been largely mitigated since Vista with UIPI and by isolating system services in a separate session.

Mishandling of window messages is unlikely to have a security impact on modern Windows systems for two reasons. First, window messages can’t be sent over the network. Second, crashing or gaining code execution at the same privilege level as you already have is not useful. This was likely apparent to the authors of the NT Fuzz Report. They do not make security claims, but correctly point out that crashes during window message handling imply a lack of rigorous testing.

There are some domains where same-privilege code execution may violate a real security boundary. Some applications combine various security primitives to create an artificial privilege level not natively present in the operating system. The prime examples is a browser’s renderer sandbox. Browser vendors are well aware of these issues and take steps to mitigate them. Another example is antivirus products. Their control panel runs with normal user privileges but is protected against inspection and tampering by other parts of the product.

Testing Methodology

We used the same core fuzzing code and methodology described in the original NT Fuzz Report to fuzz all applications in our test set. Specifically, in both SendMessage and PostMessage modes, the fuzzer used three iterations of 500,000 messages with the seed 42 and three iterations of 500,000 messages using the seed 1,337. We saw results after executing just one iteration of each method.

Fuzzing using the “random mouse and keyboard input” method was omitted due to time constraints and the desire to focus purely on window messages. We encourage you to replicate those results as well.

Caveats

Two minor changes were necessary to use the fuzzer on Windows 10. First was a tiny change to build the fuzzer on 64-bit Windows. The second change was enabling the fuzzer to target a specific window handle via a command line argument. Fuzzing a specific handle was a quick solution to the problem of fuzzing Universal Windows Platform (UWP) applications. The window message fuzzer is oriented to fuzzing windows belonging to a specific process, but UWP applications all display their UI via the same process (Figure 2). This meant that the fuzzer could not target the main window of UWP applications.

Figure 2: UWP application windows all belong to the same process (ApplicationFrameHost.exe). To fuzz these applications, the original NT fuzzer was modified to allow fuzzing of a user-specified window handle.

While modifying the fuzzer, a serious flaw was identified: the values selected for the two primary sources of randomized input, the lParam and wParam arguments to SendMessage and PostMessage, are limited to 16-bit integers. Both of the arguments are 32-bit on 32-bit Windows, and 64-bit on 64-bit Windows. The problem occurs In Fuzz.cpp, where the lParam and wParam values are set:

     wParam = (UINT) rand();
     lParam = (LONG) rand();

The rand() function returns a number in the range [0, 216], greatly limiting the set of tested values. This bug was purposely preserved during evaluation, to ensure results were accurately comparable against the original work.

Tested Applications

The NT Fuzz Report tested 33 programs. This reproduction tests just 28 because only one version of each program is used for testing. The Windows software ecosystem has changed substantially since 2000, but there is also a surprising amount of conservation. The Microsoft Office suite features the same programs as the original tests. Netscape Communicator evolved into what is now Firefox. Adobe Acrobat was renamed to Adobe Reader, but is still going strong. Even Winamp made a new release in 2018, allowing for a fair comparison with the original NT Fuzz Report. However, some legacy software has gone the way of the last millenium. Find below the list of changes, and why:

  • CD Player ⇨ Windows Media Player: The Windows Media Player has subsumed CD Player functionality.
  • Eudora ⇨ Windows Mail: Qualcomm now makes basebands, not email clients. Because Eudora is no longer around, the default Windows email client was tested instead.
  • Command AntiVirus ⇨ Avast Free Edition: The Command product is no longer available. It was replaced with Avast, the most popular third-party antivirus vendor.
  • GSView ⇨ Photos: The GSView application is no longer maintained. It was replaced with Photos, the default Windows photo viewer.
  • JavaWorkshop ⇨ NetBeans IDE: The JavaWorkshop IDE is no longer maintained. NetBeans seemed like a good free alternative that fits the spirit of what should be tested.
  • Secure CRT ⇨ BitVise SSH: Secure CRT is still around, but required a very long web form to download a trial version. BitVise SSH offered a quick download.
  • Telnet ⇨ Putty: The telnet application still exists on Windows, but now it is a console application. To fuzz a GUI application, we replaced telnet with Putty, a popular open-source terminal emulator for Windows.
  • Freecell & Solitaire were run from the Microsoft Solitaire Collection application in the Windows App Store.

The specific application version appears in the results table. All fuzzing was done on a 64-bit installation of Windows 10 Pro, version 1809 (OS Build 17763.253).

Results

As mentioned in the NT Fuzz Report, the results should not be treated as security vulnerabilities, but instead a measure of software robustness and quality.

“Finally, our results form a quantitative starting point from which to judge the relative improvement in software robustness.”

From “An Empirical Study of the Robustness of Windows NT Applications Using Random Testing” by Justin E. Forrester and Barton P. Miller

The numbers are not particularly encouraging, although the situation is improving. In the original NT Fuzz Report, every application either crashed or froze when fuzzed. Now, two programs, Calculator and Avast Antivirus, survive the window message fuzzer with no ill effects. Our praise goes to the Avast and Windows Calculator teams for thinking about erroneous window messages. The Calculator team gets additional kudos for open sourcing Calculator and showing everyone how a high-quality UWP application is built. See Table 1 for all of our fuzzing results, along with the specific version of the software used.

Program Version SendMessage PostMessage
Microsoft Access 1901 crash crash
Adobe Reader DC 2019.010.20098 crash ok
Calculator 10.1812.10048.0 ok ok
Windows Media Player 12.0.17763.292 crash crash
Visual Studio Code 1.30.2 crash ok
Avast Free 19.2.2364 ok ok
Windows Mail 16005.11231.20182.0 crash crash
Excel 1901 crash ok
Adobe FrameMaker 15.0.2.503 crash crash
Freecell 4.3.2112.0 crash crash
GhostScript 9.26 crash ok
Photos 2019.18114.17710.0 crash crash
GNU Emacs 26.1 crash crash
IE Edge 44.17763.1.0 crash crash
NetBeans 10 crash crash
Firefox 64.0.2 crash crash
Notepad 1809 crash ok
Paint 1809 crash crash
Paint Shop Pro 2019 21.1 crash crash
Powerpoint 1901 crash ok
Bitvise SSH 8.23 crash crash
Solitaire 4.3.2112.0 crash crash
Putty 0.70 freeze freeze
VS Community 2017 15.9.5 crash crash
WinAmp 5.8 5.8 Build 3660 crash ok
Word 1901 crash ok
Wordpad 1809 crash crash
WS_FTP 12.7.0.1903 crash crash

Table 1: The results of replicating the original NT Fuzz Report on Windows 10. After 19 years, very few applications properly handle malformed window messages.

A Bug in Windows?

Unfortunately our curiosity got the better of us and we had to make one exception. One common problem seemed to plague multiple unrelated applications. Some debugging showed the responsible message was WM_DEVICECHANGE. When the fuzzer sent that message, it would even crash the simplest application possible — the official Windows API HelloWorld Sample (Figure 3).

Figure 3: A 32-bit HelloWorld.exe crashes when faced with the window message fuzzer. This shouldn’t happen since the program is so simple. The implication is that the issue is somewhere in Windows.

Using the HelloWorld sample we quickly realized that the problem only affects 32-bit applications, not 64-bit applications. Some rapid debugging revealed that the crash is in wow64win.dll, the 32-to-64-bit compatibility layer. My quick (and possibly wrong) analysis of the problem shows that the wow64win.dll!whcbfnINDEVICECHANGE function will treat wParam as a pointer to a DEV_BROADCAST_HANDLE64 structure in the the target program. The function converts that structure to a DEV_BROADCAST_HANDLE32 structure for compatibility with 32-bit applications. The crash happens because the wParam value generated by the fuzzer points to invalid memory.

Treating wParam as a local pointer is a bad idea, although it was probably an intentional design decision to make sure removable device notifications work with legacy 32-bit Windows applications. Regardless, it certainly feels wrong that it is possible to crash another application without explicitly debugging it. We reported the issue to MSRC, even though no security boundary was being crossed. They confirmed the bug is not a security issue. We hope to see a fix for this admittedly obscure problem in a future version of Windows.

Conclusion

Window messages are an under-appreciated and often ignored source of untrusted input to Windows programs. Even 19 years after the first open-source window message fuzzer was deployed, 93% of tested applications still freeze or crash when run against the very same fuzzer. The fact that some applications gracefully handle these malformed inputs is an encouraging sign: it means frameworks and institutional knowledge to avoid these errors exist in some organizations.

There is also much room for improvement in window message fuzzing — the simplest method possible crashes 93% of applications. There may even be examples where window messages travel across a real security boundary. If you explore this area further, we hope you’ll share what you find.