Harnessing the eBPF Verifier

By Laura Bauman

During my internship at Trail of Bits, I prototyped a harness that improves the testability of the eBPF verifier, simplifying the testing of eBPF programs. My eBPF harness runs in user space, independently of any locally running kernel, and thus opens the door to testing of eBPF programs across different kernel versions.

eBPF enables users to instrument a running system by loading small programs into the operating system kernel. As a safety measure, the kernel “verifies” eBPF programs at load time and rejects any that it deems unsafe. However, using eBPF is a CI / CD nightmare, because there’s no way to know whether a given eBPF program will successfully load and pass verification without testing it on a running kernel.

My harness aims to eliminate that nightmare by executing the eBPF verifier outside of the running kernel. To use the harness, a developer tweaks my libbpf-based sample programs (hello.bpf.c and hello_loader.c) to tailor them to the eBPF program being tested. The version of libbpf provided by my harness links against a “kernel library” that implements the actual bpf syscall, which provides isolation from the running kernel. The harness works well with kernel version 5.18, but it is still a proof of concept; enabling support for other kernel versions and additional eBPF program features will require a significant amount of work.

With great power comes great responsibility

eBPF is an increasingly powerful technology that is used to increase system observability, implement security policies, and perform advanced networking operations. For example, the osquery open-source endpoint agent uses eBPF for security monitoring, to enable organizations to watch process and file events happening across their fleets.

The ability to inject eBPF code into the running kernel seems like either a revelation or a huge risk to the kernel’s security, integrity, and dependability. But how on earth is it safe to load user-provided code into the kernel and execute it there? The answer to this question is twofold. First, eBPF isn’t “normal” code, and it doesn’t execute in the same way as normal code. Second, eBPF code is algorithmically “verified” to be safe to execute.

eBPF isn’t normal code

eBPF (extended Berkeley Packet Filter) is an overloaded term that refers to both a specialized bytecode representation of programs and the in-kernel VM that runs those bytecode programs. eBPF is an extension of classic BPF, which has fewer features than eBPF (e.g., two registers instead of ten), uses an in-kernel interpreter instead of a just-in-time compiler, and focuses only on network packet filtering.

User applications can load eBPF code into kernel space and run it there without modifying the kernel’s source code or loading kernel modules. Loaded eBPF code is checked by the kernel’s eBPF verifier, which tries to prove that the code will terminate without crashing.

The picture above shows the general interaction between user space and kernel space, which occurs through the bpf syscall. The eBPF program is represented in eBPF bytecode, which can be obtained through the Clang back end. The interaction begins when a user space process executes the first in the series of bpf syscalls used to load an eBPF program into the kernel. The kernel then runs the verifier, which enforces constraints that ensure the eBPF program is valid (more on that later). If the verifier approves the program, the verifier will finalize the process of loading it into the kernel, and it will run when it is triggered. The program will then serve as a socket filter, listening on a socket and forwarding only information that passes the filter to user space.

Verifying eBPF

The key to eBPF safety is the eBPF verifier, which limits the set of valid eBPF programs to those that it can guarantee will not harm the kernel or cause other issues. This means that eBPF is, by design, not Turing-complete.

Over time, the set of eBPF programs accepted by the verifier has expanded, though the testability of that set of programs has not. The following quote from the “BPF Design Q&A” section of the Linux kernel documentation is telling:

The [eBPF] verifier is steadily getting ‘smarter.’ The limits are being removed. The only way to know that the program is going to be accepted by the verifier is to try to load it. The BPF development process guarantees that the future kernel versions will accept all BPF programs that were accepted by the earlier versions.

This “development process” relies on a limited set of regression tests that can be run through the kselftest system. These tests require that the version of the source match that of the running kernel and are aimed at kernel developers; the barrier to entry for others seeking to run or modify such tests is high. As eBPF is increasingly relied upon for critical observability and security infrastructure, it is concerning that the Linux kernel eBPF verifier is a single point of failure that is fundamentally difficult to test.

Trust but verify

The main problem facing eBPF is portability—that is, it is notoriously difficult to write an eBPF program that will pass the verifier and work correctly on all kernel versions (or, heck, on even one). The introduction of BPF Compile Once-Run Everywhere (CO-RE) has significantly improved eBPF program portability, though issues still remain. BPF CO-RE relies on the eBPF loader library (libbpf), the Clang compiler, and the eBPF Type Format (BTF) information in the kernel. In short, BPF CO-RE means that an eBPF program can be compiled on one Linux kernel version (e.g., by Clang), modified to match the configuration of another kernel version, and loaded into a kernel of that version (through libbpf) as though the eBPF bytecode had been compiled for it.

However, different kernel versions have different verifier limits and support different eBPF opcodes. This makes it difficult (from an engineering perspective) to tell whether a particular eBPF program will run on a kernel version other than the one it has been tested on. Moreover, different configurations of the same kernel version will also have different verifier behavior, so determining a program’s portability requires testing the program on all desired configurations. This is not practical when building CI infrastructure or trying to ship a production piece of software.

Projects that use eBPF take a variety of approaches to overcoming its portability challenges. For projects that primarily focus on tracing syscalls (like osquery and opensnoop), BPF CO-RE is less necessary, since syscall arguments are stable between kernel versions. In those cases, the limiting factor is the variations in verifier behavior. Osquery chooses to place strict constraints on its eBPF programs; it does not take advantage of modern eBPF verifier support for structures such as bounded loops and instead continues to write eBPF programs that would be accepted by the earliest verifiers. Other projects, such as SysmonForLinux, maintain multiple versions of eBPF programs for different kernel versions and choose a program version dynamically, during compilation.

What is the eBPF verifier?

One of the key benefits of eBPF is the guarantee it provides: that the loaded code will not crash the kernel, will terminate within a time limit, and will not leak information to unprivileged user processes. To ensure that code can be injected into the kernel safely and effectively, the Linux kernel’s eBPF verifier places restrictions on the abilities of eBPF programs. The name of the verifier is slightly misleading, because although it aims to enforce restrictions, it does not perform formal verification.

The verifier performs two main passes over the code. The first pass is handled by the check_cfg() function, which ensures that the program is guaranteed to terminate by performing an iterative depth-first search of all possible execution paths. The second pass (done in the do_check() function) involves static analysis of the bytecode; this pass ensures that all memory accesses are valid, that types are used consistently (e.g., scalar values are never used as pointers), and that the number of branches and total instructions is within certain complexity limits.

As mentioned earlier in the post, the constraints that the verifier enforces have changed over time. For example, eBPF programs were limited to a maximum of 4,096 instructions until kernel version 5.2, which increased that number to 1 million. Kernel version 5.3 introduced the ability for eBPF programs to use bounded loops. Note, though, that the verifier will always be backward compatible in that all future versions of the verifier will accept any eBPF program accepted by older versions of the verifier.

Alarmingly, the ability to load eBPF programs into the kernel is not always restricted to root users or processes with the CAP_SYS_ADMIN capability. In fact, the initial plan for eBPF included support for unprivileged users, requiring the verifier to disallow the sharing of kernel pointers with user programs and to perform constant blinding. In the wake of several privilege escalation vulnerabilities affecting eBPF, most Linux distributions have disabled support for unprivileged users by default. However, overriding the default still creates a risk of crippling privilege escalation attacks.

Regardless of whether eBPF is restricted to privileged users, flaws in the verifier cannot be tolerated if eBPF is to be relied upon for security-critical functionality. As explained in an LWN.net article, at the end of the day, “[the verifier] is 2000 lines or so of moderately complex code that has been reviewed by a relatively small number of (highly capable) people. It is, in a real sense, an implementation of a blacklist of prohibited behaviors; for it to work as advertised, all possible attacks must have been thought of and effectively blocked. That is a relatively high bar.” While the code may have been reviewed by highly capable people, the verifier is still a complex bit of code embedded in the Linux kernel that lacks a cohesive testing framework. Without thorough testing, there is a risk that the backward compatibility principle could be violated or that entire classes of potentially insecure programs could be allowed through the verifier.

Enabling rigorous testing of the eBPF verifier

Given that the eBPF verifier is the foundation of critical infrastructure, it should be analyzed through a rigorous testing process that can be easily integrated into CI workflows. Kernel selftests and example eBPF programs that require a running Linux kernel for every kernel version are inadequate.

The eBPF verifier harness aims to allow testing on various kernel versions without any dependence on the locally running kernel version or configuration. In other words, the harness allows the verifier (the verifier.c file) to run in user space.

Compiling only a portion of the kernel source code for execution in user space is difficult because of the monolithic nature of the kernel and the kernel-specific idioms and functionality. Luckily, the task of eBPF verification is limited in scope, and many of the involved functions and files are consistent across kernel versions. Thus, stubbing out kernel-specific functions for user space alternatives makes it possible to run the verifier in isolation. For instance, because the verifier expects to be called from within a running kernel, it calls kernel-specific memory allocation functions when it is allocating memory. When it is run within the harness, it calls user space memory allocation functions instead.

The harness is not the first tool that aims to improve the verifier’s testability. The IO Visor Project’s BPF fuzzer has a very similar goal of running the verifier in user space and enabling efficient fuzzing—and the tool has found at least one bug. But there is one main difference between the eBPF harness and similar existing solutions: the harness is intended to support all kernel versions, making it easy to compare the same eBPF program across kernel versions. The harness leaves the true kernel functionality as intact as possible to maintain an execution environment that closely approximates a true kernel context.

System design

The harness consists of the following main components:

  • Linux source code (in the form of a Git submodule)
  • A LibBPF mirror (also a Git submodule)
  • header_stubs.h (which enables certain kernel functions and macros to be overridden or excluded altogether)
  • Harness source code (i.e., implementations of stubbed-out kernel functions)

The architecture of the eBPF verifier harness.

At a high level, the harness runs a sample eBPF program through the verifier by using standard libbpf conventions in sample.bpf.c and calling bpf_object__load() in sample_loader.c. The libbpf code runs as normal (e.g., probing the “kernel” to see what operations are supported, autocreating maps if configured to do so, etc.), but instead of invoking the actual bpf() syscall and trapping to the running kernel, it executes a harness “syscall” and continues running within the harnessed kernel.

Compiling a portion of the Linux kernel involves making a lot of decisions on which source files should be included and which should be stubbed out. For example, the kernel frequently calls the kmalloc() and kfree() functions for dynamic memory allocation. Because the verifier is running in user space, these functions can be replaced with user space versions like malloc() and free(). Kernel code also includes a lot of synchronization primitives that are not necessary in the harness, since the harness is a single-threaded application; those primitives can also safely be stubbed out.

Other kernel functionality is more difficult to efficiently replace. For example, getting the harness to work required finding a way to simulate the Linux kernel Virtual File System. This was necessary because the verifier is responsible for ensuring the safe use of eBPF maps, which are identified by file descriptors. To simulate operations on file descriptors, the harness must also be able to simulate the creation of files associated with the descriptors.

A demonstration

So how does the harness actually work? What do the sample programs look like? Below is a simple eBPF program that contains a bounded loop; verifier support for bounded loops was introduced in kernel version 5.3, so all kernel versions older than 5.3 should reject the program, and all versions newer than 5.3 should accept it. Let’s run it through the harness and see what happens!

bounded_loop.bpf.c:

#include "vmlinux.h"
#include <BPF/BPF_helpers.h>


SEC("tracepoint/syscalls/sys_enter_execve")
int handle_tp(void *ctx)
{
    for (int i = 0; i < 3; i++) {
        BPF_printk("Hello World.\n");
    }
    return 0;
}

char LICENSE[] SEC("license") = "Dual BSD/GPL";

Using the harness requires compiling each eBPF program into eBPF bytecode; once that’s done, a “loader” program calls the libbpf functions that handle the setup of the bpf syscalls. The loader program looks something like the program shown below, but it can be tweaked to allow for different configuration and setup options (e.g., to disable the autocreation of maps).

bounded_loop_loader.c:

#include 
#include 
#include "bounded_loop.skel.h"

static int libbpf_print_fn(enum libbpf_print_level level, const char *format, va_list args) {
    return vfprintf(stderr, format, args);
}

int load() {
  struct bounded_loop_bpf *obj;
  const struct bpf_insn *insns;
  int err = 0;

  libbpf_set_print(libbpf_print_fn);

  obj = bounded_loop_bpf__open();
  if (!obj) {
    fprintf(stderr, "failed to open BPF object. \n");
    return 1;
  }

  // this function invokes the verifier
  err = bpf_object__load(*obj->skeleton->obj);

  // free memory allocated by libbpf functions
  bounded_loop_bpf__destroy(obj);
  return err;
}

Compiling the sample program with the necessary portions of Linux source code, libbpf, and the harness runtime produces an executable that will run the verifier and report whether the program passes verification.

The output of bounded_loop.bpf.c when run through version 5.18 of the verifier.

Looking forward

The harness is still a proof of concept, and several aspects of it will need to be improved before it can be used in production. For instance, to fully support all eBPF map types, the harness will need the ability to fully stub out additional kernel-level memory allocation primitives. The harness will also need to reliably support all versions of the verifier between 3.15 and the latest version. Implementing that support will involve manually accounting for differences in the internal kernel application programming interfaces (APIs) between these versions and adjusting stubbed-out subsystems as necessary. Lastly, more cohesive organization of the stubbed-out functions, as well as thorough documentation on their organization, would make it much easier to distinguish between unmodified kernel code and functions that have been stubbed out with user space alternatives.

Because these issues will take a nontrivial amount of work, we invite the larger community to build upon the work we have released. While we have many ideas for improvements that will move the eBPF verifier closer to adoption, we believe there are others out there that could enhance this work with their own expertise. Although that initial work will enable rapid testing of all kernel versions once it’s complete, the harness will still need to be updated each time a kernel version is released to account for any internal changes.

However, the eBPF verifier is critical and complex infrastructure, and complexity is the enemy of security; when it is difficult to test complex code, it is difficult to feel confident in the security of that code. Thus, extracting the verifier into a testing harness is well worth the effort—though the amount of effort it requires should serve as a general reminder of the importance of testability.

Introducing RPC Investigator

A new tool for Windows RPC research

By Aaron LeMasters

Trail of Bits is releasing a new tool for exploring RPC clients and servers on Windows. RPC Investigator is a .NET application that builds on the NtApiDotNet platform for enumerating, decompiling/parsing and communicating with arbitrary RPC servers. We’ve added visualization and additional features that offer a new way to explore RPC.

RPC is an important communication mechanism in Windows, not only because of the flexibility and convenience it provides software developers but also because of the renowned attack surface its implementers afford to exploit developers. While there has been extensive research published related to RPC servers, interfaces, and protocols, we feel there’s always room for additional tooling to make it easier for security practitioners to explore and understand this prolific communication technology.

Below, we’ll cover some of the background research in this space, describe the features of RPC Investigator in more detail, and discuss future tool development.

If you prefer to go straight to the code, check out RPC Investigator on Github.

Background

Microsoft Remote Procedure Call (MSRPC) is a prevalent communication mechanism that provides an extensible framework for defining server/client interfaces. MSRPC is involved on some level in nearly every activity that you can take on a Windows system, from logging in to your laptop to opening a file. For this reason alone, it has been a popular research target in both the defensive and offensive infosec communities for decades.

A few years ago, the developer of the open source .NET library NtApiDotNet, James Foreshaw, updated his library with functionality for decompiling, constructing clients for, and interacting with arbitrary RPC servers. In an excellent blog post—focusing on using the new NtApiDotNet functionality via powershell scripts and cmdlets in his NtObjectManager package—he included a small section on how to use the powershell scripts to generate C# code for an RPC client that would work with a given RPC server and then compile that code into a C# application.

We built on this concept in developing RPC Investigator (RPCI), a .NET/C# Windows Forms UI application that provides a visual interface into the existing core RPC capabilities of the NtApiDotNet platform:

  • Enumerating all active ALPC RPC servers
  • Parsing RPC servers from any PE file
  • Parsing RPC servers from processes and their loaded modules, including services
  • Integration of symbol servers
  • Exporting server definitions as serialized .NET objects for your own scripting

Beyond visualizing these core features, RPCI provides additional capabilities:

  • The Client Workbench allows you to create and execute an RPC client binary on the fly by right-clicking on an RPC server of interest. The workbench has a C# code editor pane that allows you to edit the client in real time and observe results from RPC procedures executed in your code.
  • Discovered RPC servers are organized into a library with a customizable search interface, allowing you to pivot RPC server data in useful ways, such as by searching through all RPC procedures for all servers for interesting routines.
  • The RPC Sniffer tool adds visibility into RPC-related Event Tracing for Windows (ETW) data to provide a near real-time view of active RPC calls. By combining ETW data with RPC server data from NtApiDotNet, we can build a more complete picture of ongoing RPC activity.

Features

Disclaimer: Please exercise caution whenever interacting with system services. It is possible to corrupt the system state or cause a system crash if RPCI is not used correctly.

Prerequisites and System Requirements

Currently, RPCI requires the following:

By default, RPCI will automatically discover the Debugging Tools for Windows installation directory and configure itself to use the public Windows symbol server. You can modify these settings by clicking Edit -> Settings. In the Settings dialog, you can specify the path to the debugging tools DLL (dbghelp.dll) and customize the symbol server and local symbol directory if needed (for example, you can specify the path srv*c:\symbols*https://msdl.microsoft.com/download/symbols).

If you want to observe the debug output that is written to the RPCI log, set the appropriate trace level in the Settings window. The RPCI log and all other related files are written to the current user’s application data folder, which is typically C:\Users\(user)\AppData\Roaming\RpcInvestigator. To view this folder, simply navigate to View -> Logs. However, we recommend disabling tracing to improve performance.

It’s important to note that the bitness of RPCI must match that of the system: if you run 32-bit RPCI on a 64-bit system, only RPC servers hosted in 32-bit processes or binaries will be accessible (which is most likely none).

Searching for RPC servers

The first thing you’ll want to do is find the RPC servers that are running on your system. The most straightforward way to do this is to query the RPC endpoint mapper, a persistent service provided by the operating system. Because most local RPC servers are actually ALPC servers, this query is exposed via the File -> All RPC ALPC Servers… menu item.

The discovered servers are listed in a table view according to the hosting process, as shown in the screenshot above. This table view is one starting point for navigating RPC servers in RPCI. Double-clicking a particular server will open another tab that lists all endpoints and their corresponding interface IDs. Double-clicking an endpoint will open another tab that lists all procedures that can be invoked on that endpoint’s interface. Right-clicking on an endpoint will open a context menu that presents other useful shortcuts, one of which is to create a new client to connect to this endpoint’s interface. We’ll describe that feature in a later section.

You can locate other RPC servers that are not running (or are not ALPC) by parsing the server’s image by selecting File -> Load from binary… and locating the image on disk, or by selecting File->Load from service… and selecting the service of interest (this will parse all servers in all modules loaded in the service process).

Exploring the Library

The other starting point for navigating RPC servers is to load the library view. The library is a file containing serialized .NET objects for every RPC server you have discovered while using RPCI. Simply select the menu item Library -> Servers to view all discovered RPC servers and Library -> Procedures to view all discovered procedures for all server interfaces. Both menu items will open in new tabs. To perform a quick keyword search in either tab, simply right-click on any row and type a search term into the textbox. The screenshot below shows a keyword search for “()” to quickly view procedures that have zero arguments, which are useful starting points for experimenting with an interface.

The first time you run RPCI, the library needs to be seeded. To do this, navigate to Library -> Refresh, and RPCI will attempt to parse RPC servers from all modules loaded in all processes that have a registered ALPC server. Note that this process could take quite a while and use several hundred megabytes of memory; this is because there are thousands of such modules, and during this process the binaries are re-mapped into memory and the public Microsoft symbol server is consulted. To make matters worse, the Dbghelp API is single-threaded and I suspect Microsoft’s public symbol server has rate-limiting logic.

You can periodically refresh the database to capture any new servers. The refresh operation will only add newly-discovered servers. If you need to rebuild the library from scratch (for example, because your symbols were wrong), you can either erase it using the menu item Library -> Erase or manually delete the database file (rpcserver.db) inside the current user’s roaming application data folder. Note that RPC servers that are discovered by using the File -> Load from binary… and File -> Load from service… menu items are automatically added to the library.

You can also export the entire library as text by selecting Library -> Export as Text.

Creating a New RPC Client

One of the most powerful features of RPCI is the ability to dynamically interact with an RPC server of interest that is actively running. This is accomplished by creating a new client in the Client Workbench window. To open the Client Workbench window, right-click on the server of interest from the library servers or procedures tab and select New Client.

The workbench window is organized into three panes:

  • Static RPC server information
  • A textbox containing dynamic client output
  • A tab control containing client code and procedures tabs

The client code tab contains C# source code for the RPC client that was generated by NtApiDotNet. The code has been modified to include a “Run” function, which is the “entry point” for the client. The procedures tab is a shortcut reference to the routines that are available in the selected RPC server interface, as the source code can be cumbersome to browse (something we are working to improve!).

The process for generating and running the client is simple:

  • Modify the “Run” function to call one or more of the procedures exposed on the RPC server interface; you can print the result if needed.
  • Click the “Run” button.
  • Observe any output produced by “Run”

In the screenshot above, I picked the “Host Network Service” RPC server because it exposes some procedures whose names imply interesting administrator capabilities. With a few function calls to the RPC endpoint, I was able to interact with the service to dump the name of what appears to be a default virtual network related to Azure container isolation.

Sniffing RPC Traffic with ETW Data

Another useful feature of RPCI is that it provides visibility into RPC-related ETW data. ETW is a diagnostic capability built into the operating system. Many years ago ETW was very rudimentary, but since the Endpoint Detection and Response (EDR) market exploded in the last decade, Microsoft has evolved ETW into an extremely rich source of information about what’s going on in the system. The gist of how ETW works is that an ETW provider (typically a service or an operating system component) emits well-structured data in “event” packets and an application can consume those events to diagnose performance issues.

RPCI registers as a consumer of such events from the Microsoft-RPC (MSRPC) ETW provider and displays those events in real time in either table or graph format. To start the RPC Sniffer tool, navigate to Tools -> RPC Sniffer… and click the “play” button in the toolbar. Both the table and graph will be updated every few seconds as events begin to arrive.

The events emitted by the MSRPC provider are fairly simple. The events record the results of RPC calls between a client and server in RpcClientCall and RpcServerCall start and stop task pairs. The start events contain detailed information about the RPC server interface, such as the protocol, procedure number, options, and authentication used in the call. The stop events are typically less interesting but do include a status code. By correlating the call start/stop events between a particular RPC server and the requesting process, we can begin to make sense of the operations that are in progress on the system. In the table view, it’s easier to see these event pairs when the ETW data is grouped by ActivityId (click the “Group” button in the toolbar), as shown below.

The data can be overwhelming, because ETW is fairly noisy by design, but the graph view can help you wade through the noise. To use the graph view, simply click the “Node” button in the toolbar at any time during the trace. To switch back to the table view, click the “Node” button again.

A long-running trace will produce a busy graph like the one above. You can pan, zoom, and change the graph layout type to help drill into interesting server activity. We are exploring additional ways to improve this visualization!

In the zoomed-in screenshot above, we can see individual service processes that are interacting with system services such as Base Filtering Engine (BFE, the Windows Defender firewall service), NSI, and LSASS.

Here are some other helpful tips to keep in mind when using the RPC Sniffer tool:

  • Keep RPCI diagnostic tracing disabled in Settings.
  • Do not enable ETW debug events; these produce a lot of noise and can exhaust process memory after a few minutes.
  • For optimum performance, use a release build of RPCI.
  • Consider docking the main window adjacent to the sniffer window so that you can navigate between ETW data and library data (right-click on a table row and select Open in library or click on any RPC node while in the graph view).
  • Remember that the graph view will refresh every few seconds, which might cause you to lose your place if you are zooming and panning. The best use of the graph view is to take a capture for a fixed time window and explore the graph after the capture has been stopped.

What’s Next?

We plan to accomplish the following as we continue developing RPCI:

  • Improve the code editor in the Client Workbench
  • Improve the autogeneration of names so that they are more intuitive
  • Introduce more developer-friendly coding features
  • Improve the coverage of RPC/ALPC servers that are not registered with the endpoint mapper
  • Introduce an automated ALPC port connector/scanner
  • Improve the search experience
  • Extend the graph view to be more interactive

Related Research and Further Reading

Because MSRPC has been a popular research topic for well over a decade, there are too many related resources and research efforts to name here. We’ve listed a few below that we encountered while building this tool:

If you would like to see the source code for other related RPC tools, we’ve listed a few below:

If you’re unfamiliar with RPC internals or need a technical refresher, we recommend checking out one of the authoritative sources on the topic, Alex Ionescu’s 2014 SyScan talk in Singapore, “All about the RPC, LRPC, ALPC, and LPC in your PC.”

Announcing a stable release of sigstore-python

By William Woodruff

Read the official announcement on the Sigstore blog as well!

Trail of Bits is thrilled to announce the first stable release of sigstore-python, a client implementation of Sigstore that we’ve been developing for nearly a year! This work has been graciously funded by Google’s Open Source Security Team (GOSST), who we’ve also worked with to develop pip-audit and its associated GitHub Actions workflow.

If you aren’t already familiar with Sigstore, we’ve written an explainer, including an explanation of what Sigstore is, how you can use it on your own projects, and how tools like sigstore-python fit into the overall codesigning ecosystem.

If you want to get started, it’s a single pip install away:

$ echo 'hello sigstore' > hello.txt
$ python -m pip install sigstore
$ sigstore sign hello.txt
$ sigstore verify identity hello.txt \
    --cert-identity 'foo@example.com' \
    --cert-oidc-issuer 'https://example.com'

A usable, reference-quality Sigstore client implementation

Our goals with sigstore-python are two-fold:

  • Usability: sigstore-python should provide an extremely intuitive CLI and API, with 100 percent documentation coverage and practical examples for both.
  • Reference-quality: sigstore-python is just one of many Sigstore clients being developed, including for ecosystems like Go, Ruby, Java, Rust, and JavaScript. We’re not the oldest implementation, but we’re aiming to be one of the most authoritative in terms of succinctly and correctly implementing the intricacies of Sigstore’s security model.

We believe we’ve achieved both of these goals with this release. The rest of this post will show off demonstrate how we did so!

Usability: sigstore-python is for everybody

The sigstore CLI

One of the Sigstore project’s mottos is “Software Signing for Everybody,” and we want to stay true to that with sigstore-python. To that end, we’ve designed a public Python API and sigstore CLI that abstract the murkier cryptographic bits away, leaving the two primitives that nearly every developer is already familiar with: signing and verifying.

To get started, we can install sigstore-python from PyPI, where it’s available as sigstore:

$ python -m pip install sigstore
$ sigstore --version
sigstore 1.0.0

From there, we can create an input to sign, and use sigstore sign to perform the actual signing operation:

$ echo "hello, i'm signing this!" > hello.txt
$ sigstore sign hello.txt

Waiting for browser interaction...
Using ephemeral certificate:
-----BEGIN CERTIFICATE-----
MIICwDCCAkagAwIBAgIUOZ3vPindiCHATxvCRQk/TC5WAd0wCgYIKoZIzj0EAwMw
NzEVMBMGA1UEChMMc2lnc3RvcmUuZGV2MR4wHAYDVQQDExVzaWdzdG9yZS1pbnRl
cm1lZGlhdGUwHhcNMjMwMTEwMTkzNDI5WhcNMjMwMTEwMTk0NDI5WjAAMHYwEAYH
KoZIzj0CAQYFK4EEACIDYgAETb8dcUgXs31y6tjgsVy8KwfMEzVvhUVs7jlzcwkN
MLICjVvblYtWfFReYMEN8rM8mfglyAwcW+qY/I3klMnMcf/bna/yazzP7Mnnh1g1
dzlOXh14C9iZMDPIV0KHH5u2o4IBSDCCAUQwDgYDVR0PAQH/BAQDAgeAMBMGA1Ud
JQQMMAoGCCsGAQUFBwMDMB0GA1UdDgQWBBQdX9zi1TPEHw2uAkqaCE2ecWMLTDAf
BgNVHSMEGDAWgBTf0+nPViQRlvmo2OkoVaLGLhhkPzAjBgNVHREBAf8EGTAXgRV3
aWxsaWFtQHlvc3Nhcmlhbi5uZXQwLAYKKwYBBAGDvzABAQQeaHR0cHM6Ly9naXRo
dWIuY29tL2xvZ2luL29hdXRoMIGJBgorBgEEAdZ5AgQCBHsEeQB3AHUA3T0wasbH
ETJjGR4cmWc3AqJKXrjePK3/h4pygC8p7o4AAAGFnS1KGwAABAMARjBEAiAns85i
YPmlq9RWfJOUwCRN4y5Lwvk3/Y1cWB9wNW4XMwIgBRfib3YbotTgGpB16F/5uf7r
mO2Jc7e0yElimghFFmkwCgYIKoZIzj0EAwMDaAAwZQIxAOh0Ob8Mi2lENgRNjMRe
L8r8rBoVRSi8BzJHcKAe+eTwLsjvsdryJ0yKg5HVHc2erQIwNpdUXD71OPqs3QQ4
Ka+Q2Pjcs+GV5TvaecGzJuQGbm2J5ZW5raPJrXngEGUldt0U
-----END CERTIFICATE-----

Transparency log entry created at index: 10892071
Signature written to hello.txt.sig
Certificate written to hello.txt.crt
Rekor bundle written to hello.txt.rekor

On your desktop this will produce an OAuth2 flow that prompts you for authentication, while on supported CI providers it’ll intelligently select an ambient OpenID Connect identity!

This will produce three outputs:

  • hello.txt.sig: the signature for hello.txt itself
  • hello.txt.crt: a certificate for the signature, containing the public key needed to verify the signature
  • hello.txt.rekor: an optional “offline Rekor bundle” that can be used during verification instead of accessing an online transparency log

Verification looks almost identical to signing, since the sigstore CLI intelligently locates the signature, certificate, and optional Rekor bundle based on the input’s filename. To actually perform the verification, we use the sigstore verify identity subcommand:

$ # finds hello.txt.sig, hello.txt.crt, hello.txt.rekor
$ sigstore verify identity hello.txt \
    --cert-identity foo@example.com \
    --cert-oidc-issuer https://github.com/login/oauth
OK: hello.txt

(What’s with the extra flags? Without them, we’d just be verifying the signature and certificate, and anybody can get a valid signature for any public input in Sigstore. To make sure that we’re actually verifying something meaningful, the sigstore CLI forces you to assert which identity the signature is expected to be bound to, which is then checked during certificate verification!)

However, that’s not all! Sigstore is not just for email identities; it also supports URI identities, which can correspond to a particular GitHub Actions workflow run, or some other machine identity. We can do more in-depth verifications of these signatures using the sigstore verify github subcommand, which allows us to check specific attestations made by the GitHub Actions runner environment:

$ # change this to any version!
$ v=0.10.0
$ repo=https://github.com/sigstore/sigstore-python
$ release="${repo}/release/download"
$ sha=66581529803929c3ccc45334632ccd90f06e0de4


$ # download a distribution + certificate and signature
$ wget ${release}/v${v}/sigstore-${v}.tar.gz{,.crt,.sig}

$ # verify extended claims
$ sigstore verify github sigstore-${v}.tar.gz \
    --cert-identity \
      "${repo}/.github/workflows/release.yml@refs/tags/v${v}" \
    --sha ${sha} \
    --trigger release

This goes well beyond what we can prove with just a bare sigstore verify identity command: we’re now asserting that the signature was created by a release-triggered workflow run against commit 66581529803929c3ccc45334632ccd90f06e0de4, meaning that even if an attacker somehow managed to compromise our repository’s actions and sign for new inputs, they still couldn’t fool us into accepting the wrong signature for this release!

(--sha and --trigger are just a small sample of the claims that can be verified via sigstore verify github: check the README for even more!)

The brand-new sigstore Python APIs

In addition to the CLIs above, we’ve stabilized a public Python API! You can use this API to do everything that the sigstore CLI is capable of, as well as more advanced verification techniques (such as complex logical chains of “policies”).

Using the same signing example above, but with the Python APIs instead:

import io

from sigstore.sign import Signer
from sigstore.oidc import Issuer

contents = io.BytesIO(b"hello, i'm signing this!")

# NOTE: identity_token() performs an interactive OAuth2 flow;
# see other members of `sigstore.oidc` for other credential
# mechanisms.
issuer = Issuer.production()
token = issuer.identity_token()


signer = Signer.production()
result = signer.sign(input_=contents, identity_token=token)
print(result)

And the same identity-based verification:

import base64
from pathlib import Path

from sigstore.verify import Verifier, VerificationMaterials
from sigstore.verify.policy import Identity

artifact = Path("hello.txt").open()
cert = Path("hello.txt.crt").read()
signature = Path("hello.txt.sig").read_bytes()
materials = VerificationMaterials(
    input_=artifact,
    cert_pem=cert,
    signature=base64.b64decode(signature),
    offline_rekor_entry=None,
)

verifier = Verifier.production()

result = verifier.verify(
    materials,
    Identity(
        identity="foo@example.com",
        issuer="https://github.com/login/oauth",
    ),
)
print(result)

The Identity policy corresponds to the sigstore verify identity subcommand, and hints at the Python API’s ability to express more complex relationships between claims. For example, here is how we could write the sigstore verify github verification from above:

from sigstore.verify import Verifier
from sigstore.verify.policy import (
    AllOf,
    GitHubWorkflowSHA,
    GitHubWorkflowTrigger,
    Identity
)

materials = ...

verifier = Verifier.production()

result = verifier.verify(
    materials,
    AllOf(
        [
            Identity(identity="...", issuer="..."),
            GitHubWorkflowSHA(
                "66581529803929c3ccc45334632ccd90f06e0de4"
            ),
            GitHubWorkflowTrigger("release"),
        ]
    )
)

…representing a logical AND between all sub-policies.

What’s next?

We’re making a commitment to semantic versioning for sigstore-python’s API and CLI: if you depend on sigstore~=1.0 in your Python project, you can safely assume that we will not make changes that break either without a major version bump.

With that in mind, a stable API enables many of our near-future goals for Sigstore in the Python packaging ecosystem: further integration into PyPI and the client-side packaging toolchain, as well as stabilization of our associated GitHub Action.

Work with us!

Trail of Bits is committed to the long term stability and expansion of the Sigstore ecosystem. If you’re looking to get involved in Sigstore or are working with your company to integrate it into your own systems, get in touch!

Keeping the wolves out of wolfSSL

By Max Ammann

Trail of Bits is publicly disclosing four vulnerabilities that affect wolfSSL: CVE-2022-38152, CVE-2022-38153, CVE-2022-39173, and CVE-2022-42905. The four issues, which have CVSS scores ranging from medium to critical, can all result in a denial of service (DoS). These vulnerabilities have been discovered automatically using the novel protocol fuzzer tlspuffin. This blog post will explore these vulnerabilities, then provide an in-depth overview of the fuzzer.

tlspuffin is a fuzzer inspired by formal protocol verification. Initially developed as part of my internship at LORIA, INRIA, France, it is especially targeted against cryptographic protocols like TLS or SSH.

During my internship at Trail of Bits, we pushed protocol fuzzing even further by supporting a new protocol (SSH), adding more fuzzing targets, and (re)discovering vulnerabilities. This work represents a milestone in the development of the first Dolev-Yao model-guided fuzzer. By supporting an additional protocol, we proved that our fuzzing approach is agnostic with respect to the protocol. Going forward, we aim to support other protocols such as QUIC, OpenVPN, and WireGuard.

Targeting wolfSSL

During my internship at Trail of Bits, we added several versions of wolfSSL as fuzzing targets. The wolfSSL library was an ideal choice because it was affected by two authentication vulnerabilities that were discovered in early 2022 (CVE-2022-25640 and CVE-2022-25638). That meant we could verify that tlspuffin works by using it to rediscover the known vulnerabilities.

As tlspuffin is written in Rust, we first had to write bindings to wolfSSL. While the bindings were being implemented, several bugs were discovered in the OpenSSL compatibility layer that have also been reported to the wolfSSL team. With the bindings ready, we were ready to let the fuzzer do its job: discovering weird states within wolfSSL.

Discovered Vulnerabilities

During my internship, I discovered several vulnerabilities in wolfSSL, which can result in a denial of service (DoS).

  • DOSC: CVE-2022-38153 allows MitM actors or malicious servers to perform a DoS attack against TLS 1.2 clients by intercepting and modifying a TLS packet. This vulnerability affects wolfSSL 5.3.0.
  • DOSS: CVE-2022-38152 is a DoS vulnerability against wolfSSL servers that use the wolfSSL_clear function instead of the sequence wolfSSL_free; wolfSSL_new. Resuming a session causes the server to crash with a NULL-pointer dereference. This vulnerability affects wolfSSL 5.3.0 to 5.4.0.
  • BUF: CVE-2022-39173 is caused by a buffer overflow and causes a DoS of wolfSSL servers. It is caused by pretending to resume a session, and sending duplicate cipher suites in the Client Hello. It might allow an attacker to gain RCE on certain architectures or targets; however, this has not yet been confirmed. Versions of wolfSSL before 5.5.1 are affected.
  • HEAP: CVE-2022-42905 is caused by a buffer overread while parsing TLS record headers. Versions of wolfSSL before 5.5.2 are affected.

“A few CVEs for wolfSSL, one giant leap for tlspuffin.”

The vulnerabilities mark a milestone for the fuzzer: They are the first vulnerabilities found using this tool that have a far-reaching impact. We can also confidently say that this vulnerability would not have been easy to find with classical bit-level fuzzers. It’s especially intriguing that on average, the fuzzer took less than one hour to discover a vulnerability and crash.

While preparing the fuzzing setup for wolfSSL, we also discovered a severe memory leak that was caused by misuse of the wolfSSL API. This issue was reported to the wolfSSL team, changed their documentation to help users avoid the leak. Additionally, several other code-quality issues have been reported to wolfSSL, and their team fixed all of our findings within one week of disclosure. If a “best coordinated disclosure” award existed, the wolfSSL team would definitely win it.

The following sections will focus on two of the vulnerabilities because of their higher impact and expressive attack traces.

DOSC: Denial of service against clients

In wolfSSL 5.3.0, MiTM attackers or malicious servers can crash TLS clients. The bug lives in the AddSessionToCache function, which is called when the client receives a new session ticket from the server.

Let’s assume that each bucket of the session cache of wolfSSL contains at least one entry. As soon as a new session ticket arrives, the client will reuse a previously stored cache entry to try to cache it in the session cache. Additionally, because the new session ticket is quite large at 700 bytes, it will be allocated on the heap using XMALLOC.

In the following example, SESSION_TICKET_LEN is 256:

if (ticLen > SESSION_TICKET_LEN) {
    ticBuff = (byte*)XMALLOC(ticLen, NULL,
            DYNAMIC_TYPE_SESSION_TICK);
    …
}

ssl.c:13442

This allocation leads to the initialization of cacheTicBuff, as ticBuff is already initialized, cacheSession->ticketLenAlloc is 0, and ticLen is 700:

if (ticBuff != NULL && cacheSession->ticketLenAlloc < ticLen) { 
    cacheTicBuff = cacheSession->ticket;
    …
}

ssl.c:13500

The cacheTicBuff is set to the ticket of a previous session, cacheSession->ticket. The memory to which cacheTicBuff points is not allocated on the heap; in fact, cacheTicBuff points to cacheSession->_staticTicket. This is problematic because the cacheTicBuff is later freed if it is not null.

if (cacheTicBuff != NULL)
     XFREE(cacheTicBuff, NULL, DYNAMIC_TYPE_SESSION_TICK);

ssl.c:13557

The process terminates by executing the XFREE function, as the passed pointer is not allocated on the heap.

Note that the ticket length in itself is not the cause of the crash. This vulnerability is quite different to Heartbleed, the buffer over-read vulnerability discovered in OpenSSL. With wolfSSL, a crash is caused not by overflowing buffers but by a logical bug.

Finding weird states

The fuzzer discovered the vulnerability in about one hour. The fuzzer modified the NewSessionTicket (new_message_ticket) message by replacing an actual ticket with a large array of 700 bytes (large_bytes_vec). This mutation of an otherwise-sane trace leads to a call of XFREE on a non-allocated value. This eventually leads to a crash of the client that receives such a large ticket.

Visualized exploit for DOSC (CVE-2022-38153). Each box represents a TLS message. Each message is composed of different fields like a protocol version or a vector of cipher suites. The visualization was generated using the tlspuffin fuzzer and mirrors the structure of the DY attacker traces which will be introduced in the next section.

A single execution of the above trace is not enough to reach the vulnerable code. As the bug resides in the session cache of wolfSSL, we need to let the client cache fill up in order to trigger the crash. Empirically, we discovered that about 30 prior connections are needed to reliably crash them. The reason for the random behavior is that the cache consists of multiple rows or buckets; the default compilation configuration of wolfSSL contains 11 buckets. Based on the hash of the TLS session ID, sessions are stored in one of these buckets. The DoS is triggered only if the current bucket already contains a previous session.

Reproducing this vulnerability is difficult, as a prepared state is required to reach the behavior. In general, a global state such as the wolfSSL cache makes fuzzing more difficult to apply. Ideally, one might assume that each execution of a program yields the same outputs given the identical inputs. Reproduction and debugging become more challenging if this assumption is violated because the program uses a global state; this represents a general challenge when fuzzing unknown targets.

Fortunately, tlspuffin allows researchers to recreate a program state that is similar to the one that was present when the fuzzer observed a crash. We were able to re-execute all the traces that the fuzzer rated as interesting, which allowed us to observe the crash of wolfSSL in a more controlled environment and to debug wolfSSL using GDB. After analyzing the call stack that led to the invalid free, it was clear that the bug was related to the session cache.

The root cause for DOSC lies in the usage of a shared global state. It was very surprising to find that wolfSSL shares the state between multiple invocations of the library. Conceptually, the lifetime of the session cache should be bound to the TLS context, which already serves as a container for TLS session. Each SSL session shares the state with the TLS context. The addition of maintaining a global mutable state increases complexity throughout a codebase. Therefore, it should be used only when absolutely necessary.

BUF: Buffer overflow on servers

In versions of wolfSSL before 5.5.1, malicious clients can cause a buffer overflow during a resumed TLS 1.3 handshake. If an attacker resumes or pretends to resume a previous TLS session by sending a maliciously crafted Client Hello followed by another maliciously crafted Client Hello, then a buffer overflow is possible. A minimum of two Client Hellos must be sent: one that pretends to resume a previous session, and a second as a response to a Hello Retry Request message.

The malicious Client Hellos contain a list of supported cipher suites, which contain at least ⌊sqrt(150)⌋ + 1 = 13 duplicates and fewer than 150 ciphers in total. The buffer overflow occurs in the second invocation RefineSuites function during a handshake.

/* Refine list of supported cipher suites to those common to server and 
client.
*
* ssl         SSL/TLS object.
* peerSuites  The peer's advertised list of supported cipher suites.
*/
static void RefineSuites(WOLFSSL* ssl, Suites* peerSuites)
{
    byte   suites[WOLFSSL_MAX_SUITE_SZ];
    word16 suiteSz = 0;
    word16 i, j;

  XMEMSET(suites, 0, WOLFSSL_MAX_SUITE_SZ);

  for (i = 0; i < ssl->suites->suiteSz; i += 2) {
      for (j = 0; j < peerSuites->suiteSz; j += 2) {
          if (ssl->suites->suites[i+0] == peerSuites->suites[j+0] &&
              ssl->suites->suites[i+1] == peerSuites->suites[j+1]) {
              suites[suiteSz++] = peerSuites->suites[j+0];
              suites[suiteSz++] = peerSuites->suites[j+1];
          }
      }
  }

  ssl->suites->suiteSz = suiteSz;
  XMEMCPY(ssl->suites->suites, &suites, sizeof(suites));
#ifdef WOLFSSL_DEBUG_TLS
[...]
#endif
} 

tls13.c:4355

The RefineSuites function expects a struct WOLFSSL that contains a list of acceptable ciphers suites at ssl->suites, as well as an array of peer cipher suites. Both inputs are bounded by WOLFSSL_MAX_SUITE_SZ, which is equal to 150 cipher suites or 300 bytes.

Let us assume that ssl->suites consists of a single cipher suite like TLS_AES_256_GCM_SHA384 and that the user-controllable peerSuites list contains the same cipher repeated 13 times. The RefineSuites function will iterate for each suite in ssl->suites over peerSuites and append the suite to the suites array if it is a match. The suites array has a maximum length of WOLFSSL_MAX_SUITE_SZ suites.

With the just-mentioned input, the length of suites equals now 13. The suites array is now copied to the struct WOLFSSL in the last line of the listing above. Therefore, the ssl->suites array now contains 13 TLS_AES_256_GCM_SHA384 cipher suites.

During a presumably resumed TLS handshake, the RefineSuites function is called again if a Hello Retry Request is triggered by the client. The struct WOLFSSL is not reset in between and keeps the previous suites of 13 cipher suites. Because the TLS peer controls the peerSuites array, we assume that it again contains 13 duplicate cipher suites.

The RefineSuites function will iterate for each element in ssl->suites over peerSuites and append the suite to suites if it is a match. Because the ssl->suites
array contains already 13 TLS_AES_256_GCM_SHA384 cipher suites, in total 13 x 13 = 169 cipher suites are written to suites. The 169 cipher suites exceed the allocated maximum allowed WOLFSSL_MAX_SUITE_SZ cipher suites. The suites buffer overflows on the stack.

So far, we have been unable to exploit this bug and, for example, gain remote code execution because the set of bytes that can overflow the suites buffer is small. Only valid cipher suite values can overflow the buffer.

Because of space constraints, we are not providing a detailed review of the mutations that are required in order to mutate a sane trace to an attack trace, as we did with DOSC.

To understand how we found these vulnerabilities, it is worth examining how tlspuffin was developed.

Next Generation Protocol Fuzzing

History has proven that the implementation of cryptographic protocols is prone to errors. It’s easy to introduce logical flaws when translating specifications like RFC or scientific articles to actual program code. In 2017, researchers discovered that the well-known WPA2 protocol suffered severe flaws (KRACK). Vulnerabilities like FREAK, or authentication vulnerabilities like the wolfSSL bugs found in early 2022 (CVE-2022-25640 and CVE-2022-25638), support this idea.

It is challenging to fuzz implementations of cryptographic protocols. Unlike traditional fuzzing of file formats, cryptographic protocols require a specific flow of cryptographic and mutually dependent messages to reach deep protocol states.

Additionally, detecting logical bugs is a challenge on its own. The AddressSanitizer enables security researchers to reliably find memory-related issues. For logical bugs like authentication bypasses or loss of confidentiality no automated detectors exist.

These challenges are why I and Inria set out to design tlspuffin. The fuzzer is guided by the so-called Dolev-Yao model, which has been used in formal protocol verification since the 1980s.

The Dolev-Yao Model

Formal methods have become an essential tool in the security analysis of cryptographic protocols. Modern tools like ProVerif or Tamarin feature a fully automated framework to model and verify security protocols. The ProVerif manual and DEEPSEC paper provide a good introduction to protocol verification. The underlying theory of these tools uses a symbolic model—the Dolev-Yao model—that originates from the work of Dolev and Yao.

With Dolev-Yao models, attackers have full control over the messages being sent within the communication network. Messages are modeled symbolically using a term algebra, which consists of a set of function symbols and variables. This means that messages can be represented by applying functions over variables and other functions.

An adversary can eavesdrop on, inject, or manipulate messages; the Dolev-Yao model is meant to simulate real-world attacks on these protocols, such as Man-in-the-Middle (MitM)-style attacks. The cryptographic primitives are modeled through abstracted semantics because the Dolev-Yao model focuses on finding logical protocol flaws and is not concerned with correctness of cryptographic primitives. Because the primitives are described through an abstract semantic, there is no real implementation of, for example, RSA or AES defined in the Dolev-Yao model.

It was already possible to find attacks in the cryptographic protocols using this model. The TLS specification has already undergone various analyses by these tools in 2006 and 2017, which led to fixes in RFC drafts. But in order to fuzz implementations of protocols, instead of verifying their specification, we need to do things slightly differently. We chose to replace the abstract semantics with a more concrete one which includes implementations of primitives.

The tlspuffin fuzzer was designed based on the Dolev-Yao model and guided by the symbolic formal model, which means that it can execute any protocol flow that is representable in the Dolev-Yao model. It can also generate previously unseen protocol executions. The following section explains the notion of Dolev-Yao traces, which are loosely based on the Dolev-Yao model.

Dolev-Yao Traces

Dolev-Yao traces build on top of the Dolev-Yao model and also use a term algebra to represent messages symbolically. Just like in the Dolev-Yao model, the cryptographic primitives are treated as black boxes. This allows the fuzzer to focus on logical bugs, instead of testing cryptographic primitives for their correctness.

Let’s start with an example of the infamous Needham-Schröder protocol. If you aren’t familiar, Needham-Schröder is an authentication protocol that allows two parties to establish a shared secret through a trusted server; however, its asymmetric version is infamous for being susceptible to an MitM attack.

The protocol allows Alice and Bob to create a shared secret through a trusted third-party server. The protocol works by requesting a shared secret from the server that is encrypted once for Bob and once for Alice. Alice can request a fresh secret from the server and will receive an encrypted message that contains the shared secret and a further encrypted message addressed to Bob. Alice will forward the message to Bob. Bob can now decrypt the message and also has access to the shared secret.

The flaw in the protocol allows an imposter to impersonate Alice by first initiating a connection with Alice and then forwarding the received data to Bob. (For a deeper understanding of the protocol, we suggest reading its Wikipedia article.)

In the below Dolev-Yao trace T, we model one specific execution of the Needham-Schröder protocol between the two agents with the names a and b. Each agent has an underlying implementation. The trace consists of a concatenation of steps that are delimited by a dot. There are two kinds of steps: input and output. Output steps are denoted by a bar above the agent name.

Dolev-Yao attack trace for the Needham-Schröder protocol

Let’s now describe the semantics of trace T. (A deep understanding of the steps of this protocol is not needed. This example should just give you a feeling about the expressiveness of the Dolev-Yao model and what a Dolev-Yao trace is.)

In the first step, we send the term pk(sk_E) to agent a. Agent a will serialize the term and provide it to its underlying implementation of Needham-Schröder.

Next, we let the agent a output a bitstring and bind it to h_1. By following the steps in the Dolev-Yao trace, we can observe that we now send the term aenc(adec(h_1, sk_E), pk(sk_B)) to agent b.

Next, we let agent b’s underlying implementation output a bitstring and bind it to h_2. The next two steps forward the message h_2 to agent a and bind its new output to h_3. Finally, we repeat the third and fourth step for a different input, namely h_3, and send the term h_3 to agent a.

Such traces allow us to model arbitrary execution flows of cryptographic protocols. The trace above models an MitM attack, originally discovered by Gavin Lowe. A fixed version of the protocol is known as the Needham-Schroeder-Lowe protocol.

TLS 1.3 Handshake Protocol

Before providing an example for a modern cryptographic protocol, I quickly want to explain the different phases of a TLS handshake.

Overview of the phases of a TLS handshake

  1. Key exchange: Establish shared keys and select the cryptographic methods and parameters. Both messages in this phase are not encrypted.
  2. Server parameters: Exchange further parameters that are no longer sent in plaintext.
  3. Server authentication: Authenticate the server by confirming keys and handshake integrity.
  4. Client authentication: Optionally, authenticate the client by confirming keys and handshake integrity.

Just like in the Needham-Schröder example, each message of the TLS handshake can be represented by a symbolic term. For example, the first Client Hello message can be represented as the term fn_client_hello(fn_key_share, fn_signature_algorithm, psk). In this example, fn_key_share, fn_signature_algorithm, and psk are constants.

For a more in-depth review of the handshake message, Section 2 of RFC 8446 explains each message in more detail.

Fuzzing Dolev-Yao Traces

The tlspuffin fuzzer implements Dolev-Yao traces and allows their execution in concrete fuzzing targets like OpenSSL, wolfSSL, and libssh.

Structure of tlspuffin. It follows the best-practices defined by LibAFL.

The design of tlspuffin is based on the evolutionary fuzzer LibAFL. The fuzzer uses several concepts, which are illustrated in the following sections. We will follow traces on their journey from being picked from a seed corpus until they are mutated, executed, observed, and eventually become an attack trace.

Seed Corpus

Initially, the seed corpus contains some handcrafted traces that represent some common attack scenarios (e.g., client/server is the attacker or the MitM is the attacker).

Scheduler and Mutational Stage

The scheduler picks seeds based on a heuristic; for example, the scheduler might prefer shorter and more minimal traces. After that, the picked traces are mutated. This means that messages are skipped or repeated or their contents are changed. Because we are using a Dolev-Yao model to represent messages, we can change fields of messages by swapping sub terms or changing function symbols.

Executor, Feedback, and Objectives

After the traces have been mutated, they are sent to the executor. The executor is responsible for executing the traces in actual implementations such as OpenSSL or wolfSSL, where they are executed in either the same process or a fork for each input. The executor is also responsible for collecting observations about the execution. An observation is classified as feedback if it contains information about newly discovered code edges in terms of coverage. For example, if the trace made the fuzzing target crash or an authentication bypass was witnessed, the trace is classified as an objective. The observation is then either added to the seed corpus or the objective corpus based on how it was classified.

Finally, we can repeat the process and start picking new traces from the seed corpus. This algorithm is quite common in fuzzing and is closely related to the approach of the classical AFL fuzzer. (For a more in-depth explanation of this particular algorithm, refer to the preprint LibAFL: A Framework to Build Modular and Reusable Fuzzers.)

Internship Highlights

During my internship, we added several new features to tlspuffin that extended the tool in several dimensions, which are:

  • Protocol implementations,
  • Cryptographic protocols,
  • Detection of security violations, and
  • Reproducibility of vulnerabilities.

Toward more Fuzzing Targets

Before my internship at Trail of Bits, tlspuffin already supported fuzzing several versions of OpenSSL (including the version 1.0.1, which is vulnerable to Heartbleed) and LibreSSL. We designed an interface that added the capability to fuzz arbitrary protocol libraries. By implementing the interface for wolfSSL, we were able to add support for fuzzing wolfSSL 4.3.0 to 5.4.0, even though wolfSSL is not ABI compatible with OpenSSL or LibreSSL. Because the interface is written in Rust, implementing it for wolfSSL required us to create Rust bindings. The great thing about this is that the wolfSSL bindings could be reused outside of tlspuffin for embedded software projects. We released open-source wolfSSL bindings on GitHub.

This represents a milestone in library support. Previously, the tlspuffin was bound to the OpenSSL API, which is supported only by LibreSSL and OpenSSL. With this interface, it will be possible to support arbitrary future fuzzing targets.

Toward more Protocols

Although tlspuffin was specifically designed for the TLS protocol, it has the capability to support other formats. In fact, any protocol that is formalized in the Dolev-Yao model should also be fuzzable with tlspuffin. We added support for SSH, which required us to abstract over certain protocol primitives such as messages, message parsing, the term algebra, and knowledge queries. The same abstraction we choose for TLS also, for the most part, works for SSH. However, the SSH protocol required a few adjustments because of a stateful serialization of protocol packets.

In order to test the SSH abstractions, we added support for fuzzing libssh (not to be confused with libssh2). As with wolfSSL, one of our first tasks was to create Rust bindings, which we plan to release separately as open-source software in the future.

Toward a better Security Violation Oracle

Detecting security violations other than segmentation faults, buffer overflows, or use-after-free is essential for protocol fuzzers. In the world of fuzzers, an oracle decides whether a specific execution of the program under test reached some objective.

When using sanitizers like AddressSanitizer, buffer overflows or over-reads can make the program crash. In traditional fuzzing, the oracle decides whether the classical objective “program crashed” is fulfilled. This allows oracles to detect not only program crashes caused by segmentation faults, but also memory-related issues.

Many security issues like authentication bypasses or protocol downgrades in TLS libraries do not make themselves obvious by crashing. To address this, tlspuffin features a more sophisticated oracle that can detect protocol-specific problems. This allowed tlspuffin to rediscover not just vulnerabilities like Heartbleed or CVE-2021-3449, but also logical vulnerabilities like FREAK. During my internship, we extended the capabilities of the security violation oracle to include authentication checks, which led us to rediscover two authentication bugs in wolfSSL (CVE-2022-25640 and CVE-2022-25638). This indicates that tlspuffin automatically discovered these vulnerabilities without human interaction.

Toward better Reproducibility

If the fuzzer discovers an alleged attack trace, then we as security researchers have to validate the finding. A good way to verify results is to execute them against an actual target like a TLS server or client over TCP. By using default settings, we can ensure that the setup of the fuzzing target is not causing false positives.

During the internship, we worked on a feature that allows users to execute a Dolev-Yao trace against clients or servers over TCP, which allows us to test attack traces against targets in isolation. One of these targets could be an OpenSSL server that is reachable over TCP. Every OpenSSL installation comes with such a server, which can be started using openssl s_server -key key.pem -cert cert.pem. A similar test server exists for wolfSSL. We can now execute traces through tlspuffin and see if the server crashes, misbehaves, or simply errors.

As described above, this enabled us to verify CVE-2022-38153 and to determine that a crash happens only when using a specific setup of the wolfSSL library.

Conclusion

Considerations for implementation

Despite this work, Dolev-Yao model-guided fuzzing also has drawbacks. Significant effort is required to integrate new fuzzing targets or protocols. Adding support for SSH took roughly five to six weeks, and adding a new fuzzing target took between one and two weeks. Finally, the fuzzer needed to be tested, bugs in the test harness needed to be resolved, and the fuzzer needed to be run for a reasonable length of time; in our case, finding bugs took another week. Note that letting a single instance of the fuzzer run for a long time might not be the best approach. Restarting the fuzzer every few days is a good approach to avoid that the fuzzer gets stuck in a “local minima” with respect to coverage.

Therefore, the overall process of applying Dolev-Yao model-guided fuzzing to an arbitrary cryptographic protocol and arbitrary implementation takes a few months. Based on these estimates, the fuzzing technique is best suited for ubiquitous protocols with multiple implementations like TLS or SSH, where the benefits outweigh the effort.

We noticed that protocol-specific features can increase the complexity of integration. For example, TLS uses transcripts, which can significantly increase the size of protocol messages. We applied a workaround for large transcripts in tlspuffin. In the case of SSH, we observed that message encoding and decoding is stateful, which means that messages are encoded differently based on the protocol state (a different MAC algorithm is used based on negotiated parameters).

On the contrary, testing existing or future TLS or SSH implementations through Dolev-Yao model-guided fuzzing is very promising. Investing a couple of weeks seems reasonable given that once a library is integrated into tlspuffin, it can be fuzzed continuously over many versions.

Usage in test-suites

Developers can also use tlspuffin for writing test suites. It is possible to run traces against libraries, which test for the absence of specific authentication bugs. This allows for the implementation of regression tests to ensure that previous bugs do not occur again. In other words, tlspuffin can be used for the same tasks for which TLS-Attacker is currently used.

Summary

To summarize, Dolev-Yao model-guided fuzzing is a novel and promising technique to fuzz test cryptographic protocols. It has proved its feasibility by rediscovering already-known authentication vulnerabilities and finding new DoS attacks in wolfSSL.

tlspuffin is a good fit for high-impact and widely used protocols like TLS or SSH. Integrating a new protocol into tlspuffin takes significant effort and requires an in-depth understanding of the protocol. In traditional fuzzing, domain-specific knowledge is sometimes relatively unimportant because simple fuzzers in a standard configuration can yield strong results. This advantage is lost if tlspuffin is used for protocols that are not yet supported.

Despite this, tlspuffin shines when it is used on an already-supported protocol. The internet heavily depends on the TLS and SSH protocols, and security issues affecting them have far-reaching implications. If TLS or SSH breaks, then the internet breaks. Luckily, this has not happened yet due to the great work of security researchers around the world. Let’s keep it that way by verifying, testing, and fuzzing cryptographic protocols!

I would like to wholeheartedly thank my mentor, Opal Wright. She supported me throughout my internship and motivated me by giving me plenty of praise for my work. I’d also like to give a great thanks to the entire cryptography team, who provided me with valuable feedback. Last but not least, I would like to thank my friends at INRIA for hosting me last year for my master thesis, which led to the development of tlspuffin. Without their mentorship and fundamental research, this work would not have been possible.

Coordinated disclosure timeline

As part of the disclosure process, we reported four vulnerabilities in total to WolfSSL. The timeline of disclosure and remediation is provided below:

  • August 12, 2022: Contacted wolfSSL support to set up a secure channel.
  • August 12, 2022: Reported CVE-2022-38152 and CVE-2022-38153 to wolfSSL.

For CVE-2022-38152:

  • August 12, 2022: wolfSSL maintainers confirmed and fixed the vulnerability.

For CVE-2022-38153:

  • August 16, 2022: wolfSSL maintainers confirmed the vulnerability.
  • August 17, 2022: wolfSSL maintainers fixed the vulnerability.
  • August 30, 2022: wolfSSL released a fixed version, 5.5.0.
  • September 12, 2022: Reported CVE-2022-39173 to wolfSSL.

For CVE-2022-39173:

  • September 12, 2022: wolfSSL maintainers confirmed and fixed the vulnerability.
  • September 28, 2022: wolfSSL released a fixed version, 5.5.1.
  • October 09, 2022: Reported CVE-2022-42905 to wolfSSL.

For CVE-2022-42905:

  • October 10, 2022: wolfSSL maintainers confirmed and fixed the vulnerability.
  • October 28, 2022: wolfSSL released a fixed version, 5.5.2.

We would like to thank the team at wolfSSL for working swiftly with us to address these issues; they fixed one of the vulnerabilities on the same day it was submitted to them. The people involved at INRIA and Trail of Bits even got some swag delivered in appreciation of the disclosure.

Another prolific year of open-source contributions

By Samuel Moelius

This time last year, we wrote about the more than 190 Trail of Bits-authored pull requests that were merged into non-Trail of Bits repositories in 2021. In 2022, we continued that trend by having more than 400 pull requests merged into non-Trail of Bits repositories!

Why is this significant? While we take great pride in the tools that we develop, we recognize that we benefit from tools maintained outside of Trail of Bits. When one of those tools doesn’t work as we expect, we try to fix it. When a tool doesn’t fill the need we think it was meant to, we try to improve it. In short, we try to give back to the community that gives so much to us.

Here are a few highlights from the list of PRs at the end of this blog post:

The projects named below represent software of the highest quality. Software of this caliber doesn’t come from just merging PRs and publishing new releases; it comes from careful planning, prioritizing features, familiarity with related projects, and an understanding of the role that a project plays within the larger software ecosystem. We thank these projects’ maintainers both for the work the public sees and for innumerable hours spent on work the public doesn’t see.

We wish you a happy, safe, and similarly productive 2023!

Some of Trail of Bits’s 2022 Open-Source Contributions

Cryptography

Tech Infrastructure 

Software testing tools

Blockchain software

How to share what you’ve learned from our audits

By Nick Selby

Trail of Bits recently completed a security review of cURL, which is an amazing and ubiquitous tool for transferring data. We were really thrilled to see cURL founder and lead developer Daniel Stenberg write a blog post about the engagement and the report, and wanted to highlight some important things he pointed out.

In this post, Daniel dives into cURL’s growth since its last audit in 2016: the project; the codebase; and then into the work with Trail of Bits. He touched on both the engagement experience and the final report.

His blog post provides terrific and meaningful context. He gives us high praise, as well as actionable and meaningful critiques that our teams are considering for the future. He also highlights an area in which he disagrees with a finding, providing context on why, and provides links to the responses cURL made to each of the audit points.

We believe software providers should follow Daniel’s lead if they choose to publish their security reviews. This supplementary reading is deeply needed so software developers can provide greater context and clarity around their security decisions. This is a great example of how engineering teams can work with us, and we are very proud of the compliments and cognizant of our responsibility to diligently consider his critiques.

There is one vulnerability highlighted in Daniel’s post that is not included in the final report, because the bug was found after the review ended (our engineers kept a fuzzer rolling after the conclusion of the review). That bug, a use-after-free, is now known as CVE-2022-43552. The details are available on cURL’s website and were released in sync with the patch. Trail of Bits will have a blog post about the bug in the future.

While the bug itself isn’t a critical one, the process Daniel and other cURL maintainers took to fix it is a great example of a commitment to excellence. While some software developers think of discovering and patching vulnerabilities as something akin to failure, we believe it is a hallmark of how developers should handle security issues.

We highly recommend giving the audit report, the threat model, and Daniel’s post a read!

Fast and accurate syntax searching for C and C++

By Mate Kukri

The naive approach to searching for patterns in source code is to use regular expressions; a better way is to parse the code with a custom parser, but both of these approaches have limitations. During my internship, I prototyped an internal tool called Syntex that does searching on Clang ASTs to avoid these limitations. In this post, I’ll discuss Syntex’s unique approach to syntactic pattern matching.

Searching, but with context

Syntex addresses two overarching problems with traditional pattern-searching tools.

First, existing tools are prone to producing false negatives. These tools usually contain their own parsers that they use depending on the language of the codebase they are searching in. For C and C++ codebases, these tools usually parse source code without performing macro expansion, searching through non-macro-expanded code instead of the macro-expanded code that a real compiler like Clang would produce. This means these tools cannot ensure accurate results. A client of such a tool won’t be able to confidently say, “here are all the occurrences of this pattern” or “this pattern never occurs.” In theory, these tools could match uses of macros in non-macro-expanded code, but in practice, they would be able to match only top-level uses of macros, meaning that false negatives are likely.

Another problem with these tools is that their internal parsers do not use the same representation of the code as a real compiler would, and they do not have an understanding of the semantics of the source code. That is, these tools produce plaintext output highlighting their results, so they cannot provide semantic information about the code in which their results appear. Without such information, it is difficult to further analyze the output, especially using other analysis tools. It is not strictly speaking impossible to access the source code internally parsed by these tools, but it would not be particularly useful in a multi-stage analysis pipeline requiring access to semantic information available only to a compiler. All this severely limits these tools’ usefulness as a foundation on which to use other analysis tools to more deeply analyze the given code.

For instance, let’s say we are trying to find code in which the length of a string in the argument list of a call to some risky function is implicitly truncated. We might have our tool search for the pattern $func($... $str->len $...). The tool will likely find a superset of code snippets that we actually care about. We ought to be able to semantically filter these search results to check that len is the structure field of interest and that its use induces an implicit downcast. However, whatever tool we choose to use would not be able to do this filtering because it would understand only the structure of the code, not the semantics. And because of its lack of semantic understanding, it’s more difficult to introduce other analysis tools to help further analyze the results.

Syntex solves both of these problems by operating on actual Clang ASTs. Because Syntex uses the same ASTs that the compiler uses, it eliminates the inaccuracies of typical pattern-searching tools and provides semantic information for further analysis. Syntex produces results with references to AST nodes, allowing developers to conduct follow-up semantic analysis. For instance, a client enumerating the downcast pattern above will be able to make decisions based on the type and nature of the submatches corresponding to both func and str.

Syntex matches syntax patterns against indexed code

Parsing C and C++ code is a notoriously difficult task, in that it requires implementing unbounded lookaheads and executing Turing-complete templates just to obtain a parse tree. Syntex solves the problem of parsing source code by relying on Clang, but how does it parse Syntex queries themselves?

Aside from queries containing $-prefixed meta variables, Syntex queries are syntactically valid C and C++ code. Ideally, we would parse Syntex queries with Clang, then unify the parsed queries and parsed source code to identify matches. Unfortunately, life is not so easy: Syntex queries lack the necessary syntactic/semantic context that would allow them to be parsed. For example, the pattern foo(0) yields different parse trees depending on the type of foo.

Syntex doesn’t directly resolve the edge cases of C and C++ syntax; instead, it considers all possible ambiguous interpretations while parsing queries. However, instead of defining the ambiguous language patterns itself, Syntex derives its pattern grammar from the Clang compiler’s AST. Using this approach, we can guarantee that patterns will be accepted for every construct appearing in the indexed source code.

Synthesizing the grammar

Parse tree of a simple declaration

At code building and indexing time, Syntex creates a context-free grammar by recursively walking through the Clang AST and recording the relationships between AST nodes. Nodes with children correspond to non-terminals; each appearance of such a node adds a production rule of the form parent -> child_0 ... child_n. Nodes with no children become terminal symbols in the generated grammar. For instance, the grammar (production rules and terminals) corresponding to the AST in figure 1 is as follows:

DECL_REF_EXPR           -> IDENTIFIER
INTEGER_LITERAL         -> NUMERIC_CONSTANT
IMPLICIT_CAST_EXPR      -> DECL_REF_EXPR
BINARY_OPERATOR         -> IMPLICIT_CAST_EXPR PLUS IMPLICIT_CAST_EXPR
PARENT_EXPR             -> L_PARENTHESIS BINARY_OPERATOR R_PARENTHESIS
BINARY_OPERATOR         -> PAREN_EXPR SLASH INTEGER_LITERAL
VAR                     -> KEYWORD_INT IDENTIFIER EQUAL BINARY_OPERATOR 
DECL_STMT               -> VAR SEMI

IDENTIFIER, NUMERIC_CONSTANT, PLUS, L_PARENTHESIS, R_PARENTHESIS, SLASH,
KEYWORD_INT, EQUAL, SEMI 

If we interpret DECL_STMT as the “start symbol” of this grammar, then the grammar is deterministic, and a parser that accepts strings could be generated with the commonly used LR algorithm. However, when parsing search queries, Syntex doesn’t actually know the start symbol that the query should reduce to. For example, if the query consists of an IDENTIFIER token, then Syntex can parse that token as an IDENTIFIER, a DECL_REF_EXPR containing an identifier, or an IMPLICIT_CAST_EXPR containing a DECL_REF_EXPR containing an identifier. This means that, in practice, Syntex assumes that every symbol could be a start symbol and retroactively deduces which rules are start rules based on whether they cover the entire input query.

Parsing Syntex queries

Conceptually, the first step in parsing a query is to perform tokenization (or lexical analysis). Syntex performs tokenization using a hand-coded state machine. One difference between Syntax’s tokenizer and those used in typical compilers is that Syntex’s tokenizer returns all possible interpretations of the input characters instead of just the greediest interpretation. For example, Syntex would tokenize the string ”<<“ as both << and two < symbols following each other. That way, the tokenizer doesn’t have to be aware of which interpretation is necessary in which context.

Syntex parses queries against synthetic pattern grammars using a memoizing chart parser. Memoization prevents the parsing process from resulting in (potentially) exponential runtime complexity, and the resulting memo table serves as the in-memory representation of a query parse forest. The matcher (described in the next section) uses this table to figure out which indexed ASTs match the query. This approach means that Syntex doesn’t have to materialize explicit trees for each possible interpretation of a query. Figure 2 presents a memoization table for the query string “++i”.

A memo table for query “++i.”

This table shows that the string at index 0 can be interpreted as the tokens + or ++ and that the production rule UNARY_OPERATOR -> PLUS_PLUS DECL_REF_EXPR is matched at this index. To obtain that match, the parser, after seeing that the left corner of the production above can match PLUS_PLUS, recursively obtains the parses at index 2. With this knowledge, it can enumerate the production forward and conclude that it matches in its entirety up until index 3.

Finding matches

After the parse table of a query is filled, Syntex needs to locate appearances of all interpretations of the query in the indexed source code. This process starts with all entries in the table at index 0 whose next index is the length of the input; these entries correspond to parses covering the entire input string.

Syntex’s matching algorithm operates on a proprietary Clang AST serialization format. Serialized ASTs are deserialized into in-memory trees. The deserialization process builds an index of tree node types (corresponding to grammar symbols) to deserialized nodes, which enables Syntex to quickly discover candidates that could match against a query’s root grammar symbol. A recursive unification algorithm is applied pairwise to each match candidate and each possible parse of the query. The algorithm descends the trees, checking node types and the order in which they appear, and bottoms out at the actual tokens themselves.

For the query “++i” in figure 2, Syntex starts matching at an AST node with the symbol UNARY_OPERATOR. In this case, we know that the only way to produce such a node is to use the rule body PLUS_PLUS DECL_REF_EXPR. First, the matcher makes sure the aforementioned AST node has the right structure: there are two child nodes, a PLUS_PLUS and a DECL_REF_EXPR. Then, Syntex recursively repeats the same process for those nodes. For example, for the PLUS_PLUS child, Syntex ensures that it’s a token node with the spelling “++”.

Additional features and example uses

An important feature of Syntex, briefly shown in the length truncation example above, is that it supports “meta variables” in queries. For instance, when a query such as “++i” is specified, the matcher will find only expressions incrementing variables called i. However, if we were to specify “++$” as the query, then Syntex will find expressions that increment anything. Meta variables can also be named, such as in the query “++$x”, allowing the client to retrieve the matching sub-expression by name (x) for further processing. Furthermore, Syntex allows constraints to be applied to meta variables, such as in the query “++$x:DECL_REF_EXPR”; with these constraints, Syntex would match only increments to expressions x referencing a declaration. In-query constraints are limited in expressivity, which is why the C++ API allows arbitrary C++ functions to be attached as predicates that decide to accept or reject potentially matching candidates.

Another important feature, also shown in the length truncation example above, is the globbing operator “$...”. It allows users to specify queries such as “printf($...)”. The glob operator is useful when one wants to match an unknown number of nodes. Glob operator semantics are neither greedy nor lazy. This is in part because of the non-traditional nature of Syntex-generated grammars: where a hand-coded grammar might condense list-like repetition via recursive rules, Syntex grammars explicitly represent each observed list length via a different rule. Thus, a call to printf with one argument is matched by a different rule than a call to printf with five arguments. Because Syntex can “see” all of these different rules of different lengths, it’s able to express interesting patterns with globbing, such as “printf($... i $...)”, which will find all calls to printf with i appearing somewhere in the argument list.

Parting thoughts

Syntex’s approach is unique among traditional syntactic pattern searching tools: the search engine contains very little language-specific code and easily generalizes to other programming languages. The only requirement for using Syntex is that it needs to have access to the tokens that produced each AST node. In my prototype, the C and C++ ASTs are derived from PASTA.

Syntex has already exceeded the capabilities of open-source and/or publicly available alternatives, such as Semgrep and weggli. Syntex isn’t “done,” though. The next step is to develop Syntex so that it searches through source code text that doesn’t quite exist. One of the most powerful features of the C++ language is its templates: they allow programmers to describe the shape of generic data structures and the computations involving them. These templates are configured with parameters that are substituted with concrete types or values at compile time. This substitution, called “template instantiation,” creates variants of code that were never written. In future versions of Syntex, we plan to make C++ template instantiations syntax-searchable. Our vision for this feature relies on Clang’s ability to pretty print AST nodes back to source code, which will provide us with the code needed for our grammar-building process.

Last but not least, I would like to thank Trail of Bits for providing the opportunity to tackle such an interesting research project during my internship. And I would like to thank Peter Goodman for the project idea and for mentoring me throughout the process.

What child is this?

A Primer on Process Reparenting in Windows

By Yarden Shafir

Process reparenting is a technique used in Microsoft Windows to create a child process under a different parent process than the one making the call to CreateProcess. Malicious actors can use this technique to evade security products or break process ancestry ties, making detection more challenging. However, process reparenting is also used legitimately across the operating system, for example during execution of packaged or store applications. Like many features, process reparenting can confuse both security products and security teams, leading to either missed detections or false positives on otherwise-innocent applications. This blog post will look at how to investigate this interesting behavior.

Process Monitor and the Incorrect Stack Trace

Lately I was playing around with the Windows Terminal and the way it runs and operates (something I might write about more in a future blog post). I ran the Windows Terminal through the Windows start menu and recorded its execution with Process Monitor (ProcMon), a SysInternals tool that records the execution, file system, registry, and network operations of a process. When I looked at the recording, I noticed something strange:

According to ProcMon, explorer.exe is starting the terminal process. This makes sense, as explorer.exe is generally the parent process of many user applications. But a close look at the call stack reveals some gaps: Frames 8 and 9 have no symbols and don’t even show a module name. Many would assume this is a shellcode: dynamic memory running from the heap, outside of a regular module. We can investigate this possibility using a debugger or a tool like Process Hacker (now known as System Informer). The output of Process Hacker is shown below.

The memory range to which these stack frames point isn’t mapped at all. So either this is an especially sneaky shellcode and I should recheck my system for yet another nation-state attack, or there is a different explanation.

To get to the root cause, I turn to the (almost) always-reliable debugger: WinDbg. We’ll use a kernel debugger to track the process creation of the Windows Terminal and observe the data on which ProcMon operates, which should give some indication about what’s really going on.

First, let’s start a recording session with ProcMon, which makes it load its kernel driver and register a process notify routine. Many Endpoint Detection and Response (EDR) systems and system monitoring tools use this callback to get notified about process creation and termination. To follow ProcMon’s steps, we’ll set a breakpoint on this callback and see what happens.

The list of process creation callbacks is saved in an unexported kernel symbol called PspCreateProcessNotifyRoutine. Unfortunately, the callbacks themselves are saved in a data structure that isn’t available in the public symbols, so parsing them can be a bit of a pain. But the structure itself is sufficiently well known and stable that we can use hard-coded offsets to parse it. I wrote a simple one-line script to print all the registered callbacks (many other examples are available). If you’re using the newest version of WinDbg, you can even use the new symbol builder to push the structure and use it as if it were available in the symbols!

Running this script, we can easily find ProcMon’s process callback:

dx ((__int64(*)[64])&nt!PspCreateProcessNotifyRoutine)->Where(p => p)->Select(p => 
(void(*)())(*(((__int64*)(p & ~0xf)) + 1)))
((__int64(*)[64])&nt!PspCreateProcessNotifyRoutine)->Where(p => p)->Select(p => 
(void(*)())(*(((__int64*)(p & ~0xf)) + 1)))                
    [0]              : 0xfffff80673f78900 : cng!CngCreateProcessNotifyRoutine+0x0 
[Type: void (*)()]
    [1]              : 0xfffff80674b29f50 : WdFilter+0x49f50 [Type: void (*)()]
    [2]              : 0xfffff80673dbb4b0 : ksecdd!KsecCreateProcessNotifyRoutine+0x0 
[Type: void (*)()]
    [3]              : 0xfffff8067510db70 : tcpip!CreateProcessNotifyRoutineEx+0x0 
[Type: void (*)()]
    [4]              : 0xfffff8067561d990 : iorate!IoRateProcessCreateNotify+0x0 
[Type: void (*)()]
    [5]              : 0xfffff80673eea160 : CI!I_PEProcessNotify+0x0 [Type: void 
(*)()]
    [6]              : 0xfffff80678d6a590 : dxgkrnl!DxgkProcessNotify+0x0 [Type: void 
(*)()]
    [7]              : 0xfffff8068184acf0 : peauth+0x3acf0 [Type: void (*)()]
    [8]              : 0xfffff80681b36400 : PROCMON24+0x6400 [Type: void (*)()]

The next step is setting a breakpoint on this callback, resuming the machine’s execution, and running the Windows Terminal from the start menu:

bp 0xfffff80681b36400; g

And our breakpoint gets hit!

1: kd> g
Breakpoint 0 hit
PROCMON24+0x6400:
fffff806`81b36400 4d8bc8          mov     r9,r8

To get more insight into what ProcMon sees, we should parse the function arguments. I’ll skip a couple of reverse engineering steps (if I don’t, this post will just keep on going forever) and simply let you know that on modern systems, ProcMon registers its process notify routine using PsSetCreateProcessNotifyRoutineEx2. This matters because different versions of the process-notify routine receive slightly different arguments. In this case, the routine has the type PCREATE_PROCESS_NOTIFY_ROUTINE_EX:

 
void PcreateProcessNotifyRoutineEx (
  [_Inout_]           PEPROCESS Process,
  [in]                HANDLE ProcessId,
  [in, out, optional] PPS_CREATE_NOTIFY_INFO CreateInfo
)

With this knowledge, we can use the debugger data model to present the arguments with the correct types, just as the driver sees them. There’s only one issue: PS_CREATE_NOTIFY_INFO isn’t included in the public symbols, so we don’t have easy access to it. It is, however, included in the public ntddk.h header, so we can simply copy the structure definition (with minimal adjustments) into a separate header and use it in the debugger through Synthetic Types. To that end, let’s create the header file under c:\temp\ntddk_structs.h:

typedef struct _PS_CREATE_NOTIFY_INFO {
    ULONG64 Size;
    union {
        _In_ ULONG Flags;
        struct {
            _In_ ULONG FileOpenNameAvailable : 1;
            _In_ ULONG IsSubsystemProcess : 1;
            _In_ ULONG Reserved : 30;
        };
    };
    HANDLE ParentProcessId;
    _CLIENT_ID CreatingThreadId;
    _FILE_OBJECT *FileObject;
    _UNICODE_STRING *ImageFileName;
    _UNICODE_STRING *CommandLine;
    ULONG CreationStatus;
} PS_CREATE_NOTIFY_INFO, *PPS_CREATE_NOTIFY_INFO;

Next, let’s load it into the debugger through synthetic types:

dx Debugger.Utility.Analysis.SyntheticTypes.ReadHeader("c:\\temp\\ntddk_structs.h", 
"nt")
Debugger.Utility.Analysis.SyntheticTypes.ReadHeader("c:\\temp\\ntddk_structs.h", 
"nt")                 : ntkrnlmp.exe(ntddk_structs.h)
    ReturnEnumsAsObjects : false
    RegisterSyntheticTypeModels : false
    Module           : ntkrnlmp.exe
    Header           : ntddk_structs.h
    Types           

(Side note: Try not to make any mistakes in your header files or you’ll have to restart the debugger session to reload the fixed version of the structure. It’s not currently possible to unload or reload header files, so the only options are to reload a separate header file with a differently named structure, or to restart the debugger session and try again.)

Once the header is loaded, we have everything we need to format the input arguments with the correct types:

dx @$procNotifyInput = new { Process = (nt!_EPROCESS*)@rcx, ProcessId = @rdx, 
CreateInfo = 
Debugger.Utility.Analysis.SyntheticTypes.CreateInstance("_PS_CREATE_NOTIFY_INFO",
 @r8) }
dx @$procNotifyInput = new { Process = (nt!_EPROCESS*)@rcx, ProcessId = @rdx, 
CreateInfo = 
Debugger.Utility.Analysis.SyntheticTypes.CreateInstance("_PS_CREATE_NOTIFY_INFO", 
@r8) }
@$procNotifyInput = new { Process = (nt!_EPROCESS*)@rcx, ProcessId = @rdx, CreateInfo
 = Debugger.Utility.Analysis.SyntheticTypes.CreateInstance("_PS_CREATE_NOTIFY_INFO", 
@r8) }                
    Process          : 0xffffae0f92e0f0c0 [Type: _EPROCESS *]
    ProcessId        : 0x197c [Type: unsigned __int64]
    CreateInfo 

With this, we can look further into CreateInfo to gain more information about this new process—and more importantly, who created it:

dx @$procNotifyInput.CreateInfo
@$procNotifyInput.CreateInfo                
    Size             : 0x48
    Flags            : 0x1
    FileOpenNameAvailable : 0x1
    IsSubsystemProcess : 0x0
    Reserved         : 0x0
    ParentProcessId  : 0x1738 [Type: void *]
    CreatingThreadId [Type: _CLIENT_ID]
    FileObject       : 0xffffae0f90ac7d70 : "\Program Files\WindowsApps\Microsoft.WindowsTerminal_1.15.2713.0_x64__8wekyb3d8bbwe\WindowsTerminal.exe"
 - Device for "\FileSystem\Ntfs" [Type: _FILE_OBJECT *]
    ImageFileName    : 0xffffd28d2a447578 : "\??\C:\Program Files\WindowsApps\Microsoft.WindowsTerminal_1.15.2713.0_x64__8wekyb3d8bbwe\WindowsTerminal.exe" [Type: _UNICODE_STRING *]
    CommandLine      : 0xffffae0f92c5b070 : ""C:\Program Files\WindowsApps\Microsoft.WindowsTerminal_1.15.2713.0_x64__8wekyb3d8bbwe\WindowsTerminal.exe" " [Type: _UNICODE_STRING *]
    CreationStatus   : 0x0

dx @$procNotifyInput.CreateInfo.CreatingThreadId
@$procNotifyInput.CreateInfo.CreatingThreadId                 [Type: _CLIENT_ID]
    [+0x000] UniqueProcess    : 0x5ac [Type: void *]
    [+0x008] UniqueThread     : 0x69c [Type: void *]

First, we can now be sure that the newly created process is the Windows Terminal. And second, we can spot some interesting details about who created it. Two fields are of interest here: ParentProcessId and CreatingThreadId, the latter of which also contains a UniqueProcess field (this is the process ID of the process that owns this thread). Before we try to understand why these are different, let’s take a small step back and examine the context of the process we’re currently in. Since process-notify routines are called in the context of the process that is creating the new child process, this might explain the strange call stack we saw earlier and clarify the creation of this Terminal process.

You might be surprised by what we discover: Unlike what the ProcMon GUI showed, in the driver it seems that we are running in the context of an svchost.exe process and not explorer.exe. So it is actually svchost.exe that is creating the new Terminal process!

dx @$curprocess
@$curprocess                 : svchost.exe [Switch To]
    KernelObject     [Type: _EPROCESS]
    Name             : svchost.exe
    Id               : 0x5ac
    Handle           : 0xf0f0f0f0
    Threads         
    Modules         
    Environment     
    Devices         
    Io     

Unfortunately, this doesn’t give us the full picture. If svchost.exe is creating the new process, why does the GUI claim it is explorer.exe? What is this service, and why is it creating the Terminal process at all?

To get some more information, let’s examine the call stack:

 #   Child-SP            RetAddr               Call Site
00 ffffd28d`2a446bd8 fffff806`731bacc2     PROCMON24+0x6400
01 ffffd28d`2a446be0 fffff806`730993a5     nt!PspCallProcessNotifyRoutines+0x206
02 ffffd28d`2a446cb0 fffff806`7308cec0     nt!PspInsertThread+0x639
03 ffffd28d`2a446d80 fffff806`72e39375     nt!NtCreateUserProcess+0xe10
04 ffffd28d`2a447a30 00007ff8`29185514     nt!KiSystemServiceCopyEnd+0x25
05 0000005a`52c7c308 00007ff8`268c8648     ntdll!NtCreateUserProcess+0x14
06 0000005a`52c7c310 00007ff8`268eea13     KERNELBASE!CreateProcessInternalW+0x2228
07 0000005a`52c7dc50 00007ff8`277bba80     KERNELBASE!CreateProcessAsUserW+0x63
08 0000005a`52c7dcc0 00007ff8`0cd1239e     KERNEL32!CreateProcessAsUserWStub+0x60
09 0000005a`52c7dd30 00007ff8`0cd131f1     appinfo!AiLaunchProcess+0x69e
0a 0000005a`52c7e5b0 00007ff8`27633803     appinfo!RAiLaunchProcessWithIdentity+0x901
0b 0000005a`52c7ec00 00007ff8`275c280a     RPCRT4!Invoke+0x73
0c 0000005a`52c7ece0 00007ff8`276169f2     RPCRT4!NdrAsyncServerCall+0x2ba
0d 0000005a`52c7edf0 00007ff8`275d324f     RPCRT4!DispatchToStubInCNoAvrf+0x22
0e 0000005a`52c7ee40 00007ff8`275d2e58     RPCRT4!RPC_INTERFACE::DispatchToStubWorker+0x1af
0f 0000005a`52c7ef20 00007ff8`275e2995     RPCRT4!RPC_INTERFACE::DispatchToStubWithObject+0x188
10 0000005a`52c7efc0 00007ff8`275e1fe7     RPCRT4!LRPC_SCALL::DispatchRequest+0x175
11 0000005a`52c7f090 00007ff8`275e166b     RPCRT4!LRPC_SCALL::HandleRequest+0x837
12 0000005a`52c7f190 00007ff8`275e1341     RPCRT4!LRPC_SASSOCIATION::HandleRequest+0x24b
13 0000005a`52c7f210 00007ff8`275e0f77     RPCRT4!LRPC_ADDRESS::HandleRequest+0x181
14 0000005a`52c7f2b0 00007ff8`275e7559     RPCRT4!LRPC_ADDRESS::ProcessIO+0x897
15 0000005a`52c7f3f0 00007ff8`29102160     RPCRT4!LrpcIoComplete+0xc9
16 0000005a`52c7f480 00007ff8`290f6e48     ntdll!TppAlpcpExecuteCallback+0x280
17 0000005a`52c7f500 00007ff8`277b54e0     ntdll!TppWorkerThread+0x448
18 0000005a`52c7f7f0 00007ff8`290e485b     KERNEL32!BaseThreadInitThunk+0x10
19 0000005a`52c7f820 00000000`00000000     ntdll!RtlUserThreadStart+0x2b

Now this is getting interesting. Look at the user-mode stack (starting from frame 5) and compare it to the user-mode stack seen in ProcMon–they look nearly identical. And the two missing frames seem to belong inside appinfo.dll. So what is happening?

To answer that, we’ll go back to our CreateInfo data and the Parent vs. Creator process issue. We’ll use the process list to find which process each of these IDs represent:

dx @$parentProcessId = @$procNotifyInput.CreateInfo.ParentProcessId
@$parentProcessId = @$procNotifyInput.CreateInfo.ParentProcessId : 0x1738 [Type: void *]

dx @$creatorProcessId = @$procNotifyInput.CreateInfo.CreatingThreadId.UniqueProcess
@$creatorProcessId = @$procNotifyInput.CreateInfo.CreatingThreadId.UniqueProcess : 0x5ac [Type: void *]

dx @$cursession.Processes[@$parentProcessId]
@$cursession.Processes[@$parentProcessId]                 : explorer.exe [Switch To]
    KernelObject     [Type: _EPROCESS]
    Name             : explorer.exe
    Id               : 0x1738
    Handle           : 0xf0f0f0f0
    Threads         
    Modules         
    Environment     
    Devices         
    Io  

dx @$cursession.Processes[@$creatorProcessId]
@$cursession.Processes[@$creatorProcessId]                 : svchost.exe [Switch To]
    KernelObject     [Type: _EPROCESS]
    Name             : svchost.exe
    Id               : 0x5ac
    Handle           : 0xf0f0f0f0
    Threads         
    Modules         
    Environment     
    Devices         
    Io   

The Creator process ID seems to belong to the same svchost.exe whose context we’re currently in, so this is the process creating the Terminal process (and the one whose call stack is shown in ProcMon). But the parent process is explorer.exe, which is the reason ProcMon is displaying it as the creator process—although it doesn’t consider the case where the creator process and the parent process are different, causing this call stack to be incorrectly linked to explorer.exe.

Process Reparenting, Explained

The mechanism we’re seeing here is called process reparenting. When creating a process, the creator can set a PROC_THREAD_ATTRIBUTE_PARENT_PROCESS attribute and include the handle to a different process, which will be used as the parent process. This mechanism has various uses across the system, such as creating a process in a different session than the creator process. To have a logical process tree, as well as for technical reasons, svchost.exe must reparent the child process to a different parent in session 1 (such as explorer.exe) in order to allow the child process to use the console and the UI. This mechanism can also be used to hide the actual origin of processes and confuse EDRs.

ProcMon misinterprets the data it receives by not checking to see if the process requesting the process creation is the same one as the requested parent, causing the incorrect stack we observed. However, by using a kernel driver and process creation notifications, we can have all the data necessary to tell if a process is being reparented. In fact, we can also do this from user mode, through the Microsoft-Windows-Kernel-Process ETW channel. This channel is not enabled by default, but you can register as a consumer and receive events, or use logman.exe to generate a trace and view it in Event Viewer. Note that these traces were run on a different system, so the PIDs are unrelated to the ones seen earlier in the post:

Event ID 1, ProcessStart, is the one we care about. The parsed data shown to us by the “general” description isn’t too helpful, as it will still point to the reparented process as the “parent.” However, the raw data in the event includes a third field that tells us more:

Here we see, side by side, two process creation events. In the raw data are three helpful fields:

  • System.Execution.ProcessID: The ID of the process (and thread) that requested the creation of the new process
  • EventData.ProcessID: The ID of the newly created child process
  • EventData.ParentProcessID: The ID of the process that was chosen as the parent

If the creating process ID is identical to the parent process ID (on the left side), this process wasn’t reparented. But if the two PIDs are not identical (on the right side), then this process was reparented and we get the IDs of both the creator process and the chosen parent!

We’re still processing

At this point, we’ve investigated process reparenting and the strange behavior we saw in ProcMon. Of course, this still doesn’t fully explain the mechanism behind the creation of the Terminal process, the service creating it, and the appinfo DLL. That all relates to the behavior and implementation of packaged applications, which is a whole other topic. For those who might be curious about the creation mechanism, you can find more information about that here, and I might add some more details (and debugging tips) in a future blog post.

How I gave ManticoreUI a makeover

By Calvin Fong

During my internship at Trail of Bits, I explored the effectiveness of symbolic execution for finding vulnerabilities in native applications ranging from CTF challenges to popular open source libraries like image parsers, focusing on finding ways to enhance ManticoreUI. It is a powerful tool that improves accessibility to symbolic execution and vulnerability discovery, but its usability and efficiency leave much room for improvement. By the end, I implemented new ManticoreUI features that reduce analysis time through emulation, improved shared library support, and enabled symbolic state bootstrapping from GDB to side-step complex program initialization. With these new features, I found and reported a vulnerability in the DICOM Toolkit (DCTMK), which is a widely deployed set of libraries used in medical imaging!

The current state of ManticoreUI

Manticore is a symbolic execution engine that emulates applications with symbolic data, as opposed to concrete data. This allows Manticore to test all possible execution paths of its targets. ManticoreUI (MUI) is a graphical user interface plug-in for Binary Ninja that exposes the features of Manticore to users in a simpler way with helpful graphical elements. Its design allows users to reap the benefits of symbolic execution without having to worry about the nitty-gritty of the Manticore API.

An example of the GUI.

One of my goals was to improve MUI’s user experience for finding vulnerabilities. I spent some time using MUI in CTF challenges, on artificially created vulnerable code samples, and on some small real-world targets. From this, I determined three general directions for improvement:

  • I realized that many non-default features were not obvious to new or inexperienced Manticore users. These features were sometimes implemented in code but not covered in the documentation.
  • I also noticed that real-world software targets were significantly more challenging to approach than smaller samples like CTF challenges. CTF challenges tend to be small command-line applications that typically receive input from standard input. However, there are many application types in the real world, including network services, daemons, and libraries. And the MUI user experience was very different for each type.
  • Lastly, when testing software that processes large inputs, like format parsers with big iterating loops or complex C++ binaries, MUI’s emulation was obviously much slower than the execution speed of a real CPU.

Exposing useful features through ManticoreUI

To address the first improvement area, I made two of MUI’s useful features—function models and global hooks—more obvious to users.

Function models

Function models are Python re-implementations of common library functions with awareness of Manticore’s symbolic execution engine. These models override actual library functions during symbolic execution. This improves performance because Manticore does not have to emulate each native instruction individually.

ManticoreUI now prompts when there are library functions that could be substituted with an existing function model implementation, as shown below:

Functions with function model implementations shown during startup

The Add Function Model command allows users to add a custom hook at the function address to use the function model instead of native code.

Function model selection pop-up

Global Hooks

Global hooks are another less obvious functionality. These are custom hooks that are triggered for every instruction that gets executed. They can be useful for implementing user-defined tracing functionality, like tracing every syscall that occurs (similar to strace). Alternatively, they can help with performing checks not bound to specific instructions (e.g., a global hook that ends the Manticore run when the RAX register has the value 0xdeadbeef). They can be added using the Add/Edit Global Hook command.

Global hook management pop-up

Improving the workflow for bug discovery

To address the second and third improvement areas, I implemented new MUI features that facilitate the bug discovery process. The emulate_until feature increases the performance of MUI, while shared library support and gdb state dumping improved MUI’s usability in complex targets. These features are described in greater depth below.

emulate_until

The emulate_until feature is an additional MUI solve option. Setting this value to an address will make Manticore use the Unicorn emulator to concretely emulate your target binary until it reaches the address specified. The Unicorn emulator is far faster than Manticore’s own emulated CPU, which greatly improves execution speed.

Emulate_until field in the Manticore run options

I noticed this feature was very useful for C++ binaries, which execute more instructions during initialization. When we symbolically executed a simple benchmark with a hello world C++ binary on an Ubuntu 20.04 machine, we observed the following run times:

Default emulate_until to main
Total Duration /s 311 seconds 12 seconds

Evidently, using Unicorn emulation with the emulate_until option causes significant performance benefits for even the simplest C++ binaries.

Shared library support

In vulnerability discovery, we commonly test underlying libraries of an application rather than a full application itself. Such workflows usually involve a simple harness binary that loads the library and calls library functions to be tested. Since MUI supported loading and setting hooks in only a single binary, the use of a harness binary with the shared library was a troublesome workflow for MUI.

With this new feature, users can separately load the shared library in MUI and set up all necessary hooks. Then, they can load the harness binary in MUI and link the Binary Ninja project file of the shared library. During execution, all hooks set in the shared library’s project will be resolved and added to the runtime accordingly.

While not yet implemented, this feature would be well suited for the Ghidra MUI Plugin. Binary Ninja projects contain only single binaries, while Ghidra projects can contain multiple binaries. This feature would enable a more convenient workflow for vulnerability discovery in Ghidra.

GDB state dumping

I ran into various issues while testing MUI with different targets, including unsupported system calls, unimplemented instructions, and applications that were too complicated to interact with through MUI/Manticore. I also frequently encountered situations where testing the entire application symbolically would lead to state explosion (i.e., too many forked states).

This led me to begin exploring the idea of limiting the use of Manticore’s execution engine. For example, rather than trying to symbolically execute from the start of the application, what about starting execution from a function of interest? This would still be very helpful when looking for vulnerabilities within a small subset of functions, and it reduces emulation issues by limiting the amount of code that Manticore has to symbolically execute.

I developed a GDB plugin for GEF that allows the user to dump the debugger’s state as a Manticore state object. This is stored in a file on the system that can later be loaded into MUI/Manticore to be used as the initial state of execution. This plugin dramatically increases the possibilities for MUI!

For example, network services that were usually hard to fully emulate in Manticore can now be run normally and attached to a debugger. Users can then dump the state from a breakpoint of choice and load that state into MUI to begin symbolic execution. This process allows MUI to be used with all sorts of complex targets.

This method has similarities with two other techniques: under-constrained symbolic execution and concolic execution. However, it is certainly the most “constrained” of the three methods. This is not necessarily a bad thing, but users must exercise judgment to determine which technique best suits their use case. One key weakness in using states from GDB is that injecting symbolic values requires a greater understanding of the current program state. For instance, if you replaced a variable value with a totally unconstrained symbolic value because you did not analyze certain if-else checks earlier in the program, Manticore may give inaccurate results.

Finding a vulnerability with the help of MUI

With MUI in my arsenal, I was determined to find a vulnerability while using the power of symbolic execution. My goal was not to find a vulnerability entirely through symbolic execution. Instead, I hoped to use MUI/Manticore as an oracle that could inform me about reachability and execution constraints, complementing traditional bug hunting methodologies like source auditing.

The codebase I targeted was the DICOM ToolKit (DCTMK). DCMTK is a set of libraries and utilities for working with the DICOM standard. Because DICOM files are usually used for medical imaging, DCMTK is used in software that handles medical products or data.

Rapidly assessing reachability with Manticore

I began by examining the source code with a focus on certain vulnerability sinks like memory accesses or memory allocations. When I discovered sinks that could lead to a vulnerability, I then relied on MUI to determine if the target code was reachable and if conditions for memory corruption could be created.

While reading the code for parsing BMP images, I noticed the following vulnerability sink:

// dcmdata\libi2d\i2dbmps.cc:330 
OFCondition I2DBmpSource::readBitmapData( 
  const Uint16 width, 
  const Uint16 height, 
  const Uint16 bpp, 
  const OFBool is TopDown, 
  const OFBool isMonochrome, 
  const Uint16 colors, 
  const Uint32* palette,
  char * & pixData, 
  Uint32& length) 
{
  …
  Uint8 *row_data; 
  …
  Uint16 samplesPerPixel = isMonochrome ? 1 : 3; 
  …
  length = width * height * samplesPerPixel;     // [1]
  …
  pixData = new char [length];                   // [2]
}

At [1] we see that the length variable is set to the product of width, height, and samplesPerPixel. This length is then used as the size to allocate a char buffer at [2]. This is a common sink for integer overflow vulnerabilities. If the product of width, height, and samplesPerPixel is sufficiently unbounded to overflow the capacity of length, it could make the allocation at [2] too small.

Since width and height are user-controlled, I wanted to determine if the user could provide any combination of values that would lead to an integer overflow. This is where MUI came into play! I used MUI as an oracle to determine if there could be an integer overflow of length, given the bounds of width, height, and samplesPerPixel.

A quick look through the code revealed that samplesPerPixel was maximally set to 3 for colored images. Additionally, width and height were limited to the range of unsigned 16-bit integers:

 if (tmp_height <= 0 || tmp_height > OFstatic_cast (Sint 32, UINT16_MAX)) 
   return makeOFCondition (OFM_dcmdata, 18, OF_error, "Unsupported BMP file - height too
large or zero"); 
 ...
 if (tmp_width <= 0 || tmp_width > OFstatic_cast (Sint 32, UINT16_MAX)) 
   return makeOFCondition (OFM_dcmdata, 18, OF_error, "Unsupported BMP file - width too
large or zero"); 

Using the GDB state dumping plugin, I set a breakpoint on the I2DBitmapSource::readBitmapData function and reached the breakpoint with a simple BMP image of a snail. By dumping the debuggee environment into a Manticore state, I could then load the state into MUI. The following video demonstrates this process:

With the state loaded in MUI, I could set the width and height to symbolic values. Using custom hooks, I forced Manticore to solve for a state where the integer overflow occurs. Manticore would use a sat-solver to determine if such a state was possible, allowing us to thoroughly verify the validity of this bug.

After running, I got the following values:

Results screen displaying the solved width and heights

This meant that a crafted BMP image with the width and heights specified in the above image could create a situation where length was too small, causing an undersized allocation. Running the binary in GDB with this exploit image immediately led to a crash and a successful bug discovery!

Patch

Within a few hours of informing the vendors, they introduced a patch to fix the vulnerability. This was a very pleasant security response!

A successful revamp

I’m very happy with the progress I’ve made over the course of this internship, and I think the improved analysis performance, support for shared libraries, and side-step complex application initialization with GDB have molded ManticoreUI into a better tool for aiding vulnerability discovery that nicely complements traditional bug hunting methodologies.

Through this internship, I’ve learned a lot about the applications of symbolic execution in the security field, and I’m excited to see how it will continue to develop. Beyond symbolic execution, I had the opportunity to improve my skills in software development through working on the different components of ManticoreUI.

I’m very grateful for the help my mentor, Eric Kilmer, provided throughout the internship. He gave me guidance for the direction of the project and invaluable feedback to improve on the code and ideas I contributed. This internship was surely a memorable and fruitful experience for me.

Manticore GUIs made easy

By Wong Kok Rui, National University of Singapore

Trail of Bits maintains Manticore, a symbolic execution engine that can analyze smart contracts and native binaries. While symbolic execution is a powerful technique that can augment the vulnerability discovery process, it requires some base domain knowledge and thus has its own learning curve. Given the plethora of ways in which a user can interact with such an engine and the need for frequent context switching between a disassembler and one’s terminal or script editor, integrating symbolic execution into one’s workflow can be daunting for a beginner.

One of the ways Trail of Bits has sought to ease this process is by making graphical user interfaces (GUIs) for Manticore that are embedded in popular interactive disassemblers. Last summer, former intern Alan Chang worked on the first such interface, the Manticore User Interface (MUI) plugin for Binary Ninja. We found that pairing Manticore directly with an interactive disassembler provides vulnerability researchers with a more convenient way to actually use (and benefit from) symbolic execution. Therefore, during my winter and summer internships at Trail of Bits, my goal was to help grow the MUI ecosystem by making a MUI plugin for Ghidra and building infrastructure to make these plugins easier to use, maintain, and develop.

The gRPC server–based architecture that MUI plugins use.

The Ghidra Plugin

We figured that the most direct way to encourage more people to use MUI plugins would be to simply develop MUI plugins for a greater variety of disassemblers! Thus, I spent my winternship developing a Ghidra version of the MUI plugin; I chose Ghidra chiefly because it is popular and, unlike the commercial tool Binary Ninja, free and open source. Additionally, a few internal projects at Trail of Bits were already using Ghidra, so I would have ample opportunity to explore Ghidra plugin development. Finally, by developing a Ghidra plugin (one written in Java instead of Python), we could develop a solution that wouldn’t be tied to a single programming language, gaining insight that could guide the development of future plugins.

This initial Ghidra plugin mimicked the existing Binary Ninja plugin as closely as possible. While it took a bit of time to become familiar with Java Swing and Ghidra’s widgets, simply mimicking the existing visual components and user interface was a fairly trivial task once I got going.

A side-by-side comparison of the run dialogs of Binary Ninja and Ghidra.

However, because the Ghidra plugin would be written in Java, it could not depend on the Manticore Python package or directly call Manticore’s Python API. Our solution to that challenge was to use a tool called shiv to seamlessly bundle the Manticore library and all of its dependencies into a Python zipapp. That way, we could create a “batteries-included” Manticore binary and then translate the Binary Ninja plugin’s interactions with the Manticore API into the appropriate command-line arguments. We then placed this binary in the relevant platform-specific subdirectories of Ghidra’s os directory, which facilitates cross-platform support.

By the end of the winternship, I was able to add extra features to the Ghidra plugin, such as the ability to specify arbitrary Manticore arguments in addition to those with dedicated input fields and support for multiple Manticore instances in the same Ghidra session. This, however, brought to light an additional problem.

Feature parity and cross-disassembler development

It quickly became apparent that our approach to plugin development would not be sustainable if we aspired to expand the MUI project to support even more disassemblers. For each new MUI feature, we would first have to determine how to implement the feature, accounting for the way that the plugin interacts with Manticore (e.g., through direct calls to the Manticore API or through Manticore’s command-line interface options). Furthermore, certain front-end information shared across plugins (e.g., fixed description strings or sensible default options) would have to be repeated and standardized in each implementation.

To address this problem, over the summer I developed a centralized remote procedure call (RPC) server binary for MUI. This server handles all interactions with the full-featured Manticore Python API and handles MUI functionality through individual RPCs defined in a protocol buffer. We chose to use gRPC as our RPC framework because of its performance, wide adoption, and strong support for code generation across many programming languages. As a result, MUI plugins of the future can easily contain and depend on their own gRPC-generated code.

The server is written in Python, providing it access to the full functionality of Manticore’s Python API, but is bundled into a shiv binary that can be called in any language. This facilitates a new client-server architecture that allows developers to implement any back-end Manticore functionality and tests just once. Developers of the front-end disassembler plugins can make RPC requests to the server with just a few lines of trivial code, which means that their work on individual plugins can focus almost entirely on front-end/UI changes.

To alleviate the “chore” of handling fixed strings and other front-end information that will be identical across plugins, we can store such data in JSON files that are packaged with MUI plugin releases and loaded on startup. In this way, we can standardize data such as the fields, field descriptors, and default values of the run dialogs used to start Manticore’s execution.

Fixed data can be stored in plugin-agnostic JSON files.

Demo: Developing a Feature for MUI

Let’s take a look at the process of developing a feature for MUI. Suppose that we want to enable manual state management during the runtime of a Manticore instance. Specifically, we want the ability to do the following:

  • Pause a state and resume it at a later time, which will be useful if we implement capabilities like execution tracing in the future.
  • Kill a state at our discretion. That way, if a state bypasses an avoid hook or becomes stuck in an infinite loop, we will be able to abandon it.First, we will define a new RPC and the RPC’s message formats in our protocol buffer file. The server will receive the state’s numeric ID, the Manticore instance that we’re working with, and a StateAction enum indicating whether it should resume, pause, or kill the state.

Details of the new RPC and its message formats are defined in the protocol buffer.

We’ll also need to update an existing message—ManticoreStateList. MUI plugins have state lists that display all states and the status of each state; these lists are updated via a GetStateList RPC. Because it’d be beneficial for users to see “paused” states as distinct from preexisting state statuses, we’ll add a new paused_states field to the RPC’s response message, which will contain a list of the states paused by the user.

The existing ManticoreStateList message is updated with a new paused_states attribute.

With that, we can proceed to generate the Python service interface code and mypy type stubs! During my summer internship, I used the command runner just to abstract away this work for developers, so we can run the just generate command to…just generate the required code!

Now we can move on to implementing the back-end functionality. This code is contained in a single Servicer class in which each method represents an individual RPC.

We’ll begin by validating the RPC request data. While gRPC-generated code can check that the fields provided to it are correctly typed, it cannot enforce the use of any fields or verify that fields are well formed. Thus, we’ll write our own checks to assert the validity of the request; we’ll also set an error code and error details to be returned to the requester if the request is invalid.

Code added to the back-end server validates the incoming RPC request.

Finally, we can implement the state-pausing and state-killing functionality by directly accessing Manticore’s Python API. This direct access provides us far more control than we’d have if we were interacting with Manticore through its command-line options, enabling us to create a dummy busy state or abandon a specific state, for example. If all goes well, the front-end plugin will receive an empty response, which will be populated with the default OK status code.

Once the state has been successfully processed, an empty response object is returned.

One additional benefit of our approach is that we can write tests for our RPCs without having to deal with spawning up a server and connecting to it. Instead, we can simply call the Servicer’s methods directly and pass in a new Context object, which we can later inspect for error codes.

Writing tests is as simple as directly calling the Servicer’s methods.

Once we’re done, we can use our trusty command runner to “just build” the shiv binary, which will give us a fresh server binary that we can use in our plugins.

First, we need to generate the Java service interface code, which will be based on the updated protocol buffer. gRPC maintains a Java library, grpc-java, that will handle this for us.

Then, we have to write the function that actually executes the RPC. In our plugins, we encapsulate all such “connector” functions in a single file. In the Ghidra plugin, writing a connector function involves only three steps. First, we create a StreamObserver object to asynchronously handle RPC responses. In this case, we need only implement behavior for handling error cases, since the “results” of successful ControlState RPCs will be available to the user via the GetStateList RPC. Then we build the ControlStateRequest object, populating the fields as required. Lastly, we actually execute the RPC through a method exposed by the gRPC-generated ManticoreServerStub, which conveniently handles all communication with the server for us.

The new “connector” method, written in Java, handles communication with the server.

The only thing left to do is to make the appropriate UI changes! For this feature, we can simply open a context menu by right-clicking on a state in the state list and then populate the state with the actions that call the controlState method!

The added code creates a new option in the context menu and calls the “connector” method when that new option is selected.

With the Manticore functionality handled by the gRPC server, the UI changes that we must implement on the “front ends” of the Ghidra and Binary Ninja plugins are fairly straightforward.

The “hypothetical” state management feature discussed in the demo has actually been implemented and is now part of MUI! If you’re interested in seeing all of the changes and commits, check out the pull requests for the server and the Ghidra plugin. ( These pull requests were made before a refactor of where this code lives. The code now lives in the main Manticore repository under the server directory.)

A Quick Step-by-Step Guide

To summarize, the steps for adding a new feature to MUI are as follows:

Start by modifying the MUI Server binary.

  1. Edit the protocol buffer (the ManticoreServer.proto file) if new RPCs or modifications to existing messages are required.
  2. Generate service interface code for the server script by running the just generate command.
  3. Add the functionality and the request validation code to the server script (manticore_server.py), through which one can interact with the Manticore API directly.
  4. Where applicable, write tests for the new functionality.
  5. Build the shiv server executable by running the just build command.

Then, for each front-end plugin:

  1. Use / copy over the new server binary;
  2. Generate service interface code in the programming language used by the plugin; and
  3. Make any relevant front-end changes (and, where applicable, share standardized data as part of MUI’s common resources).

Conclusion

My internships at Trail of Bits have been really fulfilling, and I’m proud of how the MUI project has progressed. Having struggled with the seemingly black-box nature of symbolic execution the first few times I tried to apply it in capture-the-flag challenges, I’m confident that the MUI project will make symbolic execution more accessible to beginners. Additionally, integration with well-established interactive disassemblers will make symbolic execution a more natural part of the vulnerability discovery process.

I’m also pleased with how the MUI development experience has progressed. Plugin development isn’t the smoothest experience, in part because the user-facing plugin installation processes aren’t designed for rapid prototyping or incremental changes. Thus, spending my winternship building the Ghidra plugin from scratch was a harrowing experience. On top of familiarizing myself with Java and picking up a new plugin development framework (both of which have their own learning curves), I spent a lot of time thinking about whether adding a certain feature would even be possible! With the new MUI server architecture, I’m now able to spend that time more productively, thinking only about how a new feature could aid in the vulnerability discovery process.

In addition to making development a less time-consuming process, the new MUI server architecture provides dev-centric features that make it a far smoother one too. These include just scripts, Gradle methods, and unit tests. In previous projects, I treated the implementation of such features as a chore; even during my internships, I began adding them to the new server only after some prodding and guidance from my mentor Eric Kilmer. That work, though, made a world of difference in my development speed, the quality of my code, and my level of frustration while debugging!