Let’s build a high-performance fuzzer with GPUs!

by Ryan Eberhardt, Stanford University

TL;DR: Can we use GPUs to get 10x performance/dollar when fuzzing embedded software in the cloud? Based on our preliminary work, we think the answer is yes!

Fuzzing is a software testing technique that supplies programs with many randomized inputs in an attempt to cause unexpected behavior. It’s an important, industry-standard technique responsible for the discovery of many security vulnerabilities and the prevention of many more. However, fuzzing well takes time, and fuzzing embedded software presents additional challenges.

Embedded platforms are not designed for the instrumentation and high-throughput computation required to find bugs via fuzzing. Without access to source code, practical fuzzing of such platforms requires an emulator, which is slow, or many physical devices, which are often impractical.

Most fuzzing approaches have used conventional CPU architectures or emulators, but we decided to use other commodity hardware to tackle this problem—in particular, GPUs. The recent boom in machine learning has driven down the off-peak price of GPUs and made massive GPU computing capacity readily available from all major cloud providers. GPUs are very good at executing tasks in parallel, and fuzzing is an easily parallelizable problem.

In this blog post, I’ll walk you through the design and implementation of this massively parallel GPU-based fuzzer. So far, we’ve implemented an execution engine that can achieve 5x more executions/second/dollar than libFuzzer—and there’s much more room for optimization.

The Pitch: Fuzzing with GPUs

Fuzzing aims to generate unexpected inputs to a program and cause undesired behaviors (e.g., crashes or memory errors). The most commonly used fuzzers are coverage-guided fuzzers, which focus on finding inputs that cause new code coverage (such as executing a function that wasn’t executed before) to explore edge cases that may crash the program.


To do this, fuzzers run many different randomized inputs through the target program. This task is easily parallelizable, as each input can be executed independently of the others.

GPUs are fairly cheap; it costs $0.11/hour for a pre-emptible Tesla T4 on Google Cloud. Also, GPUs are really good at executing many things in parallel—a Tesla T4 can context-switch between over 40,000 threads and can simultaneously execute 2,560 of them in parallel—and, as mentioned, fuzzing is a naturally parallelizable problem. Using thousands of threads, we should theoretically be able to test thousands of different inputs at the same time.

Why Hasn’t Anyone Done This Before?

In short, running code on GPUs is very different from running code on CPUs in a few critical ways.

First, a GPU cannot directly execute x86/aarch64/etc. instructions, as GPUs have their own instruction set. Our goal is to fuzz embedded software for which no source code is available. With only a binary in hand, we have no easy way of generating GPU assembly to run.

Second, a GPU has no operating system. Traditional parallel fuzzers  launch multiple processes that can execute inputs separately without interfering with other processes. If an input causes one process to crash, other processes will be unaffected. GPUs have no notion of processes or address space isolation, and any memory violation will cause the entire fuzzer to crash, so we need to find some way to isolate the parallel instances of the program being fuzzed.

Additionally, without an operating system, there’s no one home to answer system calls, which enable a program to open files, use the network, and so on. System calls must be emulated or relayed back to the CPU to be executed by the host operating system.

Finally, GPU memory is hard to manage well. GPUs have complicated memory hierarchies with several different types of memory, and each has different ease-of-use and performance characteristics. Performance is highly dependent on memory access patterns, and controlling when and how threads access memory can make or break an application. Additionally, there isn’t much memory to go around, making it even more difficult to properly manage memory layout and access patterns. Having 16GB of device memory might sound impressive, but splitting it between 40,000 threads of execution leaves each thread with a paltry 419 KiB.

Can We Build It? Yes We Can!

There are many obstacles to building a working GPU fuzzer, but none of them are insurmountable.

Executing code with binary translation

First, let’s see if we can get aarch64 binaries running on the GPU.

As mentioned, we want to fuzz embedded binaries (e.g., ARMv7, aarch64, etc.) on the GPU. NVIDIA GPUs use a different instruction set architecture called PTX (“Parallel Thread eXecution”), so we cannot directly execute the binaries we want to fuzz. A common solution to this problem is to emulate an embedded CPU, but developing a CPU emulator for GPUs would likely be an expensive investment that would perform poorly. Another alternative is to translate binaries to PTX so they can execute directly on the GPU without emulation.

Trail of Bits has developed a binary translation tool called Remill that we can use to do this. Remill “lifts” binaries to LLVM IR (intermediate representation), which can then be retargeted and compiled to any architecture supported by the LLVM project. It just so happens that LLVM supports emitting LLVM IR as PTX code, which is perfect for our purposes.

Say we have this simple example function, which sets w19 to 0, adds 5, and returns the result:

    mov w19, #0       // Store the number 0 in register w19
    add w19, w19, #5  // Add 5
    mov w0, w19       // Return the result

We can pass the bytes for these instructions to Remill, which produces LLVM IR that models the original program executing on an ARM processor:

// Simplified for brevity :)
define dso_local %struct.Memory* @_Z5sliceP6MemorymPm(%struct.Memory* readnone returned %0, i64 %1, i64* nocapture %2) local_unnamed_addr #0 {
  %4 = add i64 %1, 1
  store i64 %4, i64* %2, align 8, !tbaa !2
  ret %struct.Memory* %0

Then, with some optimizations, we can have LLVM compile the above LLVM IR to PTX assembly:

ld.param.u64  %rd1, [sub_0_param_0];
ld.param.u64  %rd2, [sub_0_param_1];
mov.u64       %rd4, 5;
st.u64        [%rd1+848], %rd4;
add.s64       %rd5, %rd2, 12;
st.u64        [%rd1+1056], %rd5;

Finally, we can load this PTX into a GPU and execute it as if we’d had access to the source code in the first place.

Managing Memory

As mentioned earlier, GPUs have no operating system to provide isolation between processes. We need to implement address space isolation so multiple instances of the fuzzed program can access the same set of memory addresses without interfering with each other, and we need to detect memory safety errors in the target program.

Remill replaces all memory access in the original program with calls to the special functions read_memory and write_memory. By providing these functions, we can implement a software memory management unit that  fills in for the missing OS functionality and mediates memory accesses.

For example, consider this function that takes a pointer and increments the integer it points to:

ldr w8, [x0]   // Load the value pointed to by the pointer
add w8, w8, #1 // Increment the value
str w8, [x0]   // Write the new value back

Remill translates this assembly into the following IR containing a read_memory call, an add instruction, and a write_memory call:

define %struct.Memory* @slice(%struct.Memory*, i64 %X8, i64* nocapture %X0_output) local_unnamed_addr #2 {
%2 = tail call i32 @__remill_read_memory_32(%struct.Memory* %0, i64 undef) #3
%3 = add i32 %2, 1
%4 = tail call %struct.Memory* @__remill_write_memory_32(%struct.Memory* %0, i64 undef, i32 %3) #3
%5 = tail call %struct.Memory* @__remill_function_return(%struct.State* nonnull undef, i64 16, %struct.Memory* %4) #2, !noalias !0
ret %struct.Memory* %5

By providing __remill_read_memory_32 and __remill_write_memory_32 functions, we can give each thread its own virtual address space. In addition, we can validate memory access and intercept invalid access before it crashes the entire fuzzer.

Remember, though, that 16GB of device memory is actually not much when shared across 40,000 threads. To conserve memory, we can use copy-on-write strategies in our MMU; threads share the same memory until one of the threads writes to memory, at which point that memory is copied. Conserving memory this way has been a surprisingly effective strategy.

Initial performance

Wonderful—we have something that works! We can take a binary program, translate it to LLVM, convert it to PTX, mix in an MMU, and run the result on thousands of GPU threads in parallel.

But how well are we meeting our goal of building a fuzzer that achieves 10x performance per dollar when compared to other fuzzers?

Evaluating fuzzers is a very tricky business, and there have been many papers published about how to effectively compare them. Our fuzzer is still too young to properly evaluate, since we are still missing critical fuzzer components such as a mutator to generate new inputs to the program. To measure executor performance only, we can look at how quickly our fuzzer runs inputs through the target program (in executions/second). By normalizing the cost of the compute hardware (GPUs are more expensive than the CPUs on which other fuzzers run), we can compare executions/second/$.

What should we fuzz in our benchmarking tests? The BPF packet filtering code from libpcap seems like a good candidate for a few reasons:

  • It implements a complicated state machine that is too difficult for humans to reason about, making it a good candidate for fuzzing.
  • BPF components have had bugs in the past, so this is a realistic target that we might want to fuzz.
  • It has no system calls (our minimal fuzzer does not yet support system calls).

Let’s write a test application that takes a packet from the fuzzer and runs a complicated BPF filter program on it:

dst host or tcp or udp or ip or ip6 or arp or rarp
or atalk or aarp or decnet or iso or stp or ipx

This test program doesn’t do a whole lot, but it does exercise complicated logic and requires a good amount of memory access.

To evaluate our fuzzer, we can compare it to libFuzzer, a fast and widely used fuzzer. This isn’t an entirely fair comparison. On one hand, libFuzzer solves an easier problem: It fuzzes using the test program’s source code, whereas our fuzzer translates and instruments a binary compiled for a different architecture. Source code is often unavailable in security research. On the other hand, libFuzzer is performing mutation to generate new inputs, which we are not doing. While obviously imperfect, this comparison will work well enough to provide order-of-magnitude estimates.

I ran this comparison using Google Compute Engine 8-core N1 instances ($0.379998/hour for non-preemptible instances at time of writing) and a Tesla T4 GPU ($0.35/hour at time of writing).

Unfortunately, our fuzzer doesn’t compare so well against libFuzzer. libFuzzer achieves 5.2M execs/s/$, and our fuzzer only achieves 361K execs/s/$.

Let’s see if we can do better…

Interleaving Memory

Before we start optimizing performance, we should profile the fuzzer to get a better sense of how it’s performing. Nvidia’s Nsight Compute profiler helps explain hardware utilization and performance bottlenecks.

From the profile, we can see that the GPU is only using 3% of its compute capacity. Most of the time, the GPU compute hardware is sitting idle, not doing anything.

This generally happens because of high memory latency: The GPU is waiting for memory reads/writes to complete. However, this isn’t happening because our fuzzer needs to access too much memory; the profile shows that the GPU is only using 45% of its available memory bandwidth. Rather, we must be accessing memory very inefficiently. Each memory access is taking a long time and not providing enough data for computation.

To fix this problem, we need a better understanding of a GPU’s execution model.

GPU threads execute in groups of 32 called a warp. All threads in a warp execute together in a parallel multiprocessor, and they run in lockstep, i.e., they run the same instruction at the same time.

When threads read or write memory, the memory accesses happen in 128-byte blocks. If the 32 threads in the warp try to read memory that lies in the same 128-byte block, then the hardware will only need to request one block (one “transaction”) from the memory bus.

However, if the threads each read memory addresses from different blocks, the hardware may need to make 32 separate memory transactions, which are often serialized. This leads to the behavior we found in the profile: The compute hardware is almost always idle because it has to wait for so many memory transactions to complete. The memory bandwidth utilization doesn’t appear quite as bad because many 128-byte chunks are being read, but only four or eight bytes out of each chunk are actually used, so much of the used bandwidth is wasted.

Currently, we allocate separate memory for each thread, so when a thread accesses memory, it very rarely falls into the same 128-byte chunk as a different thread. We can change that by allocating a slab of memory for a warp (32 threads), and interleaving the threads’ memory within that warp. This way, when threads need to access a value from memory, their values are all next to each other, and the GPU can fulfill these memory reads with a single memory transaction.

Trying this out, we find that performance improves by an order of magnitude! Clearly, it’s extremely important to be aware of memory access patterns when programming for GPUs.

Reducing Data Transfers and Kernel Launches

Re-running the profiler, we can see that we are getting much better compute utilization (33%, up from 3%), but we are still nowhere near full utilization. Can we do better?

Continuing our examination of memory usage patterns, let’s look at the type of memory used. Nvidia GPUs have several kinds of memory located in different physical places, but the easiest type to use is called “unified memory,” which automatically transfers data between different physical locations on our behalf. We have been using this because it doesn’t require us to think much about where bytes are being physically stored, but it can lead to performance bottlenecks if mismanaged, since data will be transferred between physical memory locations inefficiently.

Since we are still seeing very high memory latency, let’s take a closer look at these transfers.

Our simple fuzzer is working in “rounds”: if the GPU can run 40,000 threads, we pass 40,000 inputs to the GPU, and each thread fuzzes an input before we launch the next round. In between rounds, we reset the memory used (e.g., coverage tracking data structures and memory used by the program being fuzzed). However, this results in significant data transfers between the GPU and the CPU in between each round, as memory is paged back to the CPU, reset, and then paged back to the GPU. While these transfers are happening, the GPU is doing nothing. Additional latency is incurred as the GPU waits for the CPU to launch the next round.

We can improve this setup by doing a single launch of the GPU code and avoiding synchronicity between the CPU and GPU. Much of the data doesn’t need to be in unified memory; we can allocate global memory on the GPU instead, then asynchronously transfer data to the CPU when we need to send information about fuzzing progress (e.g., which inputs are causing crashes). In this way, when a thread finishes fuzzing an input, it can reset the memory and proceed to the next input without data transfer overhead and without waiting on the CPU.

This achieves a speedup of almost another order of magnitude! Now, the fuzzer is about five times faster per dollar than libFuzzer.

It’s extremely promising—although our fuzzer lacks a mutation engine and can’t handle system calls, exceeding libFuzzer’s performance to this degree suggests that fuzzing using GPUs may be extremely useful for certain applications.

What’s Next for GPU-Based Fuzzing?

Although we are close to our performance goal for this test program, the project still has a long way to go. Hardware utilization remains low, so there’s room for more optimization.

In addition, we need to build support for handling system calls, which may have a significant performance impact when fuzzing I/O-heavy applications. We also need to build the mutation engine before this fuzzer can be useful, although this problem is much better understood than building the execution engine.

Still, we’re very excited to be getting such promising results in such early stages of development. We look forward to an order of magnitude improvement in fuzzing embedded binaries.

We would love to hear your thoughts on this work! Contact us at ryan@reberhardt.com or artem@trailofbits.com.

Finally, a big “thank you” goes to Artem Dinaburg for the initial design of this system and for mentoring me throughout this project. Also, thank you to Peter Goodman for giving design feedback and debugging suggestions.

Osquery: Using D-Bus to query systemd data

by Rachel Cipkins, Stevens Institute of Technology

During my summer internship at Trail of Bits I worked on osquery, the massively popular open-source endpoint monitoring agent used for intrusion detection, threat hunting, operational monitoring, and many other functions. Available for Windows, macOS, Linux, and FreeBSD, osquery exposes an operating system as a high-performance relational database, which allows you to write SQL-based queries to explore operating system data.

My initial task was to port osquery’s startup_items table to Linux. Since the startup_items table is only available on macOS and Windows, we wanted to port it to Linux while keeping the current schema. Porting to Linux is complicated, though; like macOS and Windows, Linux has an indefinite number of locations for startup items, so I needed to parse the data in each location and insert it into the table. This would have been fairly simple, but we couldn’t directly parse the data for the systemd location. Ultimately, we added systemd support to the table through the D-Bus API and created a brand-new table for systemd units.

A note on the startup_items table…

Startup items are applications and binaries that run when your system is booted up, but “startup items” is also an abstract concept indicating some set of locations and subsystems that you want to enumerate. There are two primary types of startup item locations on Linux: user-specific locations and system-specific locations. The user-specific locations include ~/.config/autostart and ~/.config/autostart-scripts, while the system-specific locations include XDG and SysV. These all contain user-specific desktop entries and scripts, respectively.

You can see an example of the startup_items table for Linux at https://asciinema.org/a/354125.

Systemd and D-Bus

Systemd is not as simple to implement as the other locations; it’s an init system and service manager for Linux that uses units to represent resources. Units are resources that the system knows how to manage and operate, and these are defined in unit files. We are not able to parse these unit files directly, as we did for files relating to the other locations, and we could only get the information we needed by using an API.

So we turned to D-Bus, an interprocess communication system that allows us to interact directly with systemd to extract information about startup items. Now, systemd does have its own bus library, sd-bus, but we still prefer D-Bus because:

  • osquery uses CMake as its build system; and systemd does not. Meanwhile, D-Bus does use CMake, so it was simpler to integrate with osquery.
  • D-Bus can be used to query things other than systemd.

Here are some of the API calls we used from D-Bus to extract the information we needed:

// Connection
conn = dbus_bus_get(DBUS_BUS_SYSTEM, &error);


// Message
message =

// Reply
reply =
      dbus_connection_send_with_reply_and_block(conn, message, -1, &error);

And here’s an example of the startup_items table with systemd:


The systemd units table rises again

A few years ago the osquery community determined there was a need for some sort of table related to systemd. We restarted that conversation after successfully implementing D-Bus, and it was agreed that our unit-based table was the right direction. There are many different types of systemd units—some of the most common ones are service, socket, and device—and the systemd unit table has a column that differentiates them.

This example shows the many distinct types of units currently on my computer and narrows the results by type. Here we have executed a query for all of the services: https://asciinema.org/a/354127.

There are three states associated with each unit:

  • load_state indicates whether or not the unit has been properly loaded onto the system.
  • active_state indicates whether or not the unit is activated.
  • sub_state is an additional state specific to each type of unit.

Here you can see all the active units on the system: https://asciinema.org/a/354130.

What’s next for D-Bus and osquery?

D-Bus will allow us to query a lot of other things on osquery, including desktop environment configurations for GNOME and KDE, and network devices. Be sure to check out the new startup_items and systemd_units tables once they are merged and keep an eye out for new D-Bus features on osquery.

Want to learn more about osquery or contribute to the project? Check it out here!

Detecting Iterator Invalidation with CodeQL

by Kevin Higgs, Montgomery Blair High School

Iterator invalidation is a common and subtle class of C++ bugs that often leads to exploitable vulnerabilities. During my Trail of Bits internship this summer, I developed Itergator, a set of CodeQL classes and queries for analyzing and discovering iterator invalidation.

Results are easily interpretable by an auditor, providing information such as where an iterator is acquired, where it is invalidated, and a significance level that indicates the likelihood of a false positive. Itergator has been used to find bugs in real-world code, and the queries are easily extensible for further analysis.

Iterators Defined

Iterators are the standard way to traverse the contents of a container in C++. An iterator object supports at least two operations: dereferencing, to get the underlying object in the container; and incrementation, to get an iterator for the next element.

For example, the following code will output 1 2 3 4 5:

std::vector<int> vec{1, 2, 3, 4, 5};

for (std::vector<int>::iterator it = vec.begin(), end = vec.end(); it != end; ++it) {
    std::cout << *it << " ";

This is such a common code pattern that C++11 introduced a simplified syntax:

for (auto i : vec) {
    std::cout << i << " ";

While equivalent to the previous code, all iterator operations are now generated by the compiler. The details about the iteration are hidden from the developer.

Iterator Invalidation

Iterators are invalidated after certain modifications to their container, such as adding or erasing an element. Use of invalidated iterators is, per the standard, undefined behavior. In other words, what happens is implementation-specific and probably not good. For example, in the following code (discovered by Itergator in Cataclysm: Dark Days Ahead) the call to zones.erase invalidates the iterator it:

void zone_manager::deserialize( JsonIn &jsin )
    jsin.read( zones );
    for( auto it = zones.begin(); it != zones.end(); ++it ) {
        const zone_type_id zone_type = it->get_type();
        if( !has_type( zone_type ) ) {
            zones.erase( it );
            debugmsg( "Invalid zone type: %s", zone_type.c_str() );

The iterators of a vector in libstdc++ are pointers to the vector’s backing buffer. The erase method shifts all pointers past the erased iterator to the left by one, overwriting the erased object, and decrements the end of the vector.

If the vector contains only one element, vec.end() becomes the same as vec.begin(). In the example invalidation, at the end of the first loop iteration the iterator is incremented to be the address after vec.begin(). This means the continuation condition it != zones.end() holds, so we enter the loop with the iterator referencing whatever memory exists after the backing buffer on the heap! Because of the complexity of Cataclysm, the heap layout and the crash are not deterministic, but a properly modified game save frequently results in a segmentation fault from dereferencing an invalid address.

While this is a relatively benign example, the threat presented by this class of issues is not theoretical; iterator invalidation bugs in high-value targets have been weaponized before.


CodeQL is a static analysis framework developed by GitHub that allows you to query codebases with an SQL-like syntax. It has an object-oriented class system with predicates that define logical properties and relationships. The standard library provides a comprehensive set of classes which allow querying for a wide array of code properties and patterns.

A CodeQL database can be built for almost any code that compiles. GitHub maintains an index of databases they’ve built from public repositories at lgtm.com, which can be queried on their site or locally with the CodeQL CLI. There is also a Visual Studio Code extension for inspection of query results.

Itergator consists of both queries and libraries, allowing auditors to use Itergator’s classes in their own queries.

Detecting Iterator Invalidation

Using static analysis to detect iterator invalidation presents several challenges. The example above is simple, but invalidations can be nested many function calls deep, with complex logic surrounding them. Some iterators are declared and invalidated outside of a loop, resulting in flows that are expensive to detect without an enormous number of false positives. It is also important for the query to be extensible: Codebases often have their own iterable types with invalidation constraints that need to be detected.

CodeQL’s global data flow library and support for recursion make complex control flow analyses easy to write. Itergator is able to construct a graph of all potentially invalidating function calls (those that may result in a call to an invalidating function, like std::vector::push_back) and define classes to be used in queries:

  • Iterator: a variable that stores an iterator
  • Iterated: where a collection is iterated, e.g. vec in vec.begin()
  • Invalidator: a potentially invalidating function call in the scope of an iterator
  • Invalidation: a function call that directly invalidates an iterator

The InvalidationFlows query relates these classes with data flow to locate likely invalidations. To query non-standard iterated types, you simply extend the PotentialInvalidation class which, as an abstract class, is defined as the union of its subclasses. For example, here is an invalidation definition for destructors:

class PotentialInvalidationDestructor extends PotentialInvalidation {
    PotentialInvalidationDestructor() {
        this instanceof MemberFunction
        and this.getName().matches("~%")

    override predicate invalidates(Iterated i) {

These subclasses can be defined anywhere in your query or an imported library; definitions for STL classes are already included in Itergator. A utility query in Itergator, IteratedTypes, identifies what types to specify invalidation constraints for.

A large part of Itergator’s development required finding fixed iterator invalidation bugs on GitHub and attempting to reproduce them. One especially tricky bug in a regular expression library by Google exemplifies the challenges of this project:

struct Frame {
  Frame(Regexp** sub, int nsub)
      : sub(sub),
        round(0) {}

  Regexp** sub;
  int nsub;
  int round;
  std::vector<Splice> splices;
  int spliceidx;

int Regexp::FactorAlternation(Regexp** sub, int nsub, ParseFlags flags) {
  std::vector<Frame> stk;
  stk.emplace_back(sub, nsub);

  for (;;) {
    auto& splices = stk.back().splices;
    auto& spliceiter = stk.back().spliceiter;

    if (splices.empty()) {
    } else if (spliceiter != splices.end()) {
      stk.emplace_back(spliceiter->sub, spliceiter->nsub);
  } else { ... }

    switch (round) { ... }

    if (splices.empty() || round == 3) {
      spliceiter = splices.end();
    } else {
      spliceiter = splices.begin();

This function declares stk, a vector of frames, each of which has a splices vector and a spliceiter iterator. The iterator begins uninitialized, and is only assigned a value at the end of the first iteration of the loop (lines 32-36). It’s not obvious where the invalidation occurs; it’s not an operation on splices directly, but an element added to stk on line 26. If the backing buffer of stk is at capacity, it is reallocated and the Frame objects are copied, resulting in re-allocation of each splices vector. Because of the continue statement, spliceiter is never re-initialized, and an invalidated iterator is used on the next loop iteration.

This invalidation happens over three iterations of the loop: first initialization of the iterator, then invalidation, and finally, usage. The invalidating function call is performed on a member of an object stored inside a vector; confirming that this is the same vector the iterator refers to would be extremely complicated. Tracking control flow across all three executions is possible but expensive, and the query becomes impractical to run on large codebases.

My solution to these problems was to search for conditions necessary, but not sufficient, for invalidation. For example, I verified that the same variable—not value—can flow to both locations of iteration and invalidation. While this introduces a significant number of false positives, automatic filtering based on recurring patterns and the addition of a “significance” value makes searching through the results very manageable, while still identifying complex invalidations like the one above.

CodeQL’s caching and optimization also mean Itergator can query massive codebases, like Apple’s fork of LLVM for Swift, and find deep invalidations. Itergator identified the following bug, which was unintentionally fixed upstream a couple months ago, where the invalidation is 12 function calls deep. InvalidationFlows gives us the iteration and invalidation locations; then, after further investigation, including a customized path query, we can identify the necessary control flow:

And then we can construct a reproduction:

Step 1: Run the LLVM linker with undefined symbols and lazily loaded bitcode that references a linker script.

./ld.lld --no-threads --undefined asdf --undefined fawef --start-lib ~/lib.bc --end-lib ~/a.o

Steps 2 through 13:

lld::elf::Symbol::fetch() const
doParseFile<llvm::object::ELFType<1, true> >
void lld::elf::BitcodeFile::parse<llvm::object::ELFType<1, true> >()
lld::elf::LinkerDriver::addFile(llvm::StringRef, bool)

Step 14: Profit.

==332005==ERROR: AddressSanitizer: heap-use-after-free on address 0x603000004160 at pc 0x556f81e36288 bp 0x7ffd14c663f0 sp 0x7ffd14c663e0
READ of size 16 at 0x603000004160 thread T0
    #0 0x556f81e36287 in void 
lld::elf::LinkerDriver::link<llvm::object::ELFType<(llvm::support::endianness)1, true> >(llvm::opt::InputArgList&) (/home/kmh/llvm-project/build/bin/lld+0x1d53287)
    #1 0x556f81ddaa10 in lld::elf::LinkerDriver::main(llvm::ArrayRef<char const*>) /home/kmh/llvm-project/lld/ELF/Driver.cpp:514
    #2 0x556f81ddc3b6 in lld::elf::link(llvm::ArrayRef<char const*>, bool, llvm::raw_ostream&, llvm::raw_ostream&) /home/kmh/llvm-project/lld/ELF/Driver.cpp:111
    #3 0x556f8186cda8 in main /home/kmh/llvm-project/lld/tools/lld/lld.cpp:154
    #4 0x7f1f70d1b151 in __libc_start_main (/usr/lib/libc.so.6+0x28151)
    #5 0x556f8186861d in _start (/home/kmh/llvm-project/build/bin/lld+0x178561d)


Itergator is a powerful tool for detecting complex iterator invalidations in codebases of any size. Working with CodeQL’s declarative query language was awesome, despite the occasional engine bug, as it incorporated concepts I was already familiar with to make static analysis easy to pick up. There will always be improvements to make and more bugs to hunt, but I’m very happy with my results.

Finally, I’d like to thank my mentor Josh and everyone else at Trail of Bits who made this summer great. I can definitively say that Trail of Bits is the best place I’ve ever worked! If you have any questions, or just want to talk, shoot me a message on Twitter @themalwareman.

PrivacyRaven Has Left the Nest

By Suha S. Hussain, Georgia Tech

If you work on deep learning systems, check out our new tool, PrivacyRaven—it’s a Python library that equips engineers and researchers with a comprehensive testing suite for simulating privacy attacks on deep learning systems.

PrivacyRaven is a comprehensive testing suite for simulating privacy attacks on deep learning systems

Because deep learning enables software to perform tasks without explicit programming, it’s become ubiquitous in sensitive use cases such as:

  • Fraud detection,
  • Medical diagnosis,
  • Autonomous vehicles,
  • Facial recognition,
  • … and more.

Unfortunately, deep learning systems are also vulnerable to privacy attacks that compromise the confidentiality of the training data set and the intellectual property of the model. And unlike other forms of software, deep learning systems lack extensive assurance testing and analysis tools such as fuzzing and static analysis.

The CATastrophic Consequences of Privacy Attacks

But wait—are such privacy attacks likely? After all, medical applications using deep learning are subject to strict patient privacy regulations.

Unfortunately, yes. Imagine you’re securing a medical diagnosis system for detecting brain bleeds using CAT scan images:

Now, suppose the deep learning model in this image predicts whether or not a patient has a brain bleed and responds with a terse “Yes” or “No” answer. This setting provides users with as little access to the model as possible, so you might think there’s not much an adversary could learn. However, even when strongly restricted, an adversary modeled by PrivacyRaven can:

Evidently, an adversary can critically compromise the confidentiality of this system, so it has to be defended against privacy attacks to be considered secure. Otherwise, any major vulnerability has the potential to undermine trust and participation in all such systems.

PrivacyRaven Design Goals

Many other deep learning security techniques are onerous to use, which discourages their adoption. PrivacyRaven is meant for a broad audience, so we designed it to be:

  • Usable: Multiple levels of abstraction allow users to either automate much of the internal mechanics or directly control them, depending on their use case and familiarity with the domain.
  • Flexible: A modular design makes the attack configurations customizable and interoperable. It also allows new privacy metrics and attacks to be incorporated straightforwardly.
  • Efficient: PrivacyRaven reduces the boilerplate, affording quick prototyping and fast experimentation. Each attack can be launched in fewer than 15 lines of code.

As a result, PrivacyRaven is appropriate for a range of users, e.g., a security engineer analyzing bot detection software, an ML researcher pioneering a novel privacy attack, an ML engineer choosing between differential privacy techniques, and a privacy researcher auditing data provenance in text-generation models.

Threat Model

Optimized for usability, efficiency, and flexibility, PrivacyRaven allows users to simulate privacy attacks. Presently, the attacks provided by PrivacyRaven operate under the most restrictive threat model, i.e., they produce worst-case scenario analyses. (This may change as PrivacyRaven develops.) The modeled adversary only receives labels from an API that queries the deep learning model, so the adversary directly interacts only with the API, not the model:

Many other known machine learning attacks exploit the auxiliary information released under weaker threat models. For instance, under the white-box threat model, many systems allow users to access model parameters or loss gradients. Some black-box attacks even assume that the adversary receives full confidence predictions or model explanations.

Despite the possible benefits of these features, if you are deploying a deep learning system, we recommend reducing user access and adhering to PrivacyRaven’s threat model. The extra information provided under the aforementioned weaker threat models substantially increases the effectiveness and accessibility of attacks.

PrivacyRaven Features

PrivacyRaven provides three types of attacks: model extraction, membership inference,  and model inversion. Most of the library is dedicated to wrappers and interfaces for launching these attacks, so users don’t need an extensive background in machine learning or security.

1. Model Extraction

Model extraction attacks directly violate the intellectual property of a system. The primary objective is to extract a substitute model, or grant an adversary a copycat version of the target.

These attacks fall into two categories: optimized for high accuracy or optimized for high fidelity. A high-accuracy substitute model attempts to perform the task to the best of its ability. If the target model incorrectly classifies a data point, the substitute model will prioritize the correct classification. In contrast, a high-fidelity substitute model will duplicate the errors of the target model.

High-accuracy attacks are typically financially motivated. Models are often embedded in a Machine-Learning-as-a-Service distribution scheme, where users are billed according to the number of queries they send. With a substitute model, an adversary can avoid paying for the target and profit from their own version.

High-fidelity attacks are used for reconnaissance to learn more about the target. The substitute model extracted using this attack allows the adversary to launch other classes of attacks, including membership inference and model inversion.

Because the existing methods of model extraction often adopt disparate approaches, most security tools and implementations treat each extraction attack distinctly. PrivacyRaven instead partitions model extraction into multiple phases that encompass most attacks found in the literature (notably excluding cryptanalytic extraction):

  1. Synthesis: First, synthetic data is generated with techniques such as leveraging public data, exploiting population statistics, and collecting adversarial examples.
  2. Training: A preliminary substitute model is trained on the synthetic dataset. Depending on the attack objectives and configuration, this model doesn’t need to have the same architecture as the target model.
  3. Retraining: The substitute model is retrained using a subset sampling strategy to optimize the synthetic data quality and the overall attack performance. This phase is optional.

With this modular approach, users can quickly switch between different synthesizers, sampling strategies, and other features without being limited to configurations that have already been tested and presented. For example, a user may combine a synthesizer found in one paper on extraction attacks with a subset sampling strategy found in another one.

2. Membership Inference

Membership inference attacks are, at their core, re-identification attacks that undermine trust in the systems they target. For example, patients have to trust medical diagnosis system developers with their private medical data. But if a patient’s participation, images, and diagnosis are recovered by an adversary, it will diminish the trustworthiness of the whole system.

PrivacyRaven separates membership inference into different phases:

During a membership inference attack, an attack network is trained to detect whether a data point is included in the training dataset. To train the attack network, a model extraction attack is launched. The outputs are combined with adversarial robustness calculations to generate the dataset.

Unlike similar tools, PrivacyRaven integrates the model extraction API, which makes it easier to optimize the first phase, improve attack performance, and achieve stronger privacy guarantees. Additionally, PrivacyRaven is one of the first implementations of label-only membership inference attacks.

3. Model Inversion

Model inversion attacks look for data that the model has already memorized. Launching an inversion attack on the medical diagnosis system, for instance, would yield the CAT scan’s training dataset. In PrivacyRaven, this attack will be implemented by training a neural network to act as the inverse of the target model. Currently, this feature is in incubation and will be integrated into future PrivacyRaven releases.

Upcoming Flight Plans

We are rapidly adding more methods for model extraction, membership inference, and model inversion. Likewise, we’ll improve and extend the capabilities of PrivacyRaven to address the priorities of the larger deep learning and security communities. Right now, we’re considering:

  1. An enhanced interface for metrics visualizations: We intend PrivacyRaven to generate a high-quality output that balances comprehensiveness and clarity, so it lucidly demonstrates the attack’s impact to non-experts while still providing a measure of control for more specialized use cases.
  2. Automated hyperparameter optimization: Hyperparameter choices are both difficult to reason about and critical to the success of privacy attacks. We plan to incorporate hyperparameter optimization libraries like Optuna to help users avoid major pitfalls and reach their objectives faster.
  3. Verification of differential privacy or machine unlearning: Multiple mechanisms for auditing the implementations of differential privacy and machine unlearning exist, including using minimax rates to construct property estimators or manipulating data poisoning attacks. Consolidating these techniques would bolster the evaluation of privacy-preserving machine learning techniques.
  4. Privacy thresholds and metric calculations: Coupling metrics for privacy grounded in information theory and other fields of mathematics with practical privacy attacks is a nascent endeavor that would greatly benefit the field in its current state.
  5. More classes of attacks: We would like to incorporate attacks that specifically target federated learning and generative models as well as side channel and property inference attacks.

PrivacyRaven in Practice

To attack any deep learning model, PrivacyRaven requires only a query function from a classifier, regardless of the original programming framework or current distribution method. Here’s a model extraction attack executed with PrivacyRaven.

Inside the blue box, a query function is created for a PyTorch Lightning model included with the library (executed after the requisite components are imported). To accelerate prototyping, PrivacyRaven includes a number of victim models. The target model in this example is a fully connected neural network trained on the MNIST dataset. The single line inside of the red box downloads the EMNIST dataset to seed the attack. The bulk of the attack is the attack configuration, located in the green box. Here, the copycat synthesizer helps train the ImageNetTransferLearning classifier.

The output of this example is quite detailed, incorporating statistics about the target and substitute models in addition to metrics regarding the synthetic dataset and overall attack performance. For instance, the output may include statements like:

  • The accuracy of the substitute model is 80.00%.
  • Out of 1,000 data points, the target model and substitute model agreed on 900 data points.

This example demonstrates the core attack interface where attack parameters are defined individually. PrivacyRaven alternatively offers a run-all-attacks and a literature-based interface. The former runs a complete test on a single model, and the latter provides specific attack configurations from the literature.

The Future of Defense

Until now, in the arms race between privacy attacks and defense, engineers and researchers have not had the privacy analysis tools they need to protect deep learning systems. Differential privacy and stateful detection have emerged as two potential solutions to explore, among others. We hope PrivacyRaven will lead to the discovery and refinement of more effective defenses or mitigations. Check out this GitHub repository for a curated collection of research on privacy attacks and defenses.

Contribute to PrivacyRaven!

We’re excited to continue developing PrivacyRaven, and eagerly anticipate more applications. Try it out and contribute to PrivacyRaven now on GitHub: Incorporate a new synthesis technique, make an attack function more readable, etc.!

On a personal note, building PrivacyRaven was the primary objective of my internship this summer at Trail of Bits. It was a rewarding experience: I learned more about cutting-edge areas of security, developed my software engineering skills, and presented my PrivacyRaven work at Empire Hacking and the OpenMined Privacy Conference.

I’m continuing my internship through this winter, and look forward to applying what I’ve already learned to new problems. Feel free to contact me about PrivacyRaven or anything related to trustworthy machine learning at suha.hussain@trailofbits.com or @suhackerr.

Graphtage: A New Semantic Diffing Tool

Graphtage is a command line utility and underlying library for semantically comparing and merging tree-like structures such as JSON, JSON5, XML, HTML, YAML, and TOML files. Its name is a portmanteau of “graph” and “graftage” (i.e., the horticultural practice of joining two trees together so they grow as one). Read on for:

  • What Graphtage does differently and better
  • Why we developed it
  • How it works
  • Directions for using it as a library

All sorts of good

Graphtage lets you see what’s different between two files quickly and easily, but it isn’t a standard line-oriented comparison tool like diff. Graphtage is semantically aware, which allows it to map differences across unordered structures like JSON dicts and XML element tags. You can even compare files that are in two different formats! And when paired with our PolyFile tool, you can semantically diff arbitrary file formats.

Tree-like file formats are becoming increasingly common as a means for transmitting and storing data. If you’ve ever wrangled a gnarly REST API, disentangled the output of a template-generated webpage, or confabulated a config file (and subsequently needed to figure out which specific change was the one that made things work), you’ve probably fought with—and been disappointed by—the current state of open-source semantic diffing tools.

Graphtage solves these problems. It’s available today. To install the utility, run:

pip3 install graphtage

Grab the source code here.

How are existing diff tools insufficient?

Ordered nodes in the tree (e.g., JSON lists) and, in particular, mappings (e.g., JSON dicts) are challenging. Most extant diffing algorithms and utilities assume that the structures are ordered. Take this JSON as an example:

# original.json
    "foo": [1, 2, 3, 4],
    "bar": "testing"
# modified.json
    "foo": [2, 3, 4, 5],
    "zab": "testing",
    "woo": ["foobar"]

Existing tools effectively canonicalize the JSON (e.g., sort dictionary elements by key and format lists with one item per line), and then perform a traditional diff. We don’t need no fancy tools for that! Here’s effectively what they do:

$ cat original.json | jq -M --sort-keys > original.canonical.json
$ cat modified.json | jq -M --sort-keys > modified.canonical.json
$ diff -u original.canonical.json modified.canonical.json
    -  "bar": "testing",
       "foo": [
    -    1,
    -    4
    -  ]
    +    4,
    +    5
    +  ],
    +  "woo": [
    +    "foobar"
    +  ],
    +  "zab": "testing"

That result is not very useful, particularly if the input files are large. The problem is that changing dict keys breaks the diff: Since “bar” was changed to “zab,” the canonical representation changed, and the traditional diff algorithm considered them separate edits (lines 2 and 15 of the diff).

In contrast, here is Graphtage’s output for the same pair of files:
An example of Graphtage's diff output

Why hasn’t this been done before?

In general, optimally mapping one graph to another cannot be executed in polynomial time, and is therefore not tractable for graphs of any useful size (unless P=NP). This is true even for restricted classes of graphs like DAGs. However, trees and forests are special cases that can be mapped in polynomial time, with reasonable constraints on the types of edits possible. Graphtage exploits this.

How do it know?

Graphtage’s diffing algorithms operate on an intermediate representation rather than on the data structures of the original file format. This allows Graphtage to have generic comparison algorithms that can work on any input file type. Therefore, to add support for a new file type, all one needs to do is “lift” it to the intermediate representation. Likewise, one only needs to implement support for a new type of edit once, and it will immediately be available to apply against all supported filetypes. Using an intermediate representation has the added benefit of allowing cross-format comparisons and formatting translations: Graphtage will happily diff a JSON file against a YAML file, formatting the diff output in TOML syntax.

Graphtage matches ordered sequences like lists using an “online” “constructive” implementation of the Levenshtein distance metric, similar to the Wagner–Fischer algorithm. The algorithm starts with an unbounded mapping and iteratively improves it until the bounds converge, at which point the optimal edit sequence is discovered.

Dicts are matched by solving the minimum weight matching problem on the complete bipartite graph from key/value pairs in the source dict to key/value pairs in the destination dict.

Graphtage is a command line utility, but it can just as easily be used as a library. One can interact with Graphtage directly from Python, and extend it to support new file formats and edit types.

Next up for Graphtage

We think Graphtage is pretty nifty. You can also use Graphtage in conjunction with our PolyFile tool to semantically diff arbitrary file formats, even if they aren’t naturally tree-based. Try it, and let us know how you use it.

We also plan to extend Graphtage to work on abstract syntax trees, which will allow your source code diffs to tell you things like which variables were changed and whether code blocks were reordered. If you have a similarly nifty idea for a new feature, please share it with us!

Note: This tool was partially developed with funding from the Defense Advanced Research Projects Agency (DARPA) on the SafeDocs project. The views, opinions, and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.

Using Echidna to test a smart contract library

In this post, we’ll show you how to test your smart contracts with the Echidna fuzzer. In particular, you’ll see how to:

  • Find a bug we discovered during the Set Protocol audit using a variation of differential fuzzing, and
  • Specify and check useful properties for your own smart contract libraries.

And we’ll demonstrate how to do all of this using crytic.io, which provides a GitHub integration and additional security checks.

Libraries may import risk

Finding bugs in individual smart contracts is critically important: A contract may manage significant economic resources, whether in the form of tokens or Ether, and damages from vulnerabilities may be measured in millions of dollars. Arguably, though, there is code on the Ethereum blockchain that’s even more important than any individual contract: library code.

Libraries are potentially shared by many high-value contracts, so a subtle unknown bug in, say, SafeMath, could allow an attacker to exploit not just one, but many critical contracts. The criticality of such infrastructure code is well understood outside of blockchain contexts—bugs in widely used libraries like TLS or sqlite are contagious, infecting potentially all code that relies on the vulnerable library.

Library testing often focuses on detecting memory safety vulnerabilities. On the blockchain, however, we’re not so worried about avoiding stack smashes or a memcpy from a region containing private keys; we’re worried most about the semantic correctness of the library code. Smart contracts operate in a financial world where “code is law,” and if a library computes incorrect results under some circumstances, that “legal loophole” may propagate to a calling contract, and allow an attacker to make the contract behave badly.

Such loopholes may have other consequences than making a library produce incorrect results; if an attacker can force library code to unexpectedly revert, they then have the key to a potential denial-of-service attack. And if the attacker can make a library function enter a runaway loop, they can combine denial of service with costly gas consumption.

That’s the essence of a bug Trail of Bits discovered in an old version of a library for managing arrays of addresses, as described in this audit of the Set Protocol code.

The faulty code looks like this:

* Returns whether or not there's a duplicate. Runs in O(n^2).
* @param A Array to search
* @return Returns true if duplicate, false otherwise
function hasDuplicate(address[] memory A) returns (bool)
     for (uint256 i = 0; i < A.length - 1; i++) {
       for (uint256 j = i + 1; j < A.length; j++) {
         if (A[i] == A[j]) {
            return true;
   return false;

The problem is that if A.length is 0 (A is empty), then A.length - 1 underflows, and the outer (i) loop iterates over the entire set of uint256 values. The inner (j) loop, in this case, doesn’t execute, so we have a tight loop doing nothing for (basically) forever. Of course this process will always run out of gas, and the transaction that makes the hasDuplicate call will fail. If an attacker can produce an empty array in the right place, then a contract that (for example) enforces some invariant over an address array using hasDuplicate can be disabled—possibly permanently.

The library

For specifics, see the code for our example, and check out this tutorial on using Echidna.

At a high level, the library provides convenient functions for managing an array of addresses. A typical use case involves access control using a whitelist of addresses. AddressArrayUtils.sol has 19 functions to test:

function indexOf(address[] memory A, address a)
function contains(address[] memory A, address a)
function indexOfFromEnd(address[] A, address a)
function extend(address[] memory A, address[] memory B)
function append(address[] memory A, address a)
function sExtend(address[] storage A, address[] storage B)
function intersect(address[] memory A, address[] memory B)
function union(address[] memory A, address[] memory B)
function unionB(address[] memory A, address[] memory B)
function difference(address[] memory A, address[] memory B)
function sReverse(address[] storage A)
function pop(address[] memory A, uint256 index)
function remove(address[] memory A, address a)
function sPop(address[] storage A, uint256 index)
function sPopCheap(address[] storage A, uint256 index)
function sRemoveCheap(address[] storage A, address a)
function hasDuplicate(address[] memory A)
function isEqual(address[] memory A, address[] memory B)
function argGet(address[] memory A, uint256[] memory indexArray)

It seems like a lot, but many of the functions are similar in effect, since AddressArrayUtils provides both functional versions (operating on memory array parameters) and mutating versions (requiring storage arrays) of extend, reverse, pop, and remove. You can see how once we’ve written a test for pop, writing a test for sPop probably won’t be too difficult.

Property-based fuzzing 101

Our job is to take the functions we’re interested in—here, all of them—and:

  • Figure out what each function does, then
  • Write a test that makes sure the function does it!

One way to do this is to write a lot of unit tests, of course, but this is problematic. If we want to thoroughly test the library, it’s going to be a lot of work, and, frankly, we’re probably going to do a bad job. Are we sure we can think of every corner case? Even if we try to cover all the source code, bugs that involve missing source code, like the hasDuplicate bug, can easily be missed.

We want to use property-based testing to specify the general behavior over all possible inputs, and then generate lots of inputs. Writing a general description of behavior is harder than writing any individual concrete “given inputs X, the function should do/return Y” test. But the work to write all the concrete tests needed would be exorbitant. Most importantly, even admirably well-done manual unit tests don’t find the kind of weird edge-case bugs attackers are looking for.

The Echidna test harness: hasDuplicate

The most obvious thing about the code to test the library is that it’s bigger than the library itself!  That’s not uncommon in a case like this. Don’t let that daunt you; unlike a library, a test harness approached as a work-in-progress, and slowly improved and expanded, works just fine. Test development is inherently incremental, and even small efforts provide considerable benefit if you have a tool like Echidna to amplify your investment.

For a concrete example, let’s look at the hasDuplicate bug. We want to check that:

  • If there is a duplicate, hasDuplicate reports it, and
  • If there isn’t a duplicate, hasDuplicate reports that there isn’t one.

We could just re-implement hasDuplicate itself, but this doesn’t help much in general (here, it might let us find the bug). If we had another, independently developed, high-quality address array utility library, we could compare it, an approach called differential testing. Unfortunately, we don’t often have such a reference library.

Our approach here is to apply a weaker version of differential testing by looking for another function in the library that can detect duplicates without calling hasDuplicate. For this, we’ll use indexOf and indexOfFromEnd to check if the index of an item (starting from 0) is the same as that when a search is performed from the end of the array:

    for (uint i = 0; i < addrs1.length; i++) {
      (i1, b) = AddressArrayUtils.indexOf(addrs1, addrs1[i]);
      (i2, b) = AddressArrayUtils.indexOfFromEnd(addrs1, addrs1[i]);
      if (i1 != (i2-1)) { // -1 because fromEnd return is off by one
	hasDup = true;
    return hasDup == AddressArrayUtils.hasDuplicate(addrs1);

See the full example code in our addressarrayutils demo

This code iterates through addrs1 and finds the index of the first appearance of each element.  If there are no duplicates, of course, this will always just be i itself. The code then finds the index of the last appearance of the element (i.e., from the end). If those two indices are different, there is a duplicate. In Echidna, properties are just Boolean Solidity functions that usually return true if the property is satisfied (we’ll see the exception below), and fail if they either revert or return false. Now our hasDuplicate test is testing both hasDuplicate and the two indexOf functions. If they don’t agree, Echidna will tell us.

Now we can add a couple of functions to be fuzzed to set addrs1.

Let’s run this property on Crytic:

The property test for hasDuplicate fails in Crytic

First, crytic_hasDuplicate fails:

crytic_hasDuplicate: failed!
  Call sequence:

The triggering transaction sequence is extremely simple: Don’t add anything to addrs1, then call hasDuplicate on it. That’s it—the resulting runaway loop will exhaust your gas budget, and Crytic/Echidna will tell you the property failed. The 0x0 address results when Echidna minimizes the failure to the simplest sequence possible.

Our other properties (crytic_revert_remove and crytic_remove) pass, so that’s good. If we fix the bug in hasDuplicate then our tests will all pass:

All three property tests now pass in Crytic

The crytic_hasDuplicate: fuzzing (2928/10000) tells us that since the expensive hasDuplicate property doesn’t quickly fail, only 3,000 of our maximum of 10,000 tests for each property were performed before we hit our timeout of five minutes.

The Echidna test harness: The rest of the library

Now we’ve seen one example of a test, here are some basic suggestions for building the rest of the tests (as we’ve done for the addressarrayutils_demo repository):

  • Try different ways of computing the same thing. The more “differential” versions of a function you have, the more likely you are to find out if one of them is wrong. For example, look at all the ways we cross-check indexOf, contains, and indexOfFromEnd.
  • Test for revert. If you add the prefix _revert_ before your property name as we do here, the property only passes if all calls to it revert. This ensures code fails when it is supposed to fail.
  • Don’t forget to check obvious simple invariants, e.g., that the diff of an array with itself is always empty (ourEqual(AddressArrayUtils.difference(addrs1, addrs1), empty)).
  • Invariant checks and preconditions in other testing can also serve as a cross-check on tested functions. Note that hasDuplicate is called in many tests that aren’t meant to check hasDuplicate at all; it’s just that knowing an array is duplicate-free can establish additional invariants of many other behaviors, e.g., after removing address X at any position, the array will no longer contain X.

Getting up and running with Crytic

You can run Echidna tests on your own by downloading and installing the tool or using our docker build—but using the Crytic platform integrates Echidna property-based testing, Slither static analysis (including new analyzers not available in the public version of Slither), upgradability checks, and your own unit tests in a seamless environment tied to your version control. Plus the addressarrayutils_demo repository shows all you need for property-based testing: It can be as simple as creating a minimal Truffle setup, adding a crytic.sol file with the Echidna properties, and turning on property-based tests in your repository configuration in Crytic.

Sign up for Crytic today, and if you have questions, join our Slack channel (#crytic) or follow @CryticCI on Twitter.

Sinter: New user-mode security enforcement for macOS

TL;DR: Sinter is the first available open-source endpoint protection agent written entirely in Swift, with support for Apple’s new EndpointSecurity API from first principles. Sinter demonstrates how to build a successful event-authorization security agent, and incorporates solutions to many of the challenges that all endpoint protection agents will face as they migrate from kernel-mode to user-mode agents before the release of macOS 11 Big Sur.

Simple, open-source, and Swift

Sinter is our new open-source endpoint security enforcement agent for macOS 10.15 and above, written in Swift. We built it from scratch as a 100% user-mode agent leveraging the new EndpointSecurity API to receive authorization callbacks from the macOS kernel for a set of security-relevant event types. Sinter is controlled with simple rules to allow or deny events—and uses none of the expensive full-system scans or signature-based detection of traditional anti-virus solutions.

Grab an installer for the beta version today and try it out!

Currently, Sinter lets you write a set of rules to block or allow process execution events, and to provide the rules to the agent with a Santa-compatible sync server or with a local configuration file (here is an example rule that demonstrates an explicit-allow). However, we’re planning to develop a more sophisticated rule syntax and add blocking capability for the many other kinds of events supported by the API, which would also mean an end to the Santa rule compatibility.

The quest for the 100% user-mode security agent

Implementing an endpoint security solution (e.g., anti-virus, anti-malware) requires interception and authorization of OS-level events in real time. Historically, that has meant the use of kernel-mode callback APIs or hooking kernel-mode operating system code when a proper API was not provided. Operating system developers have long known that third-party kernel-mode code like this was the leading source of system instability and insecurity, because any small error in kernel code tends to have large consequences.

Enter the macOS EndpointSecurity API. In late 2019, Apple announced that support for all third-party kernel extensions would be deprecated in macOS, and that they would introduce user-mode APIs and frameworks to replace the functionality needed for third-party products. All security vendors were put on notice: Deprecate your existing kernel-mode solutions within the next year and migrate to the EndpointSecurity API before the next release of macOS (macOS 11 Big Sur). It’s clearly not a fun prospect for many teams, and soon after the announcement a client tapped us to develop a user-mode solution that would make migration less painful.

What is the EndpointSecurity API?

EndpointSecurity is an API that implements a callback from the macOS kernel, in real time, as a particular event is about to happen. EndpointSecurity clients subscribe to one or more event types that are either a NOTIFY type or an AUTH (Authorization) type. Notify is just what it sounds like, and is useful for capturing a simple activity log on the host. An Authorization callback is much more powerful; it lets a client process make a decision to allow or deny the event from happening.

EndpointSecurity replaces the kernel-mode equivalents for real-time event authorizing on macOS (Kauth KPI and other unsupported kernel methods) and the read-only event monitoring OpenBSM audit trail. Any real-time monitoring or protection product for macOS must be rewritten to use EndpointSecurity for macOS 11 Big Sur.

Note that there are no network-related events in the EndpointSecurity API (except UNIX domain sockets). All of these are in the Network Extension framework. You can combine the use of both APIs from one System Extension, but here we focus on the EndpointSecurity API specifically.

Using this API, Stephen Davis at FireEye and Patrick Wardle at Objective-See quickly released event monitoring applications that could display, for example, process-related and file-related events in real time. But read-only monitoring tools following in the footsteps of Process Monitor (“ProcMon”), while useful, are only using the Notify half of the functionality of the EndpointSecurity API (the ability to monitor). Google’s Santa, an open-source macOS process allow/deny solution in Objective-C, demonstrates the ability to authorize events using EndpointSecurity: Its agent now receives and makes allow/deny decisions for process events from EndpointSecurity.

We saw that it would be critically important to master the EndpointSecurity API, as many teams would need to migrate to it in their existing macOS security applications. With the development of Sinter, we’ve delved into EndpointSecurity, learned some lessons from the experience, and incorporated solutions for various challenges we encountered—so you don’t have to. Sinter also demonstrates an implementation of an EndpointSecurity client in the Swift programming language, which promises better memory safety and performance than Objective-C, while maintaining compatibility with all of the other new macOS APIs.

Developing Sinter: Not for the faint of heart

Implementing an event-authorizing agent is an order of magnitude more difficult than implementing a read-only event-subscriber. We also learned—the hard way—certain shortcomings of the EndpointSecurity API. Here’s some of the more significant heavy lifting we did in the course of Sinter’s development.

1. Making decisions in real-time without impacting the system

Possibly the most difficult part of implementing a security event authorization agent is that authorization decisions must be made in real time. You cannot block forever to make a decision, and EndpointSecurity enforces a deadline for each authorization message: If your client blows the deadline, EndpointSecurity will terminate your client process to preserve a functioning system.

Decisions shouldn’t be made synchronously; Sinter uses the es_copy_message to dequeue the message from EndpointSecurity and allow its dispatcher to immediately send you the next message. Decisions should be made in separate threads, responding asynchronously as soon as possible from each one. Some decisions will take longer than others to process, but often the APIs needed to perform signature checks can’t be interrupted.

With Sinter, we ran into this problem head-on when a quick burst of execution events involving large programs caused Sinter to lock up the machine. We solved this by implementing an efficient queuing system, with one queue for small programs and another one for big programs, so events would never get stuck waiting in the queue. The big-programs queue works out-of-process so a long-running verification can be aborted whenever necessary. This new approach performs reliably on all of our tests.

2. Mitigating the TOCTOU risks in real-time security decisions

The TOCTOU (time of check, time of use) race condition vulnerability pattern commonly occurs when making security decisions. Any security agent performing a check must not allow the checked resource to be modified in the time between the check and the action’s approval.

When authorizing macOS execution events, the resource being checked is the executable file, which is mapped into memory before executing. Here’s one TOCTOU attack scenario:

A malicious actor executes Bad.app. The bad executables are mapped into memory and an execution authorization event is emitted by EndpointSecurity. But then the attacker immediately replaces or modifies the executable file to make it Good.app. The EndpointSecurity client gets the event, verifies that the bundle and its files all look good, and allows the execution.

This problem is not unique to EndpointSecurity, and was always a risk with the KAuth framework that preceded it (e.g., an issue was raised about this TOCTOU in Santa not long ago). It’s still a challenge that must be solved by any agent that wants to authorize events. As mentioned, Sinter attempts to monitor file events to catch TOCTOU attacks. It would have been much easier if Apple handled this responsibility within the EndpointSecurity API itself (submitted to Apple as a developer feedback suggestion FB8352031; see OpenRadar).

3. macOS executable files live in application bundles

Execution events occur in the context of a single executable file, but most macOS executables exist within application bundles, the directory-like structure that appears as a single “.app” file in macOS Finder. A bundle itself is code-signed, and code-signing verification must be done at the bundle level. This means that a security agent that catches an execution event must discover if the executable has a containing app bundle, then verify the code signature on the entire bundle—these are tasks not performed by EndpointSecurity itself. Some bundles like Apple’s Xcode.app are upwards of a gigabyte in size, and processing a verification in real time isn’t possible. Execution events have to be denied at first, until the verification completes.

EndpointSecurity does provide a built-in caching mechanism, a single cache shared by all EndpointSecurity clients. However, as a client you cannot invalidate a single entry in this cache; you can only clear the entire cache at once. EndpointSecurity will automatically invalidate a cache item if the related file is changed/deleted/etc., but it does this on a per-file basis, not per-application-bundle. Currently, Sinter works with two caches: the one managed by EndpointSecurity, and another custom cache containing the application bundle code-signing verification results.

In theory, malware could be added into an application bundle and EndpointSecurity would not react by clearing a cached approval decision, if the previously approved executable file in the bundle had not changed. EndpointSecurity clients would have to monitor for this themselves, and invalidate the entire cache in response. This is less than ideal, and we hope Apple will make improvements to this caching mechanism. In the near term, EndpointSecurity clients may have to implement their own integrity monitoring on application bundles to avoid being circumvented this way. Sinter attempts its own bundle file integrity monitoring capability to detect when this custom cache should be cleared.

4. Advantages of installing your agent as a System Extension

System Extensions” is Apple’s name for “user-mode components that extend the system” and is the umbrella term for what replaces the now-deprecated third-party Kernel Extensions. EndpointSecurity is one API under this umbrella; DriverKit and Network Extensions are a couple of others. A System Extension is also a new kind of managed plug-in package for macOS through which you can install your executable.

Installing an EndpointSecurity client as a System Extension is not required—you can implement all of the EndpointSecurity functionality from any kind of executable, even a basic command-line application—but is highly encouraged. There are additional benefits and system-enforced protections for your agent when it is installed as a System Extension. System Extensions can opt to be loaded before all other third-party applications at startup. Apple also announced that macOS extends SIP (System Integrity Protection) to cover System Extensions, meaning it prevents even root users from unloading your security agent. Historically this was only possible if you developed your own kernel-mode anti-tamper logic, but installing your agent as a System Extension frees you from reinventing this wheel. Sinter is currently a background daemon, but now that Apple has documented the anti-tamper protection benefit of installing your agent as a System Extension, we will be converting Sinter to this format.

5. Mastering the Entitlements, Signing, and Notarization workflow

The EndpointSecurity API is only usable within code-signed and notarized applications by Apple-approved developers like Trail of Bits. In other words, the API is gated by a special entitlement. Unlike most entitlements, this one requires a manual application and approval by Apple, after which you are granted a code-signing certificate with the EndpointSecurity entitlement. In our case, the time to be approved was six calendar weeks, but your mileage may vary. Apple is apparently being careful with this entitlement, because a misbehaving or malicious EndpointSecurity client could put a halt to everything on a host system.

Apple’s code-signing and notarization steps are difficult to troubleshoot when they fail, so it’s essential to set up and automate the process early, so you will immediately notice when they break and easily narrow down the breaking changes. For Sinter, we created our own CMake-driven approach that automates the workflow for Apple’s notarization, packaging, package signing, and package notarization steps. All of that now integrates perfectly into our CI with minimal fuss.

One last entitlement that EndpointSecurity agents need is related to user privacy. Because most agents will be inspecting files (whether in the context of file events or the executables of process events), they need the user’s permission to access the filesystem. On or before the first run of your application, the user must manually go to Privacy settings in System Preferences, and enable “Full Disk Access.” There are MDM payloads that can automatically enable the permission and sidestep this manual user approval step.

Those were the thornier challenges we addressed when writing Sinter, and of course there were more miscellaneous gotchas and lessons learned (e.g., determining whether files are binaries, signature verification, and multiple EndpointSecurity clients). We’ll update the most compelling details as development continues—stay tuned.

The upshot

With the deprecation of kernel extensions, Apple is leveling the playing field for endpoint protection agents: Everyone must use the same user-mode APIs. This will benefit everyone with improved system stability and reduced attack surface, but existing security product developers first have to replace their kernel extensions with a user-mode approach. In user mode, they can now work in any language, not just C/C++.

So instead of starting from scratch with just the example code in C, we hope organizations will help us build and rely upon an open-source platform in Swift, a forward-looking choice for long-term investment as Apple’s successor to Objective-C.

Get involved with Sinter

The beta version of Sinter is available today. It’s a significant first step, and here’s a peek at some of the larger items we’re working on now:

We invite you to partner with us to sponsor the continued development of Sinter, or to discuss the integration of EndpointSecurity-based capability into your existing agent—just contact us to get started.

Contributors are welcome, too! Give us your feedback on GitHub, or join us in the #sinter channel on the Empire Hacking Slack.

Accidentally stepping on a DeFi lego

The initial release of yVault contained logic for computing the price of yUSDC that could be manipulated by an attacker to drain most (if not all) of the pool’s assets. Fortunately, Andre, the developer, reacted incredibly quickly and disabled the faulty code, securing the approximately 400,000 USD held at the time. However, this bug still highlights the risk stemming from increased complexity caused by composition in the DeFi space.

What is yVault?

On July 25th 2020, yEarn launched a new service called yVault: Users could deposit tokens in the vault, which would then be supplied to a DeFi protocol chosen to maximize their interest.

The initial release supported USDC and integrated with the USDC/MUSD Balancer pool. Any USDC held by the vault would be supplied to the Balancer pool as liquidity, and the vault would receive BPT tokens in return.

To use the vault, a user sends USDC and is minted yUSDC. Similarly, USDC can be withdrawn by burning yUSDC. These two operations rely on a dynamically calculated exchange rate, defined as the ratio of the value of the BPT held by the contract and the total supply of yUSDC. Since the value of BPT goes up when fees are paid by traders, the value of each yUSDC token slowly goes up over time.

Within an hour of yVault’s release, users had already deposited around 400,000 USDC, so I knew I had to take a look at the code for myself.

What was the bug?

Since the initial release integrated with Balancer, let’s consider how Balancer works. Balancer removes the need for liquidity providers to manually rebalance their portfolio by incentivizing rational market actors to do so instead. If a token goes up in price, the pool will become unbalanced. While normally a liquidity provider may need to pay fees in order to sell a token that has increased in value, Balancer incentivizes external users to pay a fee for the privilege of purchasing the token at a profit instead. The fees paid are then distributed to the liquidity providers.

Figure 1 presents the equation used to calculate the amount of tokens received based on the state of the Balancer pool and the amount of tokens sent. For the remainder of this post, let’s refer to the MUSD/USDC 50/50 pool. The swap fee is 0.05%.

// calcOutGivenIn                                                                            //
// aO = tokenAmountOut                                                                       //
// bO = tokenBalanceOut                                                                      //
// bI = tokenBalanceIn              /      /            bI             \    (wI / wO) \      //
// aI = tokenAmountIn    aO = bO * |  1 - | --------------------------  | ^            |     //
// wI = tokenWeightIn               \      \ ( bI + ( aI * ( 1 - sF )) /              /      //
// wO = tokenWeightOut                                                                       //
// sF = swapFee                                                                              //

Figure 1: Token output given input.

First, to get a sense of how this function behaves, we’ll see what happens when a rational market actor swaps a pool back into balance and when an irrational market actor swaps a pool out of balance.

Suppose the pool is currently out of balance and contains 1,100,000 USDC and 900,000 MUSD. If a rational market actor pays 90,000 MUSD, they’ll receive 99,954 USDC in exchange and make 9,954 USDC in profit. A very good deal!

Now suppose the pool is currently balanced and contains 1,000,000 USDC and 1,000,000 MUSD. What happens if an irrational market actor pays 100,000 USDC? Well, they would receive 90,867 MUSD for a loss of 9,133 MUSD. Not such a great deal.

Although the second trade results in an immediate loss and thus seems rather useless, pairing it with the first trade results in some interesting behavior.

Consider a user who first performs The Bad Trade: The user converts 100,000 USDC to 90,867 MUSD, losing 9,133 USD in the process. Then, the user performs The Good Trade and converts 90,867 MUSD to 99,908 USDC, earning 9,041 USD in the process. This results in a net loss of 92 USD. Not ideal, but certainly not as bad as the loss of 9,200 USD.

Now consider the valuation of BPT during this process. If you held 1% of the total BPT, at the start of the transaction your tokens would have been worth 1% of 2,000,000 USD, or 20,000 USD. At the end of the transaction, your tokens would have been worth 1% of 2,000,092 USD, or 20,000.96 USD. Yet for a magical moment, right in the middle of the transaction, your tokens were worth 1% of 2,009,133 USD, or 20,091.33 USD. This is the crux of the vulnerability at hand.

Knowing this, I applied the same process behavior to yVault. Before The Bad Trade, the vault holds some BPT worth some amount of USD. After The Good Trade, the vault holds the same amount of BPT worth a slightly larger amount of USD. However, between The Bad Trade and The Good Trade, the vault holds some BPT worth a significantly larger amount of USD.

Recall that the value of yUSDC is directly proportional to the value of the BPT it holds. If we bought yUSDC before The Bad Trade and sold yUSDC before The Good Trade, we would instantaneously make a profit. Repeat this enough times, and we would drain the vault.

How was it fixed?

It turns out that accurately calculating the true value of BPT and preventing attackers from extracting profit from slippage is a difficult problem to solve. Instead, the developer, Andre, deployed a new strategy that simply converts USDC to MUSD and supplies it to the mStable savings account was deployed and activated.

Future Recommendations

DeFi composability is hard, and it’s easy to accidentally expose your new protocol to unexpected risk. If you integrate multiple tokens, any one token could compromise the security of your entire platform. On the other hand, if you integrate multiple platforms, your protocol could suffer from complex interactions.

Security tooling can be used to help prevent most simple bugs in code:

  • Crytic uses an advanced version of Slither to automatically detect up to 90 types of vulnerabilities
  • Echidna asserts specific properties through fuzz testing
  • Manticore can symbolically analyze your code

Of course, tooling isn’t a panacea for security. In our study “What are the Actual Flaws in Important Smart Contracts (and How Can We Find Them)?” we discovered that almost 50% of findings were unlikely to be detected by tooling, even if the technology significantly improves. For complex codebases and DeFi projects, reach out to us to arrange a security assessment, or sign up for our Ethereum security office hours.

Contract verification made easier

Smart contract authors can now express security properties in the same language they use to write their code (Solidity) and our new tool, manticore-verifier, will automatically verify those invariants. Even better, Echidna and Manticore share the same format for specifying property tests.

In other words, smart contract authors can now write one property test and have it tested with fuzzing and verified by symbolic execution! Ultimately, manticore-verifier reduces the initial effort and cost involved in symbolic testing of arbitrary properties.

How it works

A smart contract’s behavior—and its potential bugs—are often unique and depend heavily on unspoken contract invariants. Let’s test a simple contract:

contract Ownership{
    address owner = msg.sender;
    function Owner() public{
        owner = msg.sender;
    modifier isOwner(){
        require(owner == msg.sender);
contract Pausable is Ownership{
    bool is_paused;
    modifier ifNotPaused(){
    function paused() isOwner public{
        is_paused = true;
    function resume() isOwner public{
        is_paused = false;
contract Token is Pausable{
    mapping(address => uint) public balances;
    function transfer(address to, uint value) ifNotPaused public{
        balances[msg.sender] -= value;
        balances[to] += value;

This contract maintains a balance sheet and allows for simple transactions. Users can send their tokens to other users, but the total amount of tokens must remain fixed—in other words, tokens can’t be created after the contract has started. So under this invariant, a valid property could state: “If there are only 10,000 tokens, no user could own more than that.”

We can express this property as a Solidity method: “crytic_test_balance.”

import "token.sol";
contract TestToken is Token {
    constructor() public{
        balances[msg.sender] = 10000;
    // the property
    function crytic_test_balance() view public returns(bool){
        return balances[msg.sender] <= 10000;

The emulated world

ManticoreEVM compiles and then creates the contract in a fully emulated symbolic blockchain.

Different normal accounts are also created there to replicate real-world situations. A deployer account is used to deploy the contract, others are used to explore the contract and try to break the properties, and, finally, a potentially different account is used to test the properties.

ManticoreEVM detects the property type methods present in high-level source code and checks them after every combination of symbolic transactions. A normal property is considered failed if the method returns false.

The loop (exploration)

The deployer account initially creates the target contract via a CREATE transaction. Then manticore-verifier simulates all possible interleaving transactions originating from the contract testers until (for example) no more coverage is found. After each symbolic transaction, the properties are checked in the name of the property-checker account, and if anything looks broken, a report of the reproducible exploit trace is generated. Normal properties like crytic_test_balance() are expected to return true; any other result is reported as a problem.

manticore-verifier dapp.sol –contract TestToken

It’s a command–line-based tool

Several aspects of the exploration, the stopping condition, and the user accounts employed can be modified by command line arguments. Try $manticore-verifier –help for a thorough list. Here’s an excerpt of it in action:

$manticore-verifier dapp.sol  --contract TestToken

# Owner account: 0x28e9eb58c2f5be87161a261f412a115eb85946d9
# Contract account: 0x9384027ebe35100de8ef216cb401573502017f7
# Sender_0 account: 0xad5e556d9699e9e35b3190d76f75c9bf9997533b
# PSender account: 0xad5e556d9699e9e35b3190d76f75c9bf9997533b
# Found 1 properties: crytic_test_balance
# Exploration will stop when some of the following happens:
# * 3 human transaction sent
# * Code coverage is greater than 100% measured on target contract
# * No more coverage was gained in the last transaction
# * At least 1 different properties where found to be breakable. (1 for fail fast)
# * 240 seconds pass
# Starting exploration...
Transaction 0. States: 1, RT Coverage: 0.0%, Failing properties: 0/1
Transaction 1. States: 2, RT Coverage: 60.66%, Failing properties: 0/1
Found 1/1 failing properties. Stopping exploration.
60.66% EVM code covered 
|    Property Named   |   Status   |
| crytic_test_balance | failed (0) |
Checkout testcases here:./mcore_kkgtybqb

Note that each failing property will have a test case number associated with it. More details can be found at the specified test case files: ./mcore_kkgtybqb/user_000000.tx

Bug Found!

In our example, manticore-verifier finds a way to break the specified property. When trying to transfer an incredibly large amount tokens, an internal integer representation exceeds its limits and makes it possible to boost the sender’s savings, i.e., create tokens out of thin air.

transfer(0,115792089237316195422001709574841237640532965826898585773776019699400460720238) -> STOP (*)

Conclusion: Interoperability = 101%

manticore-verifier lowers the initial cost to symbolically test arbitrary properties. It also allows our symbolic executor to work more tightly with Solidity, Echidna, and slither-prop.

The same methodology can be used with our Ethereum fuzzer, Echidna. As a result, you can write the properties once and test them with symbolic execution and fuzzing with no extra effort.

manticore-verifier can check automatically generated ERC20 properties. Moreover, slither-prop, our static analyzer, has detailed information about what an ERC20 contract should do, and can automatically produce properties for ERC20 that manticore-verifier can check, automatically.

So get your contract, add the property methods, and test with manticore-verifier at will. If you have any questions please join the Empire Hacking Slack.

Advocating for change

As a company, we believe Black lives matter. In the face of continued police brutality, racial disparities in law enforcement, and limited accountability, we demand an end to systemic racism, endorse restrictions on police use of force, and seek greater accountability for police actions. We believe police misconduct, militarization of police, and unchecked abuse of power are issues that we as Americans should protest.

Giving time, money, and attention

In this spirit, I have reaffirmed our employees’ right to protest without reprisal or retaliation. While there’s certainly no account of who has and hasn’t, I’m aware that many of our employees have recently marched to end systemic racism and police brutality.

To support understanding and discussion, we created a #solidarity channel on our company Slack. Conversations there grew rapidly as we shared research on social policy and upcoming legislation, including policies that have been analyzed by social scientists studying criminal justice:

  1. A large-scale analysis of racial disparities in police stops across the United States
  2. Collective Bargaining Rights and Police Misconduct: Evidence from Florida
  3. Evidence that curtailing proactive policing can reduce major crime
  4. Good Cop, Bad Cop: Using Civilian Allegations to Predict Police Misconduct
  5. The Wandering Officer

Many of our employees also decided to “protest with our wallets” and use our existing charitable donation matching program to support organizations we believe can effect change. In the last two weeks, employees have donated $12k and the company matched $12k ($24k total) to a number of related non-profits, including:

More we can do now: Calls to action

Advocacy is not new to us—Trail of Bits is among the largest employers of cybersecurity professionals in NYC, and has frequently advocated for policy change as part of Tech:NYC and the Coalition for Responsible Cybersecurity. As an NYC-based company, we urge the NYC Council to take action.

The June 18 legislative session of the NYC Council will be livestreamed, and we’ll be watching. We urge our representatives to:

  • Pass all five bills that were heard in the last meeting of the Public Safety Committee
  • Pass the POST Act and require reporting on NYPD use of surveillance technology
  • Commit to NYC Budget Justice and reallocate funds towards social programs

While policing is largely a state and local matter in the United States, federal action has a strong effect on state and local policies. We call on the US Congress to:

Local and state action may have the most direct impact on policing practices. If you want to lobby your representatives as an individual, use “Who are my representatives?” to find their contact information and give them a call. Personal, authentic contact with local representatives can be very effective at shaping policy decisions.

If you’re an individual motivated to support a charitable organization, consider reviewing the following resources first:

When donating, strongly consider a charitable donation matching program. If your employer does not offer one, suggest that they sign up for the Technology Partner program from RaisedBy.Us. Trail of Bits uses their service to facilitate donation matching through Brightfunds.

If you are planning to attend a protest, research what local activists in your area are recommending to protect yourself and others. There are widespread disparities in treatment of protesters across the United States: a “March for Families” in NYC may be completely unlike a similarly named event in Oregon. Consider advice from the Legal Aid Society of NYC or Vice (and their digital guide) and put on a mask before attending a protest.

We can always do more

We know our efforts are modest, and that the problems will not be fixed by a few waves of donations and legislation. Our own efforts to advocate for change started small, but they are growing.

We also recognize the diversity deficit in our own company. As part of our effort to close that gap, we are working with diversity and inclusion-focused recruiting groups and conducting implicit bias training. We’ve created the CTF Field Guide to help eliminate the knowledge gap for industry newcomers and we host yearly winternships that provide inroads for people new to computer security. We’re also increasing the matching for our existing charity matching program and making the most of our diversity-focused donation to the Summercon Foundation. Finally, to help ensure this is not a one-off effort, we are listening to our employees and community to hold us accountable.

The protests have been extraordinarily effective in moving legislation forward; so much so, it can be tough to keep up. We realize it’s only a long-overdue beginning, but the more we know about what’s gaining ground, the better we can advocate for it. To help, we’ve assembled a summary of the changes we’ve seen at the NYC, New York State, and federal levels.