Using benchmarks to speed up Echidna

By Ben Siraphob

During my time as a Trail of Bits associate last summer, I worked on optimizing the performance of Echidna, Trail of Bits’ open-source smart contract fuzzer, written in Haskell. Through extensive use of profilers and other tools, I was able to pinpoint and debug a massive space leak in one of Echidna’s dependencies, hevm. Now that this problem has been fixed, Echidna and hevm can both expect to use several gigabytes less memory on some test cases compared to before.

In this blog post, I’ll show how I used profiling to identify this deep performance issue in hevm and how we fixed it, improving Echidna’s performance.

Overview of Echidna

Suppose we are keeping track of a fixed supply pool. Users can transfer tokens among themselves or burn tokens as needed. A desirable property of this pool might be that supply never grows; it only stays the same or decreases as tokens are transferred or burned. How might we go about ensuring this property holds? We can try to write up some test scenarios or try to prove it by hand… or we can fuzz the code with Echidna!

How Echidna works

Echidna takes in smart contracts and assertions about their behavior that should always be true, both written in Solidity. Then, using information extracted from the contracts themselves, such as method names and constants, Echidna starts generating random transaction sequences and replaying them over the contracts. It keeps generating longer and new sequences from old ones, such as by splitting them up at random points or changing the parameters in the method calls.

How do we know that these generations of random sequences are covering enough of the code to eventually find a bug? Echidna uses coverage-guided fuzzing—that is, it keeps track of how much code is actually executed from the smart contract and prioritizes sequences that reach more code in order to create new ones. Once it finds a transaction sequence that violates our desired property, Echidna then proceeds to shrink it to try to minimize it. Echidna then dumps all the information into a file for further inspection.

Overview of profiling

The Glasgow Haskell Compiler (GHC) provides various tools and flags that programmers can use to understand performance at various levels of granularity. Here are two:

Compiling with profiling: This modifies the compilation process to add a profiling system that adds costs to cost centers. Costs are annotations around expressions that completely measure the computational behavior of those expressions. Usually, we are interested in top-level declarations, essentially functions and values that are exported from a module.
Collecting runtime statistics: Adding +RTS -s to a profiled Haskell program makes it show runtime statistics. It’s more coarse than profiling, showing only aggregate statistics about the program, such as total bytes allocated in the heap or bytes copied during garbage collection. After enabling profiling, one can also use the -hT option, which breaks down the heap usage by closure type.

Both of these options can produce human- and machine-readable output for further inspection. For instance, when we compile a program with profiling, we can output JSON that can be displayed in a flamegraph viewer like speedscope. This makes it easy to browse around the data and zoom in to relevant time slices. For runtime statistics, we can use eventlog2html to visualize the heap profile.

Looking at the flamegraph below and others like it led me to conclude that at least from an initial survey, Echidna wasn’t terribly bloated in terms of its memory usage. Indeed, various changes over time have targeted performance directly. (In fact, a Trail of Bits wintern from 2022 found performance issues with its coverage, which were then fixed.) However, notice the large blue regions? That’s hevm, which Echidna uses to evaluate the candidate sequences. Given that Echidna spends the vast majority of its fuzzing time on this task, it makes sense that hevm would take up a lot of computational power. That’s when I turned my attention to looking into performance issues with hevm.

The time use of functions and call stacks in Echidna

Profilers can sometimes be misleading

Profiling is useful, and it helped me find a bug in hevm whose fix led to improved performance in Echidna (which we get to in the next section), but you should also know that it can be misleading.

For example, while profiling hevm, I noticed something unusual. Various optics-related operators (getters and setters) were dominating CPU time and allocations. How could this be? The reason was that the optics library was not properly inlining some of its operators. As a result, if you run this code with profiling enabled, you would see that the % operator takes up the vast majority of allocations and time instead of the increment function, which is actually doing the computation. This isn’t observed when running an optimized binary though, since GHC must have decided to inline the operator anyway. I wrote up this issue in detail and it helped the optics library developers close an issue that was opened last year! This little aside made me realize that I should compile programs with and without profiling enabled going forward to ensure that profiling stays faithful to real-world usage.

Finding my first huge memory leak in hevm

Consider the following program. It repeatedly hashes a number, starting with 0, and writes the hashes somewhere in memory (up to address m). It does this n times.

contract A {
  mapping (uint256 => uint256) public map;
  function myFunction(uint256 n, uint256 m) public {
    uint256 h = 0;
    for (uint i = 0; i < n; i++) {
      uint256 x = h;
      h = uint256(keccak256(abi.encode(h)));
      map[x % m] = h;
    }
  }
}

What should we expect the program to do as we vary the value of n and m? If we hold m fixed and continue increasing the value of n, the memory block up to m should be completely filled. So we should expect that no more memory would be used. This is visualized below:

Holding m fixed and increasing n should eventually fill up m.

Surprisingly, this is not what I observed. The memory used by hevm went up linearly as a function of n and m. So, for some reason, hevm continued to allocate memory even though it should have been reusing it. In fact, this program used so much memory that it could use hundreds of gigabytes of RAM. I wrote up the issue here.

A graph showing allocations growing rapidly

I figured that if this memory issue affects hevm, it would surely affect Echidna as well.

Don't just measure once, measure N times!

Profiling gives you data about time and space for a single run, but that isn't enough to understand what happens as the program runs longer. For example, if you profiled Python’s insertionSort function on arrays with lengths of less than length 20, you might conclude that it's faster than quickSort when asymptotically we know that's not the case.

Similarly, I had some intuition about how "expensive" (from hevm's viewpoint) different Ethereum programs would be, but I didn’t know for sure until I measured the performance of smart contracts running on the EVM. Here's a brief overview of what smart contracts can do and how they interact with the EVM.

The EVM consists of a stack, memory, and storage. The stack is limited to 1024 items. The memory and storage are all initialized to 0 and are indexed by an unsigned 256-bit integer.
Memory is transient and its lifetime is limited to the scope of a transaction, whereas storage persists across transactions.
Contracts can allocate memory in either memory or storage. While writing to storage (persistent blockchain data) is significantly more expensive gas-wise than memory (transient memory per transaction), when we're running a local node we shouldn't expect any performance differences between the two storage types.

I wrote up eight simple smart contracts that would stress these various components. The underlying commonality between all of them is that they were parameterized with a number (n) and are expected to have a linear runtime with respect to that number. Any nonlinear runtime changes would thus indicate outliers. These are the contracts and what they do:

simple_loop: Looping and adding numbers
primes: Calculation and storage of prime numbers
hashes: Repeated hashing
hashmem: Repeated hashing and storage
balanceTransfer: Repeated transferring of 1 wei to an address
funcCall: Repeated function calls
contractCreation: Repeated contract creations
contractCreationMem: Repeated contract creations and memory

You can find their full source code in this file.

I profiled these contracts to collect information on how they perform with a wide range of n values. I increased n by powers of 2 so that the effects would be more noticeable early on. Here's what I saw:

I immediately noticed that something was definitely going on with the hashes and hashmem test cases. If the contracts’ runtimes increased linearly with increases to n, the hashes and hashmem lines wouldn't have crossed the others. How might we try to prove that? Since we know that each point should increase by roughly double (ignoring a constant term), we can simply plot the ratios of the runtimes from one point to the next and draw a line indicating what we should expect.

Bingo. hashes and hashmem were clearly off the baseline. I then directed my efforts toward profiling those specific examples and looking at any code that they depend on. After additional profiling, it seemed that repeatedly splicing and resplicing immutable bytearrays (to simulate how writes would work in a contract) caused the bytearray-related memory type to explode in size. In essence, hevm was not properly discarding the old versions of the memory.

ST to the rescue!

The fix was conceptually simple and, fortunately, had already been proposed months previously by my manager, Artur Cygan. First, we changed how hevm handles the state in EVM computations:

- type EVM a = State VM a
+ type EVM s a = StateT (VM s) (ST s) a

Then, we went through all the places where hevm deals with EVM memory and implemented a mutable vector that can be modified in place(!) How does this work? In Haskell, computations that manipulate a notion of state are encapsulated in a State monad, but there are no guarantees that only a single memory copy of that state will be there during program execution. Using the ST monad instead allowed us to ensure that the internal state used by the computation is inaccessible to the rest of the program. That way, hevm can get away with destructively updating the state while still treating the program as purely functional.

Here’s what the graphs look like after the PR. The slowdown in the last test case is now around 3 instead of 5.5, and in terms of actual runtime, the linearity is much more apparent. Nice!

Epilogue: Concrete or symbolic?

In the last few weeks of my associate program, I ran more detailed profilings with provenance information. Now we truly get x-ray vision into exactly where memory is being allocated in the program:

A detailed heap profile showing which data constructors use the most memory

What’s with all the Prop terms being generated? hevm has support for symbolic execution, which allows for various forms of static analysis. However, Echidna only ever uses the fully concrete execution. As a result, we never touch the constraints that hevm is generating. This is left for future work, which will hopefully lead to a solution in which hevm can support a more optimized concrete-only mode without compromising on its symbolic aspects.

Final thoughts

In a software project like Echidna, whose effectiveness is proportional to how quickly it can perform its fuzzing, we’re always looking for ways to make it faster without making the code needlessly complex. Doing performance engineering in a setting like Haskell reveals some interesting problems and definitely requires one to be ready to drop down and reason about the behavior of the compilation process and language semantics. It is an art as old as computer science itself.

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.

— Donald Knuth

The life and times of an Abstract Syntax Tree

By Francesco Bertolaccini

You’ve reached computer programming nirvana. Your journey has led you down many paths, including believing that God wrote the universe in LISP, but now the truth is clear in your mind: every problem can be solved by writing one more compiler.

It’s true. Even our soon-to-be artificially intelligent overlords are nothing but compilers, just as the legends foretold. That smart contract you’ve been writing for your revolutionary DeFi platform? It’s going through a compiler at some point.

Now that we’ve established that every program should contain at least one compiler if it doesn’t already, let’s talk about how one should go about writing one. As it turns out, this is a pretty vast topic, and it’s unlikely I’d be able to fit a thorough disquisition on the subject in the margin of this blog post. Instead, I’m going to concentrate on the topic of Abstract Syntax Trees (ASTs).

In the past, I’ve worked on a decompiler that turns LLVM bitcode into Clang ASTs, and that has made me into someone with opinions about them. These are opinions on the things they don’t teach you in school, like: what should the API for an AST look like? And how should it be laid out in memory? When designing a component from scratch, we must consider those aspects that go beyond its mere functionality—I guess you could call these aspects “pragmatics.” Let’s go over a few of them so that if you ever find yourself working with ASTs in the future, you may skip the more head-scratching bits and go straight to solving more cogent problems!

What are ASTs?

On their own, ASTs are not a very interesting part of a compiler. They are mostly there to translate the dreadful stream of characters we receive as input into a more palatable format for further compiler shenanigans. Yet the way ASTs are designed can make a difference when working on a compiler. Let’s investigate how.

Managing the unmanageable

If you’re working in a managed language like C# or Java, one with a garbage collector and a very OOP type system, your AST nodes are most likely going to look something like this:

class Expr {}
class IntConstant : Expr {
    int value;
}
class BinExpr : Expr {
    public Expr lhs;
    public Expr rhs;
}

This is fine—it serves the purpose well, and the model is clear: since all of the memory is managed by the runtime, ownership of the nodes is not really that important. At the end of the day, those nodes are not going anywhere until everyone is done with them and the GC determines that they are no longer reachable.

(As an aside, I’ll be making these kinds of examples throughout the post; they are not meant to be compilable, only to provide the general idea of what I’m talking about.)

I typically don’t use C# or Java when working on compilers, though. I’m a C++ troglodyte, meaning I like keeping my footguns cocked and loaded at all times: since there is no garbage collector around to clean up after the mess I leave behind, I need to think deeply about who owns each and every one of those nodes.

Let’s try and mimic what was happening in the managed case.

The naive approach

struct Expr {
    virtual ~Expr();
};
struct IntConstant : Expr {
    int value;
};
struct BinExpr : Expr {
    std::shared_ptr lhs;
    std::shared_ptr rhs;
};

Shared pointers in C++ use reference counting (which one could argue is a form of automatic garbage collection), which means that the end result is similar to what we had in Java and C#: each node is guaranteed to stay valid at least until the last object holding a reference to it is alive.

That at least in the previous sentence is key: if this was an Abstract Syntax Graph instead of an Abstract Syntax Tree, we’d quickly find ourselves in a situation where nodes would get stuck in a limbo of life detached from material reality, a series of nodes pointing at each other in a circle, forever waiting for someone else to die before they can finally find their eternal rest as well.

Again, this is a purely academic possibility since a tree is by definition acyclic, but it’s still something to keep in mind.

I don’t know Rust that well, but it is my understanding that a layout roughly equivalent to the one above would be written like this:

enum Expr {
    IntConstant(i32),
    BinExpr(Arc<Expr>, Arc<Expr>)
}

When using this representation, your compiler will typically hold a reference to a root node that causes the whole pyramid of nodes to keep standing. Once that reference is gone, the rest of the nodes follow suit.

Unfortunately, each pointer introduces additional computation and memory consumption due to its usage of an atomic reference counter. Technically, one could avoid the “atomic” part in the Rust example by using Rc instead of Arc, but there’s no equivalent of that in C++ and my example would not work as well. In my experience, it’s quite easy to do away with the ceremony of making each node hold a reference count altogether, and instead decide on a more disciplined approach to ownership.

The “reverse pyramid” approach

struct Expr {
    virtual ~Expr();
};
struct IntConstant : Expr {
    int value;
};
struct BinExpr : Expr {
    std::unique_ptr lhs;
    std::unique_ptr rhs;
};

Using unique pointers frees us from the responsibility of keeping track of when to free memory without adding the overhead of reference counting. While it’s not possible for multiple nodes to have an owning reference to the same node, it’s still possible to express cyclic data structures by dereferencing the unique pointer and storing a reference instead. This is (very) roughly equivalent to using std::weak_ptr with shared pointers.

Just like in the naive approach, destroying the root node of the AST will cause all of the other nodes to be destroyed with it. The difference is that in this case we are guaranteed that this will happen, because every child node is owned by their parent and no other owning reference is possible.

I believe this representation is roughly equivalent to this Rust snippet:

enum Expr {
    IntConstant(i32),
    BinExpr(Box<Expr>, Box<Expr>)
}

Excursus: improving the API

We are getting pretty close to what I’d call the ideal representation, but one thing I like to do is to make my data structures as immutable as possible.

BinExpr would probably look like this if I were to implement it in an actual codebase:

class BinExpr : Expr {
    std::unique_ptr lhs, rhs;
public:
    BinExpr(std::unique_ptr lhs, std::unique_ptr rhs)
        : lhs(std::move(lhs))
        , rhs(std::move(rhs)) {}   
    const Expr& get_lhs() const { return *lhs; }
    const Expr& get_rhs() const { return *rhs; }
};

This to me signals a few things:

Nodes are immutable.
Nodes can’t be null.
Nodes can’t be moved; their owner is fixed.

Removing the safeguards

The next step is to see how we can improve things by removing some of the safeguards that we’ve used so far, without completely shooting ourselves in the foot. I will not provide snippets on how to implement these approaches in Rust because last time I asked how to do that in my company’s Slack channel, the responses I received were something like “don’t” and “why would you do that?” and “someone please call security.” It should not have been a surprise, as an AST is basically a linked list with extra steps, and Rust hates linked lists.

Up until now, the general idea has been that nodes own other nodes. This makes it quite easy to handle the AST safely because the nodes are self-contained.

What if we decided to transfer the ownership of the nodes to some other entity? It is, after all, quite reasonable to have some sort of ASTContext object we can assume to handle the lifetime of our nodes, similar to what happens in Clang.

Let’s start by changing the appearance of our Expr nodes:

struct BinExpr : Expr {
    const Expr& lhs;
    const Expr& rhs;
};

Now we create a storage for all of our nodes:

vector<unique_ptr> node_storage;
auto &lhs = node_storage.emplace_back(make_unique(...));
auto &rhs = node_storage.emplace_back(make_unique(...));
auto &binexp = node_storage.emplace_back(make_unique(*lhs, *rhs));

Nice! node_storage is now the owner of all the nodes, and we can iterate over them without having to do a tree visit. In fact, go watch this talk about the design of the Carbon compiler, about 37 minutes in: if you keep your pattern of creating nodes predictable, you end up with a storage container that’s already sorted in, e.g., post-visit order!

Variants on a theme

Let’s now borrow a trick from Rust’s book: the Expr class I’ve been using up until this point is an old-school case of polymorphism via inheritance. While I do believe inheritance has its place and in many cases should be the preferred solution, I do think that ASTs are one of the places where discriminated unions are the way to go.

Rust calls discriminated unions enum, whereas C++17 calls them std::variant. While the substance is the same, the ergonomics are not: Rust has first class support for them in its syntax, whereas C++ makes its users do template metaprogramming tricks in order to use them, even though they do not necessarily realize it.

The one feature I’m most interested in for going with variant instead of inheritance is that it turns our AST objects into “value types,” allowing us to store Expr objects directly instead of having to go through an indirection via a reference or pointer. This will be important in a moment.

The other feature that this model unlocks is that we get the Visitor pattern implemented for free, and we can figure out exactly what kind of node a certain value is holding without having to invent our own dynamic type casting system. Looking at you, LLVM. And Clang. And MLIR.

Going off the rails

Let’s take a look back at an example I made earlier:

vector<unique_ptr> node_storage;
auto &lhs = node_storage.emplace_back(make_unique(...));
auto &rhs = node_storage.emplace_back(make_unique(...));
auto &binexp = node_storage.emplace_back(make_unique(*lhs, *rhs));

There’s one thing that bothers me about this: double indirection, and noncontiguous memory allocation. Think of what the memory layout for this storage mechanism looks like: the vector will have a contiguous chunk of memory allocated for storing pointers to all of the nodes, then each pointer will have an associated chunk of memory the size of a node which, as mentioned earlier, varies for each kind of node.

What this means is that our nodes, even if allocated sequentially, have the potential to end up scattered all over the place. They say early optimization is the root of all evil, but for the sake of exhausting all of the tricks I have up my sleeve, I’ll go ahead and show a way to avoid this.

Let’s start by doing what I said I’d do earlier, and use variant for our nodes:

struct IntConstant;
struct BinExpr;
using Expr = std::variant<IntConstant, BinExpr>;


struct IntConstant {
    int value;
};
struct BinExpr  {
    Expr &lhs;
    Expr &rhs;
};

Now that each and every node has the same size, we can finally store them contiguously in memory:

std::vector node_storage;
node_storage.reserve(max_num_nodes);
auto &lhs = node_storage.emplace_back(IntConstant{3});
auto &rhs = node_storage.emplace_back(IntConstant{4});
auto &binexp = node_storage.emplace_back(BinExpr{lhs, rhs});

You see that node_storage.reserve call? That’s not an optimization—that is an absolutely load-bearing part of this mechanism.

I want to make it absolutely clear that what’s happening here is the kind of thing C++ gets hate for. This is a proverbial gun that, should you choose to use it, will be strapped at your hip pointed at your foot, fully loaded and ready to blow your leg off if at any point you forget it’s there.

The reason we’re using reserve in this case is that we want to make sure that all of the memory we will potentially use for storing our nodes is allocated ahead of time, so that when we use emplace_back to place a node inside of it, we are guaranteed that that chunk of memory will not get reallocated and change address. (If that were to happen, any of our nodes that contain references to other nodes would end up pointing to garbage, and demons would start flying out of your nose.)

Using vector and reserve is of course not the only way to do this: using an std::array is also valid if the maximum number of nodes you are going to use is known at compile time.

Ah yes, max_num_nodes. How do you compute what that is going to be? There’s no single good answer to this question, but you can find decent heuristics for it. For example, let’s say you are parsing C: the smallest statement I can think of would probably look something like a;, or even more extremely, just a. We can deduce that, if we want to be extremely safe, we could allocate storage for a number of nodes equal to the amount of characters in the source code we’re parsing. Considering that most programs will not be anywhere close to this level of pathological behavior, it’s reasonable to expect that most of that memory will be wasted. Unfortunately, we can’t easily reclaim that wasted memory with a simple call to shrink_to_fit, as that can cause a reallocation.

The technique you can use in that case, or in the case where you absolutely cannot avoid allocating additional memory, is to actually do a deep clone of the AST, visiting each node and painstakingly creating a new counterpart for it in the new container.

One thing to keep in mind, when storing your AST nodes like this, is that the size of each node will now be equal to the size of the largest representable node. I don’t think that this matters that much, since you should try and keep all of your nodes as small as possible anyway, but it’s still worth thinking about.

Of course, it might be the case that you don’t actually need to extract the last drop of performance and memory efficiency out of your AST, and you may be willing to trade some of those in exchange for some ease of use. I can think of three ways of achieving this:

Use std::list.
Use std::deque.
Use indices instead of raw pointers.

Let’s go through each of these options one at a time.

Use std::list instead of std::vector

Don’t. ‘Nuff said.

Alright, fine. I’ll elaborate.

Linked lists were fine in the time when the “random access” part of RAM was not a lie yet and memory access patterns didn’t matter. Using a linked list for storing your nodes is just undoing all of the effort we’ve gone through to optimize our layout.

Use std::deque instead of std::vector

This method is already better! Since we’ll mostly just append nodes to the end of our node storage container, and since a double-ended queue guarantees that doing so is possible without invalidating the addresses of any existing contents, this looks like a very good compromise.

Unfortunately the memory layout won’t be completely contiguous anymore, but you may not care about that. If you are using Microsoft’s STL, though, you have even bigger issues ahead of you.

Use indices instead of raw pointers

The idea is that instead of storing the pointer of a child node, you store the index of that node inside of the vector. This adds a layer of indirection back into the picture, and you now also have to figure out what vector does this index refer to? Do you store a reference to the vector inside each node? That’s a bit of a waste. Do you store it globally? That’s a bit icky, if you ask me.

Parting thoughts

I’ve already written a lot and I’ve barely scratched the surface of the kind of decisions a designer will have to make when writing a compiler. I’ve talked about how you could store your AST in memory, but I’ve said nothing about what you want to store in your AST.

The overarching theme in this exhilarating overview is that there’s a lot about compilers that goes beyond parsing, and all of the abstract ideas needed to build a compiler need concretizing at some point, and the details on how you go about doing that matter. I also feel obligated to mention two maxims one should keep in mind when playing this sort of game: premature optimization is the root of all evil, and always profile your code—it’s likely that your codebase contains lower-hanging fruit you can pick before deciding to fine-tune your AST storage.

It’s interesting that most of the techniques I’ve shown in this article are not easily accessible with managed languages. Does this mean that all of this doesn’t really matter, or do compilers written in those languages (I’m thinking of, e.g., Roslyn) leave performance on the table? If so, what’s the significance of that performance?

Finally, I wanted this post to start a discussion about the internals of compilers and compiler-like tools: what do these often highly complex pieces of software hide beneath their surface? It’s easy to find material about the general ideas regarding compilation—tokenization, parsing, register allocation—but less so about the clever ideas people come up with when writing programs that need to deal with huge codebases in a fast and memory-efficient manner. If anyone has war stories to share, I want to hear them!

Curvance: Invariants unleashed

By Nat Chin

Welcome to our deep dive into the world of invariant development with Curvance.

We’ve been building invariants as part of regular code review assessments for more than 6 years now, but our work with Curvance marks our very first official invariant development project, in which developing and testing invariants is all we did.

Over the nine-week engagement, we wrote and tested 216 invariants, which helped us uncover 13 critical findings. We also found opportunities to significantly enhance our tools, including advanced trace printing and corpus preservation. This project was a journey of navigating learning curves and accomplishing technological feats, and this post will highlight our collaborative efforts and the essential role of teamwork in helping us meet the challenge. And yes, we’ll also touch on the brain-cell-testing moments we experienced throughout this project!

A collective “losing it” moment, capturing the challenges of this project

Creating a quality fuzzing suite

The success of a fuzzing suite is grounded in the quality of its invariants. Throughout this project, we focused on fine-tuning each invariant for accuracy and relevance. Fuzzing, in essence, is like having smart monkeys on keyboards testing invariants, whose effectiveness relies heavily on their precision. Our journey with Curvance over nine weeks involved turning in-depth discussions on codebase properties into precise English explanations and then coding them into executable tests, as shown in the screenshots below.

Examples of what our daily discussions looked like to clarify invariants

From the get-go, Chris from Curvance was often available to help clarify the code’s expected behavior and explain Curvance’s design choices. His insights always clarified complex functions and behavior, and he always helped with hands-on debugging and checking our invariants. This engagement was as productive as it was thanks to Chris’s consistent feedback and working alongside us the entire time. Thank you, Curvance!

The tools (and support teams) behind our success

Along with Curvance’s involvement, support from our internal teams behind Echidna, Medusa, and CloudExec helped our project succeed. Their swift responses to issues, especially during extensive rebases and complex debugging, were crucial. The Curvance engagement pushed these tools to their limits, and the solutions we had to come up with for the challenges we faced led to significant enhancements in these tools.

CloudExec proved invaluable for deploying long fuzzing jobs onto DigitalOcean. We integrated it with Echidna and Medusa for prolonged runs, enabling Curvance to easily set up its own future fuzzing runs. We pinpointed areas of improvement for CloudExec, such as its preservation of output data, which you can see on its GitHub issue tracker. We’ve already addressed many of these issues.

Echidna, our property-based fuzzer for Ethereum contracts, was pivotal in falsifying assertions. We first used Echidna in exploration mode to broadly cover the Curvance codebase, and then we moved into assertion mode, using anywhere from 10 million to 100 billion iterations. This intense use of Echidna throughout our nine-week engagement helped us uncover vital areas of improvement for the tool, making it easier for it to debug and retain the state of explored code areas.

Medusa, our geth-based experimental fuzzer, complemented Echidna in its coverage efforts for falsifying invariants on Curvance. Before we could use Medusa for this engagement, we needed to fix a known out-of-memory bug. The fix for this bug—along with fixes implemented in CloudExec to help it better preserve output data—critically improved the tool and helped maximize our coverage of the Curvance code. Immediately after it started running, it found a medium-severity bug in the code that Echidna had missed. (Echidna eventually found this bug after we changed the block time delay, likely due to the fuzzer’s non-determinism.)

Our first Medusa run of 48+ hours resulted in a medium-severity bug.

The long and winding road

While we had the best support from the teams behind our tools and from our client that we could have asked for, we still faced considerable challenges throughout this project—from the need to keep pace with Curvance’s continued development, to the challenges of debugging assertion failures. But by meeting these challenges, we learned important lessons about the nature of invariant development, and we were able to implement crucial upgrades to our tools to improve our process overall.

Racing to keep up with Curvance’s code changes

Changes to the Curvance codebase—like function removals, additions of function parameters, adjustments to arguments and error messages, and renaming of source contracts—often challenged our fuzzing suite by invalidating existing invariants or causing a series of assertion failures. Ultimately, these changes rendered our existing corpus items obsolete and unusable, and we had to rebase our fuzzing suite and revise both existing and new invariants constantly to ensure their continued relevance to the evolving system. This iterative process paralleled the client’s code development, presenting a mix of true positives (actual bugs in the client’s code) and false positives (failures due to incorrect or outdated invariants). Such outcomes emphasized that fuzzing isn’t a static, one-time setup; it demands ongoing maintenance and updates, akin to development of any active codebase.

Understanding the rationale behind each invariant change post-rebase is crucial. Hasty adjustments without fully grasping their implications could inadvertently mask bugs, undermining the effectiveness of the fuzzing suite. Keeping the suite alive and relevant is as vital as the development of the codebase itself. It’s not just about letting the fuzzer run; it’s about maintaining its alignment with the system it tests. Remember, the true power of a fuzzing suite lies in the accuracy of its invariants.

Critical tool upgrades and lessons learned

We had to make a significant rebase after the Lendtroller contract’s name was changed to MarketManager in commit a96dc9a. This change drastically impacted our work, as Echidna had just finished 43 days of running in the cloud using CloudExec. This nonstop execution had allowed Echidna to develop a detailed corpus capable of autonomously tackling complex liquidations. Unfortunately, the change rendered this corpus obsolete, and each corpus item caused Echidna worker threads to crash upon transaction replay. With our setup of 15 workers, it took only 14 more transactions that could not be replayed for all the Echidna workers to crash, halting Echidna entirely:

An Echidna crash resulting from not being able to replay corpus item

Our rebase due to Curvance’s code change led to a significant problem: our fuzzers could no longer access MarketManager functions needed to explore complex state, like posting collateral and borrowing debt. This issue prompted us to make crucial updates to Echidna, specifically to enhance its ability to validate and replay corpus sequences without crashing. We also made updates to Medusa to improve its tracking of corpus health and ability to fix start-up panics. Extended discussions about maintaining a dynamic corpus ensued, with our engineering director stepping in to manually adjust the corpus, offering some relief.

We shifted our strategy to adjust to the new codebase’s lack of coverage. We developed liquidation-specific invariants for the codebase version before the contract name change, while running the updated version in different modes to boost coverage. CloudExec’s new features, like named jobs, improved checkpointing of output directories, and checkpointing for failed jobs, were key in differentiating and managing these runs. Despite all these improvements, we let the old corpus go and chose to integrate setup functions into key contracts to speed up coverage. While effective in increasing coverage, this strategy introduced biases, especially in liquidation scenarios, by relying on static values. This limitation, marked in the codebase with /// @coverage:limitation tags, underscores the importance of broadening input ranges in our stateful tests to ensure comprehensive system exploration.

Trials and tribulations: Debugging

The Curvance invariant development report mainly highlights the results of our debugging without delving into the complex journey of investigation and root cause analysis behind these findings. This part of the process, involving detailed analysis once assertion failures were identified, required significant effort.

Our primary challenge was dissecting long call sequences, often ranging from 9 to 70 transactions, which required deep scrutiny to identify where errors and unexpected values crept in. Some sequences spanned up to 29 million blocks or included time delays exceeding 6 years, adding layers of complexity to our understanding of the system’s behavior. To tackle this, we had to manually insert logs for detailed state information, turning debugging into an exhaustive and manual endeavor:

Echidna’s debugging at the beginning of the engagement

Our ability to manually shrink transaction sequences hinged on our deep understanding of Curvance’s system. This detailed knowledge was critical for us to effectively identify which transactions were essential for uncovering vulnerabilities and which could be discarded. As we gained this deeper insight into the system throughout the project, our ability to streamline transaction sequences improved markedly.

Based on our work with combing through transaction sequences, we implemented a rich reproducer trace feature in Echidna, providing us with detailed traces of the system during execution and elaborate printouts of the system state at each step of the transaction failure sequence. Meanwhile, we also added shrinking limits of transaction sequences to Medusa to fix intermittent assertion failures, and we updated Medusa’s coverage report to increase its readability. The stark difference in Echidna’s trace printing after these updates can easily be seen in the figure below:

Echidna’s call sequences with rich traces at the end of the engagement

Finally, we created corresponding unit tests based on most assertion failures during our engagement. Initially, converting failures to unit tests was manual and time-consuming, but by the end, we streamlined the process to take just half an hour. We used the insights we gained from this experience to develop fuzz-utils, an automated tool for converting assertion failures into unit tests. Although it’s yet to be extensively tested, its potential for future engagements excites us!

One lock too many: The story behind TOB-CURV-4

After a significant change to the Curvance codebase, we encountered a puzzling assertion failure. Initially, we suspected it might be a false positive, a common occurrence with major code changes. However, after checking the changes in the Curvance source code, the root cause wasn’t immediately apparent, leading us into a complex and thorough debugging process.

We analyzed the full reproducer traces in Echidna (an Echidna feature that was added during this engagement, as mentioned in the previous section), and we tested assumptions on different senders. We crafted and executed a series of unit tests, each iteration shedding more light on the underlying mechanics. It was time to zoom out to identify the commonalities in the functions involved in the new assertion failures, leading us to focus on the processExpiredLock function. By closely scrutinizing this function, we discovered an important assertion was missing: ensuring the number of user locks stays the same after a call to the function with the “relock” option.

When we reran the fuzzer, it immediately revealed the error: such a call would process the expired lock but incorrectly grant the user a new lock without removing the old one, leading to an unexpected increase in the total number of locks. This caused all forms of issues in the combineAllLocks function: the contracts always thought the user had one more lock than they actually had. Eureka!

This trace shows the increase in the number of user locks after the expired lock is processed:

The trace for the increase in user locks, provided in the full invariant development report in finding TOB-CURV-4

What made this finding particularly striking was its ability to elude detection through the various security reviews and tests. The unit tests, as it turned out, were checking an incorrect postcondition, concealing the bug in its checks, masking its error within the testing suite. The stateless fuzzing tests on this codebase (built by Curvance before this engagement) actually started to fail after this bug was fixed. This highlighted the necessity of not only complex and meticulous testing that validates every aspect of the codebase, but also of continually questioning and validating every aspect of the target code—and its tests.

What’s next?

Reflecting on our journey with Curvance, we recognize the importance of a comprehensive security toolkit for smart contracts, including unit, integration, and fuzz tests, to uncover system complexities and potential issues. Our collaboration over the past nine weeks has allowed us to meticulously examine and understand the system, enhancing our findings’ accuracy and deepening our mutual knowledge. Working closely with Curvance has proven crucial in revealing the technology’s intricacies, leading to the development of a stateful fuzzing suite that will evolve and expand with the code, ensuring continued security and insights.

Take a look at our findings in the public Curvance report! Or dive into the Curvance fuzzing suite, now open through the Cantina Competition! Simply download and unzip corpus.zip into the curvance/ directory, then run make el for Echidna or make ml for Medusa. We’ve designed it for ease of use and expansion. Encounter any issues? Let us know! For detailed instructions and suite extension tips, check the Curvance-CantinaCompetition README and keep an eye out for the /// @custom:limitation tags in the suite.

And if you’re developing a project and want to explore stateful fuzzing, we’d love to chat with you!

Announcing two new LMS libraries

By Will Song

The Trail of Bits cryptography team is pleased to announce the open-sourcing of our pure Rust and Go implementations of Leighton-Micali Hash-Based Signatures (LMS), a well-studied NIST-standardized post-quantum digital signature algorithm. If you or your organization are looking to transition to post-quantum support for digital signatures, both of these implementations have been engineered and reviewed by several of our cryptographers, so please give them a try!

For the Rust codebase, we’ve worked with the RustCrypto team to integrate our implementation into the RustCrypto/signatures repository so that it can immediately be used with their ecosystem once the crate is published.

Our Go implementation was funded by Hewlett Packard Enterprise (HPE), as part of a larger post-quantum readiness effort within the Sigstore ecosystem. We’d like to thank HPE and Tim Pletcher in particular for supporting and collaborating on this high-impact work!

LMS: A stateful post-quantum signature scheme

LMS is a stateful hash-based signature scheme that was standardized in 2019 with RFC 8554 and subsequently adopted into the federal information processing standards in 2020. These algorithms are carefully designed to resist quantum computer attacks, which could threaten conventional algebraic signature schemes like RSA and ECDSA. Unlike other post-quantum signature designs, LMS was standardized before NIST’s large post-quantum cryptography standardization program was completed. LMS has been studied for years and its security bounds are well understood, so it was not surprising that these schemes were selected and standardized in a relatively short time frame (at least compared to the other standards).

Like other post-quantum signature schemes, LMS is a hash-based scheme, relying only on the security of a collision-resistant hash function such as SHA256. Hash-based signature schemes have much longer signatures than lattice-based signature schemes, which were recently standardized by NIST, but they are simpler to implement and require fewer novel cryptographic assumptions. This is the primary reason we chose to develop hash-based signatures first.

Unlike any signature algorithm in common usage today, LMS is a stateful scheme. The signer must track how many messages have been signed with a key, incrementing the counter with each new signature. If the private key is used more than once with the same counter value, an attacker can combine the two signatures to forge signatures on new messages. This is analogous to a nonce-reuse attack in encryption schemes like AES-GCM.

If it’s not immediately obvious, requiring this state also severely limits these schemes’ usability and security. For instance, this makes storing your private key (and its state) to some sort of persisted storage (which is usually typical for secret keys) incredibly risky, as this introduces the possibility of an old state being reused, especially for multi-threaded applications. This is why NIST makes the following warning in their standard:

Stateful hash-based signature schemes are secure against the development of quantum computers, but they are not suitable for general use because their security depends on careful state management. They are most appropriate for applications in which the use of the private key may be carefully controlled and where there is a need to transition to a post-quantum secure digital signature scheme before the post-quantum cryptography standardization process has completed.

The main benefit of a stateful algorithm like LMS over a stateless hash-based signature like SLH-DSA (SPHINCS+) is significantly shorter signature sizes: a signature with LMS is around 4KB, while a signature with SLH-DSA at a similar security level is closer to 40KB. The downside is that stateful schemes like LMS cannot easily be plugged into existing applications. Managing the private state in a signature scheme makes integration into higher-level applications complex and prone to subtle and dangerous security flaws. However, a carefully managed environment for code signing is an excellent place to test stateful post quantum signatures in the real world, and we feel that Sigstore effectively meets the NIST requirement.

RustCrypto implementation

Our Rust implementation is no-std capable and does not require heap allocations, in addition to being fully compatible with the currently available digest and signature crates. In particular, we implement the SignerMut and Verifier traits on private and public keys, respectively.

The goal of our work is to provide a more strongly typed alternative to the pre-existing implementation while also not over-allocating memory. While ArrayVec is a suitable alternative to the headaches caused by generics and GenericArray, at the cost of slightly higher memory requirements in certain cases of signatures, it does introduce an additional crate dependency that did not previously exist, which we wanted to avoid. Currently, in our implementation, both signatures and keys must know their LMS parameters before being able to deserialize and verify signatures. This should be sufficient for most use cases, but if unknown parameters must be used, it is not too difficult to hack together an enum that covers all potential algorithm types and uses the correct TryFrom implementation once the algorithm type is parsed.

Go implementation

Our Go implementation, on the other hand, is less picky. We were asked to build an LMS implementation for Sigstore, which is a more controlled environment and does not have the same restrictions that the general RustCrypto implementation assumes. Because of this, our implementation uses some small heap allocations to keep track of some variable length data, such as the number of hashes in a private key. Go is a less-clever language than Rust, which means we cannot really parameterize it over the various LMS modes, so some additional work needs to be done at a few call sites to re-specify the LMS parameters.

5 reasons to strive for better disclosure processes

By Max Ammann

This blog showcases five examples of real-world vulnerabilities that we’ve disclosed in the past year (but have not publicly disclosed before). We also share the frustrations we faced in disclosing them to illustrate the need for effective disclosure processes.

Here are the five bugs:

Discovering a vulnerability in an open-source project necessitates a careful approach, as publicly reporting it (also known as full disclosure) can alert attackers before a fix is ready. Coordinated vulnerability disclosure (CVD) uses a safer, structured reporting framework to minimize risks. Our five example cases demonstrate how the lack of a CVD process unnecessarily complicated reporting these bugs and ensuring their remediation in a timely manner.

In the Takeaways section, we show you how to set up your project for success by providing a basic security policy you can use and walking you through a streamlined disclosure process called GitHub private reporting. GitHub’s feature has several benefits:

Discreet and secure alerts to developers: no need for PGP-encrypted emails
Streamlined process: no playing hide-and-seek with company email addresses
Simple CVE issuance: no need to file a CVE form at MITRE

Time for action: If you own well-known projects on GitHub, use private reporting today! Read more on Configuring private vulnerability reporting for a repository, or skip to the Takeaways section of this post.

Case 1: Undefined behavior in borsh-rs Rust library

The first case, and reason for implementing a thorough security policy, concerned a bug in a cryptographic serialization library called borsh-rs that was not fixed for two years.

During an audit, I discovered unsafe Rust code that could cause undefined behavior if used with zero-sized types that don’t implement the Copy trait. Even though somebody else reported this bug previously, it was left unfixed because it was unclear to the developers how to avoid the undefined behavior in the code and keep the same properties (e.g., resistance against a DoS attack). During that time, the library’s users were not informed about the bug.

The whole process could have been streamlined using GitHub’s private reporting feature. If project developers cannot address a vulnerability when it is reported privately, they can still notify Dependabot users about it with a single click. Releasing an actual fix is optional when reporting vulnerabilities privately on GitHub.

I reached out to the borsh-rs developers about notifying users while there was no fix available. The developers decided that it was best to notify users because only certain uses of the library caused undefined behavior. We filed the notification RUSTSEC-2023-0033, which created a GitHub advisory. A few months later, the developers fixed the bug, and the major release 1.0.0 was published. I then updated the RustSec advisory to reflect that it was fixed.

The following code contained the bug that caused undefined behavior:

impl<T> BorshDeserialize for Vec<T>
where
    T: BorshDeserialize,
{
    #[inline]
    fn deserialize<R: Read>(reader: &mut R) -> Result<Self, Error> {
        let len = u32::deserialize(reader)?;
        if size_of::<T>() == 0 {
            let mut result = Vec::new();
            result.push(T::deserialize(reader)?);

            let p = result.as_mut_ptr();
            unsafe {
                forget(result);
                let len = len as usize;
                let result = Vec::from_raw_parts(p, len, len);
                Ok(result)
            }
        } else {
            // TODO(16): return capacity allocation when we can safely do that.
            let mut result = Vec::with_capacity(hint::cautious::<T>(len));
            for _ in 0..len {
                result.push(T::deserialize(reader)?);
            }
            Ok(result)
        }
    }
}

Figure 1: Use of unsafe Rust (borsh-rs/borsh-rs/borsh/src/de/mod.rs#123–150)

The code in figure 1 deserializes bytes to a vector of some generic data type T. If the type T is a zero-sized type, then unsafe Rust code is executed. The code first reads the requested length for the vector as u32. After that, the code allocates an empty Vec type. Then it pushes a single instance of T into it. Later, it temporarily leaks the memory of the just-allocated Vec by calling the forget function and reconstructs it by setting the length and capacity of Vec to the requested length. As a result, the unsafe Rust code assumes that T is copyable.

The unsafe Rust code protects against a DoS attack where the deserialized in-memory representation is significantly larger than the serialized on-disk representation. The attack works by setting the vector length to a large number and using zero-sized types. An instance of this bug is described in our blog post Billion times emptiness.

Case 2: DoS vector in Rust libraries for parsing the Ethereum ABI

In July, I disclosed multiple DoS vulnerabilities in four Ethereum API–parsing libraries, which were difficult to report because I had to reach out to multiple parties.

The bug affected four GitHub-hosted projects. Only the Python project eth_abi had GitHub private reporting enabled. For the other three projects (ethabi, alloy-rs, and ethereumjs-abi), I had to research who was maintaining them, which can be error-prone. For instance, I had to resort to the trick of getting email addresses from maintainers by appending the suffix .patch to GitHub commit URLs. The following link shows the non-work email address I used for committing:

https://github.com/trailofbits/publications/commit/a2ab5a1cab59b52c4fa
71b40dae1f597bc063bdf.patch

In summary, as the group of affected vendors grows, the burden on the reporter grows as well. Because you typically need to synchronize between vendors, the effort does not grow linearly but exponentially. Having more projects use the GitHub private reporting feature, a security policy with contact information, or simply an email in the README file would streamline communication and reduce effort.

Read more about the technical details of this bug in the blog post Billion times emptiness.

Case 3: Missing limit on authentication tag length in Expo

In late 2022, Joop van de Pol, a security engineer at Trail of Bits, discovered a cryptographic vulnerability in expo-secure-store. In this case, the vendor, Expo, failed to follow up with us about whether they acknowledged or had fixed the bug, which left us in the dark. Even worse, trying to follow up with the vendor consumed a lot of time that could have been spent finding more bugs in open-source software.

When we initially emailed Expo about the vulnerability through the email address listed on its GitHub, secure@expo.io, an Expo employee responded within one day and confirmed that they would forward the report to their technical team. However, after that response, we never heard back from Expo despite two gentle reminders over the course of a year.

Unfortunately, Expo did not allow private reporting through GitHub, so the email was the only contact address we had.

Now to the specifics of the bug: on Android above API level 23, SecureStore uses AES-GCM keys from the KeyStore to encrypt stored values. During encryption, the tag length and initialization vector (IV) are generated by the underlying Java crypto library as part of the Cipher class and are stored with the ciphertext:

/* package */ JSONObject createEncryptedItem(Promise promise, String plaintextValue, Cipher cipher, GCMParameterSpec gcmSpec, PostEncryptionCallback postEncryptionCallback) throws GeneralSecurityException, JSONException {

  byte[] plaintextBytes = plaintextValue.getBytes(StandardCharsets.UTF_8);
  byte[] ciphertextBytes = cipher.doFinal(plaintextBytes);
  String ciphertext = Base64.encodeToString(ciphertextBytes, Base64.NO_WRAP);

  String ivString = Base64.encodeToString(gcmSpec.getIV(), Base64.NO_WRAP);
  int authenticationTagLength = gcmSpec.getTLen();

  JSONObject result = new JSONObject()
    .put(CIPHERTEXT_PROPERTY, ciphertext)
    .put(IV_PROPERTY, ivString)
    .put(GCM_AUTHENTICATION_TAG_LENGTH_PROPERTY, authenticationTagLength);

  postEncryptionCallback.run(promise, result);

  return result;
}

Figure 2: Code for encrypting an item in the store, where the tag length is stored next to the cipher text (SecureStoreModule.java)

For decryption, the ciphertext, tag length, and IV are read and then decrypted using the AES-GCM key from the KeyStore.

An attacker with access to the storage can change an existing AES-GCM ciphertext to have a shorter authentication tag. Depending on the underlying Java cryptographic service provider implementation, the minimum tag length is 32 bits in the best case (this is the minimum allowed by the NIST specification), but it could be even lower (e.g., 8 bits or even 1 bit) in the worst case. So in the best case, the attacker has a small but non-negligible probability that the same tag will be accepted for a modified ciphertext, but in the worst case, this probability can be substantial. In either case, the success probability grows depending on the number of ciphertext blocks. Also, both repeated decryption failures and successes will eventually disclose the authentication key. For details on how this attack may be performed, see Authentication weaknesses in GCM from NIST.

From a cryptographic point of view, this is an issue. However, due to the required storage access, it may be difficult to exploit this issue in practice. Based on our findings, we recommended fixing the tag length to 128 bits instead of writing it to storage and reading it from there.

The story would have ended here since we didn’t receive any responses from Expo after the initial exchange. But in our second email reminder, we mentioned that we were going to publicly disclose this issue. One week later, the bug was silently fixed by limiting the minimum tag length to 96 bits. Practically, 96 bits offers sufficient security. However, there is also no reason not to go with the higher 128 bits.

The fix was created exactly one week after our last reminder. We suspect that our previous email reminder led to the fix, but we don’t know for sure. Unfortunately, we were never credited appropriately.

Case 4: DoS vector in the num-bigint Rust library

In July 2023, Sam Moelius, a security engineer at Trail of Bits, encountered a DoS vector in the well-known num-bigint Rust library. Even though the disclosure through email worked very well, users were never informed about this bug through, for example, a GitHub advisory or CVE.

The num-bigint project is hosted on GitHub, but GitHub private reporting is not set up, so there was no quick way for the library author or us to create an advisory. Sam reported this bug to the developer of num-bigint by sending an email. But finding the developer’s email is error-prone and takes time. Instead of sending the bug report directly, you must first confirm that you’ve reached the correct person via email and only then send out the bug details. With GitHub private reporting or a security policy in the repository, the channel to send vulnerabilities through would be clear.

But now let’s discuss the vulnerability itself. The library implements very large integers that no longer fit into primitive data types like i128. On top of that, the library can also serialize and deserialize those data types. The vulnerability Sam discovered was hidden in that serialization feature. Specifically, the library can crash due to large memory consumption or if the requested memory allocation is too large and fails.

The num-bigint types implement traits from Serde. This means that any type in the crate can be serialized and deserialized using an arbitrary file format like JSON or the binary format used by the bincode crate. The following example program shows how to use this deserialization feature:

use num_bigint::BigUint;
use std::io::Read;

fn main() -> std::io::Result<()> {
    let mut buf = Vec::new();
    let _ = std::io::stdin().read_to_end(&mut buf)?;
    let _: BigUint = bincode::deserialize(&buf).unwrap_or_default();
    Ok(())
}

Figure 3: Example deserialization format

It turns out that certain inputs cause the above program to crash. This is because implementing the Visitor trait uses untrusted user input to allocate a specific vector capacity. The following figure shows the lines that can cause the program to crash with the message memory allocation of 2893606913523067072 bytes failed.

impl<'de> Visitor<'de> for U32Visitor {
    type Value = BigUint;

    {...omitted for brevity...}

    #[cfg(not(u64_digit))]
    fn visit_seq<S>(self, mut seq: S) -> Result<Self::Value, S::Error>
    where
        S: SeqAccess<'de>,
    {
        let len = seq.size_hint().unwrap_or(0);
        let mut data = Vec::with_capacity(len);

        {...omitted for brevity...}
    }

    #[cfg(u64_digit)]
    fn visit_seq<S>(self, mut seq: S) -> Result<Self::Value, S::Error>
    where
        S: SeqAccess<'de>,
    {
        use crate::big_digit::BigDigit;
        use num_integer::Integer;

        let u32_len = seq.size_hint().unwrap_or(0);
        let len = Integer::div_ceil(&u32_len, &2);
        let mut data = Vec::with_capacity(len);

        {...omitted for brevity...}
    }
}

Figure 4: Code that allocates memory based on user input (num-bigint/src/biguint/serde.rs#61–108)

We initially contacted the author on July 20, 2023, and the bug was fixed in commit 44c87c1 on August 22, 2023. The fixed version was released the next day as 0.4.4.

Case 5: Insertion of MMKV database encryption key into Android system log with react-native-mmkv

The last case concerns the disclosure of a plaintext encryption key in the react-native-mmkv library, which was fixed in September 2023. During a secure code review for a client, I discovered a commit that fixed an untracked vulnerability in a critical dependency. Because there was no security advisory or CVE ID, neither I nor the client were informed about the vulnerability. The lack of vulnerability management caused a situation where attackers knew about a vulnerability, but users were left in the dark.

During the client engagement, I wanted to validate how the encryption key was used and handled. The commit fix: Don’t leak encryption key in logs in the react-native-mmkv library caught my attention. The following code shows the problematic log statement:

MmkvHostObject::MmkvHostObject(const std::string& instanceId, std::string path,
                               std::string cryptKey) {
  __android_log_print(ANDROID_LOG_INFO, "RNMMKV",
                      "Creating MMKV instance \"%s\"... (Path: %s, Encryption-Key: %s)",
                      instanceId.c_str(), path.c_str(), cryptKey.c_str());
  std::string* pathPtr = path.size() > 0 ? &path : nullptr;
  {...omitted for brevity...}

Figure 5: Code that initializes MMKV and also logs the encryption key

Before that fix, the encryption key I was investigating was printed in plaintext to the Android system log. This breaks the threat model because this encryption key should not be extractable from the device, even with Android debugging features enabled.

With the client’s agreement, I notified the author of react-native-mmkv, and the author and I concluded that the library users should be informed about the vulnerability. So the author enabled private reporting and together we published a GitHub advisory. The ID CVE-2024-21668 was assigned to the bug. The advisory now alerts developers if they use a vulnerable version of react-native-mmkv when running npm audit or npm install.

This case highlights that there is basically no way around GitHub advisories when it comes to npm packages. The only way to feed the output of the npm audit command is to create a GitHub advisory. Using private reporting streamlines that process.

Takeaways

GitHub’s private reporting feature contributes to securing the software ecosystem. If used correctly, the feature saves time for vulnerability reporters and software maintainers. The biggest impact of private reporting is that it is linked to the GitHub advisory database—a link that is missing, for example, when using confidential issues in GitLab. With GitHub’s private reporting feature, there is now a process for security researchers to publish to that database (with the approval of the repository maintainers).

The disclosure process also becomes clearer with a private report on GitHub. When using email, it is unclear whether you should encrypt the email and who you should send it to. If you’ve ever encrypted an email, you know that there are endless pitfalls.

However, you may still want to send an email notification to developers or a security contact, as maintainers might miss GitHub notifications. A basic email with a link to the created advisory is usually enough to raise awareness.

Step 1: Add a security policy

Publishing a security policy is the first step towards owning a vulnerability reporting process. To avoid confusion, a good policy clearly defines what to do if you find a vulnerability.

GitHub has two ways to publish a security policy. Either you can create a SECURITY.md file in the repository root, or you can create a user- or organization-wide policy by creating a .github repository and putting a SECURITY.md file in its root.

We recommend starting with a policy generated using the Policymaker by disclose.io (see this example), but replace the Official Channels section with the following:

We have multiple channels for receiving reports:

* If you discover any security-related issues with a specific GitHub project, click the *Report a vulnerability* button on the *Security* tab in the relevant GitHub project: https://github.com/%5BYOUR_ORG%5D/%5BYOUR_PROJECT%5D.
* Send an email to security@example.com

Always make sure to include at least two points of contact. If one fails, the reporter still has another option before falling back to messaging developers directly.

Step 2: Enable private reporting

Now that the security policy is set up, check out the referenced GitHub private reporting feature, a tool that allows discreet communication of vulnerabilities to maintainers so they can fix the issue before it’s publicly disclosed. It also notifies the broader community, such as npm, Crates.io, or Go users, about potential security issues in their dependencies.

Enabling and using the feature is easy and requires almost no maintenance. The only key is to make sure that you set up GitHub notifications correctly. Reports get sent via email only if you configure email notifications. The reason it’s not enabled by default is that this feature requires active monitoring of your GitHub notifications, or else reports may not get the attention they require.

After configuring the notifications, go to the “Security” tab of your repository and click “Enable vulnerability reporting”:

Emails about reported vulnerabilities have the subject line “(org/repo) Summary (GHSA-0000-0000-0000).” If you use the website notifications, you will get one like this:

If you want to enable private reporting for your whole organization, then check out this documentation.

A benefit of using private reporting is that vulnerabilities are published in the GitHub advisory database (see the GitHub documentation for more information). If dependent repositories have Dependabot enabled, then dependencies to your project are updated automatically.

On top of that, GitHub can also automatically issue a CVE ID that can be used to reference the bug outside of GitHub.

This private reporting feature is still officially in beta on GitHub. We encountered minor issues like the lack of message templates and the inability of reporters to add collaborators. We reported the latter as a bug to GitHub, but they claimed that this was by design.

Step 3: Get notifications via webhooks

If you want notifications in a messaging platform of your choice, such as Slack, you can create a repository- or organization-wide webhook on GitHub. Just enable the following event type:

After creating the webhook, repository_advisory events will be sent to the set webhook URL. The event includes the summary and description of the reported vulnerability.

How to make security researchers happy

If you want to increase your chances of getting high-quality vulnerability reports from security researchers and are already using GitHub, then set up a security policy and enable private reporting. Simplifying the process of reporting security bugs is important for the security of your software. It also helps avoid researchers becoming annoyed and deciding not to report a bug or, even worse, deciding to turn the vulnerability into an exploit or release it as a 0-day.

If you use GitHub, this is your call to action to prioritize security, protect the public software ecosystem’s security, and foster a safer development environment for everyone by setting up a basic security policy and enabling private reporting.

If you’re not a GitHub user, similar features also exist on other issue-tracking systems, such as confidential issues in GitLab. However, not all systems have this option; for instance, Gitea is missing such a feature. The reason we focused on GitHub in this post is because the platform is in a unique position due to its advisory database, which feeds into, for example, the npm package repository. But regardless of which platform you use, make sure that you have a visible security policy and reliable channels set up.

Introducing Ruzzy, a coverage-guided Ruby fuzzer

By Matt Schwager

Trail of Bits is excited to introduce Ruzzy, a coverage-guided fuzzer for pure Ruby code and Ruby C extensions. Fuzzing helps find bugs in software that processes untrusted input. In pure Ruby, these bugs may result in unexpected exceptions that could lead to denial of service, and in Ruby C extensions, they may result in memory corruption. Notably, the Ruby community has been missing a tool it can use to fuzz code for such bugs. We decided to fill that gap by building Ruzzy.

Ruzzy is heavily inspired by Google’s Atheris, a Python fuzzer. Like Atheris, Ruzzy uses libFuzzer for its coverage instrumentation and fuzzing engine. Ruzzy also supports AddressSanitizer and UndefinedBehaviorSanitizer when fuzzing C extensions.

This post will go over our motivation behind building Ruzzy, provide a brief overview of installing and running the tool, and discuss some of its interesting implementation details. Ruby revelers rejoice, Ruzzy* is here to reveal a new era of resilient Ruby repositories.

* If you’re curious, Ruzzy is simply a portmanteau of Ruby and fuzz, or fuzzer.

Bringing fuzz testing to Ruby

The Trail of Bits Testing Handbook provides the following definition of fuzzing:

Fuzzing represents a dynamic testing method that inputs malformed or unpredictable data to a system to detect security issues, bugs, or system failures. We consider it an essential tool to include in your testing suite.

Fuzzing is an important testing methodology when developing high-assurance software, even in Ruby. Consider AFL’s extensive trophy case, rust-fuzz’s trophy case, and OSS-Fuzz’s claim that it’s helped find and fix over 10,000 security vulnerabilities and 36,000 bugs with fuzzing. As mentioned previously, Python has Atheris. Java has Jazzer. The Ruby community deserves a high-quality, modern fuzzing tool too.

This isn’t to say that Ruby fuzzers haven’t been built before. They have: kisaten, afl-ruby, FuzzBert, and perhaps some we’ve missed. However, all these tools appear to be either unmaintained, difficult to use, lacking features, or all of the above. To address these challenges, Ruzzy is built on three principles:

Fuzz pure Ruby code and Ruby C extensions
Make fuzzing easy by providing a RubyGems installation process and simple interface
Integrate with the extensive libFuzzer ecosystem

With that, let’s give this thing a test drive.

Installing and running Ruzzy

The Ruzzy repository is well documented, so this post will provide an abridged version of installing and running the tool. The goal here is to provide a quick overview of what using Ruzzy looks like. For more information, check out the repository.

First things first, Ruzzy requires a Linux environment and a recent version of Clang (we’ve tested back to version 14.0.0). Releases of Clang can be found on its GitHub releases page. If you’re on a Mac or Windows computer, then you can use Docker Desktop on Mac or Windows as your Linux environment. You can then use Ruzzy’s Docker development environment to run the tool. With that out of the way, let’s get started.

Run the following command to install Ruzzy from RubyGems:

MAKE="make --environment-overrides V=1" \
CC="/path/to/clang" \
CXX="/path/to/clang++" \
LDSHARED="/path/to/clang -shared" \
LDSHAREDXX="/path/to/clang++ -shared" \
    gem install ruzzy

These environment variables ensure the tool is compiled and installed correctly. They will be explored in greater detail later in this post. Make sure to update the /path/to portions to point to your clang installation.

Fuzzing Ruby C extensions

To facilitate testing the tool, Ruzzy includes a “dummy” C extension with a heap-use-after-free bug. This section will demonstrate using Ruzzy to fuzz this vulnerable C extension.

First, we need to configure Ruzzy’s required sanitizer options:

export ASAN_OPTIONS="allocator_may_return_null=1:detect_leaks=0:use_sigaltstack=0"

(See the Ruzzy README for why these options are necessary in this context.)

Next, start fuzzing:

LD_PRELOAD=$(ruby -e 'require "ruzzy"; print Ruzzy::ASAN_PATH') \
    ruby -e 'require "ruzzy"; Ruzzy.dummy'

LD_PRELOAD is required for the same reason that Atheris requires it. That is, it uses a special shared object that provides access to libFuzzer’s sanitizers. Now that Ruzzy is fuzzing, it should quickly produce a crash like the following:

INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 2527961537
...
==45==ERROR: AddressSanitizer: heap-use-after-free on address 0x50c0009bab80 at pc 0xffff99ea1b44 bp 0xffffce8a67d0 sp 0xffffce8a67c8
...
SUMMARY: AddressSanitizer: heap-use-after-free /var/lib/gems/3.1.0/gems/ruzzy-0.7.0/ext/dummy/dummy.c:18:24 in _c_dummy_test_one_input
...
==45==ABORTING
MS: 4 EraseBytes-CopyPart-CopyPart-ChangeBit-; base unit: 410e5346bca8ee150ffd507311dd85789f2e171e
0x48,0x49,
HI
artifact_prefix='./'; Test unit written to ./crash-253420c1158bc6382093d409ce2e9cff5806e980
Base64: SEk=

Fuzzing pure Ruby code

Fuzzing pure Ruby code requires two Ruby scripts: a tracer script and a fuzzing harness. The tracer script is required due to an implementation detail of the Ruby interpreter. Every tracer script will look nearly identical. The only difference will be the name of the Ruby script you’re tracing.

First, the tracer script. Let’s call it test_tracer.rb:

require 'ruzzy'

Ruzzy.trace('test_harness.rb')

Next, the fuzzing harness. A fuzzing harness wraps a fuzzing target and passes it to the fuzzing engine. In this case, we have a simple fuzzing target that crashes when it receives the input “FUZZ.” It’s a contrived example, but it demonstrates Ruzzy’s ability to find inputs that maximize code coverage and produce crashes. Let’s call this harness test_harness.rb:

require 'ruzzy'

def fuzzing_target(input)
  if input.length == 4
    if input[0] == 'F'
      if input[1] == 'U'
        if input[2] == 'Z'
          if input[3] == 'Z'
            raise
          end
        end
      end
    end
  end
end

test_one_input = lambda do |data|
  fuzzing_target(data) # Your fuzzing target would go here
  return 0
end

Ruzzy.fuzz(test_one_input)

You can start the fuzzing process with the following command:

LD_PRELOAD=$(ruby -e 'require "ruzzy"; print Ruzzy::ASAN_PATH') \
    ruby test_tracer.rb

This should quickly produce a crash like the following:

INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 2311041000
...
/app/ruzzy/bin/test_harness.rb:12:in `block in ': unhandled exception
    from /var/lib/gems/3.1.0/gems/ruzzy-0.7.0/lib/ruzzy.rb:15:in `c_fuzz'
    from /var/lib/gems/3.1.0/gems/ruzzy-0.7.0/lib/ruzzy.rb:15:in `fuzz'
    from /app/ruzzy/bin/test_harness.rb:35:in `'
    from bin/test_tracer.rb:7:in `require_relative'
    from bin/test_tracer.rb:7:in `'
...
SUMMARY: libFuzzer: fuzz target exited
MS: 1 CopyPart-; base unit: 24b4b428cf94c21616893d6f94b30398a49d27cc
0x46,0x55,0x5a,0x5a,
FUZZ
artifact_prefix='./'; Test unit written to ./crash-aea2e3923af219a8956f626558ef32f30a914ebc
Base64: RlVaWg==

Ruzzy used libFuzzer’s coverage-guided instrumentation to discover the input (“FUZZ”) that produces a crash. This is one of Ruzzy’s key contributions: coverage-guided support for pure Ruby code. We will discuss coverage support and more in the next section.

Interesting implementation details

You don’t need to understand this section to use Ruzzy, but fuzzing can often be more art than science, so we wanted to share some details to help demystify this dark art. We certainly learned a lot from the blog posts describing Atheris and Jazzer, so we figured we’d pay it forward. Of course, there are many interesting details that go into creating a tool like this but we’ll focus on three: creating a Ruby fuzzing harness, compiling Ruby C extensions with libFuzzer, and adding coverage support for pure Ruby code.

Creating a Ruby fuzzing harness

One of the first things you need when embarking on a fuzzing campaign is a fuzzing harness. The Trail of Bits Testing Handbook defines a fuzzing harness as follows:

A harness handles the test setup for a given target. The harness wraps the software and initializes it such that it is ready for executing test cases. A harness integrates a target into a testing environment.

When fuzzing Ruby code, naturally we want to write our fuzzing harness in Ruby, too. This speaks to goal number 2 from the beginning of this post: make fuzzing Ruby simple and easy. However, a problem arises when we consider that libFuzzer is written in C/C++. When using libFuzzer as a library, we need to pass a C function pointer to LLVMFuzzerRunDriver to initiate the fuzzing process. How can we pass arbitrary Ruby code to a C/C++ library?

Using a foreign function interface (FFI) like Ruby-FFI is one possibility. However, FFIs are generally used to go the other direction: calling C/C++ code from Ruby. Ruby C extensions seem like another possibility, but we still need to figure out a way to pass arbitrary Ruby code to a C extension. After much digging around in the Ruby C extension API, we discovered the rb_proc_call function. This function allowed us to use Ruby C extensions to bridge the gap between Ruby code and the libFuzzer C/C++ implementation.

In Ruby, a Proc is “an encapsulation of a block of code, which can be stored in a local variable, passed to a method or another Proc, and can be called. Proc is an essential concept in Ruby and a core of its functional programming features.” Perfect, this is exactly what we needed. In Ruby, all lambda functions are also Procs, so we can write fuzzing harnesses like the following:

require 'json'
require 'ruzzy'

json_target = lambda do |data|
  JSON.parse(data)
  return 0
end

Ruzzy.fuzz(json_target)

In this example, the json_target lambda function is passed to Ruzzy.fuzz. Behind the scenes Ruzzy uses two language features to bridge the gap between Ruby code and a C interface: Ruby Procs and C function pointers. First, Ruzzy calls LLVMFuzzerRunDriver with a function pointer. Then, every time that function pointer is invoked, it calls rb_proc_call to execute the Ruby target. This allows the C/C++ fuzzing engine to repeatedly call the Ruby target with fuzzed data. Considering the example above, since all lambda functions are Procs, this accomplishes the goal of calling arbitrary Ruby code from a C/C++ library.

As with all good, high-level overviews, this is an oversimplification of how Ruzzy works. You can see the exact implementation in cruzzy.c.

Compiling Ruby C extensions with libFuzzer

Before we proceed, it’s important to understand that there are two Ruby C extensions we are considering: the Ruzzy C extension that hooks into the libFuzzer fuzzing engine and the Ruby C extensions that become our fuzzing targets. The previous section discussed the Ruzzy C extension implementation. This section discusses Ruby C extension targets. These are third-party libraries that use Ruby C extensions that we’d like to fuzz.

To fuzz a Ruby C extension, we need a way to compile the extension with libFuzzer and its associated sanitizers. Compiling C/C++ code for fuzzing requires special compile-time flags, so we need a way to inject these flags into the C extension compilation process. Dynamically adding these flags is important because we’d like to install and fuzz Ruby gems without having to modify the underlying code.

The mkmf, or MakeMakefile, module is the primary interface for compiling Ruby C extensions. The gem install process calls a gem-specific Ruby script, typically named extconf.rb, which calls the mkmf module. The process looks roughly like this:

gem install -> extconf.rb -> mkmf -> Makefile -> gcc/clang/CC -> extension.so

Unfortunately, by default mkmf does not respect common C/C++ compilation environment variables like CC, CXX, and CFLAGS. However, we can force this behavior by setting the following environment variable: MAKE="make --environment-overrides". This tells make that environment variables override Makefile variables. With that, we can use the following command to install Ruby gems containing C extensions with the appropriate fuzzing flags:

MAKE="make --environment-overrides V=1" \
CC="/path/to/clang" \
CXX="/path/to/clang++" \
LDSHARED="/path/to/clang -shared" \
LDSHAREDXX="/path/to/clang++ -shared" \
CFLAGS="-fsanitize=address,fuzzer-no-link -fno-omit-frame-pointer -fno-common -fPIC -g" \
CXXFLAGS="-fsanitize=address,fuzzer-no-link -fno-omit-frame-pointer -fno-common -fPIC -g" \
    gem install msgpack

The gem we’re installing is msgpack, an example of a gem containing a C extension component. Since it deserializes binary data, it makes a great fuzzing target. From here, if we wanted to fuzz msgpack, we would create an msgpack fuzzing harness and initiate the fuzzing process.

If you’d like to find more fuzzing targets, searching GitHub for extconf.rb files is one of the best ways we’ve found to identify good C extension candidates.

Adding coverage support for pure Ruby code

Instead of Ruby C extensions, what if we want to fuzz pure Ruby code? That is, Ruby projects that do not contain a C extension component. If modifying install-time functionality via lengthy, not-officially-supported environment variables is a hacky solution, then what follows is not for the faint of heart. But, hey, a working solution with a little artistic freedom is better than no solution at all.

First, we need to cover the motivation for coverage support. Fuzzers derive some of their “smarts” from analyzing coverage information. This is a lot like code coverage information provided by unit and integration tests. While fuzzing, most fuzzers prioritize inputs that unlock new code branches. This increases the likelihood that they will find crashes and bugs. When fuzzing Ruby C extensions, Ruzzy can punt coverage instrumentation for C code to Clang. With pure Ruby code, we have no such luxury.

While implementing Ruzzy, we discovered one supremely useful piece of functionality: the Ruby Coverage module. The problem is that it cannot easily be called in real time by C extensions. If you recall, Ruzzy uses its own C extension to pass fuzz harness code to LLVMFuzzerRunDriver. To implement our pure Ruby coverage “smarts,” we need to pass in Ruby coverage information to libFuzzer in real time as the fuzzing engine executes. The Coverage module is great if you have a known start and stop point of execution, but not if you need to continuously gather coverage information and pass it to libFuzzer. However, we know the Coverage module must be implemented somehow, so we dug into the Ruby interpreter’s C implementation to learn more.

Enter Ruby event hooking. The TracePoint module is the official Ruby API for listening for certain types of events like calling a function, returning from a routine, executing a line of code, and many more. When these events fire, you can execute a callback function to handle the event however you’d like. So, this sounds great, and exactly like what we need. When we’re trying to track coverage information, what we’d really like to do is listen for branching events. This is what the Coverage module is doing, so we know it must exist under the hood somewhere.

Fortunately, the public Ruby C API provides access to this event hooking functionality via the rb_add_event_hook2 function. This function takes a list of events to hook and a callback function to execute whenever one of those events fires. By digging around in the source code a bit, we find that the list of possible events looks very similar to the list in the TracePoint module:

 37    #define RUBY_EVENT_NONE      0x0000 /**< No events. */
 38    #define RUBY_EVENT_LINE      0x0001 /**< Encountered a new line. */
 39    #define RUBY_EVENT_CLASS     0x0002 /**< Encountered a new class. */
 40    #define RUBY_EVENT_END       0x0004 /**< Encountered an end of a class clause. */
       ...

Ruby event hook types

If you keep digging, you’ll notice a distinct lack of one type of event: coverage events. But why? The Coverage module appears to be handling these events. If you continue digging, you’ll find that there are in fact coverage events, and that is how the Coverage module works, but you don’t have access to them. They’re defined as part of a private, internal-only portion of the Ruby C API:

 2182    /* #define RUBY_EVENT_RESERVED_FOR_INTERNAL_USE 0x030000 */ /* from vm_core.h */
 2183    #define RUBY_EVENT_COVERAGE_LINE                0x010000
 2184    #define RUBY_EVENT_COVERAGE_BRANCH              0x020000

Private coverage event hook types

That’s the bad news. The good news is that we can define the RUBY_EVENT_COVERAGE_BRANCH event hook ourselves and set it to the correct, constant value in our code, and rb_add_event_hook2 will still respect it. So we can use Ruby’s built-in coverage tracking after all! We can feed this data into libFuzzer in real time and it will fuzz accordingly. Discussing how to feed this data into libFuzzer is beyond the scope of this post, but if you’d like to learn more, we use SanitizerCoverage’s inline 8-bit counters, PC-Table, and data flow tracing.

There’s just one more thing.

During our testing, even though we added the correct event hook, we still weren’t successfully hooking coverage events. The Coverage module must be doing something we’re not seeing. If we call Coverage.start(branches: true), per the Coverage documentation, then things work as expected. The details here involve a lot of sleuthing in the Ruby interpreter source code, so we’ll cut to the chase. As best we can tell, it appears that calling Coverage.start, which effectively calls Coverage.setup, initializes some global state in the Ruby interpreter that allows for hooking coverage events. This initialization functionality is also part of a private, internal-only API. The easiest solution we could come up with was calling Coverage.setup(branches: true) before we start fuzzing. With that, we began successfully hooking coverage events as expected.

Having coverage events included in the standard library made our lives a lot easier. Without it, we may have had to resort to much more invasive and cumbersome solutions like modifying the Ruby code the interpreter sees in real time. However, it would have made our lives even easier if hooking coverage events were part of the official, public Ruby C API. We’re currently tracking this request at trailofbits/ruzzy#9.

Again, the information presented here is a slight oversimplification of the implementation details; if you’d like to learn more, then cruzzy.c and ruzzy.rb are great places to start.

Find more Ruby bugs with Ruzzy

We faced some interesting challenges while building this tool and attempted to hide much of the complexity behind a simple, easy to use interface. When using the tool, the implementation details should not become a hindrance or an annoyance. However, discussing them here in detail may spur the next fuzzer implementation or step forward in the fuzzing community. As mentioned previously, the Atheris and Jazzer posts were a great inspiration to us, so we figured we’d pay it forward.

Building the tool is just the beginning. The real value comes when we start using the tool to find bugs. Like Atheris for Python, and Jazzer for Java before it, Ruzzy is an attempt to bring a higher level of software assurance to the Ruby community. If you find a bug using Ruzzy, feel free to open a PR against our trophy case with a link to the issue.

If you’d like to read more about our work on fuzzing, check out the following posts:

Why fuzzing over formal verification?

By Tarun Bansal, Gustavo Grieco, and Josselin Feist

We recently introduced our new offering, invariant development as a service. A recurring question that we are asked is, “Why fuzzing instead of formal verification?” And the answer is, “It’s complicated.”

We use fuzzing for most of our audits but have used formal verification methods in the past. In particular, we found symbolic execution useful in audits such as Sai, Computable, and Balancer. However, we realized through experience that fuzzing tools produce similar results but require significantly less skill and time.

In this blog post, we will examine why the two principal assertions in favor of formal verification often fall short: proving the absence of bugs is typically unattainable, and fuzzing can identify the same bugs that formal verification uncovers.

Proving the absence of bugs

One of the key selling points of formal verification over fuzzing is its ability to prove the absence of bugs. To do that, formal verification tools use mathematical representations to check whether a given invariant holds for all input values and states of the system.

While such a claim can be attainable on a simple codebase, it’s not always achievable in practice, especially with complex codebases, for the following reasons:

The code may need to be rewritten to be amenable to formal verification. This leads to the verification of a pseudo-copy of the target instead of the target itself. For example, the Runtime Verification team verified the pseudocode of the deposit contract for the ETH2.0 upgrade, as mentioned in this excerpt from their blog post:

Specifically, we first rigorously formalized the incremental Merkle tree algorithm. Then, we extracted a pseudocode implementation of the algorithm employed in the deposit contract, and formally proved the correctness of the pseudocode implementation.
Complex code may require a custom summary of some functionality to be analyzed. In these situations, the verification relies on the custom summary to be correct, which shifts the responsibility of correctness to that summary. To build such a summary, users might need to use an additional custom language, such as CVL, which increases the complexity.
Loops and recursion may require adding manual constraints (e.g., unrolling the loop for only a given amount of time) to help the prover. For example, the Certora prover might unroll some loops for a fixed number of iterations and report any additional iteration as a violation, forcing further involvement from the user.
The solver can time out. If the tool relies on a solver for equations, finding a solution in a reasonable time may not be possible. In particular, proving code with a high number of nonlinear arithmetic operations or updates to storage or memory is challenging. If the solver times out, no guarantee can be provided.

So while proving the absence of bugs is a benefit of formal verification methods in theory, it may not be the case in practice.

Finding bugs

When formally verifying the code is not possible, formal verification tools can still be used as bug finding tools. However, the question remains, “Can formal verification find real bugs that cannot be found by a fuzzer?” At this point, wouldn’t it just be easier to use a fuzzer?

To answer this question, we looked at two bugs found using formal verification in MakerDAO and Compound and then attempted to find these same bugs with only a fuzzer. Spoiler alert: we succeeded.

We selected these two bugs because they were widely advertised as having been discovered through formal verification, and they affected two popular protocols. To our surprise, it was difficult to find public issues discovered solely through formal verification, in contrast with the many bugs found by fuzzing (see our security reviews).

Our fuzzer found both bugs in a matter of minutes, running on a typical development laptop. The bugs we evaluated, as well as the formal verification and fuzz testing harnesses we used to discover them, are available on our GitHub page about fuzzing formally verified contracts to reproduce popular security issues.

Fundamental invariant of DAI

MakerDAO found a bug in its live code after four years. You can read more about the bug in When Invariants Aren’t: DAI’s Certora Surprise. Using the Certora prover, MakerDAO found that the fundamental invariant of DAI, which is that the sum of all collateral-backed debt and unbacked debt should equal the sum of all DAI balances, could be violated in a specific case. The core issue is that calling the init function when a vault’s Rate state variable is zero and its Art state variable is nonzero changes the vault’s total debt, which violates the invariant checking sum of total debt and total DAI supply. The MakerDAO team concluded that calling the init function after calling the fold function is a path to break the invariant.

function sumOfDebt() public view returns (uint256) {
    uint256 length = ilkIds.length;
    uint256 sum = 0;
    for (uint256 i=0; i < length; ++i){
        sum = sum + ilks[ilkIds[i]].Art * ilks[ilkIds[i]].rate;
    }
    return sum;
}

function echidna_fund_eq() public view returns (bool) {
    return debt == vice + sumOfDebt();
}

Figure 1: Fundamental equation of DAI invariant in Solidity

We implemented the same invariant in Solidity, as shown in figure 1, and checked it with Echidna. To our surprise, Echidna violated the invariant and found a unique path to trigger the violation. Our implementation is available in the Testvat.sol file of the repository. Implementing the invariant was easy because the source code under test was small and required only logic to compute the sum of all debts. Echidna took less than a minute on an i5 12-GB RAM Linux machine to violate the invariant.

Liquidation of collateralized account in Compound V3 Comet

The Certora team used their Certora Prover to identify an interesting issue in the Compound V3 Comet smart contracts that allowed a fully collateralized account to be liquidated. The root cause of this issue was using an 8-bit mask for a 16-bit vector. The mask remains zero for the higher bits in the vector, which skips assets while calculating total collateral and results in the liquidation of the collateralized account. More on this issue can be found in the Formal Verification Report of Compound V3 (Comet).

function echidna_used_collateral() public view returns (bool) {
    for (uint8 i = 0; i < assets.length; ++i) {
        address asset = assets[i].asset;
        uint256 userColl = sumUserCollateral(asset, true);
        uint256 totalColl = comet.getTotalCollateral(asset);
        if (userColl != totalColl) {
            return false;
        }
    }
    return true;
}

function echidna_total_collateral_per_asset() public view returns (bool) {
    for (uint8 i = 0; i < assets.length; ++i) {
        address asset = assets[i].asset;
        uint256 userColl = sumUserCollateral(asset, false);
        uint256 totalColl = comet.getTotalCollateral(asset);
        if (userColl != totalColl) {
            return false;
        }
    }
    return true;
}

Figure 2: Compound V3 Comet invariant in Solidity

Echidna discovered the issue with the implementation of the invariant in Solidity, as shown in figure 2. This implementation is available in the TestComet.sol file in the repository. Implementing the invariant was easy; it required limiting the number of users interacting with the test contract and adding a method to calculate the sum of all user collateral. Echidna broke the invariant within minutes by generating random transaction sequences to deposit collateral and checking invariants.

Is formal verification doomed?

Formal verification tools require a lot of domain-specific knowledge to be used effectively and require significant engineering efforts to apply. Grigore Rosu, Runtime Verification’s CEO, summarized it as follows:

Figure 3: A tweet from the founder of Runtime Verification Inc.

While formal verification tools are constantly improving, which reduces the engineering effort, none of the existing tools reach the ease of use of existing fuzzers. For example, the Certora Prover makes formal verification more accessible than ever, but it is still far less user-friendly than a fuzzer for complex codebases. With the rapid development of these tools, we hope for a future where formal verification tools become as accessible as other dynamic analysis tools.

So does that mean we should never use formal verification? Absolutely not. In some cases, formally verifying a contract can provide additional confidence, but these situations are rare and context-specific.

Consider formal verification for your code only if the following are true:

You are following an invariant-driven development approach.
You have already tested many invariants with fuzzing.
You have a good understanding of which remaining invariants and components would benefit from formal methods.
You have solved all the other issues that would decrease your code maturity.

Writing good invariants is the key

Over the years, we have observed that the quality of invariants is paramount. Writing good invariants is 80% of the work; the tool used to check/verify them is important but secondary. Therefore, we recommend starting with the easiest and most effective technique—fuzzing—and relying on formal verification methods only when appropriate.

If you’re eager to refine your approach to invariants and integrate them into your development process, contact us to leverage our expertise.

Streamline your static analysis triage with SARIF Explorer

By Vasco Franco

Today, we’re releasing SARIF Explorer, the VSCode extension that we developed to streamline how we triage static analysis results. We make heavy use of static analysis tools during our audits, but the process of triaging them was always a pain. We designed SARIF Explorer to provide an intuitive UI inside VSCode, with features that make this process less painful:

Open multiple SARIF files: Triage all your results at once.
Browse results: Browse results by clicking on them to open their associated location in VSCode. You can also browse a result’s dataflow steps, if present.
Classify results: Add metadata to each result by classifying it as a “bug,” “false positive,” or “TODO” and adding a custom text comment. Keyboard shortcuts are supported.
Filter results: Filter results by keyword, path (to include or exclude), level (“error,” “warning,” “note,” or “none”), and status (“bug,” “false positive,” or “TODO”).
Open GitHub issues: Copy GitHub permalinks to locations associated with results and create GitHub issues directly from SARIF Explorer.
Send bugs to weAudit: Send all bugs to weAudit once you’ve finished triaging them and continue with the weAudit workflow.
Collaborate: Share the .sarifexplorer file with your colleagues (e.g., on GitHub) to share your comments and classified results.

You can install it through the VSCode marketplace and find its code in our vscode-sarif-explorer repo.

Why we built SARIF Explorer

Have you ever had to triage hundreds of static analysis results, many of which were likely to be false positives? At Trail of Bits, we extensively use static analysis tools such as Semgrep and CodeQL, sometimes with rules that produce many false positives, so this is an experience we’re all too familiar with. As security engineers, we use these low-precision rules because if there’s a bug we can detect automatically, we want to know about it, even if it means sieving through loads of false positive results.

Long ago, you would have found me triaging these results by painstakingly going over a text file or looking into a tiny terminal window. This was grueling work that I did not enjoy at all. You read the result’s description, you copy the path to the code, you go to that file, and you analyze the code. Then, you annotate your conclusions in some other text file, and you repeat.

A few years ago, we started using SARIF Viewer at Trail of Bits. This was a tremendous improvement, as it allowed us to browse a neat list of results organized by rule and click on each one to jump to the corresponding code. Still, it lacked several features that we wanted:

The ability to classify results as bugs or false positives directly in the UI
Better result filtering
The ability to export results as GitHub issues
Better integration with weAudit—our tool for bookmarking code regions, marking files as reviewed, and more (check out our recent blog post announcing the release of this tool!)

This is why we built SARIF Explorer!

SARIF Explorer was designed with user efficiency in mind, providing an intuitive interface so that users can easily access all of the features we built into it, as well as support for keyboard shortcuts to move through and classify results.

The SARIF Explorer static analysis workflow

But why did we want all these new features, and how do we use them? At Trail of Bits, we follow this workflow when using static analysis tools:

Run all static analysis tools (configured to output SARIF files).
Open SARIF Explorer and open all of the SARIF files generated in step 1.
Filter out the noisy results.
- Are there rules that you are not interested in seeing? Hide them!
- Are there folders for which you don’t care about the results (e.g., the ./third_party folder)? Filter them out!
Classify the results.
- Determine if each result is a false positive or a bug.
- Swipe left or right accordingly (i.e., click the left or right arrow).
- Add additional context with a comment if necessary.
Working with other team members? Share your progress by committing the .sarifexplorer file to GitHub.
Send all results marked as bugs to weAudit and proceed with the weAudit workflow.

SARIF Explorer features

Now, let’s take a closer look at the SARIF Explorer features that enable this workflow:

Open multiple SARIF files: You can open and browse the results of multiple SARIF files simultaneously. Use the “Sarif files” tab to browse the list of opened SARIF files and to close or reload any of them. If you open a SARIF file in your workspace, SARIF Explorer will also automatically open it.

Browse results: You can navigate to the locations of the results by clicking on them in the “Results” tab. The detailed view of the result, among other data, includes dataflow information, which you can navigate from source to sink (if available). In the GIF below, the user follows the XSS vulnerability from the source (an event message) to the sink (a DOM parser).

GIF showing how to browse results

Classify results: You can add metadata to each result by classifying it as a “bug,” “false positive,” or “TODO” and adding a custom text comment. You can use either the mouse or keyboard to do this:
- Using the mouse: With a result selected, click one of the “bug,” “false positive,” or “TODO” buttons to classify it as such. These buttons appear next to the result and in the result’s detailed view.
- Using the keyboard: With a result selected, press the right arrow key to classify it as a bug, the left arrow key to classify it as a false positive, and the backspace key to reset the classification to a TODO. This method is more efficient.

Filter results: You can filter results by keyword, path (to include or exclude), level (“error,” “warning,” “note,” or “none”), and status (“bug,” “false positive,” or “TODO”). You can also hide all results from a specific SARIF file or from a specific rule. For example, if you want to remove all results from the test and extensions folders and to see only results classified as TODOs, you should:
- Set “Exclude Paths Containing” to “/test/, /extensions/”
- Check the “Todo” box and uncheck the “Bug” and “False Positive” boxes in the “Status” section

Copy GitHub permalinks: You can copy a GitHub permalink to the location associated with a result. This requires having weAudit installed.

Create GitHub issues: You can create formatted GitHub issues for a specific result or for all unfiltered results under a given rule. This requires having weAudit installed.

Send bugs to weAudit: You can send all results classified as bugs to weAudit (results are automatically de-duplicated if you send them twice). This requires having weAudit installed.

Collaborate: You can share the .sarifexplorer file with your colleagues (e.g., on GitHub) to share your comments and classified results. The file is a prettified JSON file, which helps resolve conflicts if more than one person writes to the file in parallel.

You can find even more details about these features in our README.

Try it!

SARIF Explorer and weAudit greatly improved our efficiency when auditing code, and we hope it improves yours too.

Go try both of these tools out and let us know what you think! We welcome any bug reports, feature requests, and contributions in our vscode-sarif-explorer and vscode-weaudit repos.

If you’re interested in VSCode extension security, check out our “Escaping misconfigured VSCode extensions” and “Escaping well-configured VSCode extensions (for profit)” blog posts.

Read code like a pro with our weAudit VSCode extension

By Filipe Casal

Today, we’re releasing weAudit, the collaborative code-reviewing tool that we use during our security audits. With weAudit, we review code more efficiently by taking notes and tracking bugs in a codebase directly inside VSCode, reducing our reliance on external tools, ensuring we never lose track of bugs we find, and enabling us to share that information with teammates.

We designed weAudit with features that are crucial to our auditing process:

Bookmarks for findings and notes: Bookmark code regions to identify findings or add audit notes.
Tracking of audited files: Mark entire files as reviewed.
Collaboration: View and share findings with multiple users.
Creation of GitHub issues: Fill in detailed information about a finding and create a preformatted GitHub issue right from weAudit.

You can install it through the VSCode marketplace and find its code in our vscode-weaudit repo.

Why we built weAudit

When we review complex codebases, we often compile detailed notes about both the high-level structure and specific low-level implementation details to share with our project team. For high-level notes, standard document sharing tools more than suffice. But those tools are not ideal for sharing low-level, code-specific notes. For those, we need a tool that allows us to share notes that are more tightly coupled with the codebase itself, almost like using post-it notes to navigate through a complex book. Specifically, we need a tool that allows us to do the following:

Quickly navigate through areas of interest in the codebase
Visually highlight significant areas of the code
Add audit notes to certain parts of the codebase

For some time, I used a very simple extension for VSCode called “Bookmarks”, which allowed me to add basic notes to lines of code. However, I was never satisfied with this extension, as it was missing crucial features:

The highlighted code did not display the notes I had written next to the code.
I had no way of sharing code coverage information with my client or fellow engineers auditing the codebase.
I had no way of sharing my notes and bookmarks. During an audit with a team of engineers, I need to be able to share these things with my team so that my knowledge is their knowledge, and vice versa.

All of us engineers at Trail of Bits agreed that we needed a better tool for this purpose. We realized that if we wanted an extension tailored to our needs, we would need to create it. That is why we built weAudit.

weAudit’s main features

The features we built into weAudit streamline our process of bookmarking, annotating, and tracking code files under audit, sharing our notes, and creating GitHub issues for findings we discover.

Bookmarks

The extension supports two types of bookmarks: findings, which represent buggy or suspicious regions of code, and notes, which represent personal annotations about the code.

You can add findings and notes to the current code snippet selection by running the corresponding VSCode commands or using the keyboard shortcuts:

“weAudit: New Finding from Selection” (shortcut: Cmd + J)
“weAudit: New Note from Selection” (shortcut: Cmd + K)

These commands will highlight the code in the editor and create a new bookmark in the “List of Findings” view in the sidebar.

By clicking on an item in the “List of Findings” view, you can navigate to the corresponding region of code.

Files with a finding will have a “!” annotation next to the file name in both the file tree of VSCode’s default “Explorer” view and in the tab above the editor, making it immediately clear which files have findings.

The highlight colors can be customized in the extension settings.

Tracking audited files

After reviewing a file, you can mark it as audited by running the “weAudit: Mark File as Reviewed” command or its keyboard shortcut, Cmd + 7. The whole file will be highlighted, and the file name in both the file tree and the tab above the editor will be annotated with a ✓.

The highlight color can be customized in the extension settings.

Daily log

Have you ever had trouble remembering which files you reviewed the previous week? Or do you just really like meaningless statistics such as the number of lines of code you read in a single day? You can see these stats by showing the daily log, accessible from the “List of Findings” panel.

You can also view the daily log by running the “weAudit: Show Daily Log” command in the command palette.

Collaboration with multiple users

You can share weAudit files (located in the .vscode folder) with your co-auditors to share findings and notes about the code. In the “weAudit Files” panel, you can toggle to show or hide the findings from each user by clicking on each entry. The colors for other users’ findings and notes and for your own findings and notes are customizable in the extension settings.

Detailed findings

You can fill in detailed information about a finding by clicking on it in the “List of Findings” view in the sidebar, where you can add all the information we include in our audit reports: title, severity, difficulty, description, exploit scenario, and recommendations for resolving the issue.

This information is then used to prefill a template, allowing you to quickly open a GitHub issue with all of the relevant details for the finding.

You can find more details and information about other features in our README.

Try it out for yourself!

If you use VSCode to navigate through large codebases, we invite you to try weAudit—even if you are not looking for bugs—and let us know what you think!

We welcome any bug reports, feature requests, and contributions in our vscode-weaudit repo.

If you’re interested in VSCode extension security, check out our “Escaping misconfigured VSCode extensions” and “Escaping well-configured VSCode extensions (for profit)” blog posts.

Releasing the Attacknet: A new tool for finding bugs in blockchain nodes using chaos testing

By Benjamin Samuels (@thebensams)

Today, Trail of Bits is publishing Attacknet, a new tool that addresses the limitations of traditional runtime verification tools, built in collaboration with the Ethereum Foundation. Attacknet is intended to augment the EF’s current test methods by subjecting their execution and consensus clients to some of the most challenging network conditions imaginable.

Blockchain nodes must be held to the highest level of security assurance possible. Historically, the primary tools used to achieve this goal have been exhaustive specification, tests, client diversity, manual audits, and testnets. While these tools have traditionally done their job well, they collectively have serious limitations that can lead to critical bugs manifesting in a production environment, such as the May 2023 finality incident that occurred on Ethereum mainnet. Attacknet addresses these limitations by subjecting devnets to a much wider range of network conditions and misconfigurations than is possible on a conventional testnet.

How Attacknet works

Attacknet uses chaos engineering, a testing methodology that proactively injects faults into a production environment to verify that the system is tolerant to certain failures. These faults reproduce real-world problem scenarios and misconfigurations, and can be used to create exaggerated scenarios to test the boundary conditions of the blockchain.

Attacknet uses Chaos Mesh to inject faults into a devnet environment generated by Kurtosis. By building on top of Kurtosis and Chaos Mesh, Attacknet can create various network topologies with ensembles of different kinds of faults to push a blockchain network to its most extreme edge cases.

Some of the faults include:

Clock skew, where a node’s clock is skewed forwards or backwards for a specific duration. Trail of Bits was able to reproduce the Ethereum finality incident using a clock skew fault, as detailed in our TrustX talk last year.
Network latency, where a node’s connection to the network (or its corresponding EL/CL client) is delayed by a certain amount of time. This fault can help reproduce global latency conditions or help detect unintentional synchronicity assumptions in the blockchain’s consensus.
Network partition, where the network is split into two or more segments that cannot communicate with each other. This fault can test the network’s fork choice rule, ability to re-org, and other edge cases.
Network packet drop/corruption, where gossip packets are dropped or have their contents corrupted by a certain amount. This fault can test a node’s gossip validation and test the robustness of the network under hostile network conditions.
Forced node crashes/offlining, where a certain client or type of client is ungracefully shut down. This fault can test the network’s resilience to validator inactivity, and test the ability of clients to re-sync to the network.
I/O disk faults/latency, where a certain amount of latency or error rate is applied to all I/O operations a node makes. This fault can help profile nodes to understand their resource requirements, as I/O is often the largest limiting factor of node performance.

Once the fault concludes, Attacknet performs a battery of health checks against each node in the network to verify that they were able to recover from the fault. If all nodes recover from the fault, Attacknet moves on to the next configured fault. If one or more nodes fail health checks, Attacknet will generate an artifact of logs and test information to allow debugging.

Future work

In this first release, Attacknet supports two run modes: one with a manually configured network topology and fault parameters, and a “planner mode” where a range of faults are run against a specific client with loosely defined topology parameters. In the future, we plan on adding an “Exploration mode” that will dynamically define fault parameters, inject them, and monitor network health repeatedly, similar to a fuzzer.

Attacknet is currently being used to test the Dencun hard fork, and is being regularly updated to improve coverage, performance, and debugging UX. However, Attacknet is not an Ethereum-specific tool, and was designed to be modular and easily extended to support other types of chains with drastically different designs and topologies. In the future, we plan on extending Attacknet to target other chains, including other types of blockchain systems such as L2s.

If you’re interested in integrating Attacknet with your chain/L2’s testing process, please contact us.

Overview of Echidna

Overview of profiling

Profilers can sometimes be misleading

Finding my first huge memory leak in hevm

Don't just measure once, measure N times!

ST to the rescue!

Epilogue: Concrete or symbolic?

Final thoughts

What are ASTs?

Managing the unmanageable

The naive approach

The “reverse pyramid” approach

Excursus: improving the API

Removing the safeguards

Variants on a theme

Going off the rails

Use std::list instead of std::vector

Use std::deque instead of std::vector

Use indices instead of raw pointers

Parting thoughts

Creating a quality fuzzing suite

The tools (and support teams) behind our success

The long and winding road

Racing to keep up with Curvance’s code changes

Critical tool upgrades and lessons learned

Trials and tribulations: Debugging

One lock too many: The story behind TOB-CURV-4

What’s next?

LMS: A stateful post-quantum signature scheme

RustCrypto implementation

Go implementation

More post-quantum engineering is coming soon!

Case 1: Undefined behavior in borsh-rs Rust library

Case 2: DoS vector in Rust libraries for parsing the Ethereum ABI

Case 3: Missing limit on authentication tag length in Expo

Case 4: DoS vector in the num-bigint Rust library

Case 5: Insertion of MMKV database encryption key into Android system log with react-native-mmkv

Takeaways

Step 1: Add a security policy

Step 2: Enable private reporting

Step 3: Get notifications via webhooks

How to make security researchers happy

Bringing fuzz testing to Ruby

Installing and running Ruzzy

Fuzzing Ruby C extensions

Fuzzing pure Ruby code

Interesting implementation details

Creating a Ruby fuzzing harness

Compiling Ruby C extensions with libFuzzer

Adding coverage support for pure Ruby code

Find more Ruby bugs with Ruzzy

Proving the absence of bugs

Finding bugs

Fundamental invariant of DAI

Liquidation of collateralized account in Compound V3 Comet

Is formal verification doomed?

Writing good invariants is the key

Why we built SARIF Explorer

The SARIF Explorer static analysis workflow

SARIF Explorer features

Try it!

Why we built weAudit

weAudit’s main features

Bookmarks

Tracking audited files

Daily log

Collaboration with multiple users

Detailed findings

Try it out for yourself!

How Attacknet works

Future work