Graphtage: A New Semantic Diffing Tool

Graphtage is a command line utility and underlying library for semantically comparing and merging tree-like structures such as JSON, JSON5, XML, HTML, YAML, and TOML files. Its name is a portmanteau of “graph” and “graftage” (i.e., the horticultural practice of joining two trees together so they grow as one). Read on for:

  • What Graphtage does differently and better
  • Why we developed it
  • How it works
  • Directions for using it as a library

All sorts of good

Graphtage lets you see what’s different between two files quickly and easily, but it isn’t a standard line-oriented comparison tool like diff. Graphtage is semantically aware, which allows it to map differences across unordered structures like JSON dicts and XML element tags. You can even compare files that are in two different formats! And when paired with our PolyFile tool, you can semantically diff arbitrary file formats.

Tree-like file formats are becoming increasingly common as a means for transmitting and storing data. If you’ve ever wrangled a gnarly REST API, disentangled the output of a template-generated webpage, or confabulated a config file (and subsequently needed to figure out which specific change was the one that made things work), you’ve probably fought with—and been disappointed by—the current state of open-source semantic diffing tools.

Graphtage solves these problems. It’s available today. To install the utility, run:

pip3 install graphtage

Grab the source code here.

How are existing diff tools insufficient?

Ordered nodes in the tree (e.g., JSON lists) and, in particular, mappings (e.g., JSON dicts) are challenging. Most extant diffing algorithms and utilities assume that the structures are ordered. Take this JSON as an example:

# original.json
{
    "foo": [1, 2, 3, 4],
    "bar": "testing"
}
# modified.json
{
    "foo": [2, 3, 4, 5],
    "zab": "testing",
    "woo": ["foobar"]
}

Existing tools effectively canonicalize the JSON (e.g., sort dictionary elements by key and format lists with one item per line), and then perform a traditional diff. We don’t need no fancy tools for that! Here’s effectively what they do:

$ cat original.json | jq -M --sort-keys > original.canonical.json
$ cat modified.json | jq -M --sort-keys > modified.canonical.json
$ diff -u original.canonical.json modified.canonical.json
    {
    -  "bar": "testing",
       "foo": [
    -    1,
         2,
         3,
    -    4
    -  ]
    +    4,
    +    5
    +  ],
    +  "woo": [
    +    "foobar"
    +  ],
    +  "zab": "testing"
    }

That result is not very useful, particularly if the input files are large. The problem is that changing dict keys breaks the diff: Since “bar” was changed to “zab,” the canonical representation changed, and the traditional diff algorithm considered them separate edits (lines 2 and 15 of the diff).

In contrast, here is Graphtage’s output for the same pair of files:
An example of Graphtage's diff output

Why hasn’t this been done before?

In general, optimally mapping one graph to another cannot be executed in polynomial time, and is therefore not tractable for graphs of any useful size (unless P=NP). This is true even for restricted classes of graphs like DAGs. However, trees and forests are special cases that can be mapped in polynomial time, with reasonable constraints on the types of edits possible. Graphtage exploits this.

How do it know?

Graphtage’s diffing algorithms operate on an intermediate representation rather than on the data structures of the original file format. This allows Graphtage to have generic comparison algorithms that can work on any input file type. Therefore, to add support for a new file type, all one needs to do is “lift” it to the intermediate representation. Likewise, one only needs to implement support for a new type of edit once, and it will immediately be available to apply against all supported filetypes. Using an intermediate representation has the added benefit of allowing cross-format comparisons and formatting translations: Graphtage will happily diff a JSON file against a YAML file, formatting the diff output in TOML syntax.

Graphtage matches ordered sequences like lists using an “online” “constructive” implementation of the Levenshtein distance metric, similar to the Wagner–Fischer algorithm. The algorithm starts with an unbounded mapping and iteratively improves it until the bounds converge, at which point the optimal edit sequence is discovered.

Dicts are matched by solving the minimum weight matching problem on the complete bipartite graph from key/value pairs in the source dict to key/value pairs in the destination dict.

Graphtage is a command line utility, but it can just as easily be used as a library. One can interact with Graphtage directly from Python, and extend it to support new file formats and edit types.

Next up for Graphtage

We think Graphtage is pretty nifty. You can also use Graphtage in conjunction with our PolyFile tool to semantically diff arbitrary file formats, even if they aren’t naturally tree-based. Try it, and let us know how you use it.

We also plan to extend Graphtage to work on abstract syntax trees, which will allow your source code diffs to tell you things like which variables were changed and whether code blocks were reordered. If you have a similarly nifty idea for a new feature, please share it with us!

Note: This tool was partially developed with funding from the Defense Advanced Research Projects Agency (DARPA) on the SafeDocs project. The views, opinions, and/or findings expressed are those of the authors and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.

Using Echidna to test a smart contract library

In this post, we’ll show you how to test your smart contracts with the Echidna fuzzer. In particular, you’ll see how to:

  • Find a bug we discovered during the Set Protocol audit using a variation of differential fuzzing, and
  • Specify and check useful properties for your own smart contract libraries.

And we’ll demonstrate how to do all of this using crytic.io, which provides a GitHub integration and additional security checks.

Libraries may import risk

Finding bugs in individual smart contracts is critically important: A contract may manage significant economic resources, whether in the form of tokens or Ether, and damages from vulnerabilities may be measured in millions of dollars. Arguably, though, there is code on the Ethereum blockchain that’s even more important than any individual contract: library code.

Libraries are potentially shared by many high-value contracts, so a subtle unknown bug in, say, SafeMath, could allow an attacker to exploit not just one, but many critical contracts. The criticality of such infrastructure code is well understood outside of blockchain contexts—bugs in widely used libraries like TLS or sqlite are contagious, infecting potentially all code that relies on the vulnerable library.

Library testing often focuses on detecting memory safety vulnerabilities. On the blockchain, however, we’re not so worried about avoiding stack smashes or a memcpy from a region containing private keys; we’re worried most about the semantic correctness of the library code. Smart contracts operate in a financial world where “code is law,” and if a library computes incorrect results under some circumstances, that “legal loophole” may propagate to a calling contract, and allow an attacker to make the contract behave badly.

Such loopholes may have other consequences than making a library produce incorrect results; if an attacker can force library code to unexpectedly revert, they then have the key to a potential denial-of-service attack. And if the attacker can make a library function enter a runaway loop, they can combine denial of service with costly gas consumption.

That’s the essence of a bug Trail of Bits discovered in an old version of a library for managing arrays of addresses, as described in this audit of the Set Protocol code.

The faulty code looks like this:

/**
* Returns whether or not there's a duplicate. Runs in O(n^2).
* @param A Array to search
* @return Returns true if duplicate, false otherwise
*/
function hasDuplicate(address[] memory A) returns (bool)
   {
     for (uint256 i = 0; i < A.length - 1; i++) {
       for (uint256 j = i + 1; j < A.length; j++) {
         if (A[i] == A[j]) {
            return true;
         }
       }
   }
   return false;
}

The problem is that if A.length is 0 (A is empty), then A.length - 1 underflows, and the outer (i) loop iterates over the entire set of uint256 values. The inner (j) loop, in this case, doesn’t execute, so we have a tight loop doing nothing for (basically) forever. Of course this process will always run out of gas, and the transaction that makes the hasDuplicate call will fail. If an attacker can produce an empty array in the right place, then a contract that (for example) enforces some invariant over an address array using hasDuplicate can be disabled—possibly permanently.

The library

For specifics, see the code for our example, and check out this tutorial on using Echidna.

At a high level, the library provides convenient functions for managing an array of addresses. A typical use case involves access control using a whitelist of addresses. AddressArrayUtils.sol has 19 functions to test:

function indexOf(address[] memory A, address a)
function contains(address[] memory A, address a)
function indexOfFromEnd(address[] A, address a)
function extend(address[] memory A, address[] memory B)
function append(address[] memory A, address a)
function sExtend(address[] storage A, address[] storage B)
function intersect(address[] memory A, address[] memory B)
function union(address[] memory A, address[] memory B)
function unionB(address[] memory A, address[] memory B)
function difference(address[] memory A, address[] memory B)
function sReverse(address[] storage A)
function pop(address[] memory A, uint256 index)
function remove(address[] memory A, address a)
function sPop(address[] storage A, uint256 index)
function sPopCheap(address[] storage A, uint256 index)
function sRemoveCheap(address[] storage A, address a)
function hasDuplicate(address[] memory A)
function isEqual(address[] memory A, address[] memory B)
function argGet(address[] memory A, uint256[] memory indexArray)

It seems like a lot, but many of the functions are similar in effect, since AddressArrayUtils provides both functional versions (operating on memory array parameters) and mutating versions (requiring storage arrays) of extend, reverse, pop, and remove. You can see how once we’ve written a test for pop, writing a test for sPop probably won’t be too difficult.

Property-based fuzzing 101

Our job is to take the functions we’re interested in—here, all of them—and:

  • Figure out what each function does, then
  • Write a test that makes sure the function does it!

One way to do this is to write a lot of unit tests, of course, but this is problematic. If we want to thoroughly test the library, it’s going to be a lot of work, and, frankly, we’re probably going to do a bad job. Are we sure we can think of every corner case? Even if we try to cover all the source code, bugs that involve missing source code, like the hasDuplicate bug, can easily be missed.

We want to use property-based testing to specify the general behavior over all possible inputs, and then generate lots of inputs. Writing a general description of behavior is harder than writing any individual concrete “given inputs X, the function should do/return Y” test. But the work to write all the concrete tests needed would be exorbitant. Most importantly, even admirably well-done manual unit tests don’t find the kind of weird edge-case bugs attackers are looking for.

The Echidna test harness: hasDuplicate

The most obvious thing about the code to test the library is that it’s bigger than the library itself!  That’s not uncommon in a case like this. Don’t let that daunt you; unlike a library, a test harness approached as a work-in-progress, and slowly improved and expanded, works just fine. Test development is inherently incremental, and even small efforts provide considerable benefit if you have a tool like Echidna to amplify your investment.

For a concrete example, let’s look at the hasDuplicate bug. We want to check that:

  • If there is a duplicate, hasDuplicate reports it, and
  • If there isn’t a duplicate, hasDuplicate reports that there isn’t one.

We could just re-implement hasDuplicate itself, but this doesn’t help much in general (here, it might let us find the bug). If we had another, independently developed, high-quality address array utility library, we could compare it, an approach called differential testing. Unfortunately, we don’t often have such a reference library.

Our approach here is to apply a weaker version of differential testing by looking for another function in the library that can detect duplicates without calling hasDuplicate. For this, we’ll use indexOf and indexOfFromEnd to check if the index of an item (starting from 0) is the same as that when a search is performed from the end of the array:

    for (uint i = 0; i < addrs1.length; i++) {
      (i1, b) = AddressArrayUtils.indexOf(addrs1, addrs1[i]);
      (i2, b) = AddressArrayUtils.indexOfFromEnd(addrs1, addrs1[i]);
      if (i1 != (i2-1)) { // -1 because fromEnd return is off by one
	hasDup = true;
      }
    }
    return hasDup == AddressArrayUtils.hasDuplicate(addrs1);
  }

See the full example code in our addressarrayutils demo

This code iterates through addrs1 and finds the index of the first appearance of each element.  If there are no duplicates, of course, this will always just be i itself. The code then finds the index of the last appearance of the element (i.e., from the end). If those two indices are different, there is a duplicate. In Echidna, properties are just Boolean Solidity functions that usually return true if the property is satisfied (we’ll see the exception below), and fail if they either revert or return false. Now our hasDuplicate test is testing both hasDuplicate and the two indexOf functions. If they don’t agree, Echidna will tell us.

Now we can add a couple of functions to be fuzzed to set addrs1.

Let’s run this property on Crytic:

The property test for hasDuplicate fails in Crytic

First, crytic_hasDuplicate fails:

crytic_hasDuplicate: failed!
  Call sequence:
    set_addr(0x0)

The triggering transaction sequence is extremely simple: Don’t add anything to addrs1, then call hasDuplicate on it. That’s it—the resulting runaway loop will exhaust your gas budget, and Crytic/Echidna will tell you the property failed. The 0x0 address results when Echidna minimizes the failure to the simplest sequence possible.

Our other properties (crytic_revert_remove and crytic_remove) pass, so that’s good. If we fix the bug in hasDuplicate then our tests will all pass:

All three property tests now pass in Crytic

The crytic_hasDuplicate: fuzzing (2928/10000) tells us that since the expensive hasDuplicate property doesn’t quickly fail, only 3,000 of our maximum of 10,000 tests for each property were performed before we hit our timeout of five minutes.

The Echidna test harness: The rest of the library

Now we’ve seen one example of a test, here are some basic suggestions for building the rest of the tests (as we’ve done for the addressarrayutils_demo repository):

  • Try different ways of computing the same thing. The more “differential” versions of a function you have, the more likely you are to find out if one of them is wrong. For example, look at all the ways we cross-check indexOf, contains, and indexOfFromEnd.
  • Test for revert. If you add the prefix _revert_ before your property name as we do here, the property only passes if all calls to it revert. This ensures code fails when it is supposed to fail.
  • Don’t forget to check obvious simple invariants, e.g., that the diff of an array with itself is always empty (ourEqual(AddressArrayUtils.difference(addrs1, addrs1), empty)).
  • Invariant checks and preconditions in other testing can also serve as a cross-check on tested functions. Note that hasDuplicate is called in many tests that aren’t meant to check hasDuplicate at all; it’s just that knowing an array is duplicate-free can establish additional invariants of many other behaviors, e.g., after removing address X at any position, the array will no longer contain X.

Getting up and running with Crytic

You can run Echidna tests on your own by downloading and installing the tool or using our docker build—but using the Crytic platform integrates Echidna property-based testing, Slither static analysis (including new analyzers not available in the public version of Slither), upgradability checks, and your own unit tests in a seamless environment tied to your version control. Plus the addressarrayutils_demo repository shows all you need for property-based testing: It can be as simple as creating a minimal Truffle setup, adding a crytic.sol file with the Echidna properties, and turning on property-based tests in your repository configuration in Crytic.

Sign up for Crytic today, and if you have questions, join our Slack channel (#crytic) or follow @CryticCI on Twitter.

Sinter: New user-mode security enforcement for macOS

TL;DR: Sinter is the first available open-source endpoint protection agent written entirely in Swift, with support for Apple’s new EndpointSecurity API from first principles. Sinter demonstrates how to build a successful event-authorization security agent, and incorporates solutions to many of the challenges that all endpoint protection agents will face as they migrate from kernel-mode to user-mode agents before the release of macOS 11 Big Sur.

Simple, open-source, and Swift

Sinter is our new open-source endpoint security enforcement agent for macOS 10.15 and above, written in Swift. We built it from scratch as a 100% user-mode agent leveraging the new EndpointSecurity API to receive authorization callbacks from the macOS kernel for a set of security-relevant event types. Sinter is controlled with simple rules to allow or deny events—and uses none of the expensive full-system scans or signature-based detection of traditional anti-virus solutions.

Grab an installer for the beta version today and try it out!

Currently, Sinter lets you write a set of rules to block or allow process execution events, and to provide the rules to the agent with a Santa-compatible sync server or with a local configuration file (here is an example rule that demonstrates an explicit-allow). However, we’re planning to develop a more sophisticated rule syntax and add blocking capability for the many other kinds of events supported by the API, which would also mean an end to the Santa rule compatibility.

The quest for the 100% user-mode security agent

Implementing an endpoint security solution (e.g., anti-virus, anti-malware) requires interception and authorization of OS-level events in real time. Historically, that has meant the use of kernel-mode callback APIs or hooking kernel-mode operating system code when a proper API was not provided. Operating system developers have long known that third-party kernel-mode code like this was the leading source of system instability and insecurity, because any small error in kernel code tends to have large consequences.

Enter the macOS EndpointSecurity API. In late 2019, Apple announced that support for all third-party kernel extensions would be deprecated in macOS, and that they would introduce user-mode APIs and frameworks to replace the functionality needed for third-party products. All security vendors were put on notice: Deprecate your existing kernel-mode solutions within the next year and migrate to the EndpointSecurity API before the next release of macOS (macOS 11 Big Sur). It’s clearly not a fun prospect for many teams, and soon after the announcement a client tapped us to develop a user-mode solution that would make migration less painful.

What is the EndpointSecurity API?

EndpointSecurity is an API that implements a callback from the macOS kernel, in real time, as a particular event is about to happen. EndpointSecurity clients subscribe to one or more event types that are either a NOTIFY type or an AUTH (Authorization) type. Notify is just what it sounds like, and is useful for capturing a simple activity log on the host. An Authorization callback is much more powerful; it lets a client process make a decision to allow or deny the event from happening.

EndpointSecurity replaces the kernel-mode equivalents for real-time event authorizing on macOS (Kauth KPI and other unsupported kernel methods) and the read-only event monitoring OpenBSM audit trail. Any real-time monitoring or protection product for macOS must be rewritten to use EndpointSecurity for macOS 11 Big Sur.

Note that there are no network-related events in the EndpointSecurity API (except UNIX domain sockets). All of these are in the Network Extension framework. You can combine the use of both APIs from one System Extension, but here we focus on the EndpointSecurity API specifically.

Using this API, Stephen Davis at FireEye and Patrick Wardle at Objective-See quickly released event monitoring applications that could display, for example, process-related and file-related events in real time. But read-only monitoring tools following in the footsteps of Process Monitor (“ProcMon”), while useful, are only using the Notify half of the functionality of the EndpointSecurity API (the ability to monitor). Google’s Santa, an open-source macOS process allow/deny solution in Objective-C, demonstrates the ability to authorize events using EndpointSecurity: Its agent now receives and makes allow/deny decisions for process events from EndpointSecurity.

We saw that it would be critically important to master the EndpointSecurity API, as many teams would need to migrate to it in their existing macOS security applications. With the development of Sinter, we’ve delved into EndpointSecurity, learned some lessons from the experience, and incorporated solutions for various challenges we encountered—so you don’t have to. Sinter also demonstrates an implementation of an EndpointSecurity client in the Swift programming language, which promises better memory safety and performance than Objective-C, while maintaining compatibility with all of the other new macOS APIs.

Developing Sinter: Not for the faint of heart

Implementing an event-authorizing agent is an order of magnitude more difficult than implementing a read-only event-subscriber. We also learned—the hard way—certain shortcomings of the EndpointSecurity API. Here’s some of the more significant heavy lifting we did in the course of Sinter’s development.

1. Making decisions in real-time without impacting the system

Possibly the most difficult part of implementing a security event authorization agent is that authorization decisions must be made in real time. You cannot block forever to make a decision, and EndpointSecurity enforces a deadline for each authorization message: If your client blows the deadline, EndpointSecurity will terminate your client process to preserve a functioning system.

Decisions shouldn’t be made synchronously; Sinter uses the es_copy_message to dequeue the message from EndpointSecurity and allow its dispatcher to immediately send you the next message. Decisions should be made in separate threads, responding asynchronously as soon as possible from each one. Some decisions will take longer than others to process, but often the APIs needed to perform signature checks can’t be interrupted.

With Sinter, we ran into this problem head-on when a quick burst of execution events involving large programs caused Sinter to lock up the machine. We solved this by implementing an efficient queuing system, with one queue for small programs and another one for big programs, so events would never get stuck waiting in the queue. The big-programs queue works out-of-process so a long-running verification can be aborted whenever necessary. This new approach performs reliably on all of our tests.

2. Mitigating the TOCTOU risks in real-time security decisions

The TOCTOU (time of check, time of use) race condition vulnerability pattern commonly occurs when making security decisions. Any security agent performing a check must not allow the checked resource to be modified in the time between the check and the action’s approval.

When authorizing macOS execution events, the resource being checked is the executable file, which is mapped into memory before executing. Here’s one TOCTOU attack scenario:

A malicious actor executes Bad.app. The bad executables are mapped into memory and an execution authorization event is emitted by EndpointSecurity. But then the attacker immediately replaces or modifies the executable file to make it Good.app. The EndpointSecurity client gets the event, verifies that the bundle and its files all look good, and allows the execution.

This problem is not unique to EndpointSecurity, and was always a risk with the KAuth framework that preceded it (e.g., an issue was raised about this TOCTOU in Santa not long ago). It’s still a challenge that must be solved by any agent that wants to authorize events. As mentioned, Sinter attempts to monitor file events to catch TOCTOU attacks. It would have been much easier if Apple handled this responsibility within the EndpointSecurity API itself (submitted to Apple as a developer feedback suggestion FB8352031; see OpenRadar).

3. macOS executable files live in application bundles

Execution events occur in the context of a single executable file, but most macOS executables exist within application bundles, the directory-like structure that appears as a single “.app” file in macOS Finder. A bundle itself is code-signed, and code-signing verification must be done at the bundle level. This means that a security agent that catches an execution event must discover if the executable has a containing app bundle, then verify the code signature on the entire bundle—these are tasks not performed by EndpointSecurity itself. Some bundles like Apple’s Xcode.app are upwards of a gigabyte in size, and processing a verification in real time isn’t possible. Execution events have to be denied at first, until the verification completes.

EndpointSecurity does provide a built-in caching mechanism, a single cache shared by all EndpointSecurity clients. However, as a client you cannot invalidate a single entry in this cache; you can only clear the entire cache at once. EndpointSecurity will automatically invalidate a cache item if the related file is changed/deleted/etc., but it does this on a per-file basis, not per-application-bundle. Currently, Sinter works with two caches: the one managed by EndpointSecurity, and another custom cache containing the application bundle code-signing verification results.

In theory, malware could be added into an application bundle and EndpointSecurity would not react by clearing a cached approval decision, if the previously approved executable file in the bundle had not changed. EndpointSecurity clients would have to monitor for this themselves, and invalidate the entire cache in response. This is less than ideal, and we hope Apple will make improvements to this caching mechanism. In the near term, EndpointSecurity clients may have to implement their own integrity monitoring on application bundles to avoid being circumvented this way. Sinter attempts its own bundle file integrity monitoring capability to detect when this custom cache should be cleared.

4. Advantages of installing your agent as a System Extension

System Extensions” is Apple’s name for “user-mode components that extend the system” and is the umbrella term for what replaces the now-deprecated third-party Kernel Extensions. EndpointSecurity is one API under this umbrella; DriverKit and Network Extensions are a couple of others. A System Extension is also a new kind of managed plug-in package for macOS through which you can install your executable.

Installing an EndpointSecurity client as a System Extension is not required—you can implement all of the EndpointSecurity functionality from any kind of executable, even a basic command-line application—but is highly encouraged. There are additional benefits and system-enforced protections for your agent when it is installed as a System Extension. System Extensions can opt to be loaded before all other third-party applications at startup. Apple also announced that macOS extends SIP (System Integrity Protection) to cover System Extensions, meaning it prevents even root users from unloading your security agent. Historically this was only possible if you developed your own kernel-mode anti-tamper logic, but installing your agent as a System Extension frees you from reinventing this wheel. Sinter is currently a background daemon, but now that Apple has documented the anti-tamper protection benefit of installing your agent as a System Extension, we will be converting Sinter to this format.

5. Mastering the Entitlements, Signing, and Notarization workflow

The EndpointSecurity API is only usable within code-signed and notarized applications by Apple-approved developers like Trail of Bits. In other words, the API is gated by a special entitlement. Unlike most entitlements, this one requires a manual application and approval by Apple, after which you are granted a code-signing certificate with the EndpointSecurity entitlement. In our case, the time to be approved was six calendar weeks, but your mileage may vary. Apple is apparently being careful with this entitlement, because a misbehaving or malicious EndpointSecurity client could put a halt to everything on a host system.

Apple’s code-signing and notarization steps are difficult to troubleshoot when they fail, so it’s essential to set up and automate the process early, so you will immediately notice when they break and easily narrow down the breaking changes. For Sinter, we created our own CMake-driven approach that automates the workflow for Apple’s notarization, packaging, package signing, and package notarization steps. All of that now integrates perfectly into our CI with minimal fuss.

One last entitlement that EndpointSecurity agents need is related to user privacy. Because most agents will be inspecting files (whether in the context of file events or the executables of process events), they need the user’s permission to access the filesystem. On or before the first run of your application, the user must manually go to Privacy settings in System Preferences, and enable “Full Disk Access.” There are MDM payloads that can automatically enable the permission and sidestep this manual user approval step.

Those were the thornier challenges we addressed when writing Sinter, and of course there were more miscellaneous gotchas and lessons learned (e.g., determining whether files are binaries, signature verification, and multiple EndpointSecurity clients). We’ll update the most compelling details as development continues—stay tuned.

The upshot

With the deprecation of kernel extensions, Apple is leveling the playing field for endpoint protection agents: Everyone must use the same user-mode APIs. This will benefit everyone with improved system stability and reduced attack surface, but existing security product developers first have to replace their kernel extensions with a user-mode approach. In user mode, they can now work in any language, not just C/C++.

So instead of starting from scratch with just the example code in C, we hope organizations will help us build and rely upon an open-source platform in Swift, a forward-looking choice for long-term investment as Apple’s successor to Objective-C.

Get involved with Sinter

The beta version of Sinter is available today. It’s a significant first step, and here’s a peek at some of the larger items we’re working on now:

We invite you to partner with us to sponsor the continued development of Sinter, or to discuss the integration of EndpointSecurity-based capability into your existing agent—just contact us to get started.

Contributors are welcome, too! Give us your feedback on GitHub, or join us in the #sinter channel on the Empire Hacking Slack.

Accidentally stepping on a DeFi lego

The initial release of yVault contained logic for computing the price of yUSDC that could be manipulated by an attacker to drain most (if not all) of the pool’s assets. Fortunately, Andre, the developer, reacted incredibly quickly and disabled the faulty code, securing the approximately 400,000 USD held at the time. However, this bug still highlights the risk stemming from increased complexity caused by composition in the DeFi space.

What is yVault?

On July 25th 2020, yEarn launched a new service called yVault: Users could deposit tokens in the vault, which would then be supplied to a DeFi protocol chosen to maximize their interest.

The initial release supported USDC and integrated with the USDC/MUSD Balancer pool. Any USDC held by the vault would be supplied to the Balancer pool as liquidity, and the vault would receive BPT tokens in return.

To use the vault, a user sends USDC and is minted yUSDC. Similarly, USDC can be withdrawn by burning yUSDC. These two operations rely on a dynamically calculated exchange rate, defined as the ratio of the value of the BPT held by the contract and the total supply of yUSDC. Since the value of BPT goes up when fees are paid by traders, the value of each yUSDC token slowly goes up over time.

Within an hour of yVault’s release, users had already deposited around 400,000 USDC, so I knew I had to take a look at the code for myself.

What was the bug?

Since the initial release integrated with Balancer, let’s consider how Balancer works. Balancer removes the need for liquidity providers to manually rebalance their portfolio by incentivizing rational market actors to do so instead. If a token goes up in price, the pool will become unbalanced. While normally a liquidity provider may need to pay fees in order to sell a token that has increased in value, Balancer incentivizes external users to pay a fee for the privilege of purchasing the token at a profit instead. The fees paid are then distributed to the liquidity providers.

Figure 1 presents the equation used to calculate the amount of tokens received based on the state of the Balancer pool and the amount of tokens sent. For the remainder of this post, let’s refer to the MUSD/USDC 50/50 pool. The swap fee is 0.05%.

/**********************************************************************************************
// calcOutGivenIn                                                                            //
// aO = tokenAmountOut                                                                       //
// bO = tokenBalanceOut                                                                      //
// bI = tokenBalanceIn              /      /            bI             \    (wI / wO) \      //
// aI = tokenAmountIn    aO = bO * |  1 - | --------------------------  | ^            |     //
// wI = tokenWeightIn               \      \ ( bI + ( aI * ( 1 - sF )) /              /      //
// wO = tokenWeightOut                                                                       //
// sF = swapFee                                                                              //
**********************************************************************************************/

Figure 1: Token output given input.

First, to get a sense of how this function behaves, we’ll see what happens when a rational market actor swaps a pool back into balance and when an irrational market actor swaps a pool out of balance.

Suppose the pool is currently out of balance and contains 1,100,000 USDC and 900,000 MUSD. If a rational market actor pays 90,000 MUSD, they’ll receive 99,954 USDC in exchange and make 9,954 USDC in profit. A very good deal!

Now suppose the pool is currently balanced and contains 1,000,000 USDC and 1,000,000 MUSD. What happens if an irrational market actor pays 100,000 USDC? Well, they would receive 90,867 MUSD for a loss of 9,133 MUSD. Not such a great deal.

Although the second trade results in an immediate loss and thus seems rather useless, pairing it with the first trade results in some interesting behavior.

Consider a user who first performs The Bad Trade: The user converts 100,000 USDC to 90,867 MUSD, losing 9,133 USD in the process. Then, the user performs The Good Trade and converts 90,867 MUSD to 99,908 USDC, earning 9,041 USD in the process. This results in a net loss of 92 USD. Not ideal, but certainly not as bad as the loss of 9,200 USD.

Now consider the valuation of BPT during this process. If you held 1% of the total BPT, at the start of the transaction your tokens would have been worth 1% of 2,000,000 USD, or 20,000 USD. At the end of the transaction, your tokens would have been worth 1% of 2,000,092 USD, or 20,000.96 USD. Yet for a magical moment, right in the middle of the transaction, your tokens were worth 1% of 2,009,133 USD, or 20,091.33 USD. This is the crux of the vulnerability at hand.

Knowing this, I applied the same process behavior to yVault. Before The Bad Trade, the vault holds some BPT worth some amount of USD. After The Good Trade, the vault holds the same amount of BPT worth a slightly larger amount of USD. However, between The Bad Trade and The Good Trade, the vault holds some BPT worth a significantly larger amount of USD.

Recall that the value of yUSDC is directly proportional to the value of the BPT it holds. If we bought yUSDC before The Bad Trade and sold yUSDC before The Good Trade, we would instantaneously make a profit. Repeat this enough times, and we would drain the vault.

How was it fixed?

It turns out that accurately calculating the true value of BPT and preventing attackers from extracting profit from slippage is a difficult problem to solve. Instead, the developer, Andre, deployed a new strategy that simply converts USDC to MUSD and supplies it to the mStable savings account was deployed and activated.

Future Recommendations

DeFi composability is hard, and it’s easy to accidentally expose your new protocol to unexpected risk. If you integrate multiple tokens, any one token could compromise the security of your entire platform. On the other hand, if you integrate multiple platforms, your protocol could suffer from complex interactions.

Security tooling can be used to help prevent most simple bugs in code:

  • Crytic uses an advanced version of Slither to automatically detect up to 90 types of vulnerabilities
  • Echidna asserts specific properties through fuzz testing
  • Manticore can symbolically analyze your code

Of course, tooling isn’t a panacea for security. In our study “What are the Actual Flaws in Important Smart Contracts (and How Can We Find Them)?” we discovered that almost 50% of findings were unlikely to be detected by tooling, even if the technology significantly improves. For complex codebases and DeFi projects, reach out to us to arrange a security assessment, or sign up for our Ethereum security office hours.

Contract verification made easier

Smart contract authors can now express security properties in the same language they use to write their code (Solidity) and our new tool, manticore-verifier, will automatically verify those invariants. Even better, Echidna and Manticore share the same format for specifying property tests.

In other words, smart contract authors can now write one property test and have it tested with fuzzing and verified by symbolic execution! Ultimately, manticore-verifier reduces the initial effort and cost involved in symbolic testing of arbitrary properties.

How it works

A smart contract’s behavior—and its potential bugs—are often unique and depend heavily on unspoken contract invariants. Let’s test a simple contract:

contract Ownership{
    address owner = msg.sender;
    function Owner() public{
        owner = msg.sender;
    }
    modifier isOwner(){
        require(owner == msg.sender);
        _;
    }
}
contract Pausable is Ownership{
    bool is_paused;
    modifier ifNotPaused(){
        require(!is_paused);
        _;
    }
    function paused() isOwner public{
        is_paused = true;
    }
    function resume() isOwner public{
        is_paused = false;
    }
}
contract Token is Pausable{
    mapping(address => uint) public balances;
    function transfer(address to, uint value) ifNotPaused public{
        balances[msg.sender] -= value;
        balances[to] += value;
    }
}

This contract maintains a balance sheet and allows for simple transactions. Users can send their tokens to other users, but the total amount of tokens must remain fixed—in other words, tokens can’t be created after the contract has started. So under this invariant, a valid property could state: “If there are only 10,000 tokens, no user could own more than that.”

We can express this property as a Solidity method: “crytic_test_balance.”

import "token.sol";
contract TestToken is Token {
    constructor() public{
        balances[msg.sender] = 10000;
    }
    // the property
    function crytic_test_balance() view public returns(bool){
        return balances[msg.sender] <= 10000;
    }   
}

The emulated world

ManticoreEVM compiles and then creates the contract in a fully emulated symbolic blockchain.

Different normal accounts are also created there to replicate real-world situations. A deployer account is used to deploy the contract, others are used to explore the contract and try to break the properties, and, finally, a potentially different account is used to test the properties.

ManticoreEVM detects the property type methods present in high-level source code and checks them after every combination of symbolic transactions. A normal property is considered failed if the method returns false.

The loop (exploration)

The deployer account initially creates the target contract via a CREATE transaction. Then manticore-verifier simulates all possible interleaving transactions originating from the contract testers until (for example) no more coverage is found. After each symbolic transaction, the properties are checked in the name of the property-checker account, and if anything looks broken, a report of the reproducible exploit trace is generated. Normal properties like crytic_test_balance() are expected to return true; any other result is reported as a problem.

manticore-verifier dapp.sol –contract TestToken

It’s a command–line-based tool

Several aspects of the exploration, the stopping condition, and the user accounts employed can be modified by command line arguments. Try $manticore-verifier –help for a thorough list. Here’s an excerpt of it in action:

$manticore-verifier dapp.sol  --contract TestToken

# Owner account: 0x28e9eb58c2f5be87161a261f412a115eb85946d9
# Contract account: 0x9384027ebe35100de8ef216cb401573502017f7
# Sender_0 account: 0xad5e556d9699e9e35b3190d76f75c9bf9997533b
# PSender account: 0xad5e556d9699e9e35b3190d76f75c9bf9997533b
# Found 1 properties: crytic_test_balance
# Exploration will stop when some of the following happens:
# * 3 human transaction sent
# * Code coverage is greater than 100% measured on target contract
# * No more coverage was gained in the last transaction
# * At least 1 different properties where found to be breakable. (1 for fail fast)
# * 240 seconds pass
# Starting exploration...
Transaction 0. States: 1, RT Coverage: 0.0%, Failing properties: 0/1
Transaction 1. States: 2, RT Coverage: 60.66%, Failing properties: 0/1
Found 1/1 failing properties. Stopping exploration.
60.66% EVM code covered 
+---------------------+------------+
|    Property Named   |   Status   |
+---------------------+------------+
| crytic_test_balance | failed (0) |
+---------------------+------------+
Checkout testcases here:./mcore_kkgtybqb

Note that each failing property will have a test case number associated with it. More details can be found at the specified test case files: ./mcore_kkgtybqb/user_000000.tx

Bug Found!

In our example, manticore-verifier finds a way to break the specified property. When trying to transfer an incredibly large amount tokens, an internal integer representation exceeds its limits and makes it possible to boost the sender’s savings, i.e., create tokens out of thin air.

transfer(0,115792089237316195422001709574841237640532965826898585773776019699400460720238) -> STOP (*)

Conclusion: Interoperability = 101%

manticore-verifier lowers the initial cost to symbolically test arbitrary properties. It also allows our symbolic executor to work more tightly with Solidity, Echidna, and slither-prop.

The same methodology can be used with our Ethereum fuzzer, Echidna. As a result, you can write the properties once and test them with symbolic execution and fuzzing with no extra effort.

manticore-verifier can check automatically generated ERC20 properties. Moreover, slither-prop, our static analyzer, has detailed information about what an ERC20 contract should do, and can automatically produce properties for ERC20 that manticore-verifier can check, automatically.

So get your contract, add the property methods, and test with manticore-verifier at will. If you have any questions please join the Empire Hacking Slack.

Advocating for change

As a company, we believe Black lives matter. In the face of continued police brutality, racial disparities in law enforcement, and limited accountability, we demand an end to systemic racism, endorse restrictions on police use of force, and seek greater accountability for police actions. We believe police misconduct, militarization of police, and unchecked abuse of power are issues that we as Americans should protest.

Giving time, money, and attention

In this spirit, I have reaffirmed our employees’ right to protest without reprisal or retaliation. While there’s certainly no account of who has and hasn’t, I’m aware that many of our employees have recently marched to end systemic racism and police brutality.

To support understanding and discussion, we created a #solidarity channel on our company Slack. Conversations there grew rapidly as we shared research on social policy and upcoming legislation, including policies that have been analyzed by social scientists studying criminal justice:

  1. A large-scale analysis of racial disparities in police stops across the United States
  2. Collective Bargaining Rights and Police Misconduct: Evidence from Florida
  3. Evidence that curtailing proactive policing can reduce major crime
  4. Good Cop, Bad Cop: Using Civilian Allegations to Predict Police Misconduct
  5. The Wandering Officer

Many of our employees also decided to “protest with our wallets” and use our existing charitable donation matching program to support organizations we believe can effect change. In the last two weeks, employees have donated $12k and the company matched $12k ($24k total) to a number of related non-profits, including:

More we can do now: Calls to action

Advocacy is not new to us—Trail of Bits is among the largest employers of cybersecurity professionals in NYC, and has frequently advocated for policy change as part of Tech:NYC and the Coalition for Responsible Cybersecurity. As an NYC-based company, we urge the NYC Council to take action.

The June 18 legislative session of the NYC Council will be livestreamed, and we’ll be watching. We urge our representatives to:

  • Pass all five bills that were heard in the last meeting of the Public Safety Committee
  • Pass the POST Act and require reporting on NYPD use of surveillance technology
  • Commit to NYC Budget Justice and reallocate funds towards social programs

While policing is largely a state and local matter in the United States, federal action has a strong effect on state and local policies. We call on the US Congress to:

Local and state action may have the most direct impact on policing practices. If you want to lobby your representatives as an individual, use “Who are my representatives?” to find their contact information and give them a call. Personal, authentic contact with local representatives can be very effective at shaping policy decisions.

If you’re an individual motivated to support a charitable organization, consider reviewing the following resources first:

When donating, strongly consider a charitable donation matching program. If your employer does not offer one, suggest that they sign up for the Technology Partner program from RaisedBy.Us. Trail of Bits uses their service to facilitate donation matching through Brightfunds.

If you are planning to attend a protest, research what local activists in your area are recommending to protect yourself and others. There are widespread disparities in treatment of protesters across the United States: a “March for Families” in NYC may be completely unlike a similarly named event in Oregon. Consider advice from the Legal Aid Society of NYC or Vice (and their digital guide) and put on a mask before attending a protest.

We can always do more

We know our efforts are modest, and that the problems will not be fixed by a few waves of donations and legislation. Our own efforts to advocate for change started small, but they are growing.

We also recognize the diversity deficit in our own company. As part of our effort to close that gap, we are working with diversity and inclusion-focused recruiting groups and conducting implicit bias training. We’ve created the CTF Field Guide to help eliminate the knowledge gap for industry newcomers and we host yearly winternships that provide inroads for people new to computer security. We’re also increasing the matching for our existing charity matching program and making the most of our diversity-focused donation to the Summercon Foundation. Finally, to help ensure this is not a one-off effort, we are listening to our employees and community to hold us accountable.

The protests have been extraordinarily effective in moving legislation forward; so much so, it can be tough to keep up. We realize it’s only a long-overdue beginning, but the more we know about what’s gaining ground, the better we can advocate for it. To help, we’ve assembled a summary of the changes we’ve seen at the NYC, New York State, and federal levels.

Upgradeable contracts made safer with Crytic

Upgradeable contracts are not as safe as you think. Architectures for upgradeability can be flawed, locking contracts, losing data, or sabotaging your ability to recover from an incident. Every contract upgrade must be carefully reviewed to avoid catastrophic mistakes. The most common delegatecall proxy comes with drawbacks that we’ve catalogued before.

Crytic now includes a comprehensive suite of 17 upgradeability checks to help you avoid these pitfalls.

The how-to

Reviewing upgradeable contracts is a complex low-level task that requires investigating the storage layout and organization of functions in memory. We created a sample token that supports upgradeability to help walk through the steps in crytic/upgradeability-demo. This simple demo repository includes:

  • MyToken, our initial implementation of a simple token
  • Proxy, our proxy

Any call to Proxy will use a delegatecall on MyToken to execute its logic, while the storage variables will be held on Proxy. This is a standard setup for most upgradeable contracts.

Consider these two contracts are already deployed on mainnet. However, the code for MyToken has become stale and you need to change its features. It’s time for MyTokenV2! The code for MyTokenV2 is similar to MyToken, with the exception of removing the init() function and its associated state variable.

Let’s use Crytic to ensure that deploying MyTokenV2 does not introduce new security risks.

Configuration

First, tell Crytic about your upgradeable contracts. Go to your Crytic settings and find this panel:

Here you can configure:

  1. The contract being upgraded
  2. The proxy used
  3. The new version of the contract

Note: (1) and (2) are optional; Crytic will run as many checks as are appropriate.

For example, if you only have the upgradeable contract, and no proxy or new version, Crytic can already look for flaws in the initialization schema. If you have the upgradeable contract and the proxy, but no new version, Crytic can look for function collisions between the implementation and the proxy. If you have multiple upgradeable contracts, or multiple proxies, you can then configure any combination that fits your setup.

Back to MyToken, we have these three contracts:

Once we configure Crytic, the upgradeability checks will run on every commit and pull request, similar to security checks and unit tests:

Crytic’s Findings

Occasionally, Crytic will find serious errors in your upgradeability code (oh no!). We built one such issue into our demo. Here’s what it looks like when Crytic discovers a security issue:

The was_init storage variable was removed, so balances has a different storage offset in MyToken and MyTokenV2, breaking the storage layout of the contract.

This is a common mistake that can be particularly difficult to find by hand in complex codebases with many contracts and inheritances—but Crytic will catch the issue for you!

What else can Crytic find?

Crytic will review (depending on your configuration):

  • Storage layout consistency between the upgrades and the proxy
  • Function collisions between the proxy and the implementation
  • Correct initialization schema
  • Best practices for variable usage

Here’s the detailed list of checks:

Num What it Detects Impact Proxy needed New version needed
1 Variables that should not be constant High X
2 Function ID collision High X
3 Function shadowing High X
4 Missing call to init function High
5 initializer() is not called High
6 Init function called multiple times High
7 Incorrect vars order in v2 High X
8 Incorrect vars order in the proxy High X
9 State variables with an initial value High
10 Variables that should be constant High X
11 Extra vars in the proxy Medium X
12 Variable missing in the v2 Medium X
13 Extra vars in the v2 Informational X
14 Initializable is not inherited Informational
15 Initializable is missing Informational
16 Initialize function that must be called Informational
17 initializer() is missing Informational

Check your contracts with Crytic

In addition to finding 90+ vulnerabilities, Crytic can now detect flaws in your upgradeability code. It is the only platform that can protect your codebase in depth for so many issues. If you want to avoid catastrophic mistakes, use Crytic before deploying any upgradeable contract.

Got questions? Join our Slack channel (#crytic) or follow @CryticCI on Twitter.

ECDSA: Handle with Care

The elliptic curve digital signature algorithm (ECDSA) is a common digital signature scheme that we see in many of our code reviews. It has some desirable properties, but can also be very fragile. For example, LadderLeak was published just a couple of weeks ago, which demonstrated the feasibility of key recovery with a side channel attack that reveals less than one bit of the secret nonce.

ECDSA is fragile and must be handled with care

This post will walk you through:

  • the various ways in which ECDSA nonce bias can be exploited
  • how simple it is to attack in practice when things go wrong, and
  • how to protect yourself.

You’re probably familiar with attacks against ECDSA. Some attacks are trivial, and some involve advanced Fourier analysis and lattice math. Although these attacks can be complicated, I hope this post will demonstrate that they are easy to implement in practice. In fact, even if you don’t know anything about lattices, after reading this blog post you will be able to leverage a lattice attack to break ECDSA signatures produced with a very slightly faulty RNG using less than 100 lines of python code.

Math disclaimer: to read this post, you will need to be somewhat familiar with mathematical groups, recognizing that they have a binary operation and a group generator. You do not need to be an expert on elliptic curves; you just need to know that elliptic curves can be used to form a mathematical group (and, thus, have a concept of addition and scalar multiplication). Familiarity with other math concepts like lattices is helpful, but not required.

DSA primer

ECDSA is a specific form of the digital signature algorithm (DSA). DSA is a pretty common digital signature scheme, and is defined with three algorithms: key generation, signing, and verification. The key generation algorithm generates a private and public key; the private key is responsible for creating signatures; and the public key is responsible for verifying signatures. The signature algorithm takes as input a message and private key, and produces a signature. The verification algorithm takes as input a message, signature, and public key, and returns true or false, indicating whether the signature is valid.

DSA is defined over any mathematical group, and this scheme is secure as long as the discrete log problem is hard over this group. The group typically used is the integers modulo a prime, p. Along with this group, we will have a group generator, g, and some cryptographically secure hash function, H. We can assume that p, g, and H will all be publicly known.

Key generation works by first randomly selecting a value, x, from the integers mod p. Then the value y = gx mod p is computed. The private signing key is set to x, and the public key is y. The signing key must be kept secret, as this is what allows signatures to be made.

The signing algorithm produces a signature from a message, m, and the secret key, x. First, a random element of the group, k, is generated. This is known as the nonce, which is important when talking about attacks. Then, the values r = gk mod p and s = (k-1(H(m) + xr)) mod p are computed. Here k-1 is the group inverse, and H(m) is the result of computing the hash of m and interpreting the result as an integer mod p. The signature is defined to be the pair (r,s). (Note: if either of the r or s values equal 0, the algorithm restarts with a new k value).

The verification algorithm receives as input the signature, (r,s), the message, m, and the public key, y. Let ŝ = s-1, then the algorithm outputs true if and only if r,s ≠ 0 and r = (gH(m)yr)ŝ. This verification check works because gH(m)yr = gH(m)+xr = gks, and so (gH(m)yr)ŝ = gk = r.

A digital signature scheme is considered secure if it is unforgeable. Unforgeability has a formal cryptographic meaning, but on a high level it means that you cannot produce signatures without knowing the secret key (unless you have copied an already existing signature created from the secret key). DSA is proven to be unforgeable under the discrete log assumption.

ECDSA

DSA is defined over a mathematical group. When DSA is used with the elliptic curve group as this mathematical group, we call this ECDSA. The elliptic curve group consists of elliptic curve points, which are pairs (x,y) that satisfy the equation y2 = x3 + ax + b, for some a,b. For this blog post, all you need to know is that, using elliptic curves, you can define a finite group, which means you obtain a group generator, g (an elliptic curve point), and addition and scalar multiplication operations just like you can with integers. Since they form a finite group, the generator, g, will have a finite order, p. This blog post will not explain or require you to know how these elliptic curve operations work, but If you’re curious, I encourage you to read more about them here.

ECDSA works the same way as DSA, except with a different group. The secret key, x, will still be a random value from the integers mod p. Now, the public key, y, is still computed as y = gx, except now g is an elliptic curve point. This means that y will also be an elliptic curve point (before, y was an integer mod p). Another difference occurs in how we compute the value r. We still generate a random nonce, k, as an integer mod p, just as before. We will compute gk, but again, g is an elliptic curve point, and so gk is as well. Therefore, we can compute (xk,yk) = gk, and we set r = xk. Now, the s value can be computed as before, and we obtain our signature (r,s), which will still be integers mod p as before. To verify, we need to adjust for the fact that we’ve computed r slightly differently. So, as before, we compute the value (gH(m)yr)ŝ, but now this value is an elliptic curve point, so we take the x-coordinate of this point and compare it against our r value.

Recovering secret keys from reused nonces

Now that we understand what ECDSA is and how it works, let’s demonstrate its fragility. Again, since it’s a digital signature scheme, it is imperative that the secret key is never revealed to anyone other than the message signer. However, if a signer ever releases a signature and also releases the nonce they used, an attacker can immediately recover the secret key. Say I release a signature (r,s) for a message m, and I accidentally reveal that I used the nonce k. Since s = (k-1(H(m) + xr)), we can easily compute the secret key:

s = (k-1(H(m) + xr))

ks = H(m) + xr

ksH(m) = xr

x = r-1(ksH(m))

Therefore, not only does a signer need to keep their secret key secret, but they also must keep all of their nonces they ever generate secret.

Even if the signer keeps every nonce secret, if they accidentally repeat a single nonce (even for different messages), the secret key can immediately be recovered as well. Let (r,s1) and (r,s2) be two signatures produced on messages m1 and m2 (respectively) from the same nonce, k—since they have the same nonce, the r values will be the same, so this is very easily detected by an attacker:

s1 = k-1(H(m1) + xr) and s2 = k-1(H(m2) + xr)

s1s2 = k-1(H(m1) – H(m2))

k(s1s2) = H(m1) – H(m2)

k = (s1s2)-1(H(m1) – H(m2))

Once we have recovered the nonce, k, using the formula above, we can then recover the secret key by performing the previously described attack.

Let’s take a moment to digest this.

If a nonce for a signature is ever revealed, the secret key can immediately be recovered, which breaks our entire signature scheme. Further, if two nonces are ever repeated, regardless of what the messages are, an attacker can easily detect this and immediately recover the secret key, again breaking our entire scheme. That is pretty fragile, and these are just the easy attacks!

Attacking ECDSA from leaked and biased nonces

It turns out that even leaking small parts of the nonce can also be very damaging to the signature scheme. In 1999, work by Howgrave-Graham and Smart demonstrated the feasibility of using lattice attacks to break DSA from partial nonce leakage. Later, Nguyen and Shparlinski improved on their work, and were able to recover secret keys on 160-bit DSA (here 160-bit refers to p), and later ECDSA, by knowing only three bits of each nonce from 100 signatures.

Later, Mulder et al were able to perform more attacks on partial nonce leakage. They used a different, Fourier transform-based attack derived from work by Bleichenbacher. Using these techniques, and knowing only five bits of each nonce from 4,000 signatures, they were able to recover secret keys from 384-bit ECDSA, and leveraged their techniques to break 384-bit ECDSA running on a smart card.

You may have heard of the Minerva attack: Several timing side channels were leveraged to recover partial nonce leakage, and these lattice attacks were performed on a wide variety of targets. With enough signatures, they were able to successfully attack targets even when only the size of the nonce was leaked!

Even worse, a few weeks back, the LadderLeak attack further improved on Fourier analysis attacks, and now ECDSA secret keys can be recovered if only 1 bit of the nonce is leaked! In fact, the single bit can be leaked with probability less than 1, so attackers technically need less than 1 bit. This was leveraged to attack a very small leakage in Montgomery ladders in several OpenSSL versions.

Again, let’s digest this. Even when only a few bits of the nonce are leaked—or further, even if only the size of the nonce is leaked—or further, if one bit of nonce is leaked—then, most of the time, the entire signature scheme can be broken by observing enough signatures. This is incredibly fragile!

On top of this, even if you manage to keep all of your nonces secret and never repeat a nonce, and you never leak any bits of your nonce to an attacker, you still aren’t fully protected! Work by Breitner and Heninger showed that a slightly faulty random number generator (RNG) can also catastrophically break your scheme by leveraging lattice attacks. Specifically, when using 256-bit ECDSA, if your RNG introduces a bias of just 4 bits in your nonce, your signature scheme can be broken completely by a lattice attack, even if we don’t know what those biased values are.

These attacks involve some complicated math. Like most cryptographic attacks, they formulate a series of ECDSA signatures as another hard math problem. In this case, the problem is known as the Hidden Number Problem. The Hidden Number Problem has been fairly widely studied by other researchers, so there are a lot of techniques and algorithms for solving it. This means that once we figure out how to mold a series of ECDSA signatures into an instance of the Hidden Number Problem, we can then apply existing techniques to find an ECDSA secret key.

Breaking ECDSA from bad nonces

Now, Fourier analysis, Hidden Number Problems, and lattice attacks are more complicated than your everyday cryptography, and they seem daunting. However, the fact that these attacks involve complicated math may fool some people into thinking they’re very difficult to implement in practice. This is not the case. As I mentioned in the beginning, I will teach you how to implement these attacks using fewer than 100 lines of Python code. Moreover, to perform this attack, you actually don’t need to know anything about the Hidden Number Problem or lattices. The only lattice component we need is access to the LLL algorithm. However, we can treat this algorithm as a black box; we don’t need to understand how it works or what it is doing.

We’ll be attacking signatures produced from bad nonces (i.e., bad RNG). Specifically, these nonces will have a fixed prefix, meaning their most significant bits are always the same. (The attack still works even if the fixed bits aren’t the most significant bits, but this is the easiest to follow). When using LLL, all we have to know is that we will input a matrix of values, and the algorithm will output a matrix of new values. If we use a series of ECDSA signatures to construct a matrix in a particular way, LLL will output a matrix that will allow us to recover the ECDSA private key. More specifically, because of the way we construct this matrix, one of the rows of the output of LLL will contain all of the signatures’ nonces. (It requires more complicated math to understand why, so we won’t discuss it here, but if you’re curious, see section 4 of this paper). Once we recover the nonces, we can use the basic attack described above to recover the secret key.

To perform the attack we’ll need access to an ECDSA and an LLL library in python. I chose this ECDSA library, which allows us to input our own nonces (so we can input nonces from bad RNGs to test our attack), and this LLL library. We’ll perform this attack on the NIST P-256 elliptic curve, beginning with the easiest form of the attack: We are given two signatures generated from only 128-bit nonces. First, we generate our signatures.

import ecdsa
import random

gen = ecdsa.NIST256p.generator
order = gen.order()
secret = random.randrange(1,order)

pub_key = ecdsa.ecdsa.Public_key(gen, gen * secret)
priv_key = ecdsa.ecdsa.Private_key(pub_key, secret)

nonce1 = random.randrange(1, 2**127)
nonce2 = random.randrange(1, 2**127)

msg1 = random.randrange(1, order)
msg2 = random.randrange(1, order)

sig1 = priv_key.sign(msg1, nonce1)
sig2 = priv_key.sign(msg2, nonce2)

Now that we have our signatures, we need to craft the matrix we’ll input into the LLL algorithm:

Matrix that we will input into the LLL algorithm

Here N is the order of NIST P-256 (ord in code snippet above), B is the upper bound on the size of our nonces (which will be 2128 in this example, because both nonces are only 128 bits in size); m1 and m2 are the two random messages; and (r1, s1) and (r2,s2) are the two signature pairs. In our python code, our matrix will look like this (here modular_inv is a function for computing the inverse mod N):

r1 = sig1.r
s1_inv = modular_inv(sig1.s, order)
r2 = sig2.r
s2_inv = modular_inv(sig2.s, order)

matrix = [[order, 0, 0, 0], [0, order, 0, 0],
[r1*s1_inv, r2*s2_inv, (2**128) / order, 0],
[msg1*s1_inv, msg2*s2_inv, 0, 2**128]]

Now we’ll input this matrix into the black-box LLL algorithm, which will return a new matrix to us. For reasons that don’t matter here, one of the rows of this returned matrix will contain the nonces used to generate the two signatures. If we knew more about what the algorithm is actually doing, we could probably predict where the nonce is going to be. But since we don’t care about the details, we are just going to check every row in the returned matrix to see if we can find the secret key. Remember, we already showed how to recover the private key once we have the nonce, k. Specifically, we compute r-1(ksH(m)). An attacker in the real world would have access to the public key corresponding to these signatures. Therefore, to determine if we have found the correct private key, we will compute its corresponding public key and compare it against the known public key. The attack will look like this:

import olll

new_matrix = olll.reduction(matrix, 0.75)
r1_inv = modular_inv(sig1.r, order)
s1 = sig1.s

for row in new_matrix:
    potential_nonce_1 = row[0]
    potential_priv_key = r1_inv * ((potential_nonce_1 * s1) - msg1)

    # check if we found private key by comparing its public key with actual public key
    if ecdsa.ecdsa.Public_key(gen, gen * potential_priv_key) == pub_key:
        print(&quot;found private key!&quot;)

I should mention that there is a noticeable failure rate for this basic attack. If you run the code presented to you, you will notice this as well. But again, for the purposes of this post, don’t worry about these specifics. Also, this failure rate should decrease if you perform this same attack with more signatures.

Hopefully at this point I’ve shown why these attacks aren’t so complicated. We were able to recover the secret key from just two signatures, and we didn’t do anything overly complicated. That said, some of you would probably argue that being able to attack signatures with only 128-bit nonces isn’t that interesting. So let’s move on to more realistic attacks.

Exploiting real-world ECDSA bugs

You may have heard of a recent bug in the randomness generated in Yubikeys. Essentially, bad randomness caused as many as 80 bits of the nonce to be fixed to the same value. Attacking this real-world bug will not be much more difficult than the attack we just performed above, except we don’t know what the fixed 80-bit values are (in the previous example, we knew the fixed 128 bits were all set to 0). To overcome this, we need to add a trick to our attack.

Imagine we receive a collection of signatures whose nonces have 80 fixed bits. For ease of explanation, we will assume these 80 bits are the most significant bits (the attack is still feasible if this is not the case; you simply shift the fixed bits to the most significant bits by multiplying each signature by a power of 2). Even though we don’t know what these 80 bits are, we know that if we subtract any two nonces, the 80 most significant bits of their difference will all be zeros. Therefore, we are going to perform the same attack as above, except with our signature values subtracted. Specifically, given a set of n signatures and messages, we will build the following matrix:

Matrix that we will input into the LLL algorithm when the nonce bias is unknown

This time, we will again input this matrix into LLL and receive a new matrix back. However, since we subtracted the nth value from every entry in this matrix, instead of receiving a row full of nonces, we will actually receive a row with the difference between each nonce and the nth nonce. In other words, the matrix returned from LLL will give us the value k1 – kn, the difference between the nonces for signatures 1 and n. It takes some algebraic manipulation, but we can still recover the secret key from this value using the following formula:

s1 = k1-1(m1 + xr1) and sn = kn-1(mn + xrn)

s1k1 = m1 + xr1 and snkn = mn + xrn

k1 = s1-1(m1 + xr1) and kn = sn-1(mn + xrn)

k1kn = s1-1(m1 + xr1) – sn-1(mn + xrn)

s1sn(k1kn) = sn(m1 + xr1) – s1(mn + xrn)

s1sn(k1kn) = xsnr1 – xs1rn + snm1 – s1mn

x(s1rn – snr1) = snm1 – s1mn s1sn(k1kn)

Secret key = x = (rns1 – r1sn)-1 (snm1 – s1mn – s1sn(k1 – kn))

With all of that context, let’s exploit the Yubikey bug. If signatures are produced from nonces with 80 fixed bits, we only need five signatures to recover the secret key. We will build the matrix above with n = 6 to reduce the error rate:

# generate 80 most significant bits, nonce must be less than order
yubikey_fixed_prefix = random.randrange(2**176, order)

msgs = [random.randrange(1, order) for i in range(6)]
nonces = [random.randrange(1, 2**176) + yubikey_fixed_prefix for i in range(6)]
sigs = [priv_key.sign(msgs[i],nonces[i]) for i in range(6)]

matrix = [[order, 0, 0, 0, 0, 0, 0],
[0, order, 0, 0, 0, 0, 0],
[0, 0, order, 0, 0, 0, 0],
[0, 0, 0, order, 0, 0, 0],
[0, 0, 0, 0, order, 0, 0]]

row, row2 = [], []
[msgn, rn, sn] = [msgs[-1], sigs[-1].r, sigs[-1].s]
rnsn_inv = rn * modular_inv(sn, order)
mnsn_inv = msgn * modular_inv(sn, order)

# 2nd to last row: [r1(s1^-1) - rn(sn^-1), ... , rn-1(sn-1^-1) - rn(sn^-1), 2^176/order, 0 ]
# last row: [m1(s1^-1) - mn(sn^-1), ... , mn-1(sn-1^-1) - mn(sn^-1), 0, 2^176]
for i in range(5):
    row.append((sigs[i].r * modular_inv(sigs[i].s, order)) - rnsn_inv)
    row2.append((msgs[i] * modular_inv(sigs[i].s, order)) - mnsn_inv)

# add last elements of last two rows, B = 2**(256-80) for yubikey
row.append((2**176) / order)
row.append(0)
row2.append(0)
row2.append(2**176)

matrix.append(row)
matrix.append(row2)

new_matrix = olll.reduction(matrix, 0.75)

for row in new_matrix:
    potential_nonce_diff = row[0]

    # Secret key = (rns1 - r1sn)-1 (snm1 - s1mn - s1sn(k1 - kn))
    potential_priv_key = (sn * msgs[0]) - (sigs[0].s * msgn) - (sigs[0].s * sn * potential_nonce_diff)
    potential_priv_key *= modular_inv((rn * sigs[0].s) - (sigs[0].r * sn), order)

    # check if we found private key by comparing its public key with actual public key
    if ecdsa.ecdsa.Public_key(gen, gen * potential_priv_key) == pub_key:
        print(&quot;found private key!&quot;)

That’s it! We just exploited a real-world bug in about 50 lines of python.

Some might further argue that although this was an actual bug, systems producing 80 fixed bits are rare. However, this attack can be much more powerful than shown in this one example! For 256-bit elliptic curves, this attack will work even if only 4 bits of the nonce are fixed. Moreover, the attack does not become more complicated to implement. You simply need to increase the dimension of your lattice—i.e., in the matrix figure above, just increase the value of n and repeat the attack—nothing else! This will increase the running time of your attack, but not the complexity to implement. You could copy that code snippet and recover ECDSA secret keys generated from nonces with as little as 4 bits of bias. On top of that, the attack against nonce leakage is a similar level of difficulty.

Hopefully, I’ve now convinced you of the fragility of ECDSA and how easily it can be broken in practice when things go wrong.

By the way, some of you may be wondering how we determine the value n. Remember, n is the number of signatures we need to recover the secret key. When the nonce had the first 128 bits fixed to 0, this value was 2 (this value is 3 when 128 bits are fixed, but we don’t know to what value they are fixed). When the nonce had 80 randomly fixed bits, this value was 5. If you consult the relevant publications around these attacks, you can find the exact formula and derivation of this value for a given number of fixed bits. For simplicity, I derived these values empirically by attempting this attack with different numbers of signatures on different amounts of fixed bits. I’ve compiled the results into the figure below:

The number of signatures required to use this attack for a given number of fixed nonce bits (derived empirically)

Protecting your ECDSA signatures

If ECDSA is so fragile, how can users protect themselves? Ideally, we recommend that you use EdDSA instead of ECDSA, which handles nonce generation much more safely by eliminating the use of RNGs. Further, Ed25519, which is EdDSA over Curve25519, is designed to overcome the side-channel attacks that have targeted ECDSA, and it is currently being standardized by NIST.

If you’re required to use ECDSA, proceed with caution and handle with care! ECDSA is fragile, but it is not broken. As we saw, it is imperative that nonces used for ECDSA signatures are never repeated, never revealed (even partially), and generated safely.

To protect yourself from nonce leakage, the mitigation strategy is to write the implementation to operate in “constant time.” However, guaranteeing this can be very difficult, as we saw with OpenSSL. For instance, code can appear to be constant time, but then an optimizing compiler can introduce non-constant time behavior. Further, some assembly instructions are constant time in some architectures or processor models, but not in others. (Read more about this here).

Another technique for mitigating nonce leakage is known as blinding, where random numbers are included in your arithmetic to randomize timing information. However, evaluating the security of your blinding implementation can be tricky, and slightly weak blinding schemes can be problematic.

With both of these mitigations, keep in mind that the amount of nonce leakage is on the order of a single bit, so even the slightest changes by an optimizing compiler or the slightest leakage from your blinding technique can be catastrophic to your signature scheme.

To ensure that nonces are generated safely, most people recommend using RFC 6979, which specifies a way to securely generate nonces deterministically (i.e., without an RNG), using the message and secret key as entropy. This protocol to generate nonces eliminates the problem of bad RNGs, which can be problematic for devices such as Yubikeys where generating randomness securely is difficult. The signature scheme EdDSA actually uses a similar nonce generation method by default to avoid bad RNGs.

If you are using ECDSA in your system, I encourage you to consider all of those recommendations. Hopefully, with enough care, your signature scheme won’t end up like this:

This is what happens to ECDSA when you don’t generate your nonces safely

We’re always experimenting and developing tools to help you work faster and smarter. Need help with your next project? Contact us!

How to check if a mutex is locked in Go

TL;DR: Can we check if a mutex is locked in Go? Yes, but not with a mutex API. Here’s a solution for use in debug builds.

Although you can Lock() or Unlock() a mutex, you can’t check whether it’s locked. While it is a reasonable omission (e.g., due to possible race conditions; see also Why can’t I check whether a mutex is locked?), having such functionality can still be useful for testing whether the software does what it is supposed to do.

In other words, it would be nice to have an AssertMutexLocked function solely for debug builds, which could be used like this:

// this method should always be called with o.lock locked
func (Object* o) someMethodImpl() {
    AssertMutexLocked(&amp;o.lock)
    // (...)
}

Having such a function would allow us to confirm the assumption that a given mutex is locked and find potential bugs when it’s added into an existing codebase. In fact, there was a GitHub issue about adding this exact functionality in the official Go repository (golang/go#1366), but it was closed with a WontFix status.

I also learned via the great grep.app project that many projects have similar preconditions about mutexes, such as google/gvisor, ghettovoice/gossip, vitessio/vitess, and others.

Now let’s implement the MutexLocked (and other) functions.

Checking if a mutex is locked

To check whether a mutex is locked, we have to read its state. The sync.Mutex structure contains two fields:

type Mutex struct {
	state int32
	sema  uint32
}

The state field’s bits correspond to the following flags (source):

const (
	mutexLocked = 1 << iota // mutex is locked
	mutexWoken
	mutexStarving
	mutexWaiterShift = iota
	// (...)

So if a mutex is locked, its state field has the mutexLocked (1) bit set. However, we can’t just access the state field directly from a Go program, because this field is not exported (its name does not start with a capital letter). Luckily, the field can still be accessed with Go reflection, which I used in the code below when I implemented the functions that allow us to check if a given sync.Mutex or sync.RWMutex is locked.

package main

import (
	"fmt"
	"reflect"
	"sync"
)

const mutexLocked = 1

func MutexLocked(m *sync.Mutex) bool {
	state := reflect.ValueOf(m).Elem().FieldByName("state")
	return state.Int()&amp;mutexLocked == mutexLocked
}

func RWMutexWriteLocked(rw *sync.RWMutex) bool {
	// RWMutex has a "w" sync.Mutex field for write lock
	state := reflect.ValueOf(rw).Elem().FieldByName("w").FieldByName("state")
	return state.Int()&amp;mutexLocked == mutexLocked
}

func RWMutexReadLocked(rw *sync.RWMutex) bool {
	return reflect.ValueOf(rw).Elem().FieldByName("readerCount").Int() &gt; 0
}

func main() {
	m := sync.Mutex{}
	fmt.Println("m locked =", MutexLocked(&amp;m))
	m.Lock()
	fmt.Println("m locked =", MutexLocked(&amp;m))
	m.Unlock()
	fmt.Println("m locked =", MutexLocked(&amp;m))

	rw := sync.RWMutex{}
	fmt.Println("rw write locked =", RWMutexWriteLocked(&amp;rw), " read locked =", RWMutexReadLocked(&amp;rw))
	rw.Lock()
	fmt.Println("rw write locked =", RWMutexWriteLocked(&amp;rw), " read locked =", RWMutexReadLocked(&amp;rw))
	rw.Unlock()
	fmt.Println("rw write locked =", RWMutexWriteLocked(&amp;rw), " read locked =", RWMutexReadLocked(&amp;rw))
	rw.RLock()
	fmt.Println("rw write locked =", RWMutexWriteLocked(&amp;rw), " read locked =", RWMutexReadLocked(&amp;rw))
	rw.RLock()
	fmt.Println("rw write locked =", RWMutexWriteLocked(&amp;rw), " read locked =", RWMutexReadLocked(&amp;rw))
	rw.RUnlock()
	fmt.Println("rw write locked =", RWMutexWriteLocked(&amp;rw), " read locked =", RWMutexReadLocked(&amp;rw))
	rw.RUnlock()
	fmt.Println("rw write locked =", RWMutexWriteLocked(&amp;rw), " read locked =", RWMutexReadLocked(&amp;rw))
}

We can see this program’s output below:

m locked = false
m locked = true
m locked = false
rw write locked = false  read locked = false
rw write locked = true  read locked = false
rw write locked = false  read locked = false
rw write locked = false  read locked = true
rw write locked = false  read locked = true
rw write locked = false  read locked = true
rw write locked = false  read locked = false

And this can later be used to create AssertMutexLocked and other functions. To that end, I’ve created a small library with these functions at trailofbits/go-mutexasserts—which enables the assertion checks only in builds with a debug tag.

Note: Although there are other tools for detecting race conditions in Go, such as Go’s race detector or OnEdge from Trail of Bits, these tools will detect problematic situations only once they occur, and won’t allow you to assert whether the mutex precondition holds.

We’re always developing tools to help you work faster and smarter. Need help with your next project? Contact us!

Breaking the Solidity Compiler with a Fuzzer

Over the last few months, we’ve been fuzzing solc, the standard Solidity smart contract compiler, and we’ve racked up almost 20 (now mostly fixed) new bugs. A few of these are duplicates of existing bugs with slightly different symptoms or triggers, but the vast majority are previously unreported bugs in the compiler.

This has been a very successful fuzzing campaign and, to our knowledge, one of the most successful ever launched against solc. This isn’t the first time solc has been fuzzed with AFL; fuzzing solc via AFL is a long-standing practice. The compiler has even been tested on OSSFuzz since January of 2019. How did we manage to find so many previously undiscovered bugs–and bugs worth fixing fairly quickly, in most cases? Here are five important elements of our campaign.

1. Have a secret sauce

Fortunately, it’s not necessary that the novelty actually be kept secret, just that it be genuinely new and somewhat tasty! Essentially, we used AFL in this fuzzing campaign, but not just any off-the-shelf AFL. Instead, we used a new variant of AFL expressly designed to help developers fuzz language tools for C-like languages without a lot of extra effort.

The changes from standard AFL aren’t particularly large; this fuzzer just adds a number of new AFL havoc mutations that look like those used by a naive, text-based source code mutation testing tool (i.e., universalmutator). The new approach requires less than 500 lines of code to implement, most of it very simple and repetitive.

This variation of AFL is part of a joint research project with Rijnard van Tonder at Sourcegraph, Claire Le Goues at CMU, and John Regehr at the University of Utah. In our preliminary experiments comparing the method to plain old AFL, the results look good for solc and the Tiny C Compiler, tcc. As science, the approach needs further development and validation; we’re working on that. In practice, however, this new approach has almost certainly helped us find many new bugs in solc.

We found a few of the early bugs reported using plain old AFL in experimental comparisons, and some of the bugs we found easily with our new approach we also eventually duplicated using AFL without the new approach—but the majority of the bugs have not been replicated in “normal” AFL. The graph below shows the number of issues we submitted on GitHub, and underscores the significance of the AFL changes:

The big jump in bug discovery in late February came immediately after we added a few smarter mutation operations to our version of AFL. It could be coincidence, but we doubt it; we manually inspected the files generated and saw a qualitative change in the AFL fuzzing queue contents. Additionally, the proportion of files AFL generated that were actually compilable Solidity jumped by more than 10%.

2. Build on the work of others

Fuzzing a system that has never been fuzzed can certainly be effective; the system’s “resistance” to the kinds of inputs fuzzers generate is likely to be extremely low. However, there can also be advantages to fuzzing a system that has been fuzzed before. As we noted, we aren’t the first to fuzz solc with AFL. Nor were previous efforts totally freelance ad-hoc work; the compiler team was involved in fuzzing solc, and had built tools we could use to make our job easier.

The Solidity build includes an executable called solfuzzer that takes a Solidity source file as input and compiles it using a wide variety of options (with and without optimization, etc.) looking for various invariant violations and kinds of crashes. Several of the bugs we found don’t exhibit with the normal solc executable unless you use specific command-line options (especially optimization) or run solc in certain other, rather unusual, ways; solfuzzer found all of these. We also learned from the experience of others that a good starting corpus for AFL fuzzing is in the test/libsolidity/syntaxTests directory tree. This was what other people were using, and it definitely covers a lot of the “what you might see in a Solidity source file” ground.

Of course, even with such existing work, you need to know what you’re doing, or at least how to look it up on Google. Nothing out there will tell you that simply compiling solc with AFL won’t actually produce good fuzzing. First, you need to notice that that the fuzzing results in a very high map density, which measures the degree to which you’ve “filled” AFL’s coverage hash. Then you either need to know the advice given in the AFL User Guide, or search for the term “afl map density” and see that you need to recompile the whole system with AFL_INST_RATIO set to 10 to make it easier for the fuzzer to identify new paths. This only happens, according to the AFL docs, when “you’re fuzzing extremely hairy software.” So if you’re used to fuzzing compilers, you probably have seen this before, but otherwise you probably haven’t run into map density problems.

3. Play with the corpus

You may notice that the last spike in submitted bugs comes long after the last commit made to our AFL-compiler-fuzzer repository. Did we make local changes that aren’t yet visible? No, we just changed the corpus we used for fuzzing. In particular, we looked beyond the syntax tests, and added all the Solidity source files we could find under test/libsolidity. The most important thing this accomplished was allowing us to find SMT checker bugs, because it brought in files that used the SMTChecker pragma. Without a corpus example using that pragma, AFL has essentially no chance of exploring SMT Checker behaviors.

The other late-bloom bugs we found (when it seemed impossible to find any new bugs) mostly came from building a “master” corpus including every interesting path produced by every fuzzer run we’d performed up to that point, and then letting the fuzzer explore it for over a month.

4. Be patient

Yes, we said over a month (on two cores). We ran over a billion compilations in order to hit some of the more obscure bugs we found. These bugs were very deep in the derivation tree from the original corpus. Bugs we found in the Vyper compiler similarly required some very long runs to discover. Of course, if your fuzzing effort involves more than just playing around with a new technique, you may want to throw machines (and thus money) at the problem. But according to an important new paper, you may need to throw exponentially more machines at the problem if that’s your only approach.

Moreover, for feedback-based fuzzers, just using more machines may not produce some of the obscure bugs that require a long time to find; there’s not always a shortcut to a bug that requires a mutation of a mutation of a mutation of a mutation…of an original corpus path. Firing off a million “clusterfuzz” instances will produce lots of breadth, but it doesn’t necessarily achieve depth, even if those instances periodically share their novel paths with each other.

5. Do the obvious, necessary things

There’s nothing secret about reducing your bug-triggering source files before submitting them, or trying to follow the actual issue submission guidelines of the project you’re reporting bugs to. And, of course, even if it’s not mentioned in those guidelines, performing a quick search to avoid submitting duplicates is standard. We did those things. They didn’t add much to our bug count, but they certainly sped up the process of recognizing the issues submitted as real bugs and fixing them.

Interestingly, not much reduction was usually required. For the most part, just removing 5-10 lines of code (less than half the file) produced a “good-enough” input. This is partly due to the corpus, and (we think) partly due to our custom mutations tending to keep inputs small, even beyond AFL’s built-in heuristics along those lines.

What did we find?

Some bugs were very simple problems. For instance, this contract used to cause the compiler to bomb out with the message “Unknown exception during compilation: std::bad_cast”:

contract C {
    function f() public returns (uint, uint) {
        try this() {
        } catch Error(string memory) {
        }
    }
}

The issue was easily fixed by changing a typeError into a fatalTypeError, which prevents the compiler from continuing in a bad state. The commit fixing that was only one line of code (though quite a few lines of new tests).

On the other hand, this issue, which prompted a bug bounty award and made it into a list of important bug fixes for the 0.6.8 compiler release, could produce incorrect code for some string literals. It also required substantially more code to handle the needed quoting.

Even the un-reduced versions of our bug-triggering Solidity files look like Solidity source code. This is probably because our mutations, which are heavily favored by AFL, tend to “preserve source-code-y-ness.” Much of what seems to be happening is a mix of small changes that don’t make files too nonsensical plus combination (AFL splicing) of corpus examples that haven’t drifted too far from normal Solidity code. AFL on its own tends to reduce source code to uncompilable garbage that, even if merged with interesting code, won’t make it past initial compiler stages. But with more focused mutations, splicing can often get the job done, as in this input that triggers a bug that’s still open (as we write):

contract C {
    function o (int256) public returns (int256) {
    assembly {
        c:=shl(1,a)
    }
    }

    int constant c=2 szabo+1 seconds+3 finney*3 hours;
}

The triggering input combines assembly and a constant, but there are no files in the corpus we used that contain both and look much like this. The closest is:

contract C {
  bool constant c = this;
  function f() public {
    assembly {
        let t := c
    }
  }
}

Meanwhile, the closest file containing both assembly and a shl is:

contract C {
    function f(uint x) public returns (uint y) {
        assembly { y := shl(2, x) }
    }

Combining contracts like this is not trivial; no instance much like the particular shl expression in the bug-exposing contract even appears anywhere in the corpus. Trying to modify a constant in assembly isn’t too likely to show up in legitimate code. And we imagine manually producing such strange but important inputs is extremely non-trivial. In this case, as happens so often with fuzzing, if you can think of a contract at all like the one triggering the bug, you or someone else probably could have written the right code in the first place.

Conclusion

It’s harder to find important bugs in already-fuzzed high-visibility software than in never-fuzzed software. However, with some novelty in your approach, smart bootstrapping based on previous fuzzing campaigns (especially for oracles, infrastructure, and corpus content), plus experience and expertise, it is possible to find many never-discovered bugs in complex software systems, even if they are hosted on OSSFuzz. In the end, even our most aggressive fuzzing only scratches the surface of truly complex software like a modern production compiler—so cunning, in addition to brute force, is required.

We’re always developing tools to help you work faster and smarter. Need help with your next project? Contact us!