Differential fuzz testing upgradeable smart contracts with Diffusc

William Bodell

July 07, 2023

On March 28, 2023, SafeMoon, a self-styled “community-focused DeFi token” on Binance Smart Chain, lost the equivalent of $8.9 million in Binance Coin BNB to an exploit in a liquidity pool. The exploit leveraged a simple error introduced in an upgrade to SafeMoon’s SFM token contract, allowing the attacker to burn tokens held in the liquidity pool and artificially inflate their price before selling enough previously acquired tokens to completely drain the pool of wrapped BNB.

Smart contract upgrades are meant to fix bugs, but examples like this highlight how upgradeability can go terribly wrong. Thankfully, such bugs can be avoided with the right testing practices. To that end, it is my pleasure to introduce a new tool to your smart contract security toolbox, Diffusc, which I have been working on since February as an associate at Trail of Bits.

Diffusc combines static analysis with differential fuzz testing to compare two upgradeable smart contract (USC) implementations, which can uncover unexpected differences in behavior before an upgrade is performed on-chain. Built on top of Slither and Echidna, Diffusc performs differential taint analysis, uses the results to generate differential fuzz testing contracts in Solidity, and then feeds them into Echidna for fuzzing. It is, to my knowledge, the first implementation of differential fuzzing for smart contracts and should be used in combination with other auditing tools before performing an upgrade.

If you want to play with the tool right now, head on over to the repo, follow the setup instructions in the README, and test it out on some real-world examples like Compound and Safemoon.

Upgradeable smart contracts

While there are other ways of designing smart contracts for upgradeability, the most common USC pattern by far is the delegatecall-based proxy pattern. In this pattern, a proxy contract stores the address of an implementation contract, which can be changed by the contract owner or admin. There are many sub-patterns, but the key feature is the use of delegatecall in the proxy’s fallback function, which catches all calls to functions not defined in the proxy itself.

Crucially, delegatecall differs from the typical call opcode because it fetches the function code from the target contract but executes it in the context of the proxy, so the proxy’s storage is used for all business logic. This allows the implementation to be swapped out without the need for migrating the state to a new contract. For an in-depth survey of USC proxy patterns, see Proxy Hunting: Understanding and Characterizing Proxy-based Upgradeable Smart Contracts in Blockchains and our Trail of Bits blog posts on upgradeability.

Differential fuzzing

Fuzz testing is a security analysis technique in which randomly generated inputs are fed into the software under test while the fuzzer monitors its execution for errors. There are a variety of flavors, one of which is differential fuzzing, in which two similar implementations are fed the same inputs, with the fuzzer looking for any differences in execution between the two.

There are several fuzzers designed to test smart contracts specifically, of which Echidna is the most mature and feature rich. While fuzzers outside the realm of smart contracts often monitor the software under test for crashes, smart contract fuzzing typically looks for invariant violations. Invariants can be inserted into the contract under test itself (i.e., internal testing) or written in test functions that call into the contract under test from an external contract (i.e., external testing—for more detail see our introduction to common testing approaches).

Differential fuzzing for smart contracts uses external testing, with test functions that take some random input and feed it into matching functions in both implementations and then compare the results of the two calls, asserting that they should be equal.

Diffusc implementation

Diffusc is a human-assisted tool that aims to ease the validation of smart contract upgrades:

It leverages Slither’s static analysis to identify all functions that are impacted by the upgrade.
It generates wrappers to deploy and interact with the contracts. Wrapper contracts come in two flavors: standard mode and fork mode.
The user should review the wrappers for errors, add information that Diffusc could not infer automatically, and add additional invariants and preconditions where appropriate.
Finally, Diffusc leverages Echidna to perform differential fuzzing and to try to find issues with the upgrade. Some failing tests may require additional manual review.

Figure 1. Diffusc architecture at a high level

Using Slither to diff upgrade versions

The first component of Diffusc is a pair of utility extensions in Slither, which will be included in an upcoming release of the static analysis tool. The upgradeability utility primarily does two things:

Compares two USC implementations to generate a diff, augmented by taint analysis to identify unmodified code that can be affected by changes made elsewhere
Identifies the storage slot in which a proxy stores its implementation address

The difficult part is in the implementation comparison and finding code that is affected by the changes.

Finding new and modified functions

To find new functions and variables, we compare the list of function signatures and variables for the two USCs. Missing or modified variables can be found the same way. To find modified functions, we rely on the intermediate representation (IR) of the function (through SlithIR), and we traverse the control flow graph to see if the functions match. This allows us to look for semantic change and not be impacted by changes such as the addition of inline code comments or code formatting.

As an example, consider a somewhat simplified version of the Compound upgrade that introduced a token distribution bug.

In late September to October of 2021, a bug introduced in an upgrade to the Compound protocol’s Comptroller contract caused tens of millions of dollars in COMP tokens to be erroneously distributed to users. After begging—and even threatening—users to return the funds, the Compound community ultimately lost about $40 million in reward tokens, diluting the positions of existing token holders.

One of the new functions, _upgradeSplitCompRewards(), initialized any existing markets that had not accrued any rewards with a new index value in the market’s corresponding supplyState struct. This new function was called by the modified _become() function, which is called as part of the upgrade process.

   function _become(SimpleUnitroller unitroller) public {
       require(msg.sender == unitroller.admin(), &quot;only unitroller admin can change brains&quot;);
       require(unitroller._acceptImplementation() == 0, &quot;change not authorized&quot;);

<pre><code>   // TODO: Remove this post upgrade
   SimpleComptrollerV2(address(unitroller))._upgradeSplitCompRewards();
</code></pre>

}

function _upgradeSplitCompRewards() public {
       require(msg.sender == comptrollerImplementation, &quot;only brains can become itself&quot;);
       uint32 blockNumber = safe32(getBlockNumber(), &quot;block number exceeds 32 bits&quot;);
       for (uint i = 0; i &lt; allMarkets.length; i ++) {
           CompMarketState storage supplyState = compSupplyState[address(allMarkets[i])];
           if (supplyState.index == 0) {
               // Initialize supply state index with default value
               supplyState.index = compInitialIndex;
               supplyState.block = blockNumber;
           }
       }
   }

Figure 2: The new function _upgradeSplitCompRewards(),
which is called by _become()

To confirm that _become() has been modified, the IR between the two versions is compared (the comment on line 5 above is ignored in IR):

Function Comptroller._become(Unitroller) (*)
    Expression: require(bool,string)(msg.sender == unitroller.admin(),only admin can upgrade)
    IRs:
        TMP_50(address) = HIGH_LEVEL_CALL, dest:unitroller(Unitroller), function:admin, args:[]
        TMP_51(bool) = msg.sender == TMP_50
        TMP_52(None) = SOLIDITY_CALL require(bool,string)(TMP_51,only admin can upgrade)
    Expression: require(bool,string)(unitroller._acceptImplementation() == 0,not authorized)
    IRs:
        TMP_53(uint256) = HIGH_LEVEL_CALL, dest:unitroller(Unitroller), function:_acceptImplementation, args:[]
        TMP_54(bool) = TMP_53 == 0
        TMP_55(None) = SOLIDITY_CALL require(bool,string)(TMP_54,change not authorized)
    // New IR from upgrade
    Expression: Comptroller(address(unitroller))._upgradeSplitCompRewards()
    IRs:
        TMP_56 = CONVERT unitroller to address
        TMP_57 = CONVERT TMP_56 to Comptroller
        HIGH_LEVEL_CALL, dest:TMP_57(Comptroller), function:_upgradeSplitCompRewards, args:[]

Figure 3. IR for Comptroller._become() with the new function call highlighted

Taint analysis from the diff

Since we are interested in how these changes might affect other parts of the code, we also perform taint analysis to find other entry points that could be promising to fuzz. We consider an unmodified function to be tainted if it reads or writes to a storage variable that is also written to by a new or modified function, or if the function makes an internal call to a modified function. We consider a variable tainted if it is written to by any new, modified or tainted function.

Take Compound for example. The upgrade introduced two new functions, _initializeMarket and _upgradeSplitCompRewards—either of which can change a market’s supply and borrow states—while replacing two other functions with new ones that have modified signatures, _setCompSpeeds and setCompSpeedInternal. The upgrade also modified seven functions, most notably distributeSupplierComp and distributeBorrowerComp. Together, these new and modified functions taint 21 state variables, including the critical compSupplyState and compBorrowState mappings, as well as 26 functions. The claimComp function at the center of the exploit is tainted because it calls the modified function distributeSupplierComp, which also reads the tainted variable compSupplyState.

function distributeSupplierComp(address cToken, address supplier) internal {
   CompMarketState storage supplyState = compSupplyState[cToken];
   // Double memory supplyIndex = Double({mantissa: supplyState.index});
   uint supplyIndex = supplyState.index;
   // Double memory supplierIndex = Double({mantissa: compSupplierIndex[cToken][supplier]});
   uint supplierIndex = compSupplierIndex[cToken][supplier];

// Update supplier index to current index since we are distributing accrued COMP
   // compSupplierIndex[cToken][supplier] = supplyIndex.mantissa;
   compSupplierIndex[cToken][supplier] = supplyIndex;

// if (supplierIndex.mantissa == 0 && supplyIndex.mantissa > 0) {
   if (supplierIndex == 0 && supplyIndex > compInitialIndex) {
       // Covers case where user supplied tokens before market's supply state was set.
       // Rewards the user with COMP accrued from when supplier rewards were first
       // set for the market.
       supplierIndex = compInitialIndex;  // BUG: line not reached due to new initialization
   }

// Calculate change in the cumulative sum of the COMP per cToken accrued
   Double memory deltaIndex = Double({mantissa: sub_(supplyIndex, supplierIndex)});
   uint supplierTokens = CToken(cToken).balanceOf(supplier);

// Calculate COMP accrued: cTokenAmount * accruedPerCToken
   uint supplierDelta = mul_(supplierTokens, deltaIndex);
   uint supplierAccrued = add_(compAccrued[supplier], supplierDelta);
   compAccrued[supplier] = supplierAccrued;
}

function claimComp() public {
   for (uint i = 0; i < allMarkets.length; i++) {
       CToken cToken = allMarkets[i];
       require(markets[address(cToken)].isListed, "market must be listed");
       updateCompSupplyIndex(address(cToken));
       distributeSupplierComp(address(cToken), msg.sender);
   }
   compAccrued[msg.sender] = grantCompInternal(msg.sender, compAccrued[msg.sender]);
}

Figure 4. The changes made to the distributeSupplierComp() function,
which is called by claimComp()

However, sometimes it may not be enough to fuzz only the new, modified and tainted functions in the contract-under-test itself. For instance, in the case of Compound, a user may also interact with one or more markets (i.e., cToken contracts), each of which makes calls to the Comptroller. Furthermore, some tainted functions in the contract under test (i.e., the Comptroller) may make external calls to other contracts, causing differences in behavior in those as well.

Therefore we also perform cross-contract taint analysis during the comparison by looking for any external calls within new, modified and tainted functions. If we find any, we derive a set of external contracts, each with its own list of tainted functions and variables resulting from the external call. For example, if we look at the grantCompInternal function that gets called by claimComp, we find external calls to Comp.balanceOf and Comp.transfer:

function grantCompInternal(address user, uint amount) internal returns (uint) {
   Comp comp = Comp(getCompAddress());
   uint compRemaining = comp.balanceOf(address(this));
   if (amount > 0 && amount <= compRemaining) {
       comp.transfer(user, amount);
       return 0;
   }
   return amount;
}

Figure 5. The grantCompInternal() function, which contains a call to an
external contract that transfers COMP tokens to the user

Once we find these external calls, we can flag both functions as tainted, as well as the internal Comp.balances mapping and any other functions in Comp that read or write to the balances. This cross-contract analysis completes the taint analysis. Standard taint analysis reduces the number of Comptroller functions to test from 69 to 38, a 45% reduction, while cross-contract analysis reduces the number of Comp functions from 19 to 4 and the number of CErc20/cToken functions from 78 to 16, each of which is a 79% reduction.

Generating differential fuzzing invariant tests for Echidna

Diffusc performs the differential static analysis on the two USC implementations provided, as well as any additional targets specified via command-line argument, using the new Slither utility. With the information gathered during this process, the tool can now begin to automatically generate differential fuzzing invariants in the form of a Solidity test contract.

Choosing the invariant

Typically, when writing invariant tests for a smart contract, we must carefully identify the key invariants, which requires a deep understanding of the contract’s business logic and the space of possible states. This is very important and still applies to testing upgradeable smart contracts, but it cannot be easily automated.

Since the goal when creating Diffusc was always to automate as much as possible, we take a different approach to choosing invariants: for each function we are interested in fuzzing, we create a wrapper method that calls the function on both implementations with the same input and asserts that the results of both calls should be the same. Each wrapper function looks like this:

function TargetContract_balanceOf(address a) public virtual {
   hevm.prank(msg.sender);
   (bool successV1, bytes memory outputV1) = address(proxyV1).call(
       abi.encodeWithSelector(
           targetContractV1.balanceOf.selector, a
       )
   );
   hevm.prank(msg.sender);
   (bool successV2, bytes memory outputV2) = address(proxyV2).call(
       abi.encodeWithSelector(
           targetContractV2.balanceOf.selector, a
       )
   );
   assert(successV1 == successV2);
   assert((!successV1 && !successV2) || keccak256(outputV1) == keccak256(outputV2));
}

Figure 6. An autogenerated wrapper for the balanceOf() function, including two low-level calls (one per implementation) and the invariant assert statements

We use low-level calls so the wrapper functions can check whether a call to either target reverts, rather than the wrapper itself reverting. This is because we want to check that both calls either succeed or fail together. It follows that we want to compare the return values only if both calls succeed. If a proxy contract was specified via the command line, we use the proxy’s address, rather than the implementation, as the call target. We use the hevm.prank(msg.sender) cheat code function to set the sender for the next call, in case the target function is sensitive to the sender address.

Some functions will have an intended difference in behavior, which may be the reason for upgrading in the first place. These require the user to manually review the generated invariants and discard those that are not relevant. Here we look only for functions that have different behaviors, and more complex invariants will still require human intervention, so Diffusc is not a replacement for manually writing invariants specific to the project.

Standard mode and fork mode

There are two modes in which you can run Diffusc, which affects how the test code is generated:

Standard mode: The contracts are all deployed on a local testnet without any preexisting state. This is the standard way to use Echidna.
Fork mode: The contracts are fetched from on-chain addresses, and Echidna works with two forks of the chain.

The reason for having these two modes is to simplify tool use; each mode will be easier to use in different scenarios. Fork mode often requires less manual effort because it is typically not necessary to provide custom initialization logic in the wrapper contract’s constructor—the contracts are presumably already initialized on-chain. Fork mode can also automatically discover token holders for any input ERC-20 contracts and use the holders’ addresses to send transactions.

Standard mode, on the other hand, is faster than fork mode because it doesn’t require RPC requests or require that the contracts under test be deployed on-chain. It is best used on contracts without many interactions with external contracts, unless those contracts can easily be deployed and used without too much setup.

The example wrapper method above was generated using standard mode, in which all relevant contracts are deployed to a local testnet using the given source code files. In fact, other than the two USC implementations, the test contract’s constructor will deploy each target contract twice, once for each implementation. This includes the optional proxy contract, and if one is provided, then the constructor must also store each implementation address in the correct slot for each proxy. For example, a generic constructor would look like this:

constructor() public {
   targetContractV1 = ITargetContractV1(address(new TargetContractV1()));
   targetContractV2 = ITargetContractV2(address(new TargetContractV2()));
   proxyV1 = IProxy(address(new Proxy()));
   proxyV2 = IProxy(address(new Proxy()));
   // Store the implementation addresses in the proxy slot 0.
   hevm.store(
       address(proxyV1),
       bytes32(uint(0)),
       bytes32(uint256(uint160(address(targetContractV1))))
   );
   hevm.store(
       address(proxyV2),
       bytes32(uint(0)),
       bytes32(uint256(uint160(address(targetContractV1))))
   );

Figure 7. An example test contract constructor generated in standard mode

Because fork mode works with addresses of preexisting contracts, the most significant difference between it and standard mode is that the test contract does not deploy any contracts but rather stores their addresses in its constructor. As a result, it’s not possible to have more than one deployment of any additional targets, such as the proxy. Instead, it is necessary to maintain two separate forks of the network, each using the same proxy address but with different implementation addresses stored in the proxy’s implementation storage slot. For example, the test contract’s constructor might look like this:

constructor() public {
   hevm.roll(13322796);
   fork1 = hevm.createFork();
   fork2 = hevm.createFork();
   targetContractV1 = ITargetContractV1(0x75442Ac771a7243433e033F3F8EaB2631e22938f);
   targetContractV2 = ITargetContractV2(0x374ABb8cE19A73f2c4EFAd642bda76c797f19233);
   proxy = IProxy(0x3d9819210A31b4961b30EF54bE2aeD79B9c9Cd3B);
   // Store the implementation addresses in the proxy slot 0.
   hevm.selectFork(fork1);
   hevm.store(
       address(proxy),
       bytes32(uint(0)),
       bytes32(uint256(uint160(address(targetContractV1))))
   );
   hevm.selectFork(fork2);
   hevm.store(
       address(proxy),
       bytes32(uint(0)),
       bytes32(uint256(uint160(address(targetContractV2))))
   );

Figure 8. An example test contract constructor generated in fork mode

The createFork() and selectFork(uint256 forkId) functions, made accessible through the IHevm contract interface, are experimental cheat codes added to HEVM to support Diffusc’s fork mode. The former saves a snapshot of the global state and returns a forkId number, while the latter updates the state on the current fork and then restores the most recent state on the fork with the specified forkId.

It is important that the forks maintain their own independent global states because each wrapper method in the test contract switches forks twice, as shown in the constructor. The test contract itself has persistent storage, so its own state does not change when switching forks. This allows us to make the same call on each fork and then compare the results.

Using Diffusc to trigger real-world bugs: Compound

To demonstrate how Diffusc can be used in the real world, let’s consider the Compound example. We’ve already seen most of the relevant changes to Compound’s Comptroller contract in the now infamous upgrade. For a summary of how the new and modified functions caused the bug, I recommend Mudit Gupta’s excellent Twitter thread analyzing the incident.

Upgrading during the fuzzing campaign

One key detail is that triggering this bug required the user to have already interacted with at least one of the cToken markets prior to the upgrade. This means that our fuzzing contract must be able to perform the upgrade in the middle of a fuzzing transaction sequence generated by Echidna. For this reason, Diffusc provides a command-line argument to indicate that an upgrade function should be included in the test contract and that both deployments should begin each transaction sequence with the same USC implementation (i.e., setting both proxies to point to the V1 contract in the constructor).

By default, for the general purpose upgrade function, Diffusc uses the Slither upgradeability utility to determine which slot the proxy stores its implementation address in and writes the upgrade function using the HEVM cheat code store as follows:

// TODO: Consider replacing this with the actual upgrade method
function upgradeV2() external virtual {
   hevm.store(
       address(unitrollerV2),
       bytes32(uint(2)),  // implementation storage slot determined automatically
       bytes32(uint256(uint160(address(comptrollerV2))))
   );
}

Figure 9. The autogenerated upgrade function in the test contract, using the cheat code hevm.store to upgrade the contract in the middle of a transaction sequence

The TODO comment above the function is an autogenerated note for the user, suggesting an opportunity for manual refinement.

Manual configuration

It is often necessary for the developer/user to augment the autogenerated test contract, as not everything can be automated. For instance, there may be some protocol setup requirements that Diffusc cannot infer, such as linking the contracts and minting tokens to appropriate addresses. Diffusc cannot determine the appropriate preconditions for each wrapper method, nor can it infer how upgrades are performed. For these reasons, we recommend writing a new contract that inherits the autogenerated test contract, which overrides the constructor, the upgrade function, and/or any wrapper functions requiring specific preconditions.

For instance, while the generic upgrade function in figure 9 serves to update the implementation pointer, we typically recommend replacing it with the upgrade procedure specific to the contract.

In the case of Compound, it is necessary that the developer modify or override this function, as shown in figure 10, because calling _become on the new implementation triggers important upgrade logic that does more than just update the implementation.

function upgradeV2() external override {
   unitrollerV2._setPendingImplementation(address(comptrollerV2));
   comptrollerV2._become(address(unitrollerV2));
}

Figure 10. The overriding definition of upgradeV2() written by the user

To trigger the token distribution bug, it was necessary to first call the mint() function on the external token contract, then upgrade the contracts using the upgradeV2() function before calling claimComp(), as shown in figure 11.

Figure 11. Screenshot of Echidna fuzzing campaign in standard mode, showing the sequence of transactions that led to an invariant violation in Comp_balanceOf()

Because Compound is a relatively complex protocol that represents each market with a token, it is also necessary to override the autogenerated test contract’s constructor to correctly deploy the cToken contracts (and their underlying ERC-20 tokens) and to add them via Comptroller._supportMarket. With this custom initialization, it is possible to detect the Comp distribution bug in less than an hour of fuzzing because interacting with a cToken before the upgrade is a necessary precondition for the token distribution bug.

Fork mode, on the other hand, does not require nearly as much custom initialization, though it comes with its own complications, such as needing to set the Comptroller’s admin to the test contract’s address because the on-chain admin is not an address the fuzzer can control. Additional steps may also be taken to identify token holders and send transactions from those addresses, in case certain functions require that the sender have some tokens.

As mentioned earlier, Diffusc can automatically discover token holders in fork mode. However, in this case, only certain holders could exploit the bug (i.e., those who interacted with specific markets prior to the upgrade). Since it is unlikely that Diffusc would discover one of these affected addresses, it was easier to trigger the bug in standard mode by allowing the fuzzer to interact with the cTokens prior to calling the upgradeV2() function from figure 10. That said, it was not hard to trigger the bug in fork mode when a known exploiter address was provided manually.

Add Diffusc to your security toolbox

As I have just demonstrated with the two examples above, given two versions of a USC implementation, Diffusc automatically generates differential fuzz testing contracts that can be used with Echidna to detect real-world bugs. It can do this in two ways: standard mode and fork mode, each of which has pros and cons, as shown in the Compound example. In either case, not everything can be automated, and some manual effort is expected from the user to assure the correctness of the test wrapper functions. But we expect that users of Diffusc will be smart contract developers who are more than capable of this effort.

Upgradeable smart contracts are here to stay. While developers can patch their contracts when a bug is discovered, we will likely continue seeing new bugs introduced in upgrades. With billions of dollars in crypto locked in USCs, the stakes are high, making it even more crucial that developers thoroughly analyze the security of their contracts every time they make a change. Diffusc does not replace other smart contract security practices, but it is another tool in the developer’s security toolbox and should be used prior to finalizing any upgrade.

Thanks

I would like to thank Josselin Feist and Gustavo Grieco for their guidance throughout my time at Trail of Bits and Dr. Yue Duan from the Illinois Institute of Technology for first suggesting the project. Also, a special thanks to Artur Cygan (@arcz) for his work on adding fork support to HEVM.