Contract upgrade anti-patterns

A popular trend in smart contract design is to promote the development of upgradable contracts. At Trail of Bits, we have reviewed many upgradable contracts and believe that this trend is going in the wrong direction. Existing techniques to upgrade contracts have flaws, increase the complexity of the contract significantly, and ultimately introduce bugs. To highlight this point, we are releasing a previously unknown flaw in the Zeppelin contract upgrade strategy, one of the most common upgrade approaches.

In this article, we are going to detail our analysis of existing smart contract upgrade strategies, describe the weaknesses we have observed in practice, and provide recommendations for contracts that require upgrades. In a follow-up blog post, we will detail a method, contract migration, that achieves the same benefits with few of the downsides.

An overview of upgradable contracts

Two ‘families’ of patterns have emerged for upgradable smart contracts:

  • Data separation, where logic and data are kept in separate contracts. The logic contract owns and calls the data contract.
  • Delegatecall-based proxies, where logic and data are kept in separate contracts, also, but the data contract (the proxy) calls the logic contract through delegatecall.

The data separation pattern has the advantage of simplicity. It does not require the same low-level expertise as the delegatecall pattern. The delegatecall pattern has received lots of attention recently. Developers may be inclined to choose this solution because documentation and examples are easier to find.

Using either of these patterns comes at considerable risk, an aspect of this trend that has gone unacknowledged thus far.

Data separation pattern

The data separation pattern keeps logic and data in separate contracts. The logic contract, which owns the data contract, can be upgraded if required. The data contract is not meant to be upgraded. Only the owner can alter its content.

Figure 1: High-level overview of the data separation upgrade pattern

When considering this pattern, pay special attention to these two aspects: how to store data, and how to perform the upgrade.

Data storage strategy

If the variables needed across an upgrade will remain the same, you can use a simple design where the data contract holds these variables, with their getters and setters. Only the contract owner should be able to call the setters:

contract DataContract is Owner {
  uint public myVar;

  function setMyVar(uint new_value) onlyOwner public {
    myVar = new_value;
  }
}

Figure 2: Data storage example (using onlyOwner modifier)

You have to clearly identify the state variables required. This approach is suitable for ERC20 token-based contracts since they only require the storage of their balances.

If a future upgrade requires new persistent variables, they could be stored in a second data contract. You can split the data across separate contracts, but at the cost of additional logic contract calls and authorization. If you don’t intend to upgrade the contract frequently, the additional cost may be acceptable.

Nothing prevents the addition of state variables to the logic contract. These variables will not be kept during an upgrade, but can be useful for implementing the logic. If you want to keep them, you can migrate them to the new logic contract, too.

Key-value pair

A key-value pair system is an alternative to the simple data storage solution described above. It is more amenable to evolution but also more complex. For example, you can declare a mapping from a bytes32 key value to each base variable type:

contract DataContract is Owner {
  mapping(bytes32 => uint) uIntStorage;

  function getUint(bytes32 key) view public returns(uint) {
    return uintStorage[key];
}

  function setUint(bytes32 key, uint new_val) onlyOwner public {
    uintStorage[key] = new_val;
  }
}

Figure 3: Key-Value Storage Example (using onlyOwner modifier)

This solution is often called the Eternal Storage pattern.

How to perform the upgrade

This pattern offers several different strategies, depending on how the data are stored.

One of the simplest approaches is to transfer the ownership of the data contract to a new logic contract and then disable the original logic contract. To disable the previous logic contract, implement a pausable mechanism or set its pointer to 0x0 in the data contract.

Figure 4: Upgrade by deploying a new logic contract and disabling the old one

Another solution involves forwarding the calls from the original logic contract to the new version:

Figure 5: Upgrade by deploying a new logic contract and forwarding calls to it from the old one

This solution is useful if you want to allow users to call the first contract. However, it adds complexity; you have to maintain more contracts.

Finally, a more complex approach uses a third contract as an entry point, with a changeable pointer to the logic contract:

Figure 6: Upgrade by deploying a proxy contract that calls a new logic contract

A proxy contract provides the user with a constant entry point and a distinction of responsibilities that is clearer than the forwarding solution. However, it comes with additional gas costs.

Cardstack and Rocket-pool have detailed implementations of the data separation pattern.

Risks of the data separation pattern

The simplicity of the data separation pattern is more perceived than real. This pattern adds complexity to your code, and necessitates a more complex authorization schema. We have repeatedly seen clients deploy this pattern incorrectly. For example, one client’s implementation achieved the opposite effect, where a feature was impossible to upgrade because some its logic was located in the data contract.

In our experience, developers also find the EternalStorage pattern challenging to apply consistently. We have seen developers storing their values as bytes32, then applying type conversion to retrieve the original values. This increased the complexity of the data model, and the likelihood of subtle flaws. Developers unfamiliar with complex data structures will make mistakes with this pattern.

Delegatecall-based proxy pattern

Like the data separation method, the proxy pattern splits a contract in two: one contract holding the logic and a proxy contract holding the data. What’s different? In this pattern, the proxy contract calls the logic contract with delegatecall; the reverse order.

Figure 7: Visual representation of the proxy pattern

In this pattern, the user interacts with the proxy. The contract holding the logic can be updated. This solution requires mastering delegatecall to allow one contract to use code from another.

Let’s review how delegatecall works.

Background on delegatecall

delegatecall allows one contract to execute code from another contract while keeping the context of the caller, including its storage. A typical use-case of the delegatecall opcode is to implement libraries. For example:

pragma solidity ^0.4.24;

library Lib {

  struct Data { uint val; }

  function set(Data storage self, uint new_val) public {
    self.val = new_val;
  }
}

contract C {
  Lib.Data public myVal;

  function set(uint new_val) public {
    Lib.set(myVal, new_val);
  }
}

Figure 8: Library example based on delegatecall opcode

Here, two contracts will be deployed: Lib and C. A call to Lib in C will be done through delegatecall:

Figure 9: EVM opcodes of a call to Lib.set (Ethersplay output)

As a result, when Lib.set changes self.val, it changes the value stored in C’s myVal variable.

Solidity looks like Java or JavaScript, which are object-oriented languages. It’s familiar, but comes with the baggage of misconceptions and assumptions. In the following example, a programmer might assume that as long as two contract variables share the same name, then they will share the same storage, but this is not the case with Solidity.

pragma solidity ^0.4.24;

contract LogicContract {
  uint public a;

  function set(uint val) public {
    a = val;
  }
}

contract ProxyContract {
  address public contract_pointer;
  uint public a;

  constructor() public {
    contract_pointer = address(new LogicContract());
  }

  function set(uint val) public {
    // Note: the return value of delegatecall should be checked
    contract_pointer.delegatecall(bytes4(keccak256("set(uint256)")), val);
  }
}

Figure 10: Dangerous delegatecall usage

Figure 11 represents the code and the storage variables of both of the contracts at deployment:

Figure 11: Memory illustration of Figure 10

What happens when the delegatecall is executed? LogicContract.set will write in ProxyContract.contract_pointer instead of ProxyContract.a. This memory corruption happens because:

  • LogicContract.set is executed within the context of ProxyContract.
  • LogicContract knows only one state variable: a. Any store to this variable will be done on the first element in memory (see the Layout of State Variables in Storage documentation).
  • The first element for ProxyContract is contract_pointer. As a result, LogicContract.set will write theProxyContract.contract_pointer variable instead of ProxyContract.a (see Figure 12).
  • At this point, the memory in ProxyContract has been corrupted.

If a was the first variable declared in ProxyContract, delegatecall would have not corrupted the memory.

Figure 12: LogicContract.set will write the first element in storage: ProxyContract.contract_pointer

Use delegatecall with caution, especially if the called contract has state variables declared.

Let’s review the different data-storage strategies based on delegatecall.

Data storage strategies

There are three approaches to separating data and logic when using the proxy pattern:

  • Inherited storage, which uses Solidity inheritance to ensure that the caller and the callee have the same memory layout.
  • Eternal storage, which is the key-value storage version of the logic separation that we saw above.
  • Unstructured storage, which is the only strategy that does not suffer from potential memory corruption due to an incorrect memory layout. It relies on inline assembly code and custom memory management on storage variables.

See ZeppelinOS for a more thorough review of these approaches.

How to perform an upgrade

To upgrade the code, the proxy contract needs to point to a new logic contract. The previous logic contract is then discarded.

Risks of delegatecall

In our experience with clients, we have found that it is difficult to apply the delegatecall-based proxy pattern correctly. The proxy pattern requires that memory layouts stay consistent between contract and compiler upgrades. A developer unfamiliar with EVM internals can easily introduce critical bugs during an upgrade.

Only one approach, unstructured storage, overcomes the memory layout requirement but it requires low-level memory handling, which is difficult to implement and review. Due to its high complexity, unstructured storage is only meant to store state variables that are critical for the upgradability of the contract, such as the pointer to the logic contract. Further, this approach hinders static analysis of Solidity (for example, by Slither), costing the contract the guarantees provided by these tools.

Preventing memory layout corruption with automated tools is an ongoing area of research. No existing tool can verify that an upgrade is safe against a compromise. Upgrades with delegatecall will lack automated safety guarantees.

Breaking the proxy pattern

To wit, we have discovered and are now disclosing a previously unknown security issue in the Zeppelin proxy pattern, rooted in the complex semantics of delegatecall. It affects all the Zeppelin implementations that we have investigated. This issue highlights the complexity of using a low-level Solidity mechanism and illustrates the likelihood that an implementation of this pattern will have flaws.

What is the bug?

The Zeppelin Proxy contract does not check for the contract’s existence prior to returning. As a result, the proxy contract may return success to a failed call and result in incorrect behavior should the result of the call be required for application logic.

Low-level calls, including assembly, lack the protections offered by high-level Solidity calls. In particular, low-level calls will not check that the called account has code. The Solidity documentation warns:

The low-level call, delegatecall and callcode will return success if the called account is non-existent, as part of the design of EVM. Existence must be checked prior to calling if desired.

If the destination of delegatecall has no code, then the call will successfully return. If the proxy is set incorrectly, or if the destination was destroyed, any call to the proxy will succeed but will not send back data.

A contract calling the proxy may change its own state under the assumption that its interactions are successful, even though they are not.

If the caller does not check the size of the data returned, which is the case of any contract compiled with Solidity 0.4.22 or earlier, then any call will succeed. The situation is slightly better for recently compiled contracts (Solidity 0.4.24 and up) thanks to the check on returndatasize. However, that check won’t protect the calls that do not expect data in return.

ERC20 tokens are at considerable risk

Many ERC20 tokens have a known flaw that prevents the transfer functions from returning data. As a result, these contracts support a call to transfer which may return no data. In such a case, the lack of an existence check, as detailed above, may lead a third party to believe that a token transfer was successful when it was not, and may lead to the theft of money.

Exploit scenario

Bob’s ERC20 smart contract is a proxy contract based on delegatecall. The proxy is incorrectly set due to human error, a flaw in the code, or a malicious actor. Any call to the token will act as a successful call with no data returned.

Alice’s exchange handles ERC20 tokens that do not return data on transfer. Eve has no tokens. Eve calls the deposit function of Alice’s exchange for 10,000 tokens, which calls transferFrom of Bob’s token. The call is a success. Alice’s exchange credits Eve with 10,000 tokens. Eve sells the tokens and receives ethers for free.

How to avoid this flaw

During an upgrade, check that the new logic contract has code. One solution is to use the extcodesize opcode. Alternatively, you can check for the existence of the target each time delegatecall is used.

There are tools that can help. For instance, Manticore is capable of reviewing your smart contract code to check a contract’s existence before any calls are made to it. This check was designed to help mitigate risky proxy contract upgrades.

Recommendations

If you must design a smart contract upgrade solution, use the simplest solution possible for your situation.

In all cases, avoid the use of inline assembly and low-level calls. The proper use of this functionality requires extreme familiarity with the semantics of delegatecall, and the internals of Solidity and EVM. Few teams whose code we’ve reviewed get this right.

Data separation recommendations

If you need to store data, opt for the simple data storage strategy over key-pairs (aka Eternal Storage). This method requires writing less code and depends on fewer moving parts. There is simply less that can go wrong.

Use the contract-discard solution to perform upgrades. Avoid the forwarding solution, since it requires building forwarding logic that may be too complex to implement correctly. Only use the proxy solution if you need a fixed address.

Proxy pattern recommendations

Check for the destination contract’s existence prior to calling delegatecall. Solidity will not perform this check on your behalf. Neglecting the check may lead to unintended behavior and security issues. You are responsible for these checks if relying upon low-level functionality.

If you are using the proxy pattern, you must:

  • Have a detailed understanding of Ethereum internals, including the precise mechanics of delegatecall and detailed knowledge of Solidity and EVM internals.
  • Carefully consider the order of inheritance, as it impacts the memory layout.
  • Carefully consider the order in which variables are declared. For example, variable shadowing, or even type changes (as noted below) can impact the programmer’s intent when interacting with delegatecall.
  • Be aware that the compiler may use padding and/or pack variables together. For example, if two consecutive uint256 are changed to two uint8, the compiler can store the two variables in one slot instead of two.
  • Confirm that the variables’ memory layout is respected if a different version of solc is used or if different optimizations are enabled. Different versions of solc compute storage offsets in different ways. The storage order of variables may impact gas costs, memory layout, and thus the result of delegatecall.
  • Carefully consider the contract’s initialization. According to the proxy variant, state variables may not be initializable during construction. As a result, there is a potential race condition during initialization that needs to be mitigated.
  • Carefully consider names of functions in the proxy to avoid function-name collision. Proxy functions with the same Keccak hash as the intended function will be called instead, which could lead to unpredictable or malicious behavior.

Concluding remarks

We strongly advise against the use of these patterns for upgradable smart contracts. Both strategies have the potential for flaws, significantly increase complexity, and introduce bugs, and ultimately decrease trust in your smart contract. Strive for simple, immutable, and secure contracts rather than importing a significant amount of code to postpone feature and security issues.

Further, security engineers that review smart contracts should not recommend complex, poorly understood, and potentially insecure upgrade mechanisms. Ethereum security community, consider the risk prior to endorsing these techniques.

In a follow-up blog post, we will describe contract migration, our recommended approach to achieve the benefits of upgradable smart contracts without their downsides. A contract migration strategy is essential in case of private key compromise, and helpful in avoiding the need for other upgrades.

In the meantime, you should contact us if you’re concerned that your upgrade strategy may be insecure.

5 thoughts on “Contract upgrade anti-patterns

  1. Hey folks! Regarding the error reported on ZeppelinOS proxy contracts (“Zeppelin Proxy contract does not check for the contract’s existence prior to returning”), please note that the proxy *does* check the contract’s existence when upgrading (see https://github.com/zeppelinos/zos/blob/936eec451c2919299171265c7ce48ab0077035b2/packages/lib/contracts/upgradeability/UpgradeabilityProxy.sol#L61). The check is not done on every delegatecall for gas-consumption issues, and is actually the approach you recommend as well (“During an upgrade, check that the new logic contract has code. One solution is to use the extcodesize opcode.”).

  2. As for “Carefully consider names of functions in the proxy to avoid function-name collision. Proxy functions with the same Keccak hash as the intended function will be called instead, which could lead to unpredictable or malicious behavior.”, the transparent proxy variant we are using in ZeppelinOS (see https://github.com/zeppelinos/zos-lib/pull/36) solves this issue.

  3. Hey Santiago,

    Thanks for pointing that out! Zeppelin has many copies of their upgrade contracts on the internet. The code in zeppelinos/labs and the code samples on your blog lack the contract existence check. Further, we found many examples of third-party codebases that copied the insecure code into their own projects. We were not even aware of the third copy in zeppelinos/zos until recently. We expect many developers use the code in zeppelinos/labs.

    We would recommend two changes as a result of our blog post:

    * Fix the code in zeppelinos/labs since it is the likely version used by anyone looking to implement a delegatecall proxy.

    * Check for the existence at each call, rather than only at construction, to prevent errors due to self-destructed contracts. You may warn the user that they should not deploy a self-destructible contract but a more restrictive solution makes sense for a software library like your own.

    Regarding the function name collision, we linked to the audit report of that issue which points to the fix.

    Our goal was not to review ZeppelinOS, but to make the community aware of the risks of any upgradable system. It is a danger that many developers implement these solutions without proper foundational knowledge.

    – Josselin

  4. Pingback: How contract migration works | Trail of Bits Blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s