By Fredrik Dahlgren, Staff Security Engineer
In October 2019, a security researcher found a devastating vulnerability in Tornado.cash, a decentralized, non-custodial mixer on the Ethereum network. Tornado.cash uses zero-knowledge proofs (ZKPs) to allow its users to privately deposit and withdraw funds. The proofs are supposed to guarantee that each withdrawal can be matched against a corresponding deposit to the mixer. However, because of an issue in one of the ZKPs, anyone could forge a proof of deposit and withdraw funds from the system.
At the time, the Tornado.cash team saved its users’ funds by exploiting the vulnerability to drain the funds from the mixer before the issue was discovered by someone else. Then they patched the ZKPs and migrated all user funds to a new version of the contract. Considering the severity of the underlying vulnerability, it is almost ironic that the fix consisted of just two characters.
This bug would have been caught using Circomspect, a new static analyzer for ZKPs that we are open-sourcing today. Circomspect finds potential vulnerabilities and code smells in ZKPs developed using Circom, the language used for the ZKPs deployed by Tornado.cash. It can identify a wide range of issues that can occur in Circom programs. In particular, it would have found the vulnerability in Tornado.cash early in the development process, before the contract was deployed on-chain.
How Circom works
Tornado.cash was developed using Circom, a domain-specific language (DSL) and a compiler that can be used to generate and verify ZKPs. ZKPs are powerful cryptographic tools that allow you to make proofs about a statement without revealing any private information. For complex systems like a full computer program, the difficult part in using ZKPs becomes representing the statement in a format that the proof system can understand. Circom and other DSLs are used to describe a computation, together with a set of constraints on the program inputs and outputs (known as signals). The Circom compiler takes a program and generates a prover and a verifier. The prover can be used to run the computation described by the DSL on a set of public and private inputs to produce an output, together with a proof that the computation was run correctly. The verifier can then take the public inputs and the computed output and verify them against the proof generated by the prover. If the public inputs do not correspond to the provided output, this is detected by the verifier.
The following figure shows a small toy example of a Circom program allowing the user to prove that they know a private input
x such that
x5 - 2x4 + 5x - 4 = 0:
y <== x5 - 2 * x4 + 5 * x - 4 tells the compiler two things: that the prover should assign the value of the right-hand side to
y during the proof generation phase (denoted
y <-- x5 - 2 * x4 + 5 * x - 4 in Circom), and that the verifier should ensure that
y is equal to the right-hand side during the proof verification phase (which is denoted
y === x5 - 2 * x4 + 5 * x - 4 in Circom). This type of duality is often present in zero-knowledge DSLs like Circom. The prover performs a computation, and the verifier has to ensure that the computation is correct. Sometimes these two sides of the same coin can be described using the same code path, but sometimes (for example, due to restrictions on how constraints may be specified in R1CS-based systems like Circom) we need to use different code to describe computation and verification. If we forget to add instructions describing the verification steps corresponding to the computation performed by the prover, it may become possible to forge proofs.
The Tornado.cash vulnerability
In the case of Tornado.cash, it turned out that the MIMC hash function used to compute the Merkle tree root in the proof used only the assignment operator
<-- when defining the output. (Actually, it uses
=, as demonstrated in the GitHub diff above. However, in the previous version of the Circom compiler, this was interpreted in the same way as
<--. Today, this code would generate a compilation error.) As we have seen, this only assigned a value to the output during proof generation, but did not constrain the output during proof verification, leaving the verifying contract vulnerable.
Our new Circom bug finder, Circomspect
Circomspect is a static-analyzer and linter for programs written in the Circom DSL. Its main use is as a tool for reviewing the security and correctness of Circom templates and functions. The implementation is based on the Circom compiler and uses the same parser as the compiler does. This ensures that any program that the compiler can parse can also be parsed using Circomspect. The abstract syntax tree generated by the parser is converted to static single-assignment form, which allows us to perform simple data flow analyses on the input program.
The current version implements a number of analysis passes, checking Circom programs for potential issues like unconstrained signals, unused variables, and shadowing variable declarations. It warns the user about each use of the signal assignment operator
<--, and can often detect if a circuit uses
<-- to assign a quadratic expression to a signal, indicating that the signal constraint assignment operator
<== could be used instead. This analysis pass would have found the vulnerability in the Tornado.cash described above. All issues flagged by Circomspect do not represent vulnerabilities, but rather locations that should be reviewed to make sure that the code does what is expected.
As an example of the types of issues Circomspect can find, consider the following function from the circom-pairing repository:
This function may look a bit daunting at first sight. It implements inversion modulo
p using the extended Euclidean algorithm. Running Circomspect on the containing file yields a number of warnings telling us that the assignments to the arrays
newv do not contribute to the return value of the function, which means that they cannot influence either witness or constraint generation.
A closer look at the implementation reveals that the variable
y is used only to compute
newv is used only to update
v is used only to update
y. It follows that none of the variables
newv contribute to the return value of the function
find_Fp_inverse, and all can safely be removed. (As an aside, this makes complete sense since running the extended Euclidean algorithm on two coprime integers
p computes two integers
y such that
x * num + y * p = 1. This means that if we’re interested in the inverse of
p, it is given by
x, and the value of
y is not needed. Since
y are computed independently, the code used to compute
y can safely be removed.)
Improving the state of ZKP tooling
Zero-knowledge DSLs like Circom have democratized ZKPs. They allow developers without a background in mathematics or cryptography to build and deploy systems that use zero-knowledge technology to protect their users. However, since ZKPs are often used to protect user privacy or assure computational integrity, any vulnerability in a ZPK typically has serious ramifications for the security and privacy guarantees of the entire system. In addition, since these DSLs are new and emerging pieces of technology, there is very little tooling support available for developers.
At Trail of Bits, we are actively working to fill that void. Earlier this year we released Amarna, our static-analyzer for ZKPs written in the Cairo programming language, and today we are open sourcing Circomspect, our static analyzer and linter for Circom programs. Circomspect is under active development and can be installed from crates.io or downloaded from the Circomspect GitHub repository. Please try it out and let us know what you think! We welcome all comments, bug reports, and ideas for new analysis passes.