A deep dive into Axiom’s Halo2 circuits

Page content

Among the many highly complex, cutting-edge projects our cryptography team reviews, one from 2023 stands out. Over two audits, we reviewed a blockchain system developed by Axiom that allows computing over the entire history of Ethereum, all verified by zero-knowledge proofs (ZKPs) on-chain using ZK-verified elliptic curve and SNARK recursion operations. This system is built using the Halo2 framework—a complex, emerging technology that presents many challenges when building a secure application, including potential under-constrained issues resulting from its low-level API.

Since the conclusion of our audit, this library has been repurposed into Axiom’s more general OpenVM product, which is a ZK virtual machine that allows generation of ZK proofs for arbitrary Rust programs.

This post offers an insight into our review process; the discovered findings, including several soundness and under-constrained bugs that could break the system’s security; and the steps Axiom has implemented following our recommendations. Thanks to the massive scale of Axiom’s system and the novelty of the Halo2 framework, the audit significantly augmented our already-extensive knowledge of testing systems leveraging ZKPs. We applaud Axiom for being so collaborative and working diligently with us to help secure their system.

“We were impressed with the technical rigor and security mindset of the Trail of Bits team during their review of our Halo2 circuits. They were systematic and thoughtful in the review process, especially in the more intricate cryptographic areas of our system, giving us greater confidence in the final system security.” – Yi Sun, co-founder of Axiom

The Axiom system

Axiom designed a system that allows access to historical blockchain data for EVM applications. Natively, EVM contracts may access only their own current account state; they cannot view past states, transaction statuses, or the state of other accounts. To enable this access, Axiom used ZKPs to succinctly verify flexible queries over historical transactions and states. Specifically, they used the Halo2 ZKP framework to allow users to read and trust historical data, such as headers, states, and transactions.

To build such a system, Axiom had to model the Ethereum data, transactions, state, etc., using Halo2 circuits. This requires writing Halo2 circuits for low-level primitives, such as elliptic curve cryptography, as well as higher-level data structures, like Merkle trees and Ethereum blocks. Since the existing libraries of Halo2 circuits were very limited, many of these primitives had to be developed as part of Axiom’s halo2-base and halo2-ecc libraries and used in the snark-verifier library for SNARK recursion. In addition to custom low-level libraries, Axiom also used a modified version of the Halo2 proof system itself. Several Axiom circuits include proofs about variable-length arrays, which benefit greatly from multi-phase circuits, which are implemented as an experimental feature in the Privacy & Scaling Explorations’ Halo2 fork.

Axiom has noted that since the audit has concluded, they have shut down the ZK coprocessor product that the ZK circuits were originally intended for, and Axiom is now using them as a dependency in the OpenVM project, a modular and performant ZK-based virtual machine that allows arbitrary Rust crates to be used in ZK applications with some limitations.

The security assessment process

The Trail of Bits cryptography team worked to review Axiom’s Halo2 circuits over two separate assessments. Our first project, which ran from February 10 to May 17, 2023, focused on the low-level Halo2 primitives (PSE’s Halo2 fork), such as elliptic curve arithmetic and hash functions, as well as some of Axiom’s business logic. The second audit, which ran from September 11 to September 29, 2023, focused on Axiom’s changes/upgrades from the first assessment as well as a deeper review of the SNARK verification logic.

Due to the nature of these projects, much of our efforts comprised deep manual review of the Halo2 circuits. Fortunately, our cryptography team has reviewed multiple projects using the Halo2 framework, and so we maintain internal notes and documentation pertaining to the sharp edges and common pitfalls associated with Halo2 applications, which we leveraged (and updated) continuously throughout the projects with Axiom. Effectively reviewing this system required us to extensively study the Axiom system by reviewing their documentation and maintaining an open dialogue with Axiom engineers through Slack and weekly calls.

Even though these projects were mostly manual efforts, we leveraged tools to automate some of our analysis as well. We used some Rust utility tools like cargo-audit and Clippy to find outdated dependencies and common Rust issues, and we used Semgrep to look for more common Rust issues and Axiom-specific logic issues. Using Semgrep, we’ve been able to develop custom analyses that target Halo2; specifically, when we identified a Halo2 security issue that may be present in other locations, we used Semgrep to perform variant analysis and identify and remediate all similar issues.

The joys of reviewing Halo2

In order to create ZKPs that prove the execution of a particular program, one must model the program in a format (circuits) that the ZKP system can use. Halo2 (and others such as Circom, Cairo, and Noir) is a library developed by the ZCash team that allows you to do exactly that; specifically, it allows you to specify circuits made up of selectively active polynomial equations called “gates,” equality constraints that act like “wires” between those gates, and subset-inclusion constraints referred to as “lookups.” Essentially, your circuit is a massive system of equations, where each equation is one of a few specific “gate” equations, potentially with different constant values. The variables in these equations represent different values used throughout the computation, such as your inputs, intermediate values, and outputs. This style of circuit definition is referred to as “plonkish,” since it generalizes the type of gates-and-wires arithmetic circuit first popularized by the PLoNK proof system. Rather than directly computing a result, circuits typically check that all of these values match the correct execution of your program. The complete details are much more complicated, but if you are curious, you can check out the Halo2 book.

One of the most common and severe bug classes for ZKP circuits is known as “under-constrained” soundness bugs. A variable in a circuit is under-constrained when it can take on multiple possible values, some of which violate the assumptions of the rest of the circuit. This typically happens when a direct calculation gets replaced with a check. For example, if you run x = sqrt(9) in a normal program, x will always equal one value, typically 3. However, if you replace this in a circuit with a more efficient check (i.e., instead of directly computing x, you instead let the prover select some value for x and then you check that x^2 == 9), then a malicious prover can choose whether to set x to 3 or -3. Sometimes, this flexibility isn’t exploitable—but in severe instances, a malicious prover can essentially generate fake proofs where they perform wildly incorrect or invalid computations, but they still generate a ZKP that verifies as correct.

Halo2 provides an extremely low-level API for defining circuits. This allows teams to develop custom-tailored sets of constraints that allow them to optimize for their specific needs and achieve much better performance for generating ZKPs. However, this is a bit of a double-edged sword, since each type of constraint also introduces new hazards, and the interaction of different constraints can make it much easier to introduce security vulnerabilities. We have especially found “under-constrained” bugs in circuits that have different types of data. Since each variable in a Halo2 circuit is a finite field element, implementers need to juggle many additional constraints to ensure that variables have a more restricted set of values—for example, that variables representing truth values are only 0 or 1. Since each constraint is expensive, implementers try to avoid redundant constraints, but sometimes the reason that a constraint can be skipped is spread across a dozen files and thousands of lines of code, and mistakes can be extremely hard to notice.

There are some good resources for learning Halo2, such as the Halo2 book, but these have limitations. In particular, if you open the Halo2 book, you’ll notice there are quite a few TODOs present (and there were even more when we were working with Axiom). Beyond that, once you start to dig into Halo2, you’ll quickly notice that there are tons of sharp edges, which is why we felt the need to maintain internal documentation and guidance for using Halo2. To give you a sense for this, here are a couple of notes pulled from our internal documentation:

  • API notation can be inconsistent. In particular, Region::assign_advice does not create a constraint, but Region::assign_advice_from_instance and Region::assign_advice_from_constant do create constraints. If you mix these up, and think that Region::assign_advice assigns a constraint, you have probably introduced a severe under-constrained bug.

  • Proper constraining requires multiple API calls. Copy constraints, which allow you to copy values within the circuit, are introduced by a call to AssignedCell::copy_advice. However, to actually enforce the copy constraints, you need to enable permutation checks for the given columns using a call to ConstraintSystem::enable_equality.

We have an entire list of notes similar to those. Keep in mind that tooling support for Halo2 is extremely limited, and so verifying all of these subtleties is a mostly manual process.

Key findings

Our two security assessments of the Axiom Halo2 circuits identified a total of 35 security issues. Of these, four were high severity and pertained to soundness or under-constrained issues, potentially breaking the entire security of the system. Here are quick summaries of these four issues:

  • TOB-AXIOM-3: The idx_to_indicator circuit was under-constrained. This circuit is supposed to output a vector of all zeroes and ones, where every element in the vector is zero except for the location of idx, which is supposed to be one. Due to a missing constraint, the vector of all zeroes would always be treated as a correct answer, and so a malicious prover could insert this value and create incorrect computations that would verify as correct.

  • TOB-AXIOM-13: Two of the scalar multiplication circuits could return under-constrained results. If exploited, this could allow a malicious prover to generate incorrect scalar multiplication results that would be treated as valid. This could have implications for operations such as signature verification (i.e., signature forgery) that use these operations.

  • TOB-AXIOM-19: A small but catastrophic typo in the assert_equal function meant that it did not actually assert equality. Essentially, the function was supposed to compare two values against each other for equality, but it instead compared the first value with itself. This small mistake could have drastic consequences because this function was used in many critical locations, such as proof verification, and this would break the proof soundness.

  • TOB-AXIOMv2-3: The range_check method was supposed to constrain an integer value to be below a certain range, but this was enforced with a debug_assert, rather than an actual constraint. This means that large values would be caught during testing, but once deployed, a malicious prover could generate malicious values and this would not be caught by the ZKP. Fortunately, based on how this was used, it was not exploitable in the Axiom system, but it could’ve been problematic if it was used in other ways.

Other notable findings include the incorrect handling of edge cases like the point at infinity in elliptic curve operations and areas where dangerous assumptions could introduce system vulnerabilities.

In addition to these security issues, we also raised a number of what we call code quality issues. These issues did not represent immediate security issues on their own, but we still raised them because they either could lead to security issues down the line or represent opportunities for making the codebase more readable or efficient. Over both assessments, we raised a total of 25 code quality issues pertaining to concerns such as incorrect comments and redundant Rust calls.

Recommendations: the path forward

When we worked with Axiom, they were fairly early in their development lifecycle. In particular, the Halo2 circuits we reviewed in our first project had just recently been developed at the time, and after this project, Axiom continued their development of additional circuits, which we reviewed in the second assessment. This is why, when you read our security report, you’ll see that our Codebase Maturity Evaluation gave Axiom our lowest ratings of weak/missing in areas like documentation and testing.

While documentation and testing are essential for any security critical project, this is especially true when dealing with a low-level, complex framework that has many sharp edges, like Halo2. Both testing and documentation are paramount for a few reasons, such as better readability/auditability, preventing regressions, and confirming expected behaviors. This is why, after both of our assessments, we gave Axiom long-term, strategic recommendations to improve both their testing and documentation, in addition to recommendations for fixing the 35 issues we reported.

We were pleased to see that Axiom listened to these recommendations and responded with significant improvements on both fronts. They developed an end-to-end test suite between our first and second assessments, and expanded the test suite after completing both audits.

Wrapping up

Our collaboration with Axiom is a great example of the challenges of building applications using Halo2, the commitment it takes to properly secure them, and why security reviews are such an essential part of the development cycle. In particular, by engaging with us early in the development lifecycle, Axiom was able to make very impactful, long-term improvements to their security posture. This was very similar to the impact we saw with Ockam in our design review of their system.

We’d like to again thank the Axiom team for their great work on this project; they were highly collaborative, responsive, and actively incorporated our insights after our projects were completed. If you are building a complex cryptographic system like Axiom’s and are in need of security assistance, please reach out to our cryptography team!