Interactive decompilation with rellic-xref

By Francesco Bertolaccini

Rellic is a framework for analyzing and decompiling LLVM modules into C code, implementing the concepts described in the original paper presenting the Dream decompiler and its successor, Dream++. It recently made an appearance on this blog when I presented rellic-headergen, a tool for extracting debug metadata from LLVM modules and turning them into compilable C header files. In this post, I am presenting a tool I developed for exploring the relationship between the original LLVM module and its decompiled C version: rellic-xref.

rellic-xref’s interface


I was tasked with improving the quality of the “provenance information” (i.e., information about how each LLVM instruction relates to the final decompiled code) produced by Rellic. However, the main front end for Rellic (a tool called rellic-decomp that produces C source code in a “batch processing” style) does not give much information about what is happening under the hood. Most importantly, it does not print any of the provenance info I was interested in.

To solve this problem, I developed rellic-xref as a web interface so that Rellic can be used in an interactive and graphical way. rellic-xref spins up a local web server and serves as an interface to the underlying decompiler. When a user uploads an LLVM module, a textual representation of the module is presented in the right-hand pane. The user can then run a variety of preprocessing steps on the LLVM bitcode or proceed to the decompilation step. Preprocessing is sometimes required when the module contains instructions that are not yet supported by the decompilation engine.

Once the module has been decompiled, the original module and decompiled source code are shown side by side:

Original module and decompiled source code

The main attraction of the tool is already available at this point: hovering your mouse over parts of the decompiled source code will highlight the instructions that led to its generation, and vice versa.

The refinement process

The source code that’s produced straight out of the decompiler is not quite as pretty as it could be. To fix this, Rellic offers a series of refinement passes that make the code more readable. These refinement passes are normally executed iteratively, and the order in which these passes are executed is hard-coded in rellic-decomp. For example, pressing the “Use default chain” button loads rellic-decomp’s default pass configuration into rellic-xref.

Refinement passes offered by Rellic

The default chain consists of passes that are executed only once and passes that are computed iteratively to search for a fixed point. However, rellic-xref gives users the ability to change the order and the way in which the refinement passes are run: passes can be removed (using the “x” buttons in the figure above) and inserted at will.

Although the Rellic passes operate on the Clang abstract syntax tree (AST), they are not suitable for any generic C program, as they rely on assumptions about how the decompilation process generates them. In particular, the decompiled code is initially in a form similar to single static assignment (SSA), reflecting the fact that it is directly generated from LLVM bitcode, which is itself in SSA form.

Refinement passes

For those who are unfamiliar with Rellic’s refinement passes, we provide descriptions of these passes followed by an example of how they refine the code.

Dead statement elimination
This pass removes statements from the AST that do not cause any side effects. For example, “null” statements (lone semicolons) and several types of expressions can be safely removed.

Original Refined

void foo() {

int a;

a = 3;




void foo() {

int a;

a = 3;



Z3 condition simplification
This pass uses the Z3 SMT solver to improve the quality of if and while conditions by reducing their size as much as possible. It works in a recursive manner by inspecting the syntax tree of each condition and pruning any branch that it finds to be trivially true or false.

Original Refined

if(1U && x == 3) {



if(x == 3) {



if(x != 3 || x == 3) {



if(1U) {



Nested condition propagation
As the name suggests, this pass propagates conditions from parent statements to their children. In practical terms, this means that conditions in parent if and while statements are assumed to be true in nested if and while statements.

Original Refined

if(x == 0) {

if(x == 0 && y == 1) {




if(x == 0) {

if(1U && y == 1) {




Nested scope combination
This one is pretty simple: any statement that appears in a compound statement or in a trivially true if statement can be extracted and put into the parent scope. This pass works on the assumption that all local variables have been declared at the beginning of the function.

Original Refined

void foo() {

int x;

int y;


x = 1;


if(1U) {

y = 1;



void foo() {

int x;

int y;

x = 1;

y = 1;


Condition-based refinement
This pass recognizes when a scope contains two adjacent if statements that have opposite conditions and merges them into an if-else statement.

Original Refined

if(a == 42) {



if(!(a == 42)) {



if(a == 42) {


} else {



Reachability-based refinement
Similar to the condition-based refinement pass, this one recognizes when successive if statements have exclusive but not opposite reaching conditions and rearranges them into if-else-if statements.

Original Refined

if(a == 42) {



if(!(a == 42) && b == 13) {



if(a == 42) {


} else if(b == 13) {



Loop refinement
This pass recognizes if statements that should be responsible for terminating infinite while loops. It refactors the code to create loops with conditions.

Original Refined

while(true) {

if(a == 42) {





while(!(a == 42)) {



Expression combination
This pass performs a number of simplifications, such as turning pointer arithmetic into array accesses and removing superfluous casts.

Original Refined
!(x == 5)
x != 5

Condition normalization
This is the only pass that is not used in the default refinement chain. The purpose of this pass is to turn conditions into conjunctive normal form to reveal more opportunities for simplification. Unfortunately, it also tends to produce exponentially large conditions—literally!—so it is best used sparingly and only as a very last step before applying a final simplification using Z3.

I’m sold! How do I use it?

Just follow the instructions in the README and you’ll have an instance of rellic-xref running in no time.

Closing thoughts

rellic-xref turns the traditionally batch-oriented Rellic into a more interactive tool that provides insight into the decompilation and refinement process. It is also a glimpse into what the Rellic framework could be used for. For instance, what if the user had more control over the underlying Clang AST? Allowing custom variable renaming and retyping, for example, would go a long way in making Rellic feel like a proper component of a reverse engineering suite. Further work on rellic-xref (or development of a similar tool) could give users more control over the Clang AST in this way.

Rellic’s passes operate directly at the C AST level and heavily use Clang’s APIs. This was both a blessing and a curse as I worked with Rellic. For instance, calling the Clang AST “abstract” is a bit of a misnomer, as it has characteristics of both an abstract and a concrete syntax tree. For example, it contains information about positions of tokens and comments but also things that are not actually present in the source code text, like implicit casts. My experience with Rellic has taught me that the Clang AST interface is not really meant to be used as a mutable resource, and it has more of a write-once, read-many semantics. We have plans to migrate Rellic for use in an upcoming project featuring MLIR, which may help in this regard. However, that is beyond the scope of this blog post.

I’d like to thank my mentor, Peter Goodman, for his guidance during my internship and Marek Surovič for his precious feedback on my work with Rellic. Working at Trail of Bits continues to prove to be a great experience full of gratifying moments.

Themes from Real World Crypto 2022

By William Woodruff

Last week, over 500 cryptographers from around the world gathered in Amsterdam for Real World Crypto 2022, meeting in person for the first time in over two years.

As in previous years, we dispatched a handful of our researchers and engineers to attend the conference, listen to talks, and schmooze observe the themes currently dominating the nexus between cryptographic research and practical (real world!) engineering.

Here are the major themes we gleaned from Real World Crypto 2022:

  1. Trusted hardware isn’t so trustworthy: Implementers of trusted hardware (whether trusted execution environments (TEEs), HSMs, or secure enclaves) continue to make engineering mistakes that fundamentally violate the integrity promises made by the hardware.
  2. Security tooling is still too difficult to use: Or “you can lead a horse to water, but you can’t make it run ./configure && make && make install.”
  3. Side channels everywhere: When God closes a door, he opens a side channel.
  4. LANGSEC in cryptographic contexts: Figuring out which protocol you’re speaking is the third hard problem in computer science.

Let’s get to it!

Trusted hardware isn’t so trustworthy

Fundamental non-cryptographic vulnerabilities in trusted hardware are nothing new. Years of vulnerabilities have led to Intel’s decision to remove SGX from its next generation of consumer CPUs, and ROCA affected one in four TPMs back in 2017.

What is new is the prevalence of trusted hardware in consumer-facing roles. Ordinary users are increasingly (and unwittingly!) interacting with secure enclaves and TEEs via password managers and 2FA schemes like WebAuthn on their mobile phones and computers. This has fundamentally broadened the risk associated with vulnerabilities in trusted hardware: breaks in trusted hardware now pose a direct risk to individual users.

That’s where our first highlight from RWC 2022 comes in: “Trust Dies in Darkness: Shedding Light on Samsung’s TrustZone Cryptographic Design” (slides, video, paper). In this session, the presenters describe two critical weaknesses in TEEGRIS, Samsung’s implementation of a TrustZone OS: an IV reuse attack that allows an attacker to extract hardware-protected keys, and a downgrade attack that renders even the latest and patched flagship Samsung devices vulnerable to the first attack. We’ll take a look at both.

IV reuse in TEEGRIS

TEEGRIS is an entirely separate OS, running in isolation and in parallel with the “normal” host OS (Android). To communicate with the host, TEEGRIS provides a trusted application (TA) that runs within the TEE but exposes resources to the normal host via Keymaster, a command-and-response protocol standardized by Google.

Keymaster includes the concept of “blobs”: encryption keys that have themselves been encrypted (“wrapped”) with the TEE’s key material and stored on the host OS. Because the wrapped keys are stored on the host, their security ultimately depends on the security of the TEE’s correct application of encryption during key wrapping.

So how does the TEEGRIS Keymaster wrap keys? With AES-GCM!

As you’ll recall, there are (normally) three parameters for a block cipher (AES) combined with a mode of operation (GCM):

  • The secret key, used to initialize the block cipher
  • The initialization vector (IV), used to perturb the ciphertext and prevent our friend the ECB penguin
  • The plaintext itself, which we intend to encrypt (in this case, another encryption key)

The security of AES-GCM depends on the assumption that an IV is never reused for the same secret key. Therefore, an attacker that can force the secret key and IV to be used across multiple “sessions” (in this case, key wrappings) can violate the security of AES-GCM. The presenters discovered mistakes in Samsung’s Keymaster implementation that violate two security assumptions: that the key derivation function (KDF) can’t be manipulated to produce the same key multiple times, and that the attacker can’t control the IV. Here’s how:

  • On the Galaxy S8 and S9, the KDF used to generate the secret key used only attacker-controlled inputs. In other words, an attacker can force all encrypted blobs for a given Android application to use the exact same AES key. In this context, this is acceptable as long as an attacker cannot force IV reuse, except
  • …the Android application can set an IV when generating or importing a key! Samsung’s Keymaster implementation on the Galaxy S9 trusts the IV passed in by the host, allowing an attacker to use the same IV multiple times.

At this point, the properties of the stream cipher itself give the attacker everything they need to recover an encryption key from another blob: the XOR of the malicious blob, the malicious key, and the target (victim) blob that yields the plaintext of the target, which is the unwrapped encryption key!

Ultimately, the presenters determined that this particular attack worked only on the Galaxy S9: the S8’s Keymaster TA generates secret keys from attacker-controlled inputs but doesn’t use an attacker-provided IV, preventing IV reuse.

The talk’s presenters reported this bug in March of 2021, and it was assigned CVE-2021-25444.

Downgrade attacks

As part of their analysis of Samsung’s TrustZone implementation, the presenters discovered that the Keymaster TA on Galaxy S10, S20, and S21 devices used a newer blob format (“v20-s10”) by default. This new format changes the data used to seed the KDF: instead of being entirely attacker controlled, random bytes (derived from the TEE itself) are mixed in, preventing key reuse.

But not so fast: the TEE on the S10, S20, and S21 uses the “v20-s10” format by default but allows the application to specify a different blob version to use instead. The version without any randomized salt (“v15”) is one of the valid options, so we’re right back where we started with predictable key generation.

The talk’s presenters reported this bug in July of 2021, and it was assigned CVE-2021-25490.


TEEs are not special: they’re subject to the same cryptographic engineering requirements as everything else. Hardware guarantees are only as good as the software running on top of them, which should (1) use modern ciphers with misuse-resistant modes of operation, (2) minimize potential attacker influence over key and key derivation material, and (3) eliminate the attacker’s ability to downgrade formats and protocols that should be completely opaque to the host OS.

Security tooling is still too difficult to use

We at Trail of Bits are big fans of automated security tooling: it’s why we write and open-source tools like dylint, pip-audit, siderophile, and Echidna.

That’s why we were saddened by the survey results in “‘They’re not that hard to mitigate’: What Cryptographic Library Developers Think About Timing Attacks” (slides, video, paper): of 44 cryptographers surveyed across 27 major open-source cryptography projects, only 17 had actually used automated tools to find timing vulnerabilities, even though 100% of the participants surveyed were aware of timing vulnerabilities and their potential severity. The following are some of the reasons participants cited for choosing not to use automated tooling:

  • Skepticism about risk: Many participants expressed doubt that they needed additional tooling to help mitigate timing attacks or that there were practical real-world attacks that justified the effort required for mitigation.
  • Difficulty of installation or use: Many of the tools surveyed had convoluted installation, compilation, and usage instructions. Open-source maintainers expressed frustration when trying to make projects with outdated dependencies work on modern systems, particularly in contexts in which they’d be most useful (automated testing in CI/CD).
  • Maintenance status: Many of the tools surveyed are source artifacts from academic works and are either unmaintained or very loosely maintained. Others had no easily discoverable source artifacts, had binary releases only, or were commercially or otherwise restrictively licensed.
  • Invasiveness: Many of the tools introduce additional requirements on the programs they analyze, such as particular build structures (or program representations, such as C/C++ or certain binary formats only) and special DSLs for indicating secret and public values. This makes many tools inapplicable to newer projects written in languages like Python, Rust, and Go.
  • Overhead: Many of the tools involve significant learning curves that would take up too much of a developer’s already limited time. Many also require a significant amount of time to use, even after mastering them, in terms of manually reviewing and eliminating false positives and negatives, tuning the tools to increase the true positive rate, and so forth.

Perhaps unintuitively, awareness of tools did not correlate with their use: the majority of developers surveyed (33/44) were aware of one or more tools, but only half of that number actually chose to use them.


In the presenters’ words, there is a clear “leaky pipeline” from awareness of timing vulnerabilities (nearly universal), to tool awareness (the majority of developers), to actual tool use (a small minority of developers). Stopping those leaks will require tools to become:

  • Easier to install and use: This will reduce the cognitive overhead necessary between selecting a tool and actually being able to apply it.
  • Readily available: Tools must be discoverable without requiring intense familiarity with active cryptographic research; tools should be downloadable from well-known sources (such as public Git hosts).

Additionally, the presenters identified compilers themselves as an important new frontier: the compiler is always present, is already familiar to developers, and is the ideal place to introduce more advanced techniques like secret typing. We at Trail of Bits happen to agree!

Side channels everywhere

The great thing about side-channel vulnerabilities is their incredible pervasiveness: there’s a seemingly never-ending reservoir of increasingly creative techniques for extracting information from a target machine.

Side channels are typically described along two dimensions: passive-active (i.e., does the attacker need to interact with the target, and to what extent?) and local-remote (i.e., does the attacker need to be in the physical vicinity of the target?). Remote, passive side channels are thus the “best of both worlds” from an attacker’s perspective: they’re entirely covert and require no physical presence, making them (in principle) undetectable by the victim.

We loved the side channel described in “Lend Me Your Ear: Passive Remote Physical Side Channels on PCs” (video, paper). To summarize:

  • The presenters observed that, on laptops, the onboard microphone is physically wired to the audio interface (and, thus, to the CPU). Digital logic controls the intentional flow of data, but it’s all just wires and, therefore, unintentional noise underneath.
  • In effect, this means that the onboard microphone might act as an EM probe for the CPU itself!
  • We share our audio over the internet to potentially untrusted parties: company meetings, conferences, VoIP with friends and family, voice chat for video games, and so on…
  • …so can we extract anything of interest from that data?

The presenters offered three case studies:

  • Website identification: A victim is browsing a website while talking over VoIP, and the attacker (who is on the call with the victim) would like to know which site the victim is currently on.
    • Result: Using a convolutional neural network with a 14-way classifier (for 14 popular news websites), the presenters were able to achieve 96% accuracy.
  • Cryptographic key recovery: A victim is performing ECDSA signatures on her local machine while talking over VoIP, and the attacker would like to exfiltrate the secret key being used for signing.
    • Result: The presenters were able to use the same side-channel weakness as Minerva but without local instrumentation. Even with post-processing noise, they demonstrated key extraction after roughly 20,000 signing operations.
  • CS:GO wallhacks: A victim is playing an online first-person shooter while talking with other players over VoIP, and the attacker would like to know where the victim is physically located on the game map.
    • Result: Distinct “zebra patterns” were visually identifiable in the spectrogram when the victim was hidden behind an opaque in-game object, such as a car. The presenters observed that this circumvented standard “anticheat” mitigations, as no client code was manipulated to reveal the victim’s in-game location.


Side channels are the gift that keeps on giving: they’re difficult to anticipate and to mitigate, and they compromise cryptographic schemes that are completely sound in the abstract.

The presenters correctly note that this particular attack upends a traditional assumption about physical side channels: that they cannot be exploited remotely and, thus, can be excluded from threat models in which the attacker is purely remote.

LANGSEC in cryptographic contexts

LANGSEC is the “language-theoretic approach to security”: it attributes many (most?) exploitable software bugs to the ad hoc interpretation of potentially untrusted inputs and proposes that we parse untrusted inputs by comparing them against a formal language derived solely from valid or expected inputs.

This approach is extremely relevant to the kinds of bugs that regularly rear their heads in applied cryptography:

We saw not one, but two LANGSEC-adjacent talks at RWC this year!

Application layer protocol confusion

The presenters of “ALPACA: Application Layer Protocol Confusion—Analyzing and Mitigating Cracks in TLS Authentication” (slides, video, paper) started with their observation of a design decision in TLS: because TLS is fundamentally application and protocol independent, it has no direct notion of how the two endpoints should be communicating. In other words, TLS cares only about establishing an encrypted channel between two machines, not (necessarily) which machines or which services on those machines are actually communicating.

In their use on the web, TLS certificates are normally bound to domains, preventing an attacker from redirecting traffic intended for to But this isn’t always sufficient:

  • Wildcard certificates are common: an attacker who controls might be able to redirect traffic from if that traffic were encrypted with a certificate permitting *
  • Certificates can claim many hosts, including hosts that have been obtained or compromised by a malicious attacker: the certificate for might also permit, which an attacker might control.
  • Finally, and perhaps most interestingly, certificates do not specify which service and/or port they expect to authenticate with, creating an opportunity for an attacker to redirect traffic to a different service on the same host.

To make this easier for the attacker, hosts frequently run multiple services with protocols that roughly resemble HTTP. The presenters evaluated four of them (FTP, SMTP, IMAP, and POP3) against three different attack techniques:

  • Reflection: A MiTM attacker redirects a cross-origin HTTPS request to a different service on the same host, causing that service to “reflect” a trusted response back to the victim.
  • Download: An attacker stores malicious data on a service running the same host and tricks a subsequent HTTPS request into downloading and presenting that data, similarly to a stored XSS attack.
  • Upload: An attacker compromises a service running on the same host and redirects a subsequent HTTPS request to the service, causing sensitive contents (such as cookie headers) to be uploaded to the service for later retrieval.

Next, the presenters evaluated popular web browsers and application servers for FTP, SMTP, IMAP, and POP3 and determined that:

  • All browsers were vulnerable to at least two attack techniques (FTP upload + FTP download) against one or more FTP server packages.
  • Internet Explorer and Microsoft Edge were particularly vulnerable: all exploit methods worked with one or more server packages.

This is all terrible great, but how many actual servers are vulnerable? As it turns out, quite a few: of 2 million unique hosts running a TLS-enabled application server (like FTP or SMTP), over 1.4 million (or 69%) were also running HTTPS, making them potentially vulnerable to a general cross-protocol attack. The presenters further narrowed this down to hosts with application servers that were known to be exploitable (such as old versions of ProFTPD) and identified over 114,000 HTTPS hosts that could be attacked.

So what can we do about it? The presenters have some ideas:

  • At the application server level, there are some reasonable countermeasures we could apply: protocols like FTP should be more strict about what they accept (e.g., refusing to accept requests that look like HTTP) and should be more aggressive about terminating requests that don’t resemble valid FTP sessions.
  • At the certificate level, organizations should be wary of wildcard and multi-domain certificates and should avoid shared hosts for TLS-enabled applications.
  • Finally, at the protocol level, TLS extensions like ALPN allow clients to specify the application-level protocol they expect to communicate with, potentially allowing the target application server (like SMTP) to reject the redirected connection. This requires application servers not to ignore ALPN, which they frequently do.


  • Despite being known and well understood for years, cross-protocol attacks are still possible today! Even worse, trivial scans reveal hundreds of thousands of exploitable application servers running on shared HTTPS, making the bar for exploitation very low.
  • This space is not fully explored: application protocols like SMTP and FTP are obvious targets because of their similarity to HTTP, but newer protocols are also showing up in internet services such as VPN protocols and DTLS.

“Secure in isolation, vulnerable when composed”: ElGamal in OpenPGP

The presenters of “On the (in)security of ElGamal in OpenPGP” (slides, video, paper) covered another LANGSEC-adjacent problem: standards or protocols that are secure in isolation but insecure when interoperating.

The presenters considered ElGamal in implementations of OpenPGP (RFC 4880) due to its (ahem) unique status among asymmetric schemes required by OpenPGP:

  • Unlike RSA (PKCS#1) and ECDH (RFC 6637), ElGamal has no formal or official specification!
  • The two “official” references for ElGamal are the original paper itself and the 1997 edition of the Handbook of Applied Cryptography, which disagree on parameter selection techniques! The OpenPGP RFC cites both; the presenters concluded that the RFC intends for the original paper to be authoritative.

(By the way, did you know that this is still a common problem for cryptographic protocols, including zero-knowledge, MPC, and threshold schemes? If that sounds scary (it is) and like something you’d like to avoid (it is), you should check out ZKDocs! We’ve done the hard work of understanding best practices for protocol and scheme design in the zero-knowledge ecosystem so that you don’t have to.)

The presenters evaluated three implementations of PGP that support ElGamal key generation (GnuPG, Botan, and libcrypto++) and found that none obey RFC 4880 with regard to parameter selection: all three use different approaches to prime generation.

But that’s merely the beginning: many OpenPGP implementations are proprietary or subject to long-term changes, making it difficult to evaluate real-world deviation from the standard just from open-source codebases. To get a sense for the real world, the presenters surveyed over 800,000 real-world ElGamal keys and found that:

  • The majority of keys appear to be generated using a “safe primes” technique.
  • A large minority appear to be using Lim-Lee primes.
  • A much smaller minority appear to be using Schnorr or similar primes.
  • Just 5% appear to be using “quasi-safe” primes, likely indicating an intent to be compliant with RFC 4880’s prime generation requirements.

Each of these prime generation techniques is (probably) secure in isolation…but not when composed: each of Go, GnuPG, and libcrypto++’s implementations of encryption against an ElGamal public key were vulnerable to side-channel attacks enabling plaintext recovery because of the unexpected prime generation techniques used for ElGamal keys in the wild.

The bottom line: of the roughly 800,000 keys surveyed, approximately 2,000 were vulnerable to practical plaintext recovery because of the “short exponent” optimization used to generate them. To verify the feasibility of their attack, the presenters successfully recovered an encrypted message’s plaintext after about 2.5 hours of side-channel analysis of GPG performing encryptions.


  • ElGamal is an old, well-understood cryptosystem, one whose parameters and security properties are straightforward on paper (much like RSA) but is subject to significant ambiguity and diversity in real-world implementations. Standards matter for security, and ElGamal needs a real one!
  • Cryptosystem security is pernicious: it’s not enough to be aware of potential side channels via your own inputs; signing and encryption schemes must also be resistant to poorly (or even just unusually) generated keys, certificates, etc.

Honorable mentions

There were a lot of really great talks at this year’s RWC—too many to highlight in a single blog post. Some others that we really liked include:

  • “Zero-Knowledge Middleboxes” (slides, video, paper): Companies currently rely on TLS middleboxes (and other network management techniques, like DNS filtering) to enforce corporate data and security policies. Middleboxes are powerful tools, ones that are subject to privacy abuses (and subsequent user circumvention by savvy users, undermining their efficacy). This talk offers an interesting (albeit still experimental) solution: use a middlebox to verify a zero-knowledge proof of policy compliance, without actually decrypting (and, therefore, compromising) any TLS sessions! The key result of this solution is compliance with a DNS policy without compromising a user’s DNS-over-TLS session, with an overhead of approximately five milliseconds per verification (corresponding to one DNS lookup).
  • “Commit Acts of Steganography Before It’s Too Late” (slides, video): Steganography is the ugly duckling of the cryptographic/cryptanalytic world, with research on steganography and steganalysis having largely dried up. Kaptchuk argues that this decline in interest is unwarranted and that steganography will play an important role in deniable communication with and within repressive states. To this end, the talk proposes Meteor, a cryptographically secure steganographic scheme that uses a generative language model to hide messages within plausible-looking human-language sentences.

See you in 2023!

Improving the state of go-fuzz

By Christian Presa Schnell

During my winternship, I used the findings from recent Go audits to make several improvements to go-fuzz, a coverage-based fuzzer for projects written in Go. I focused on three enhancements to improve the effectiveness of Go fuzzing campaigns and provide a better experience for users. I contributed to fixing type alias issues, integrating dictionary support, and developing new mutation strategies.

What is go-fuzz?

go-fuzz finds software bugs by providing random input to a program and monitoring it for errors. It consists of two main components: go-fuzz and go-fuzz-build. The go-fuzz-build component is responsible for the source code instrumentation. And once the source code of the target program is instrumented, the code is compiled and the binary is then used by go-fuzz for the fuzzing campaign.

A user first instruments the source code to the tool, allowing for information, such as the coverage at runtime, to be extracted. Then, go-fuzz executes the program with a given set of inputs that are mutated with each interaction as it tries to increase coverage and trigger unexpected behaviors that lead to crashes. The harness, also provided by the user, is the fuzzing entry point and calls the function to be fuzzed. It returns a value to go-fuzz indicating whether the input should be dropped or promoted within the input corpus.

go-fuzz has been very successful in discovering new bugs, and the tool has helped find more than 200 bugs that are highlighted on GitHub and many more during Trail of Bits audits.

Instrumentation with type aliases

My first task was to investigate the root cause of a bug that crashed go-fuzz and propose a fix for it. In particular, the crash error we obtained was undefined: fs in fs.FileMode. A more detailed description of the bug can be found in issue dvyukov/go-fuzz#325.

This bug occurs only with Go version 1.16 and higher and when interacting with the file system via the os package rather than with fs. As many projects interact with the file system, this issue is of great importance, and therefore, the improvements I proposed are essential to increasing go-fuzz’s usability.

Even though there’s a workaround for this bug, it is still necessary to modify the code by adding statements using the fs package. This is not an ideal solution, since it requires one to manually modify the code, which may influence the fuzzing harness.

Another way of solving the problem is to simply use Go version 1.15. However, this is also problematic, since it is not always possible to run the project that we want to fuzz with a lower version of Go. Therefore, we wanted to find a thorough solution that would not require these constraints.

The error didn’t point to the root cause of the crash, so we needed to perform a detailed analysis.

Bug reproduction

I developed a minimal program that crashes go-fuzz-build based on the GitHub issue. The function that caused the bug, which is called HomeDir below, gets the stats for a file and checks whether the permission for the file and the current user is writable.

package homedir
import (
func HomeDir() {
	p := "text.txt"
	info, _ := os.Stat(p)
	if info.Mode().Perm()&(1<<(uint(7))) & 1 != 0 {

To successfully instrument the program with go-fuzz-build, we needed to provide a fuzzing harness. Since we simply wanted to instrument the program, the harness did not require the HomeDir function to be invoked. So I implemented the harness in another file but in the same package as the HomeDir function so that the function could be instrumented without being called, allowing us to investigate the issue.

package homedir

func Fuzz(data []byte) int {
	return 1

After looking at this code, the cause of the go-fuzz-build crash seemed even more confusing. The crash was related to the fs package, but the fs package was not used in the program:

failed to execute go build: exit status 2
homedir.go:13: undefined: fs in fs.FileMode

Bug triage

Interestingly, this bug was not introduced by a particular commit to go-fuzz, but it appeared when go-fuzz was used with a Go version of 1.16 and higher, which means that some change from Go version 1.15 to 1.16 had to be the reason for this issue.
Since the crash occurred when compiling the instrumented source code, figuring out where go-fuzz-build crashed was easy. The instrumentation was somehow faulty and produced non-valid code.

//line homedir.go:1
package homedir
//line homedir.go:1
import (
//line homedir.go:1
	_go_fuzz_dep_ "go-fuzz-dep"
//line homedir.go:1
import (
//line homedir.go:8
func HomeDir() {
//line homedir.go:8
	p := "text.txt"
	info, _ := os.Stat(p)
//line homedir.go:13
	if func() _go_fuzz_dep_.Bool {
//line homedir.go:13
		__gofuzz_v1 := fs.FileMode(info.Mode().Perm() & (1 << 7))
//line homedir.go:13
		_go_fuzz_dep_.Sonar(__gofuzz_v1, 0, 725889)
//line homedir.go:13
		return __gofuzz_v1 != 0
//line homedir.go:14
	}() == true {
//line homedir.go:14
	} else {
//line homedir.go:14
//line homedir.go:14
//line homedir.go:14
//line homedir.go:15
var _ = _go_fuzz_dep_.CoverTab

Root cause analysis

The expression in line 13 of the original program, (info.Mode().Perm() & (1 << 7)), is explicitly converted to the fs.FileMode type. This type conversion is one of the modifications performed by the instrumentation. The type conversion is correct by itself, since the info.Mode().Perm() has the type fs.FileMode. The real problem is that while the fs package is used, there lacks an import for it. Therefore, the compiler cannot resolve the type conversion, and the compilation fails.

However, this does not answer the question of why go-fuzz-build crashes in Go version 1.16 and up and not in lower versions. We found the answer to this question by looking at the differences between 1.15 and 1.16: the FileMode type in the os package changed from type FileMode uint32 in Go 1.15 to type FileMode = fs.FileMode in Go 1.16.

Essentially, the FileMode type changed from a type definition with the underlying uint32 type to a type alias with a type target defined in the fs package. A type alias does not create a new type. Instead, it just defines a new name for the original type. For this reason, the typechecker used by go-fuzz-build identifies fs.FileMode as the type that should be used for the type conversion instead of the type alias defined in the os package. This should not be an issue if the type alias and the original type are in the same package, but if there are multiple packages, the corresponding import statements should be added to the instrumented code.

Proposed fix

Ideally, a fix for this issue should be future-proof. While it is possible to hard-code the case of the fs.FileMode, this is not sufficient since other type aliases might be introduced in future versions of Go or in external packages used by the fuzzed code, so more fixes would be required. My proposed fix addresses this problem.

The fix I proposed consists of the following steps. First, analyze the typechecker output for every instrumented file for types that are defined in non-imported packages. If such a type exists, an import statement will be added with the corresponding package. However, there might be cases in which such a type exists but the instrumentation does not use it to perform a type conversion. That would render the import statement that was added to an unused import, and consequently, the compiler would refuse to compile the code. Therefore, it is essential to remove the unused added imports. For that purpose, goimports, a program that optimizes imports, will be executed before compilation. Then, the compilation succeeds.

Because the initializer is imported by the package in which the alias is defined, the package is guaranteed to execute only once. Therefore, we don’t have to worry about the import statement changing the semantics of the source code.

Dictionary support

​The mutation engines of general-purpose fuzzers are designed to be very effective when mutating binary data formats, such as images or compressed data. However, general-purpose fuzzers don't perform very well when mutating inputs for syntax-aware programs, as they will accept only inputs that are valid for the underlying grammar. Common examples for such targets are programs that parse human-readable data formats like languages (SQL) or text-based protocols (HTTP, FTP).

Most of the time, in order to achieve good results with such human-readable targets, you have to build a custom mutation engine that adheres to the syntax, which is complex and time-consuming.

Another approach is to use fuzzing dictionaries. Fuzzing dictionaries are a collection of interesting keywords that are relevant for the required syntax. This approach allows a general-purpose mutation engine to insert keywords into the input at random positions. This gives the fuzzer more valid inputs than it would get with mutations on the bytes.

Up to this point, go-fuzz’s mutation engine generated a list of keywords using a technique called “token capture” and inserted these keywords into the mutated inputs. This technique extracts interesting strings directly from the instrumented code by looking for hard-coded values. The reasoning behind this approach is that if the input requires a certain syntax, there would be hard-coded statements within the program that check the validity of the input. While token capturing is correct, it has an important drawback: not only are the relevant keywords extracted, but so are keywords that are irrelevant to the input, such as log message strings. This is problematic, since adding noise to the string list reduces the fuzzer’s overall effectiveness.

A different approach is to let the user provide dictionaries containing interesting keywords relevant for the specific syntax used by the targeted program. I proposed a modification that allows one to pass a -dict parameter to go-fuzz and to provide a dictionary file containing the keywords (in the AFL/libFuzzer format) and a level parameter to provide more fine-grained control over the tokens in the dictionary file.

The following example illustrates the syntax of a token dictionary for SQL:

function_abs=" abs(1)"
function_avg=" avg(1)"
function_changes=" changes()"
function_char=" char(1)"
function_coalesce=" coalesce(1,1)"
function_count=" count(1)"
function_date=" date(1,1,1)"

By adopting the same syntax for the dictionaries used by AFL and libFuzzer, we can reuse preexisting dictionaries containing important keywords for specific fuzzing targets without having to redefine the keywords in a new format.

New mutation strategies

A fuzzer’s effectiveness depends on the quality of its mutation algorithms and whether they lead to more diverse inputs and increased code coverage. To this effect, I developed three new mutation strategies for go-fuzz.

Insert repeated bytes
This strategy mutates the fuzzer’s previous input by inserting a random byte that is repeated a random number of times. This is a variation of the insert random bytes strategy from libFuzzer that increases the likelihood of a situation in which certain bytes are repeated.

Shuffle bytes
Another mutation strategy inspired by libFuzzer, shuffle bytes selects a random subpart of the input with a random length and shuffles it using the Fisher-Yates shuffling algorithm.

Little endian base 128
Last but not least, I improved the InsertLiteral mutation strategy by implementing the little endian base 128 (LEB128) encoding. Similar to the process of inserting string literals discussed in the section on dictionaries above, with this improved strategy, the mutation engine scans the source code for hard-coded integer values and inserts them into the input to mutate it.

Inserting strings into the input is straightforward, as strings have a direct byte representation. This is not the case for integer values, since there are multiple formats to store the values in bytes depending on the length of the integer (8, 16, 32, and 64 bits) and on the endianness on which the integer is stored, either little endian or big endian. For this reason, the mutation engine needs to be able to insert the integer literals in different formats, since they might be used by the fuzzed program.

LEB128 is a variable-length encoding algorithm that is able to store integers by using very few bytes. In particular, LEB128 can store small integer values without leading zero bytes as well as arbitrarily big integers. Additionally, there are two different variants of the LEB128 encoding that have to be implemented separately: unsigned LEB128 and signed LEB128.

Because of its efficiency, this encoding is very popular and is used in many projects, like LLVM, DWARF, and Android DEX. Therefore, go-fuzz’s support of it is very useful.

The future of go-fuzz

The recent release of Go version 1.18 introduced first-party support for fuzzing. Hence, go-fuzz has reached the end of its life and future improvements will most likely be limited to bug fixes. Nonetheless, enhancing go-fuzz is still useful, as it is a well-known solution with an ecosystem of helpful tools, like trailofbits/go-fuzz-utils, and it may still be used in older projects.

I hope that the proposed improvements will be adopted upstream into go-fuzz so that everyone can benefit from them to discover and fix new bugs. Even though Go’s new built-in fuzzer will gain popularity due to its ease of use, we hope that Go developers will continue to draw inspiration from go-fuzz, which has been a huge success so far. It will certainly be interesting to see what the future holds for the fuzzing of Go projects.

I am very grateful for the opportunity to have worked as an intern at Trail of Bits and on the go-fuzz project—it was a great learning experience. I would like to thank my mentors, Dominik Czarnota and Rory Mackie, for their guidance and support.

Amarna: Static analysis for Cairo programs

By Filipe Casal

We are open-sourcing Amarna, our new static analyzer and linter for the Cairo programming language. Cairo is a programming language powering several trading exchanges with millions of dollars in assets (such as dYdX, driven by StarkWare) and is the programming language for StarkNet contracts. But, not unlike other languages, it has its share of weird features and footguns. So we will first provide a brief overview of the language, its ecosystem, and some pitfalls in the language that developers should be aware of. We will then present Amarna and discuss how it works, and what it finds, and what we plan to implement down the line.

Introduction to Cairo

Why do we need Cairo?

The purpose of Cairo, and similar languages such as Noir and Leo, is to write “provable programs,” where one party runs the program and creates a proof that it returns a certain output when given a certain input.

Suppose we want to outsource a program’s computation to some (potentially dishonest) server and need to guarantee that the result is correct. With Cairo, we can obtain a proof that the program output the correct result; we need only to verify the proof rather than recomputing the function ourselves (which would defeat the purpose of outsourcing the computation in the first place).

In summary, we take the following steps:

  • Write the function we want to compute.
  • Run the function on the worker machine with the concrete inputs, obtain the result, and generate a proof of validity for the computation.
  • Validate the computation by validating the proof.

The Cairo programming language

As we just explained, the Cairo programming model involves two key roles: the prover, who runs the program and creates a proof that the program returns a certain output, and the verifier, who verifies the proofs created by the prover.

However, in practice, Cairo programmers will not actually generate or verify the proofs themselves. Instead, the ecosystem includes these three pillars:

The fact registry is a database that stores program facts, or values computed from hashes of programs and of their outputs; creating a program fact is a way to bind a program to its output.

This is the basic workflow in Cairo:

  • A user writes a program and submits its trace to the SHARP (via the Cairo playground or the command cairo-sharp).
  • The SHARP creates a STARK proof for the program trace and submits it to the proof verifier contract.
  • The proof verifier contract validates the proof, and, if valid, writes the program fact to the fact registry.
  • Any other user can now query the fact registry contract to check whether that program fact is valid.

There are two other things to keep in mind:

  • Memory in Cairo is write-once: after a value is written to memory, it cannot be changed.
  • The assert statement assert a = b will behave differently depending on whether a is initialized: if a is uninitialized, the assert statement assigns b to a;  if a is initialized, the assert statement asserts that a and b are equal.

Although the details of Cairo’s syntax and keywords are interesting, we will not cover these topics in this post. The official Cairo documentation and Perama’s notes on Cairo are a good starting point for more information.

Setting up and running Cairo code

Now that we’ve briefly outlined the Cairo language in general, let’s discuss how to set up and run Cairo code. Consider the following simple Cairo program. This function computes the Pedersen hash function of a pair of numbers, (input, 1), and outputs the result in the console:

# validate_hash.cairo
%builtins output pedersen

from starkware.cairo.common.cairo_builtins import HashBuiltin
from starkware.cairo.common.hash import hash2
from starkware.cairo.common.serialize import serialize_word

func main{output_ptr:felt*, pedersen_ptr : HashBuiltin*}():
    local input
    %{ ids.input = 4242 %}

    # computes the Pedersen hash of the tuple (input, 1)
    let (hash) = hash2{hash_ptr=pedersen_ptr}(input, 1)

    # prints the computed hash

    return ()

To set up the Cairo tools, we use a Python virtual environment:

$ mkvirtualenv cairo-venv
(cairo-venv)$ pip3 install cairo-lang

Then, we compile the program:

# compile the validate_hash.cairo file,
# writing the output to compiled.json
$ cairo-compile validate_hash.cairo --output compiled.json

Finally, we run the program, which will output the following value:

# run the program
$ cairo-run --program=compiled.json --print_output --layout small
Program output:

This value is the field element corresponding to the Pedersen hash of (4242, 1).

Now, suppose that we change the input from 4242 to some hidden value and instead provide the verifier with the following output:

$ cairo-run --program=compiled.json --print_output --layout small
Program output:

Why would the verifier believe us? Well, we can prove that we know the hidden value that will make the program return that output!

To generate the proof, we need to compute the hash of the program to generate the program fact. This hash does not depend on the input value, as the assignment is inside a hint (a quirk of Cairo that we’ll discuss later in this post):

# compute the program's hash
$ cairo-hash-program --program compiled.json                                                
# compute program fact
from web3 import Web3

def compute_fact(program_hash, program_output):
    fact = Web3.solidityKeccak(['uint256', 'bytes32'],
                               [program_hash, Web3.solidityKeccak(['uint256[]'], [program_output])])

    h = hex(int.from_bytes(fact, 'big'))
    return h

# hash and output computed above
program_hash = 0x3c034247e8bf20ce12c878793cd47c5faa6f5470114a33ac62a90b43cfbb494
program_output = [1134422549749907873058035660235532262290291351787221961833544516346461369884]

print(compute_fact(program_hash, program_output))
# 0xe7551a607a2f15b078c9ae76d2641e60ed12f2943e917e0b1d2e84dc320897f3

Then, we can check the validity of the program fact by using the fact registry contract and calling the isValid function with the program fact as input:

The result of calling the isValid function to check the validity of the program fact                                                                         .

To recap, we ran the program, and the SHARP created a proof that can be queried in the fact registry to check its validity, proving that we actually know the input that would cause the program to output this value.

Now, I can actually tell you that the input I used was 71938042130017, and you can go ahead and check that the result matches.

You can read more about the details of this process in Cairo’s documentation for blockchain developers and more about the fact registry in this article by StarkWare.

Cairo features and footguns

Cairo has several quirks and footguns that can trip up new Cairo programmers. We will describe three Cairo features that are easily misused, leading to security issues: Cairo hints, the interplay between recursion and underconstrained structures, and non-deterministic jumps.


Hints are special Cairo statements that basically enable the prover to write arbitrary Python code. Yes, the Python code written in a Cairo hint is literally exec’d!

Hints are written inside %{ %}. We already used them in the first example to assign a value to the input variable:

%builtins output

from starkware.cairo.common.serialize import serialize_word

func main{output_ptr:felt*}():

    # arbitrary python code
       import os

    # prints 1

    return ()


$ cairo-compile hints.cairo --output compiled.json               
$ cairo-run --program=compiled.json --print_output --layout small
Program output:

Because Cairo can execute arbitrary Python code in hints, you should not run arbitrary Cairo code on your own machine—doing so can grant full control of your machine to the person who wrote the code.

Hints are commonly used to write code that is only executed by the prover. The proof verifier does not even know that a hint exists because hints do not change the program hash. The following function from the Cairo playground computes the square root of a positive integer, n:

func sqrt(n) -> (res):
    local res

    # Set the value of res using a python hint.
        import math

        # Use the ids variable to access the value
        # of a Cairo variable.
        ids.res = int(math.sqrt(ids.n))

    # The following line guarantees that 
    # `res` is a square root of `n`
    assert n = res * res
    return (res)

The program computes the square root of n by using the Python math library inside the hint. But at verification time, this code does not run, and the verifier needs to check that the result is actually a square root. So, the function includes a check to verify that n equals res * res before the function returns the result.

Underconstrained structures

Cairo lacks support for while and for loops, leaving programmers to use good old recursion for iteration. Let’s consider the “Dynamic allocation” challenge from the Cairo playground. The challenge asks us to write a function that, given a list of elements, will square those elements and return a new list containing those squared elements:

%builtins output

from starkware.cairo.common.alloc import alloc
from starkware.cairo.common.serialize import serialize_word

# Fills `new_array` with the squares of the
# first `length` elements in `array`.
func _inner_sqr_array(array : felt*, new_array : felt*,
                                        length : felt):
    # recursion base case
    if length == 0:
        return ()

    # recursive case: the first element of the new_array will
    # be the first element of the array squared
    # recall that the assert will assign to the
    # `new_array` array at position 0
    # since it has not been initialized
    assert [new_array] = [array] * [array]

    # recursively call, advancing the arrays
    # and subtracting 1 to the array length
    _inner_sqr_array(array=array + 1, 
                     new_array=new_array + 1,
                     length=length - 1)
    return ()

func sqr_array(array : felt*, length : felt) ->
                                          (new_array : felt*):
    # allocates an arbitrary length array
    let (local res_array) = alloc()

    # fills the newly allocated array with the squares
    # of the elements of array
    _inner_sqr_array(array, res_array, length)
    return (res_array)

func main{output_ptr : felt*}():

    # Allocate a new array.
    let (local array) = alloc()

    # Fill the new array with field elements.
    assert [array] = 1
    assert [array + 1] = 2
    assert [array + 2] = 3
    assert [array + 3] = 4

    let (new_array) = sqr_array(array=array, length=4)

    # prints the array elements
    serialize_word([new_array + 1])
    serialize_word([new_array + 2])
    serialize_word([new_array + 3])

    return ()

Running this code will output the numbers 1, 4, 9, and 16 as expected.

But what happens if an error (or an off-by-one bug) occurs and causes the sqr_array function to be called with a zero length?

func main{output_ptr : felt*}():
    # Allocate a new array.
    let (local array) = alloc()
    # Fill the new array with field elements.
    assert [array] = 1
    assert [array + 1] = 2
    assert [array + 2] = 3
    assert [array + 3] = 4

    let (new_array) = sqr_array(array=array, length=0)
    serialize_word([new_array + 1])
    serialize_word([new_array + 2])
    serialize_word([new_array + 3])

    return ()

Basically, the following happens:

  1. The sqr_array function will allocate res_array and call _inner_sqr_array(array, res_array, 0).
  2. _inner_sqr_array will compare the length with 0 and return immediately.
  3. sqr_array will return the allocated (but never written to) res_array.

So what happens when you call serialize_word on the first element of new_array?
Well, it depends… Running the code as-is will result in an error because the value of new_array is unknown:

The error that occurs after running the above code as-is                                                                                     .

However, remember that usually you won’t be running code; you’ll be verifying proofs that a program outputs some value. And I can actually provide you proof that this program can output any four values that you would like! You can compute all of this yourself to confirm that I’m not cheating:

$ cairo-compile recursion.cairo --output compiled.json    
$ cairo-hash-program --program compiled.json             

The following fact binds this program with the output [1, 3, 3, 7]:

# hash and output computed above
program_hash = 0x01eb05e1deb7ea9dd7bd266abf8aa8a07bf9a62146b11c0bd1da8bb844ff2479
program_output = [1, 3, 3, 7]

print(compute_fact(program_hash, program_output))
# 0x4703704b8f7411d5195e907c2eba54af809cb05eebc65eb9a9423964409a8a4d

This fact is valid according to the fact registry contract:

The fact registry’s verification of the program fact                                                           .

So what is happening here?

Well, since the returned array is only allocated and never written to (because its length is 0, the recursion stops as soon as it starts), the prover can write to the array in a hint, and the hint code won’t affect the program’s hash!

The “evil” sqr_array function is actually the following:

func sqr_array(array : felt*, length : felt) -> 
                                           (new_array : felt*):
    let (local res_array) = alloc()

    %{  # write on the result array if the length is 0
        if ids.length == 0:
            data = [1, 3, 3, 7]
            for idx, d in enumerate(data):
                memory[ids.res_array + idx] = d

    _inner_sqr_array(array, res_array, length)
    return (res_array)

In a nutshell, if there is some bug that makes the length of the array 0, a malicious prover could create any arbitrary result he wants.

You might also ask yourself why, in general, a malicious prover can’t simply add a hint at the end of the program to change the output in any way he wishes. Well, he can, as long as that memory hasn’t been written to before; this is because memory in Cairo is write-once, so you can only write one value to each memory cell.

This pattern of creating the final result array is necessary due to the way memory works in Cairo, but it also carries the risk of a security issue: a simple off-by-one mistake in tracking the length of this array can allow a malicious prover to arbitrarily control the array memory.

Nondeterministic jumps

Nondeterministic jumps are another code pattern that can seem unnatural to a programmer reading Cairo for the first time. They combine hints and conditional jumps to redirect the program’s control flow with some value. This value can be unknown to the verifier, as the prover can set it in a hint.

For example, we can write a program that checks whether two elements, x and y, are equal in the following contrived manner:

func are_equal(x, y) -> (eq):
    # sets the ap register to True or False depending on 
    # the equality of x and y
    %{ memory[ap] = ids.x == ids.y %}
    # jump to the label equal if the elements were equal
    jmp equal if [ap] != 0; ap++

    # case x != y
    return (0)
    # case x == y
    return (1)

Running this program will return the expected result (0 for different values and 1 for equal values):

func main{output_ptr : felt*}():

    let (res) = are_equal(1, 2)
    serialize_word(res) # -> 0

    let (res) = are_equal(42, 42)
    serialize_word(res) # -> 1


However, this function is actually vulnerable to a malicious prover. Notice how the jump instruction depends only on the value written in the hint:

    %{ memory[ap] = ids.x == ids.y %}
    jmp equal if [ap] != 0; ap++

And we know that hints are fully controllable by the prover! This means that the prover can write any other code in that hint. In fact, there are no guarantees that the prover actually checked whether x and y are equal, or even that x and y were used in any way. Since there are no other checks in place, the function could return whatever the prover wants it to.

As we saw previously, the program hash does not consider code in hints; therefore, a verifier can’t know whether the correct hint was executed. The malicious prover can provide proofs for any possible output values of the program ((0, 0), (1, 1), (0, 1), or (1, 0)) by changing the hint code and submitting each proof to the SHARP.

So how do we fix it?

Whenever we see nondeterministic jumps, we need to make sure that the jumps are valid, and the verifier needs to validate the jumps in each label:

func are_equal(x, y) -> (eq):
    %{ memory[ap] = ids.x == ids.y %}
    jmp equal if [ap] != 0; ap++

    # case x != y
    # we are in the not_equal case
    # so we can't have equal x and y
    if x == y:
	# add unsatisfiable assert
        assert x = x + 1
    return (0)
    # case x == y
    # we are in the equal case
    # so x and y must equal
    assert x = y
    return (1)

In this case, the function is simple enough that the code needs only an if statement:

func are_equal(x, y) -> (eq):
    if x == y:
        return (1)
        return (0)

Amarna, our Cairo static analyzer

While auditing Cairo code, we noticed there was essentially no language support of any form, except for syntax highlighting in VScode. Then, as we found issues in the code, we wanted to make sure that similar patterns were not present elsewhere in the codebase.

We decided to build Amarna, a static analyzer for Cairo, to give us the ability to create our own rules and search code patterns of interest to us—not necessarily security vulnerabilities, but any security-sensitive operations that need to be analyzed or need greater attention when reviewing code.

Amarna exports its static analysis results to the SARIF format, allowing us to easily integrate them into VSCode with VSCode’s SARIF Viewer extension and to view warnings underlined in the code:

Cairo code with an underlined dead store (left) and the SARIF Viewer extension showing the results from Amarna (right)                                                                   .

How does Amarna work?

The Cairo compiler is written in Python using lark, a parsing toolkit, to define a grammar and to construct its syntax tree. Using the lark library, it is straightforward to build visitors to a program’s abstract syntax tree. From here, writing a rule is a matter of encoding what you want to find in the tree.

The first rule we wrote was to highlight all uses of arithmetic operations +, -, *, and /. Of course, not all uses of division are insecure, but with these operations underlined, the developer is reminded that Cairo arithmetic works over a finite field and that division is not integer division, as it is in other programming languages. Field arithmetic underflows and overflows are other issues that developers need to be aware of. By highlighting all arithmetic expressions, Amarna helps developers and reviewers to quickly zoom in on locations in the codebase that could be problematic in this regard.

The rule to detect all divisions is very simple: it basically just creates the result object with the file position and adds it to the analysis results:

class ArithmeticOperationsRule(GenericRule):
    Check arithmetic operations:
        - reports ALL multiplications and divisions
        - reports ONLY addition and subtraction that
          do not involve a register like [ap - 1]

    RULE_TEXT = "Cairo arithmetic is defined over a finite field and has potential for overflows."
    RULE_PREFIX = "arithmetic-"

    def expr_div(self, tree: Tree) -> None:
        result = create_result(
            self.RULE_PREFIX +,

As we looked for more complex code patterns, we developed three classes of rules:

  • Local rules analyze each file independently. The rule described above, to find all arithmetic operations in a file, is an example of a local rule.
  • Gatherer rules analyze each file independently and gather data to be used by post-processing rules. For example, we have rules to gather all declared functions and all called functions.
  • Post-processing rules run after all files are analyzed and use the data gathered by the gatherer rules. For example, after a gatherer rule finds all declared functions and all called functions in a file, a post-processing rule can find all unused functions by identifying functions that are declared but never called.

So what does Amarna find?

So far, we have implemented 10 rules, whose impacts range from informational rules that help us audit code (marked as Info) to potentially security-sensitive code patterns (marked as Warning):


# Rule What it finds Impact Precision
1 Arithmetic operations All uses of arithmetic operations +, -, *, and / Info High
2 Unused arguments Function arguments that are not used in the function they appear in Warning High
3 Unused imports Unused imports Info High
4 Mistyped decorators Mistyped code decorators Info High
5 Unused functions Functions that are never called Info Medium
6 Error codes Function calls that have a return value that must be checked Info High
7 Inconsistent assert usage Asserts that use the same constant in different ways (e.g., assert_le(amount, BOUND) and assert_le(amount, BOUND - 1)) Warning High
8 Dead stores Variables that are assigned values but are not used before a return statement Info Medium
9 Potential unchecked overflows Function calls that ignore the returned overflow flags (e.g., uint256_add) Warning High
10 Caller address return value Function calls to the get_caller_address function Info High

While most of these rules fall into the informational category, they can definitely have security implications: for example, failing to check the return code of a function can be quite serious (imagine if the function is a signature verification); the Error codes rule will find some of these instances.

The Unused arguments rule will find function arguments that are not used in the function they appear in, a common pattern in general purpose programming language linters; this generally indicates that there was some intention of using the argument, but it was never actually used, which might also have security implications. The rule would have found this bug in an OpenZeppelin contract a few months ago which was due to an unchecked nonce, passed as an argument to the execute function.

Going forward

As Cairo is still a developing ecosystem, enumerating all vulnerable patterns can be difficult. We plan to add more rules moving forward, and in the medium/long term, we plan to add more complex analysis features, such as data-flow analysis.

In the meantime, if you have any ideas for vulnerable code patterns, we are more than happy to review feature requests, new rules, bug fixes, issues, and other contributions from the community.

The Frozen Heart vulnerability in PlonK

By Jim Miller

In part 1 of this blog post, we disclosed critical vulnerabilities that break the soundness of multiple implementations of zero-knowledge proof systems. This class of vulnerability, which we dubbed Frozen Heart, is caused by insecure implementations of the Fiat-Shamir transformation that allow malicious users to forge proofs for random statements. In part 2 and part 3 of the blog post, we demonstrated how to exploit a Frozen Heart vulnerability in two specific proof systems: Girault’s proof of knowledge and Bulletproofs. In this part, we will look at the Frozen Heart vulnerability in PlonK.

Zero-Knowledge Proofs and the Fiat-Shamir Transformation

This post assumes that you possess some familiarity with zero-knowledge proofs. If you would like to read more about them, there are several helpful blog posts and videos available to you, such as Matt Green’s primer. To learn more about the Fiat-Shamir transformation, check out the blog post I wrote explaining it in more detail. You can also check out ZKDocs for more information about both topics.


The PlonK proof system is one of the latest ZK-SNARK schemes to be developed in the last few years. SNARKs are essentially a specialized group of zero-knowledge proof systems that have very efficient proof size and verification costs. For more information on SNARKs, I highly recommend reading Zcash’s blog series.

Most SNARKs require a trusted setup ceremony. In this ceremony, a group of parties generates a set of values that are structured in a particular way. Importantly, nobody in this group can know the underlying secrets for this set of values (otherwise the proof system can be broken). This ceremony is not ideal, as users of these proof systems have to trust that some group generates these values honestly and securely, otherwise the entire proof system is insecure.

Older SNARK systems need to perform this ceremony multiple times for each program. PlonK’s universal and updateable trusted setup is much nicer; only one trusted setup ceremony is required to prove statements for multiple programs.

In the next section, I will walk through the full PlonK protocol to demonstrate how a Frozen Heart vulnerability works. However, PlonK is quite complex, so this section may be difficult to follow. If you’d like to learn more about how PlonK works, I highly recommend reading Vitalik Buterin’s blog post and watching David Wong’s YouTube series.

The Frozen Heart Vulnerability in PlonK

As in Bulletproofs, PlonK contains multiple Fiat-Shamir transformations. Each of these transformations, if implemented incorrectly, could contain a Frozen Heart vulnerability. I will focus on one particular vulnerability that appears to be the most common. This is the vulnerability that affected Dusk Network’s plonk, Iden3’s SnarkJS, and ConsenSys’ gnark.

PlonK, like most SNARKs, is used to prove that a certain computation was performed correctly. In a process called arithmetization, the computation is represented in a format that the proof system can interpret: polynomials and arithmetic circuits.

We can think of the inputs to the computation as input wires in a circuit. For SNARKs, there are public and private inputs. The private inputs are wires in the circuit that only the prover sees. The public inputs are the remaining wires, seen by both the prover and the verifier.

Other public values include the values generated by the trusted setup ceremony used to generate and verify proofs. Additionally, because the prover and verifier need to agree on the computation or program to be proven, a representation of the program’s circuit is shared publicly between both parties.

In part 1 of this series, we introduced a rule of thumb for securely implementing the Fiat-Shamir transformation: the Fiat-Shamir hash computation must include all public values from the zero-knowledge proof statement and all public values computed in the proof (i.e., all random “commitment” values). This means that the public inputs, the values from the trusted setup ceremony, the program’s circuit, and all the public values computed in the proof itself must be included in PlonK’s Fiat-Shamir transformations. The Frozen Heart vulnerability affecting the Dusk Network, Iden3, and ConsenSys codebases stems from their failure to include the public inputs in these computations. Let’s take a closer look.

Overview of the PlonK Protocol

PlonK has undergone a few revisions since it was first published, so a lot of the PlonK implementations are not based on the most recent version on the Cryptology ePrint archive. In this section, I will be focusing on this version posted in March 2020, as this (or a similar version) appears to be what the SnarkJS implementation is based on. Note: this vulnerability still applies to all versions of the paper, but for different versions, the exact details of how this issue can be exploited will differ.

Before diving into the details of the protocol, I will describe it at a high level. To produce a proof, the prover first constructs a series of commitments to various polynomials that represent the computation (specifically, they are constraints corresponding to the program’s circuit). After committing to these values, the prover constructs the opening proofs for these polynomials, which the verifier can verify using elliptic curve pairings. This polynomial committing and opening is done using the Kate polynomial commitment scheme. This scheme is complex, but it’s a core part of the PlonK protocol, and understanding it is key to understanding how PlonK works. Check out this blog post for more detail.

From the prover’s perspective, the protocol has five rounds in total, each of which computes a new Fiat-Shamir challenge. The following are all of the public and private inputs for the PlonK protocol:

Public and private inputs for the PlonK proof system (source)

The common preprocessed input contains the values from the trusted setup ceremony (the x[1]1, …, xn+2[1]1 values, where x[1]1 = g1x and g1 is the generator of an elliptic curve group) and the circuit representation of the program; these are shared by both the prover and verifier. The remaining values are the input wires to the program’s circuit. The public inputs are the public wires, and the prover’s input is both the public input and the prover’s private wires. With that established, we can start the first round of the protocol, in which the prover computes the blinded wire values.

Round 1 for the prover in the PlonK protocol (source)

The prover first computes three polynomials (a, b, and c) and then evaluates them at point x by using the values from the trusted setup ceremony. The underlying x value is not known (assuming the trusted setup ceremony was performed correctly), so the prover is producing a commitment to these polynomials without leaking any information. The prover does the same for the permutation polynomial in round 2:

Round 2 for the prover in the PlonK protocol (source)

We will not describe this round in detail, as it’s especially complex and irrelevant to this vulnerability. In summary, this round is responsible for enforcing the “copy constraints,” which ensure that the values assigned to all wires are consistent (i.e., it prevents a prover from assigning inconsistent values if one gate’s output is the input to another gate). From there, the prover computes the quotient polynomial in round 3. This quotient polynomial is crucial for the Frozen Heart vulnerability, as we will see shortly.

Round 3 for the prover in the PlonK protocol (source)

After round 3, the prover then computes the evaluation challenge, zeta, using the Fiat-Shamir technique. zeta is then used to evaluate all of the polynomials constructed up to this point. This zeta challenge value is the value we will target in the Frozen Heart vulnerability. As you can see, the authors of the paper use the term “transcript,” which they explain earlier in the paper to mean the preprocessed input, the public input, and the proof values generated along the way.

(The paragraph above was updated on April 29, 2022.)

Round 4 for the prover in the PlonK protocol (source)

Finally, in round 5, the prover then generates the opening proof polynomials and returns the final proof.

Round 5 for the prover in the PlonK protocol (source)

To verify this proof, the verifier performs the following 12 steps:

Verifier in the PlonK protocol (source)

At a high level, the verifier first verifies that the proof is well formed (steps 1-4), computes a series of values (steps 5-11), and then verifies them by using a single elliptic curve pairing operation (step 12). This check essentially verifies the divisibility of the constructed polynomials and will pass only if the polynomials are structured as expected (unless there’s a Frozen Heart vulnerability, of course).

Exploiting a Frozen Heart Vulnerability in PlonK

Recall that the Frozen Heart vulnerability in question stems from a failure to include the public inputs for the program’s circuit in any of the Fiat-Shamir challenge calculations (importantly, zeta). At a high level, a malicious prover can exploit this vulnerability by picking malicious public input values (that will depend on challenge values like zeta) to trick the verifier into reconstructing a t_bar in step 8 that will pass the final verification check. Let’s look at the details.

Recall that in round 1, the prover generates three polynomials (a, b, and c) based on the prover’s public and private wire values. If we are a malicious prover trying to forge proofs, then we don’t actually have these wire values (because we haven’t actually done any computation). Therefore, in round 1, we’ll generate completely random polynomials a’, b’, and c’ and then output their evaluations, [a’]1, [b’]1, and [c’]1.

Our random a, b, and c polynomials will break the polynomial computed in round 2 because this polynomial enforces the “copy constraints,” which means that wire values are consistent. With completely random polynomials, it’s essentially guaranteed that these constraints won’t hold. Fortunately, we can actually skip these checks entirely by zeroing out the round 2 polynomial and outputting [0]1. Note: if the implementation checks for the point at infinity, the verifier might reject this proof. If this is the case, you can be clever with how you generate a’, b’, and c’ so that the copy constraints hold, but the details of how that would work are a bit gnarly. Since the SnarkJS implementation does not return an error on this zeroed out polynomial, I will continue with this approach.

Now, for round 3, remember that we don’t have the required wire values to correctly compute our polynomial t. Instead, we will generate a random polynomial t’ and output [t’lo]1, [t’mid]1, and [t’hi]1.

In round 4, we will compute the challenge zeta and all of the evaluations as we did before, except we now use a’, b’, c’, and t’. We will then use these evaluations to construct the r polynomial in the expected way and then evaluate r at zeta and output all of the evaluations. Notice that each of the evaluations computed are consistent with the polynomials that we committed to in previous rounds.

In round 5, we will perform the calculations as expected but replace a, b, c, and t with a’, b’, c’, and t’. This is the end of the protocol, but we’re not done. The last step is to compute public input values that will trick the verifier.

Tricking the verifier really comes down to this opening proof polynomial (we can ignore the other opening proof polynomial because we zeroed out polynomial z in round 2):

Computation of opening proof polynomial (source)

Since our evaluations in round 4 corresponded to the polynomials used in round 5, the polynomial inside of the large parentheses above will evaluate to zero when evaluated at zeta; therefore, it is divisible by (X - zeta), and we can compute Wzeta. Now, we need to ensure that the values we compute in the proof are recomputed the same way by the verifier. In steps 9–11 the verifier reconstructs the values in a structure identical to how the prover calculated them. Therefore, all of these values should be valid.

This just leaves us with a final problem to solve in step 8, in which the verifier calculates t_bar. Because we output t’_bar, a random polynomial evaluated at zeta rather than the t_bar that the prover is expected to compute, the verifier’s t_bar will not match our output and will not pass the verification. However, the verifier uses public inputs to compute t_bar, which are not included in the Fiat-Shamir transformation for any of the challenges (namely, zeta). So we can retrofit our public inputs so that they force the verifier to compute t’_bar in step 8.

To do so, we first plug in the t’_bar that we computed in round 4 to the left side of the equation in step 8. Then, we solve this equation for PI(zeta), a sum over the public inputs multiplied by the Lagrangian polynomial. Since it’s a sum, there are multiple combinations that will work. The simplest way is to zero out every public input except the first and solve for the wire value that results in the PI(zeta) value we solved for. We’ve now constructed our public inputs that will trick the verifier into computing t’_bar. The reason this works is because these public input values are not used to compute zeta, so we know zeta before we have to decide on these public input values.

Now, the verifier will reconstruct the same t’_bar value that we computed in round 4, and the final pairing check will pass—we’ve successfully forged a proof.

To be clear, this vulnerability is introduced only through incorrect implementations; this is not a vulnerability in the PlonK paper itself.

(This section was updated on April 29, 2022.)

Frozen Heart’s Impact on PlonK

Let’s take a step back and assess the severity of this vulnerability. First, let’s recall what we are proving. Using PlonK, the prover is proving to the verifier that he has correctly executed a particular (agreed upon) program and that the output he has given to the verifier is correct. In the previous section, we forged a proof by using completely random wire values for the program’s circuit. The verifier accepts this proof of computation as valid, even though the prover didn’t correctly compute any of the circuit’s wire values (i.e., he didn’t actually run the program). It’s worth reiterating that this post focused on only one kind of Frozen Heart vulnerability. Several similar attacks against this proof system are possible if other Fiat-Shamir transformations are done incorrectly.

Whether this is exploitable in practice is determined by how the verifier handles the public input values. Specifically, this is exploitable only if the verifier accepts arbitrary public inputs from the prover (rather than agreeing on them beforehand, for instance). If we look at the example code in the SnarkJS repository, we can see that the public inputs (publicSignals) are output by the prover (using the fullProve function) and blindly accepted by the verifier (this example is for Groth16, but the PlonK API works in the same way). In general, the exploitability of this vulnerability is implementation dependent.

Example code snippet using SnarkJS (source)

You can imagine that in most applications, this type of proof forgery is very severe. The PlonK proof system effectively gives zero guarantees that a program was executed correctly. Keep in mind that our example used random wire values, but this is not a requirement. A more sophisticated attacker could pick more clever wire values (e.g., by choosing an output of the program that benefits the attacker) and still perform this same attack.

This is the final post in our series on the Frozen Heart vulnerability. I hope that, by now, I’ve made it clear that these issues are very severe and, unfortunately, widespread. Our hope is that this series of posts creates better awareness of these issues. If you haven’t already, check out ZKDocs for more guidance, and please reach out to us if you’d like more content to be added!

The Frozen Heart vulnerability in Bulletproofs

By Jim Miller

In part 1 of this series, we disclosed critical vulnerabilities that break the soundness of multiple implementations of zero-knowledge proof systems. This class of vulnerability, which we dubbed Frozen Heart, is caused by insecure implementations of the Fiat-Shamir transformation that allow malicious users to forge proofs for random statements. In part 2, we demonstrated how to exploit a Frozen Heart vulnerability in a specific proof system: Girault’s proof of knowledge. In this post, we will demonstrate such an exploit against another proof system: Bulletproofs.

Zero-Knowledge Proofs and the Fiat-Shamir Transformation

This post assumes that you possess some familiarity with zero-knowledge proofs. If you would like to read more about them, there are several helpful blog posts and videos available to you, such as Matt Green’s primer. To learn more about the Fiat-Shamir transformation, check out the blog post I wrote explaining it in more detail. You can also check out ZKDocs for more information about both topics.


Bulletproofs are complex and efficient zero-knowledge range proofs, in which a prover proves that a certain secret value lies within a predefined range without having to reveal the value itself.

Bulletproofs operate over a cryptographic primitive known as a Pedersen commitment, a specific type of commitment scheme. Using a commitment scheme, a party can create a commitment, which binds the party to a secret value but does not reveal any information about this value. Later, this party can decommit or reveal this commitment; if revealed, and if the scheme is secure, the other party can be sure that the revealed value is the same as the original committed value.

To create a Pedersen commitment for value x, you generate a random gamma and then compute the commitment using comm = (gx)(hgamma): g and h are different generators of your finite group, and the discrete log of h relative to g is unknown (i.e., it’s infeasible to find a such that ga = h). Since Pedersen commitments are secure, the commitment does not reveal any information about x, and it’s impossible to equivocate on the commitment; this means you cannot publish different x’ and gamma’ values that produce the same commitment (assuming the discrete log between g and h is unknown).

The fact that the Pedersen commitment does not reveal any information can be problematic for complex protocols, such as those that need to guarantee that the secret value falls within a predefined range (e.g., to restrict these values to prevent integer overflows). However, using Bulletproofs, we can prove that our commitment corresponds to a value within a predefined range, such as [0, 232), without revealing the specific input value.

Unfortunately, the Bulletproofs protocol is too complex to walk through in detail in this post. To describe the Frozen Heart vulnerability, I will present the Bulletproofs protocol step-by-step, but this will be difficult to follow if you have not seen it before. If you’d like to learn more about how Bulletproofs work, you can find a number of blogs and videos online, such as this write-up.

The Frozen Heart Vulnerability in Bulletproofs

Bulletproofs have several Fiat-Shamir transformations (the exact number depends on the parameters being used), which could be abused in different ways. In this section, I will walk through how to exploit one such vulnerability. In fact, this vulnerability is the result of an error made by the authors of the original Bulletproofs paper, in which they recommend using an insecure Fiat-Shamir implementation. ING Bank’s zkrp, SECBIT Labs’ ckb-zkp, and Adjoint, Inc.’s bulletproofs were all affected by this issue.

Bulletproofs operate over Pedersen commitments, which take the form V = (gv)(hgamma). As a reminder, the goal of Bulletproofs is to prove that the secret value, v, lies in a predefined range. For the purposes of this post, we will use [0, 232) as our predefined range. Here is the formal zero-knowledge proof statement for Bulletproofs:

Formal proof statement for Bulletproofs (source)

In part 1 of this series, we introduced a rule of thumb for securely implementing the Fiat-Shamir transformation: the Fiat-Shamir hash computation must include all public values from the zero-knowledge proof statement and all public values computed in the proof (i.e., all random “commitment” values). So we want to ensure that all of the public values in this statement (g, h, V, n) are in the Fiat-Shamir calculation, along with the random commitments, which we haven’t covered yet.

As is the case in most cryptography papers, the Bulletproofs algorithm is presented in its interactive version. Here is the prover’s role, as presented in the paper:

Interactive Bulletproofs protocol from the original paper

(To be clear, this is not the full version of the protocol. There is a significant optimization to decrease the size of the proof sent in step (63) that is described later in the Bulletproofs paper, but this is irrelevant for the purposes of this exploit.)

Once the prover sends the final proof in step (63), the verifier will need to verify it by performing the following checks:

The checks performed by the verifier to verify Bulletproofs (source)

In steps (49)/(50) and (55)/(56) of the prover’s protocol, three different Fiat-Shamir challenge values need to be generated: x, y, and z. However, the authors recommend using an insecure computation for these values:

Insecure Fiat-Shamir challenge computation recommended in the original Bulletproofs paper (source)

According to the authors, we should set y = Hash(A,S), z = Hash(A,S,y), and (following their description) x = Hash(A,S,y,z,T1,T2). This violates our rule of thumb: none of the public values from the statement—most importantly, V—are included. This is a Frozen Heart vulnerability! This vulnerability is critical; as you may have guessed, it allows malicious provers to forge proofs for values that actually lie outside of the predefined range.

Now, to actually exploit this, a malicious prover can do the following:

  • Set v equal to any value in the range. So let’s just say v = 3.
  • Pick a random gamma value.
  • Generate aL, aR, A, and S as expected (according to steps (41), (42), (44), and (47), respectively) using these v and gamma values.
  • Compute t1 and t2 as described in the Bulletproofs paper (this is not in the above figure but is described in text in the paper), and then compute values t1 and t2 by randomly generating numbers. When computing T1 and T2, replace t1 with t1 and t2 with t2. So, to be clear, we set Ti = (gt’_i)(htau_i) (as expected, but we switch ti with ti).
  • Compute the rest of the values (l, r, t_hat, tau_x, mu) according to the protocol.
  • Finally, compute a new V using the same gamma but with a new v’ value. Specifically, set v’ = 3 + (t1 - t’1)(x/z2) + (t2 - t’2)(x2/z2) (You’ll see why this was chosen like this in a second). Here, 3 comes from setting v = 3 in step 1.

Now, recall all of the verification checks from above. The only values we computed differently than expected were T1, T2, and V. Since we computed the other values as expected, all of the checks will pass automatically, with the exception of check (65), which depends on T1, T2, and V. But because x, y, and z are computed independently of V, we can compute a malicious V value that depends on these values and will pass check (65). Let’s see how this works:

First, let’s simplify the left-hand side of check (65):

LHS = (gt_hat)(htau_x)
= (g[t0 + t1 * x + t2 * x2])(h[tau2 * x2 + tau1 * x + z2 * y])

Now, let’s simplify the right-hand side of check (65):

RHS = (Vz2)(gdelta(y,z))(T1x)(T2x2)
= (Vz2)(gdelta(y,z))[(gt’1)(htau1)]x[(gt’2)(htau2)]x2

Now, if you look at the v’ value that we picked for V, you’ll see that the T1 and T2 exponents in g will cancel out the exponents in g for V. So the right-hand side is simplified as follows:

= g[3z2 + t1 * x + t2 * x2 + delta(y,z)]h[gamma * z2 + tau1 * x + tau2 * x2]

We can see that the exponent in h is identical on both sides. All we have to do now is check that the exponents in g match.

In the g exponent on the left-hand side, we have t0 + t1 * x + t2 * x2.

In the g exponent on the right-hand side, we have 3z2 + delta(y,z) + t1 * x + t2 * x2.

And t0 is defined in the original paper to be:

t0 as defined in the Bulletproofs paper (source)

This is exactly right, meaning we’ve successfully forged a proof.

To be clear, this is a forgery because we just submitted this proof for the newly computed V value, where v was set to be v’ = 3 + (t1 - t’1)(x / z2) + (t2 - t’2)(x2 / z2). Since x and z are random Fiat-Shamir challenges, v’ will end up being a random value in the interval [0, group order). Since the group order is usually much larger than the interval in question (here, 232), v’ will be a value outside of the range with overwhelming probability, but the proof will still pass. If, for whatever reason, v’ is not outside of this range, a malicious actor could simply start the same process over with new random values (e.g., new gamma, new t’1, etc.) until the desired v’ value is obtained.

Frozen Heart’s Impact on Bulletproofs

Frozen Heart vulnerabilities are critical because they allow attackers to forge proofs, but their impact on the surrounding application depends on how the application uses the proof system. As we saw for the Schnorr and Girault proof systems in part 2 of this series, there may be contexts in which these vulnerabilities are not critical. However, this is unlikely to be the case for Bulletproofs.

In our example, we were able to produce a forgery for a random value in the group order. In most applications, the predefined range for the range proof is typically much smaller than the size of the group order. This means that, although the specific value cannot be chosen, an attacker could easily produce a proof for values outside of the desired range. In the majority of contexts we’ve seen Bulletproofs being used, this is severe.

Watch out for the final part of this series, in which we will explore the Frozen Heart vulnerability in an even more complex proof system: PlonK.

The Frozen Heart vulnerability in Girault’s proof of knowledge

By Jim Miller

In part 1 of this series, we disclosed critical vulnerabilities that break the soundness of multiple implementations of zero-knowledge proof systems. This class of vulnerability, which we dubbed Frozen Heart, is caused by insecure implementations of the Fiat-Shamir transformation that allow malicious users to forge proofs for random statements. The vulnerability is general and can be applied to any proof system that implements the Fiat-Shamir transformation insecurely. In this post, I will show how to exploit a Frozen Heart vulnerability in Girault’s proof of knowledge.

Zero-Knowledge Proofs and the Fiat-Shamir Transformation

This post assumes that you possess some familiarity with zero-knowledge proofs. If you would like to read more about them, there are several helpful blog posts and videos available to you, such as Matt Green’s primer. To learn more about the Fiat-Shamir transformation, check out the blog post I wrote explaining it in more detail. You can also check out ZKDocs for more information about both topics.

Girault’s Proof of Knowledge

In Girault’s proof of knowledge protocol, the prover proves that he knows the discrete log of a certain value over a composite modulus. In other words, a prover convinces a verifier that he knows some secret value, x, such that h = g-x mod N, where g is a high order generator and N is some composite modulus (e.g., N = p * q, where p and q are two primes). To learn more about the protocol, check out our description on ZKDocs. If you are familiar with Schnorr proofs, think of Girault’s proof of knowledge as Schnorr proofs over a composite modulus rather than a prime modulus.

The protocol is interactive by default, but it can be made non-interactive using the Fiat-Shamir transformation:

The interactive and non-interactive versions of Girault’s proof of knowledge from ZKDocs (source)

At a high level, the prover first computes the commitment, u, by generating a random value, r, and computing u = gr. The prover then obtains the random challenge value, e, (either from the verifier in the interactive version or from herself by using the Fiat-Shamir transformation in the non-interactive version) and calculates the final proof value, z = r + x * e. This protocol is proven to be secure as long as the discrete log can’t be computed over the group that is used and the Fiat-Shamir challenge is computed correctly.

The Frozen Heart Vulnerability in Girault’s Proof of Knowledge

But what happens if we don’t compute the Fiat-Shamir challenge correctly? Let’s look at an example. In ZenGo’s implementation of Girault’s proof of knowledge (before it was patched), e is computed by hashing g, N, and u, but not h (in this implementation, h is represented by In part 1 of this series, we introduced a rule of thumb for securely implementing the Fiat-Shamir transformation: the Fiat-Shamir hash computation must include all public values from the zero-knowledge proof statement and all public values computed in the proof (i.e., all random “commitment” values). Therefore, failure to include h in this computation introduces a Frozen Heart vulnerability, allowing malicious provers to forge proofs for random h values, even if they do not know its discrete log.

Let’s walk through how to exploit this Frozen Heart vulnerability. We first choose random values for both u and z. Since h is not included in the Fiat-Shamir implementation, we can now compute e = Hash(g,N,u). Next, we need to find an h value that will pass the verification check: u = gzhe mod N. We already know all of the values except for h: we generated u and z, we computed e, and g and N are both public. Therefore, we can solve for h:

u = gzhe mod N
he = ug-z mod N
einv = e-1 mod phi(N)
h = (ug-z)einv mod N

If we plug in this h value, we know the second verification check will pass because we chose h specifically to pass this check. The only other check performed is the check for the challenge value, e, but this check will pass as well because we’ve computed this value identically. Keep in mind that we simply picked a random u. This means that we don’t actually know the discrete log of u (i.e., it’s infeasible to find a value t such that gt = u mod N). Since we don’t know the discrete log of u, we don’t know the discrete log of h. However, we have tricked the verifier that this proof is valid, even without knowing this discrete log, which means we have successfully forged a proof.

Note: in order to compute e_inv, the malicious prover needs to be able to compute phi(N), which would require knowing the prime factors of N.

Frozen Heart’s Impact on Girault’s Proof of Knowledge

Frozen Heart vulnerabilities are critical in Girault’s proof of knowledge because they allow attackers to forge proofs. However, the impact of this vulnerability on an application using this proof system depends entirely on how the proof system is used. If Girault’s proof of knowledge is used simply for standalone public keys (i.e., keys that are not used as part of some larger protocol), then a Frozen Heart vulnerability may not be that severe.

The reason for this is that, for Girault’s scheme, a Frozen Heart vulnerability makes it possible to forge random h values. But that’s not any more powerful than generating a random x and computing h = g-x, which results in a random h that we can construct a proof for. However, if this proof system is used within a larger, more complex protocol—such as a threshold signature scheme, which requires that the proof be unforgeable—then a Frozen Heart vulnerability is likely very severe.

Even though Frozen Heart vulnerabilities might not be critical for Girault’s scheme (and Schnorr’s scheme, for the same reasons) in certain contexts, this is not true for more complex proof systems. To see this in more detail, check out the upcoming part 3 post of this series, in which we explore the Frozen Heart vulnerability on the Bulletproofs proof system.

Coordinated disclosure of vulnerabilities affecting Girault, Bulletproofs, and PlonK

By Jim Miller

Trail of Bits is publicly disclosing critical vulnerabilities that break the soundness of multiple implementations of zero-knowledge proof systems, including PlonK and Bulletproofs. These vulnerabilities are caused by insecure implementations of the Fiat-Shamir transformation that allow malicious users to forge proofs for random statements.

We’ve dubbed this class of vulnerabilities Frozen Heart. The word frozen is an acronym for FoRging Of ZEro kNowledge proofs, and the Fiat-Shamir transformation is at the heart of most proof systems: it’s vital for their practical use, and it’s generally located centrally in protocols. We hope that a catchy moniker will help raise awareness of these issues in the cryptography and wider technology communities.

This is a coordinated disclosure: we have notified those parties who we know were affected, and they remediated the issues prior to our publication. The following repositories were affected:

The vulnerabilities in one of these proof systems, Bulletproofs, stem from a mistake in the original academic paper, in which the authors recommend an insecure Fiat-Shamir generation. In addition to disclosing these issues to the above repositories, we’ve also reached out to the authors of Bulletproofs who have now fixed the mistake.

Vulnerabilities stemming from Fiat-Shamir implementation issues are not new (e.g., see here and here), and they certainly were not discovered by us. Unfortunately, in our experience, they are incredibly pervasive throughout the zero-knowledge ecosystem. In fact, we have reported several similar vulnerabilities to some of our previous clients. What’s surprising is that despite the pervasiveness of these issues, the cryptography and security communities at large do not appear to be aware of them.

In this blog post, I will first describe the Frozen Heart vulnerability at a high level. I will then discuss the reasons Frozen Heart vulnerabilities occur so commonly in practice (spoiler alert: bad documentation and guidance), steps that the community can take to prevent them, and Trail of Bits’ role in leading that charge. Lastly, I will provide the details of our coordinated disclosure.

This is part 1 of a series of posts on the Frozen Heart vulnerability. In parts 2, 3, and 4, I’ll describe how these vulnerabilities actually work in practice by covering their impact on Girault’s proof of knowledge, Bulletproofs, and PlonK, respectively. Make sure to watch out for these posts in the coming days!

Zero-Knowledge Proofs and the Fiat-Shamir Transformation

This post assumes that you possess some familiarity with zero-knowledge proofs. If you would like to read more about them, there are several helpful blog posts and videos available to you, such as Matt Green’s primer. To learn more about the Fiat-Shamir transformation, check out the blog post I wrote explaining it in more detail. You can also check out ZKDocs for more information about both topics.

The Frozen Heart Vulnerability

The Fiat-Shamir transformation is applied to proof systems with the following structure:

  1. The prover generates a random value: the commitment.
  2. The verifier responds with a random value: the challenge.
  3. The prover then uses the commitment, the challenge, and her secret data to generate the zero-knowledge proof.

Each proof system is accompanied by a security proof, which guarantees that it’s infeasible for an attacker to forge proofs as long as certain assumptions are met. For proof systems of this structure, one very important assumption is that the challenge value generated by the verifier is completely unpredictable and uncontrollable by the prover. At a high level, the reason is that the zero-knowledge proof generated by the prover is considered valid only if it satisfies a very specific mathematical relationship. The prover may satisfy this relationship in one of two ways:

  1. He actually has the necessary secret data to satisfy the relationship, and he generates a zero-knowledge proof the honest way.
  2. He does not have the necessary secret data, but he guesses random values and gets lucky.

For a secure proof system, it’s effectively impossible for malicious provers to achieve option 2 (i.e., they have only a one in 2128 probability of guessing right) as long as the random challenge is completely unpredictable. But if a malicious prover can predict this value in a certain way, it’s actually easy for him to commit a proof forgery by finding random values that will satisfy the necessary mathematical relationship. This is exactly how the Frozen Heart vulnerability works.

This vulnerability is generic; it can be applied to any proof system that insecurely implements the Fiat-Shamir transformation. However, the exact details of the vulnerability depend on the proof system in question, and the impact of the vulnerability depends on how the surrounding application uses the proof system. For examples on how the vulnerability impacts different systems, be sure to read my upcoming posts describing Frozen Heart vulnerabilities in Girault’s proof of knowledge, Bulletproofs, and PlonK. You can also check out my previous Fiat-Shamir blog post, in which I show how this works for the Schnorr proof system.

Preventing Frozen Heart Vulnerabilities

The Frozen Heart vulnerability can affect any proof system using the Fiat-Shamir transformation. To protect against these vulnerabilities, you need to follow this rule of thumb for computing Fiat-Shamir transformations: the Fiat-Shamir hash computation must include all public values from the zero-knowledge proof statement and all public values computed in the proof (i.e., all random “commitment” values).

Here, the zero-knowledge proof statement is a formal description of what the proof system is proving. As an example, let’s look at the Schnorr proof system. The (informal) proof statement for the Schnorr proof system is the following: the prover proves that she knows a secret value, x, such that h = gx mod q, where q is a prime number and g is a generator of a finite group. Here, h, g, and q are all public values, and x is a private value. Therefore, we need to include h, g, and q in our Fiat-Shamir calculation.

But we’re not done. In step 1 of the protocol, the prover generates a random value, r, and computes u = gr (here, u is “the commitment”). This u value is then sent to the verifier as part of the proof, so it is considered public as well. Therefore, u needs to be included in the Fiat-Shamir calculation. You can see that all of these values are included in the hash computation in ZKDocs.

A Frozen Heart vulnerability is introduced if some of these values (specifically, h or u) are missing. If these values are missing, a malicious prover can forge proofs for random h values for which she does not know the discrete log. Again, to see how this works, read my previous Fiat-Shamir blog post.

The details are crucial here. Even for the Schnorr proof system, which is arguably the most simple zero-knowledge proof protocol used in practice, it can be easy to make mistakes. Just imagine how easy it is to introduce mistakes in complex proof systems that use multiple Fiat-Shamir transformations, such as Bulletproofs and PlonK.

The Problem

Why is this type of vulnerability so widespread? It really comes down to a combination of ambiguous descriptions in academic papers and a general lack of guidance around these protocols.

Let’s look at PlonK as an example. If you inspect the details of the protocol, you will see that the authors do not explicitly explain how to compute the Fiat-Shamir transformation (i.e., hash these values). Instead, the authors instruct users to compute the value by hashing the “transcript.” They do describe what they mean by “transcript” in another location, but never explicitly. A few implementations of the proof system got this computation wrong simply because of the ambiguity around the term “transcript.”

Believe it or not, PlonK’s description is actually better than important descriptions in most papers. Sometimes the authors of a paper will actually specify an insecure implementation, as was the case for Bulletproofs (which we’ll see in an upcoming follow-up post). Furthermore, most academic papers present only the interactive version of the protocol. Then, after they’ve described the protocol, they mention in passing that it can be made non-interactive using the Fiat-Shamir transformation. They provide you with the original Fiat-Shamir publication from the 1980s, but they rarely say how this technique should actually be used!

Can you imagine if all cryptographic primitives had such documentation issues? We have an entire RFC dedicated to generating nonces deterministically for ECDSA, but it is still implemented incorrectly. Imagine if there were no standards or guidance for ECDSA, and developers could implement the algorithm only by reading its original paper. Imagine if this paper didn’t explicitly explain how to generate these nonces and instead pointed the reader to a technique from a paper written in the 1980s. That’s essentially the current state of most of these zero-knowledge proof systems.

To be clear, I’m not trying to condemn the authors of these papers. These protocols are incredibly impressive and already take a lot of work to build, and it’s not the job of these authors to write comprehensive implementation details for their protocols. But the problem is that there really is no strong guidance for these protocols, and so developers have to rely largely on academic papers. So, my point is not that we should blame the authors, but rather that we shouldn’t be surprised at the consequences.

The Solutions

The best way to address these issues is to produce better implementation guidance. Academic papers are not designed to be comprehensive guides for implementation. A developer, particularly a non-cryptographer, using only guidance from these papers to implement a complex protocol is likely to make an error and introduce these critical vulnerabilities. This is the exact reason we created ZKDocs. With ZKDocs, we aim to provide clearer implementation guidance, focusing particularly on areas of protocols that are commonly messed up, such as Fiat-Shamir transformations. If you’re creating your own zero-knowledge proof implementation, check out our Fiat-Shamir section of ZKDocs!

It’s also worth mentioning that these issues could be more or less eradicated if test vectors for these protocols were widely available. Implementations with incorrect Fiat-Shamir transformations would fail tests using test vectors from correct implementations. However, given the limited guidance for most of these proof systems, producing test vectors for all of them seems unlikely.

Lastly, investigating these Frozen Heart vulnerabilities was a good reminder of the value of code audits. My team and I reviewed a lot of public repositories, and we found a good number of implementations that performed these transformations correctly. Most of these implementations were built by groups of people who performed internal and, typically, external code audits.

Coordinated Disclosure

Prior to our disclosure, my teammates and I spent the last few months researching and reviewing implementations for as many proof systems as possible. Once our research was complete, we disclosed the vulnerabilities to ZenGo, ING Bank, SECBIT Labs, Adjoint Inc., Dusk Network, Iden3, ConsenSys, and all of their active forks on March 18, 2022. We also contacted the authors of the Bulletproofs paper on this date.

As of April 12, 2022, ZenGo, Dusk Network, Iden3, and ConsenSys have patched their implementations with the required fixes. ING Bank has deleted its vulnerable repository. The authors of the Bulletproofs paper have updated their section on the Fiat-Shamir transformation. We were not able to get in contact with SECBIT Labs or Adjoint Inc.

We would like to thank ZenGo, ING Bank, Dusk Network, Iden3, ConsenSys, and the Bulletproofs authors for working swiftly with us to address these issues.


I would like to thank each of my teammates for assisting me in reviewing public implementations and David Bernhard, Olivier Pereira, and Bogdan Warinschi for their work on this vulnerability. Lastly, a huge thanks goes to Dr. Sarang Noether for helping me understand these issues more deeply.

Towards Practical Security Optimizations for Binaries

By Michael D. Brown, Senior Security Engineer

To be thus is nothing, but to be safely thus. (Macbeth: 3.1)

It’s not enough that compilers generate efficient code, they must also generate safe code. Despite the extensive testing and correctness certification that goes into developing compilers and their optimization passes, they may inadvertently introduce information leaks into programs or eliminate security-critical operations the programmer wrote in source code. Let’s take a look at an example.

Figure 1 shows an example of CWE-733, a weakness in which a compiler optimization removes or modifies security-critical code. In this case, the source code written by the programmer sanitizes a variable that previously held a cryptographic key by setting it to zero. This is an important step! If the programmer doesn’t sanitize the variable, the key may be recoverable by an attacker later. However, when this code is compiled the sanitization operation is likely to be removed by a compiler optimization pass called dead store elimination.

This pass optimizes programs by eliminating what it thinks are unnecessary variable assignment operations. It assumes that values assigned to a variable are unnecessary if they are not used later in the program, which unfortunately includes our sanitization code.

Figure 1: Compiler Optimization Removal or Modification of Security-critical Code (CWE-733)

This example is one of several well-documented instances of a compiler optimization inadvertently introducing a security weakness into a program. Recently, my colleagues at Georgia Tech and I published an extensive study of how compiler design choices impact another security property of binaries: malicious code reusability. We discovered that compiler code generation and optimization behaviors generally do not consider malicious reusability. As a result, they produce binaries that are generally more reusable by an attacker than is necessary.

Building powerful code reuse exploits

Attackers use code reuse exploit techniques such as return- and jump-oriented programming (ROP and JOP) to evade malicious code injection defenses. Instead of injecting malicious code, attackers using these techniques reuse snippets of a vulnerable program’s executable code, called gadgets, to write their exploit payloads.

Gadgets consist of one or more binary instructions that perform a useful computational task followed by an indirect branch instruction (return, indirect jump, indirect call) that terminates the gadget. The final control-flow instruction is used to chain one or more gadgets together. Gadgets can be thought of as individual instructions that can be used to write an exploit program.

Since the gadgets are part of the vulnerable program, defenses that prevent injected code from being executed won’t stop the exploit. The downside from the attacker’s perspective is that they are potentially limited in their exploit programming by the gadgets that are available. Ultimately, the gadgets available to the attacker (and how useful they are) are a function of the compiler’s code generation and optimization behaviors, because it is the compiler that produces the program’s binary code.

A simple example of a ROP payload is depicted in Figure 2. The payload consists of a chain of gadget addresses (i.e. the exploit) with some necessary data interspersed (i.e. the exploit’s inputs). The attacker first exploits a memory corruption vulnerability like a stack-based buffer overflow (CWE-121) to place the payload on the stack in such a way that the return address on the stack is overwritten with the address of the first gadget in the chain. This redirects program execution to the first gadget in the chain.

Figure 2: Example ROP Gadget Chain

Summary of study findings

In our study, we analyzed over 1,000 variants of 20 different programs compiled with GCC and clang to determine how optimization behaviors affect the set of gadgets available in the output binaries. We used a static analysis tool, GSA, to measure changes in gadget set size, utility, and composability before and after applying optimization options. Starting at a high level, we first discovered that optimizations increased gadget set size, utility, and/or composability in approximately 85% of cases.

Diving deeper, we performed differential binary analysis on several program variants to identify the root causes of these effects. We identified several compiler behaviors that both directly and indirectly contribute to this problem. Two behaviors stood out as the most impactful: duplicating indirect branch instructions and code layout changes.

Duplicating indirect branch instructions

Chaining gadgets together to create exploit programs relies on the control-flow instructions at the end of each gadget. Since each gadget must end with one of these instructions, the more indirect branch transfers a program has the more likely it is to have a large number of unique and useful gadgets. Many compiler optimizations improve performance by selectively duplicating these instructions, resulting in increases to gadget set size and utility.

This behavior is most apparent with GCC‘s omit frame pointer optimization, which eliminates frame pointer setup and restore instructions at the beginning and end of functions that do not need them. In many cases, such as the one shown in Figure 3, eliminating the pointer restore instruction at the end of a function creates an opportunity to optimize further by duplicating the indirect control flow instruction (retn)at the end of the function. While this secondary optimization slightly reduces code size and execution time, it creates one or more copies of the retn instruction. In turn, this introduces several more gadgets into the program, including ones that may be useful to an attacker.

Figure 3: Duplication of retn instruction by GCC Omit Frame Pointer optimization

Binary layout changes

In general, optimization behaviors insert, remove, or alter instructions in such a way that changes the size of code blocks and functions. This causes changes in how blocks and functions are eventually laid out in binary format, which in turn requires changes to displacements used in control-flow instructions throughout the program.

In some cases, the new displacements contain the binary encoding of an indirect branch instruction, like the example shown in Figure 4. Here, a conditional jump instruction with a short 1-byte displacement in unoptimized code changes to equivalent conditional jump instruction with a near 4-byte displacement as a byproduct of a layout change caused by optimization. This new displacement encodes the retn (i.e., 0xC3) indirect branch instruction. Even though the displacement is not intended to be an instruction, it can be decoded as one during an exploit because x86_64 uses unaligned, variable-length instructions. The sequence of bytes preceding indirect branch instruction encoding can be decoded as a gadget if they happen to encode valid instructions (which is quite likely given the density of the x86_64 ISA). These gadgets are called “unintended” or “unaligned” gadgets.

Figure 4: Binary layout change resulting in introduced gadget terminating instruction encoding

What can we do to address this?

The various behaviors we discovered all have a common property: they are secondary to or independent of the desired optimization. This means we can undo gadget-introducing behaviors without completely sacrificing performance.

Ideally, compilers would patch their optimizations to remove these behaviors. Unfortunately, instruction duplication behaviors are common to many different optimization passes and gadget encodings introduced through displacement changes can’t be detected during optimization as binary layout happens much later.

Fortunately, binary recompilers such as Egalito are well-suited to address this problem. Egalito allows us to transform program binaries in a layout-agnostic manner regardless of the compiler used to generate the binary. This provides many advantages to the problem at hand. First, we can implement recompiler passes to undo negative behaviors once for Egalito instead of for each problematic optimization in each compiler. Additionally, we can undo negative behaviors in programs without access to source code or special compilers!

Practical binary security optimizations

We built an initial set of five binary optimization passes for Egalito that eliminate gadgets in binaries that are introduced carelessly by compilers:

  1. Return Merging: Merge all return instructions in a function to a single instance.
  2. Indirect Jump Merging: Merge all indirect jump instructions targeting the same register in a function to a single instance.
  3. Instruction barrier widening: Eliminate unintended special-purpose gadgets that span consecutive intended instructions.
  4. Offset/Displacement Sledding: Eliminate gadgets rooted in jump displacements.
  5. Function Reordering: Eliminate gadgets rooted in call offsets.

Next, we evaluated the impact of these optimization passes on gadget sets and performance by applying them to several of our study binaries. We found that our passes:

  1. Eliminated 31.8% of useful gadgets on average
  2. Reduced the overall utility of gadget sets in 78% of variants
  3. Eliminated one or more special purpose gadget types (e.g., syscall gadgets) in 75% of variants
  4. Had no effect on execution speed
  5. Increased code size by only 6.1 kB on average


Compiler code generation and optimization behaviors have a massive impact on a binary gadget set. But due to a lack of attention to latent security properties, these behaviors generally create binaries with gadget sets that are easier for attackers to reuse in exploits. There are many root causes for this, however, it is possible to mitigate and undo negative behaviors with simple code transformations that do not sacrifice performance.

While our initial research on this problem has yielded some promising results, it is not exhaustive in dealing with problematic compiler behaviors. Over the coming year, I will be working on additional transformations to address other issues such as problematic register allocations. Additionally, I will be examining how these optimizations may have secondary benefits, such as reducing the performance costs of employing other code reuse defenses such as Control-Flow Integrity (CFI).


This research was conducted with my co-authors at Georgia Tech and Georgia Tech Research Institute: Matthew Pruett, Robert Bigelow, Girish Mururu, and Santosh Pande.

Optimizing a smart contract fuzzer

By Sam Alws

During my winternship, I applied code analysis tools, such as GHC’s Haskell profiler, to improve the efficiency of the Echidna smart contract fuzzer. As a result, Echidna is now over six times faster!

Echidna overview

To use Echidna, users provide smart contracts and a list of conditions that should be satisfied no matter what happens (e.g., “a user can never have a negative number of coins”). Then, Echidna generates a large number of random transaction sequences, calls the contracts with these transaction sequences, and checks that the conditions are still satisfied after the contracts execute.

Echidna uses coverage-guided fuzzing; this means it not only uses randomization to generate transaction sequences, but it also considers how much of the contract code was reached by previous random sequences. Coverage allows bugs to be found more quickly since it favors sequences that go deeper into the program and that touch more of its code; however, many users have noted that Echidna runs far slower when coverage is on (over six times slower on my computer). My task for the internship was to track down the sources of slow execution time and to speed up Echidna’s runtime.

Optimizing Haskell programs

Optimizing Haskell programs is very different from optimizing imperative programs since the order of execution is often very different from the order in which the code is written. One issue that often occurs in Haskell is very high memory usage due to lazy evaluation: computations represented as “thunks” are stored to be evaluated later, so the heap keeps expanding until it runs out of space. Another simpler issue is that a slow function might be called repeatedly when it needs to be called only once and have its result saved for later (this is a general problem in programming, not specific to Haskell). I had to deal with both of these issues when debugging Echidna.

Haskell profiler

One feature of Haskell that I made extensive use of was profiling. Profiling lets the programmer see which functions are taking up the most memory and CPU time and look at a flame graph showing which functions call which other functions. Using the profiler is as simple as adding a flag at compile time (-prof) and another pair of flags at runtime (+RTS -p). Then, given the plaintext profile file that is generated (which is very useful in its own right), a flame graph can be made using this tool; here’s an example:

This flame graph shows how much computing time each function took up. Each bar represents a function, and its length represents how much time it took up. A bar stacked on another bar represents one function calling another. (The colors of the bars are picked at random, for aesthetic reasons and readability.)

The profiles generated from running Echidna on sample inputs showed mostly the usual expected functions: functions that run the smart contracts, functions that generate inputs, and so on. One that caught my eye, though, is a function called getBytecodeMetadata, which scans through contract bytecodes and looks for the section containing the contract’s metadata (name, source file, license, etc.). This function needed to be called only a few times at the start of the fuzzer, but it was taking up a large portion of the CPU and memory usage.

A memoization fix

Searching through the codebase, I found a problem slowing down the runtime: the getBytecodeMetadata function is called repeatedly on the same small set of contracts in every execution cycle. By storing the return value from getBytecodeMetadata and then looking it up later instead of recalculating, we could significantly improve the runtime of the codebase. This technique is called memoization.

Adding in the change and testing it on some example contracts, I found that the runtime went down to under 30% of its original time.

A state fix

Another issue I found was with Ethereum transactions that run for a long time (e.g., a for loop with a counter going up to one million). These transactions were not able to be computed because Echidna ran out of memory. The cause of this problem was Haskell’s lazy evaluation filling the heap with unevaluated thunks.

Luckily, the fix for this problem had already been found by someone else and suggested on GitHub. The fix had to do with Haskell’s State data type, which is used to make it more convenient (and less verbose) to write functions that pass around state variables. The fix essentially consisted of avoiding the use of the State data type in a certain function and passing state variables around manually instead. The fix hadn’t been added to the codebase because it produced different results than the current code, even though it was supposed to be a simple performance fix that didn’t affect the behavior. After dealing with this problem and cleaning up the code, I found that it not only fixed the memory issue, but it also improved Echidna’s speed. Testing the fix on example contracts, I found that the runtime typically went down to 50% of its original time.

For an explanation of why this fix worked, let’s look at a simpler example. Let’s say we have the following code that uses the State data type to make a simple change on the state for all numbers from 50 million down to 1:

import Control.Monad.State.Strict

-- if the state is even, divide it by 2 and add num, otherwise just add num
stateChange :: Int -> Int -> Int
stateChange num state
| even state = (state div 2) + num
| otherwise = state + num

stateExample :: Int -> State Int Int
stateExample 0 = get
stateExample n = modify (stateChange n) >> stateExample (n - 1)

main :: IO ()
main = print (execState (stateExample 50000000) 0)

This program runs fine, but it uses up a lot of memory. Let’s write the same piece of code without the State data type:

stateChange :: Int -> Int -> Int
stateChange num state
| even state = (state div 2) + num
| otherwise = state + num

stateExample’ :: Int -> Int -> Int
stateExample’ state 0 = state
stateExample’ state n = stateExample’ (stateChange n state) (n - 1)

main :: IO ()
main = print (stateExample’ 0 50000000)

This code uses far less memory than the original (46 KB versus 3 GB on my computer). This is because of Haskell compiler optimizations. I compiled with the -O2 flag ghc -O2 file.hs; ./file, or ghc -O2 -prof file.hs; ./file +RTS -s for memory allocation stats.

Unoptimized, the call chain for the second example should be stateExample’ 0 50000000 = stateExample’ (stateChange 50000000 0) 49999999 = stateExample’ (stateChange 49999999 $ stateChange 50000000 0) 49999998 = stateExample’ (stateChange 49999998 $ stateChange 49999999 $ stateChange 50000000 0) 49999997 = …. Note the ever-growing (... $ stateChange 49999999 $ stateChange 50000000 0) term, which expands to fill up more and more memory until it is finally forced to be evaluated once n reaches 0.

However, the Haskell compiler is clever. It realizes that the final state will eventually be needed anyway and makes that term strict, so it doesn’t end up taking up tons of memory. On the other hand, when Haskell compiles the first example, which uses the State data type, there are too many layers of abstraction in the way, and it isn’t able to realize that it can make the (... $ stateChange 50000000 0) term strict. By not using the State data type, we are making our code simpler to read for the Haskell compiler and thus making it easier to implement the necessary optimizations.

The same thing was happening in the Echidna memory issue that I helped to solve: minimizing the use of the State data type helped the Haskell compiler to realize that a term could be made strict, so it resulted in a massive decrease in memory usage and an increase in performance.

An alternate fix

Another way to fix the memory issue with our above example is to replace the line defining stateExample n with the following:

stateExample n = do
s <- get
put $! stateChange n s
stateExample (n-1)

Note the use of $! in the third line. This forces the evaluation of the new state to be strict, eliminating the need for optimizations to make it strict for us.
While this also fixes the problem in our simple example, things get more complicated with Haskell’s Lens library, so we chose not to use put $! in Echidna; instead we chose to eliminate the use of State.


The performance improvements that we introduced are already available in the 2.0.0 release. While we’ve achieved great success this round, this does not mean our work on Echidna’s fuzzing speed is done. Haskell is a language compiled to native code and it can be very fast with enough profiling effort. We’ll continue to benchmark Echidna and keep an eye on slow examples. If you believe you’ve encountered one, please open an issue.

I thoroughly enjoyed working on Echidna’s codebase for my winternship. I learned a lot about Haskell, Solidity, and Echidna, and I gained experience in dealing with performance issues and working with relatively large codebases. I’d specifically like to thank Artur Cygan for setting aside the time to give valuable feedback and advice.