The Tao of Continuous Integration

By Paul Kehrer

It is a truism in modern software development that a robust continuous integration (CI) system is necessary. But many projects suffer from CI that feels brittle, frustrates developers, and actively impedes development velocity. Why is this? What can you do to avoid the common CI pitfalls?

Continuous Integration Needs a Purpose

CI is supposed to provide additional assurance that a project’s code is correct. However, the tests a developer writes to verify the expected functionality are at their least useful when they are initially written. This is perhaps counterintuitive because a developer’s greatest familiarity comes when they initially write the code. They’ve thought about it from numerous angles and considered many of the possible edge cases and have implemented something that works pretty well!

Unfortunately, writing code is the easiest part of programming. The real challenge is building code that others can read so your project can thrive for many years. Software entropy increases over time. Developers—especially ones not familiar with large long-term codebases—can’t anticipate how their code may be integrated, refactored, and repurposed to accommodate needs beyond those that weren’t originally considered.

When these sorts of refactors and expansions occur, tests are the only way changes can be made confidently. So why do developers end up with systems that lack quality testing?

Trivial Testing

When writing tests, especially for high code-coverage metrics, the most common complaint is that some tests are trivial and exercise nothing interesting or error-prone in the codebase. These complaints are valid when thinking about the code as it exists today, but now consider that the software could be repurposed from its original intention. What once was trivial might now be subtle. Failing to test trivial cases may lead your work into a labyrinth of hidden traps rooted in unobservable behavior.
Remember these three things:

  1. No test is trivial in the long run.
  2. Tests are documentation of expected behavior.
  3. Untested code is subject to incidental behavioral change.

Unreliable CI

Unreliable CI is poison for developers. For internal projects, it saps productivity and makes people hate working on it. And for open-source projects, it drives away contributors faster than they can arrive.

Find what’s causing your tests to be unreliable and fix it. Unreliable CI commonly manifests as flaky tests, and tools exist to mark tests as flaky until you can find the root cause. This will allow immediate improvement in your CI without crippling the team.

Slow CI

You may find yourself with an excessively long CI cycle time. This is problematic because a quality development process requires that all CI jobs pass. If the cycle time is too long and complex so that it’s impractical to run it locally, then developers will create workarounds. These workarounds may take many forms, but it’s most common to see PR sizes balloon when no one wants to put in a 2-line PR, wait an hour for it to merge, and then rebase their 300-line PR. On top of it when they can just make a few unrelated changes in a single PR. This causes problems for code reviewers and lowers the quality of the project.

Developers aren’t wrong to do this, and CI has failed them. When building CI systems, it’s important to keep a latency budget in mind that goes something like, “CI should never be slower than time, t, where t is chosen a priori.” If CI becomes slower than that, then an effort is spent to improve it, even if it encroaches on the development of new features.

Coverage is difficult

Part of responsible testing is knowing which lines of code your tests are exercising—a nice, simple number that tells you everything. So why is coverage so commonly ignored?

First, the technical challenge. Modern software runs against many disparate targets. To be useful, CI systems should run against numerous targets that submit data to a hosted system that can combine coverage. (The frustration of tools like this failing and how to maintain development velocity despite all software being hot garbage is another discussion.) Absent this, service software developers often fail to notice missed coverage as it becomes lost in the noise of “expected” missed lines.

Now let’s talk about the social challenges. Software is typically written in a way that makes it difficult to test small pieces of functionality. This issue gave rise to the test-driven development (TDD) trend, where tests are written first to help developers factor their code in a testable manner. This is generally a net win in readability and testability but requires discipline and a different approach to development that doesn’t come naturally to many people.

The perceived drudgery in making more of a codebase testable causes complaints that coverage is an imperfect metric. After all, not all code branches are created equal, and depending on your language, some code paths should never be exercised. These are not good reasons to dismiss coverage as a valuable metric, but on specific occasions, there may exist a compelling reason to not spend the effort to cover something with tests. However, be aware that by failing to cover a certain piece of code with tests, its behavior is no longer part of the contract future developers will uphold during refactoring.

What do we do?

So how do we get to CI nirvana given all these obstacles? Incrementally. An existing project is a valuable asset, and we want to preserve what we have while increasing our ability to improve it in the future. (Rewrites are almost universally a bad idea.) This necessitates a graduated approach that, while specifically customized to a given project, has a broad recipe:

    1. Make CI reliable
    2. Speed up CI
    3. Improve test quality
    4. Improve coverage

We should all spend time investing in the longevity of our projects. This sort of foundational effort pays rapid dividends and ensures that your software projects can be world-class.

Serving up zero-knowledge proofs

By Jim Miller, Senior Cryptography Analyst

Zero-knowledge (ZK) proofs are gaining popularity, and exciting new applications for this technology are emerging, particularly in the blockchain space. So we’d like to shine a spotlight on an interesting source of implementation bugs that we’ve seen—the Fiat Shamir transformation.

A ZK proof can be either interactive, where the prover and verifier communicate via challenges in a multi-step process, or non-interactive, where a prover computes a proof once and sends it to the verifier. The non-interactive ZK proof is preferred over the multi-step interactive process, but most ZK schemes are interactive by default.

Enter the Fiat-Shamir transformation. It transforms interactive ZK proofs into non-interactive ones. Easier said than done. This can be a tricky implementation and has led to several bugs, including one discovered in a Swiss voting system.

Here, we will use a tennis analogy to walk you through what this transformation is, and then we’ll show some examples of how it goes wrong in practice.

ZK Crash Course

Zero-knowledge proofs allow you (the prover) to prove to another party (the verifier) that you know some secret value without having to reveal the value itself. This concept can be applied in a variety of ways. For example, you give the verifier some value y, and you can prove that you possess an input value x such that y = sha256(x), without having to reveal x.

Or, as we’ve seen in the blockchain space, you want to spend some money without revealing how much you are spending, and ZK proofs can prove to the verifier that you possess enough money to spend without revealing how much you have or spent.

Now let’s talk about what security looks like for ZK proofs. They have a security notion known as soundness, which has a formal and well-defined technical definition. Simply put, a ZK proof is sound if a malicious prover can’t create fake proofs that appear to be valid. In other words, the verifier can’t be tricked with a bad proof.

Tennis analogy

Let’s visualize ZK proofs and the Fiat-Shamir transformation with a tennis analogy. The prover and verifier are tennis players standing opposite sides of the net. The prover is trying to prove to the verifier that they are good at tennis. (“Good” is used here in a very basic sense and specific to this example.)

(1) The prover hits the ball to the verifier, and (2) the verifier sends a random return back to the prover—random speed, location, amount of spin, etc.—and (3) the prover must return the ball to a targeted spot on the court. If the prover hits the target, the verifier classifies them as a good tennis player. If they do not hit the mark, they are classified as a bad tennis player.

In ZK terms, being classified as “good at tennis” corresponds to the verifier accepting the ZK proof as valid. Being classified as “bad at tennis” corresponds to the verifier rejecting the proof as invalid.

Now assume that this tennis game is a sound ZK scheme. Recalling our definition earlier, this means that a bad tennis player cannot convince the verifier they are good. If the prover is actually bad, they are unable to hit the tennis ball into the target. However, and this is important, this game is sound only if the verifier sends the tennis ball back in a truly random way.

Imagine the prover realizes that the verifier sends the tennis ball to the same spot and same location every time. Maybe the verifier is bad at tennis but only practiced hitting the ball from that exact location every time.

Even though they cannot hit the target from any location, they can hit the target from that exact spot and trick the verifier. Therefore, bad randomness can break the soundness of this scheme as well as real ZK schemes.

Back to ZK

This tennis analogy is a common challenge/response protocol seen in various areas of cryptography. In the ZK space, these are sigma protocols having three steps:

  1. The prover first sends an initial message, the commitment message. (The prover first hits the ball to the verifier in our tennis analogy.)
  2. The verifier replies with a truly random message, the challenge message. (The verifier returns the tennis ball to a random spot with random speed and spin.)
  3. The prover sends a message that directly depends on the challenge message, the “response” message, and the verifier then accepts this as valid or invalid. (The prover moves to hit the return of the verifier and attempts to hit the target.)

It turns out that many ZK schemes take the form of one of these sigma protocols, which unfortunately makes them interactive. Luckily, the Fiat-Shamir transformation can transform these sigma protocols into non-interactive protocols.

First, I’ll explain the transformation using our tennis example. Sometimes players practice by hitting the tennis ball off of a wall to simulate another player’s returns. This is like the Fiat-Shamir transformation. Instead of relying on the verifier sending a return, we replace the verifier with a wall that returns the ball for us. And for this scheme to be sound, this return has to be completely random. Therefore (stretch your imagination here), we need a special wall that returns the ball with a completely random ball speed, placement, and spin. Then, the prover sends the ball back as before, and if they hit the target, they are a good player.

So how do we find such a wall for ZK proofs in practice? We use a hash function. Instead of sending their initial message to the verifier and waiting for a random response, the prover inputs their message into a hash function and uses the output as its challenge message. Using this challenge, the prover executes step 3 and sends the verifier’s response message as its proof. The verifier either accepts or rejects this in a non-interactive process.

Security of Fiat-Shamir implementations

Is this secure? Yes, mostly. First of all, we have to assume that our hash function returns completely random outputs. This is known as modeling the hash function as a random oracle and is somewhat of a controversial topic, but it has been used in various applications.

The other crucial and more subtle aspect is determining what input the hash function should receive. Using our tennis example, let’s say we’re using a wall to return the tennis ball, but for some reason, the wall’s internal random generation is only dependent on the speed and spin, but not on the location. This means that the prover will get the same challenge (i.e., the ball will be returned to the same spot on the court) from different locations, as long as the speed and spin are identical. These challenges are no longer truly random, allowing the prover to break the scheme’s soundness.

In the theoretical world, this is referred to as the weak and strong Fiat-Shamir transformations. Essentially, a small difference in inputs into the hash function can have very subtle but severe consequences to the protocol’s security. The theoretical details are a bit out of scope for this blog post, but if you are curious, you can read more about them here.

Going wrong in practice

Let’s look at a very simple sigma protocol example—the Schnorr protocol. In the Schnorr protocol, the prover proves to the verifier that they know the value X such that Y = gX, for some group generator g. For those of you familiar with cryptography, you’ll know this is the common private/public key structure for things like ECC, El Gamal, etc. Therefore, this allows the prover to prove that they hold the secret key, X, corresponding to their public key, Y. The protocol works as follows:

  • The prover generates a random value A, and sends B = gA to the verifier
  • The verifier replies with a random challenge value C
  • The prover sends Z = A + CX to the verifier as its proof
  • The verifier verifies the proof by checking that gZ = BYC

After doing some arithmetic, you can see that the verification check works, and this protocol is secure because the discrete log problem is assumed to be hard.

Since this is a sigma protocol, we can make this non-interactive with our new favorite transformation, Fiat-Shamir:

  • The prover generates a random value A, computes B = gA
  • The prover generates C randomly with a hash function
  • The prover sends Z = A + CX to the verifier as its proof
  • The verifier verifies the proof by checking that gZ = BYC

But how exactly should the prover generate this random value, C? In the interactive protocol, the prover sends B to the verifier, so it might make sense to compute C = Hash(B). This is actually what theorists define to be the weak Fiat-Shamir transformation. As you might suspect, this has some subtle consequences. In particular, by using this computation, it’s actually possible for an adversary to provide a valid proof for some public key even if they don’t know the secret key:

  • Let PK’ be some public key that we don’t know the secret key for
  • Set B = PK’, C = Hash(B)
  • Pick a random Z
  • Set PK = (gZ / PK’)1/C

After some more arithmetic, you can see that Z will verify as a valid proof for PK. But since we don’t know the secret key for PK’, we don’t know the secret key for PK either! This is a forged proof. Depending on the exact application, this could be very problematic because it allows you to create fake proofs for keys related to another party’s public key, PK’.

This problem is avoided by adjusting how you compute C. By setting C = Hash(B, PK) or C = Hash(B, PK, g) will avoid these issues entirely. These are defined to be the strong Fiat-Shamir transformation.

A very similar issue was discovered in a Swiss voting system. In this system, one of the design goals is to allow for verifiable election results. To do this, a ZK scheme produces a decryption proof, which proves that an encrypted vote decrypts to the correct result. However, when implementing this scheme, only part of the prover’s input was given to the hash function (i.e., the strong Fiat-Shamir transformation was not used), and the prover was able to create false proofs. In this case, this meant that valid votes could be altered in such a way to make them not be counted, which has obvious implications for an election using such a scheme.


In summary, the Fiat-Shamir transformation converts an interactive ZK proof into a non-interactive one by computing a hash function on the prover’s inputs. In practice, this can lead to very subtle but dangerous bugs. When in doubt, use the strong Fiat-Shamir transformation and make sure every piece of the prover’s input is placed inside of the hash function!

Confessions of a smart contract paper reviewer

If you’re thinking of writing a paper describing an exciting novel approach to smart contract analysis and want to know what reviewers will be looking for, you’ve come to the right place. Deadlines for many big conferences (ISSTA tool papers, ASE, FSE, etc.) are approaching, as is our own Workshop on Smart Contract Analysis, so we’d like to share a few pro tips. Even if you’re not writing a smart contract paper, this post can also help you identify research that’s worth reading and understand its impact.

I’ve been reviewing smart contract analysis papers for a few years now—over 25 papers in seven different venues in the last year—and I’ve taken away six requirements for a good paper. I’d also like to share a little about extra points that take a paper from “good enough” to “great.”

  1. Explain what the analysis method actually does! Some authors fail to describe how a proposed analysis algorithm works, perhaps due to page limits. It’s not essential to describe every painstaking detail of an algorithm or implementation, but a good paper does more than describe a method with high-level buzzwords. “We combine symbolic execution with swarm testing” is a great sentence for a paper’s abstract, but this level of granularity cannot be sustained throughout the paper. Reviewers see this as hand-wavy. Provide details for your audience to understand what you actually are proposing and evaluate it. For a tool paper—a short paper that basically advertises a particular tool exists—a generic explanation is sometimes fine. Still, such uninformative descriptions appear surprisingly often in full conference submissions, which are supposed to allow readers to understand and even duplicate interesting new approaches.
  2. Understand the basics of blockchain and smart contracts. Too many papers are rejected for making obvious mistakes about how a contract or the blockchain works. This kind of problem is often foreshadowed by an introduction that includes boilerplate text describing blockchains and smart contracts. Smart contract and blockchain analyses are, in some ways, pure instances of core code analysis and test generation problems. However, if you’re still in the early stages of researching this topic, a minimum amount of homework is required before you can produce credible results. We recommend going through the Ethernaut CTF exercises to understand some basics of contract exploitation and then reading our Building Secure Contracts tutorials, including experiments with real tools used in paid audits. Blockchains and smart contracts are fast-moving targets, and many early papers concentrated on handling ether. However, much of the modern contracts’ financial value is in ERC-20 or other recently developed token types. If you’re only looking at ether-related exploits, you’re not addressing much of what’s going on now.
  3. Base experimental results on meaningful sets of contracts. Out of all the contracts ever created on the Ethereum blockchain, only a small fraction accounts for almost all transactions and ether/token activity. Most Etherscan contracts have little practical use and are toys deployed by programmers learning Solidity. If your experiments are based on randomly selected contracts from Etherscan, your results will not reflect contracts of interest, and many are likely to be near-duplicates. A random sampling of contracts is a red flag for reviewers because the data set is noisy and may fail to include any contracts anyone actually cares about. Instead, base your experiments on active contracts that have participated in transactions, held ether or token value, or satisfying other criteria demonstrating that they’re meaningful. It also shows good judgment to include some diversity in the contract set and demonstrate that you aren’t, say, basing your results on 30 basic ERC-20 tokens with nearly identical implementations. Moreover, the fact that state-of-the-art Ethereum moves fast applies here. These days a lot of the action is not in single contracts but in multi-contract systems, where analysis based on those contracts’ composition is necessary to explore meaningful behavior. The same guidance goes for demonstrating a method for finding vulnerabilities. Finding meaningless vulnerabilities in contracts that hold no ether or token value and never participate in transactions isn’t compelling. On the other hand, there are real vulnerabilities in real contracts that participate in numerous transactions. Find those, and you have a good demonstration of your ideas! Google BigQuery is one way to get started on this since it can be difficult to extract from the blockchain.
  4. More generally, respect the rules of (fuzzing) research. Our post on performing good fuzzing research mostly applies to fuzzing smart contracts, too. Smart contract fuzzers may not be expected to run for 24 hours, but it’s certainly essential to run tools “long enough.” You need statistics-based evidence, not a string of anecdotes. If your contract fuzzer performed better, did it do so by a statistically significant margin? What’s the estimated effect size, and how confident can we be in that estimate’s quality? Other points of good fuzzing experimental practice are just common sense but easily overlooked: for example, if you don’t use a consistent version of the Solidity compiler in your experiments or fail to report what version you used, reproducing (and understanding) your results will be complicated. Two particular aspects of these general guidelines are essential for smart contract fuzzing papers, so we’ll separate those out.
  5. Compare against a meaningful tool baseline. You need to compare your work with widely accepted concepts and tools. Pit your tool against real competition, which may require more effort in smart contract work. People are always releasing new tools and updating old ones, and some older tools no longer work. By the time your paper is reviewed, the cutting edge may have moved. Still, it must be obvious that you went through the trouble of selecting a reasonable set of comparable tools and comparing your work against state of the art when you wrote the paper.
  6. Draw clear boundaries explaining what your tool does and does not detect. Smart contract fuzzers report a variety of bugs, but there are no Solidity “crashes.” So tools have to look for something, whether it be an integer overflow, reentrancy, locked funds, or runaway gas usage indicating a possible denial-of-service vulnerability. Ultimately, this means that tools may excel in some areas and lag behind in others. One fuzzer may use an aggressive set of oracles, which can lead to false positives, while another may report a particular set of bugs lowering its false-positive error. Comparing apples to apples can be hard in this context, but you must show that your method finds meaningful bugs. One way to do this in fuzzing is to compare code coverage results between your tool and others.

We hope this advice helps strengthen your approach to publishing your smart contract research. In summary, you can almost guarantee that if I review your paper, and I can’t figure out what your method even is, I’d reject your paper. If you clearly didn’t do any homework on smart contracts beyond reading the Wikipedia page on Ethereum, I’d reject your paper. If you based your experiments on 50 random contracts on the blockchain that have received 10 transactions after deployment, hold a total of $0.05 worth of Ether, and are mostly duplicates of each other, I’d reject your paper. If you don’t understand the basic rules of fuzzing research, if you only compare to one outdated academic research tool, and ignore five popular open-source tools, if you claim your approach is better simply because you have a tendency to produce more false positives based on a very generous notion of “bug”… well, you can guess!

The good news is, doing all of the things this post suggests is not just part of satisfying reviewers. It’s part of satisfying yourself and your future readers (and potential tool users), and it’s essential to building a better world for smart contract developers.

Finally, to take a paper from “good” to “great,” tell me something about how you came up with the core idea of the paper and what the larger implications of the idea working might be. That is, there’s some reason, other than sheer luck, why this approach is better at finding bugs in smart contracts. What does the method’s success tell us about the nature of smart contracts or the larger problem of generating tests that aren’t just a sequence of bytes or input values but a structured sequence of function calls? How can I improve my understanding of smart contracts or testing, in general, based on your work? I look forward to reading about your research. We’d love to see your smart contract analysis papers at this year’s edition of WoSCA to be co-located with ISSTA in July!

PDF is Broken: a justCTF Challenge

Trail of Bits sponsored the recent justCTF competition, and our engineers helped craft several of the challenges, including D0cker, Go-fs, Pinata, Oracles, and 25519. In this post we’re going to cover another of our challenges, titled PDF is broken, and so is this file. It demonstrates some of the PDF file format’s idiosyncrasies in a bit of an unusual steganographic puzzle. CTF challenges that amount to finding a steganographic needle in a haystack are rarely enlightening, let alone enjoyable. LiveOverflow recently had an excellent video on file format tricks and concludes with a similar sentiment. Therefore, we designed this challenge to teach justCTF participants some PDF tricks and how Trail of Bits’ open source tools can make easy work of these forensic challenges.

In which a PDF is a webserver, serving copies of itself

The PDF file in the challenge is in fact broken, but most PDF viewers will usually just render it as a blank page with no complaints. The file command reports the challenge as just being “data.” Opening the file in a hex editor, we see that it looks like a Ruby script:

require 'json'
require 'cgi'
require 'socket'
% `file` sometimes lies
% and `readelf -p .note` might be useful later

The PDF header on line 5 is embedded within a Ruby multi-line comment that begins on line 4, but that’s not the part that’s broken! Almost all PDF viewers will ignore everything before the %PDF-1.5 header. Lines 7 and 8 are PDF comments affirming what we saw from the file command, as well as a readelf hint that we’ll get to later.

The remainder of the Ruby script is embedded within a PDF object stream—the “9999 0 obj” line—, which can contain arbitrary data ignored by PDF. But what of the remainder of the PDF? How does that not affect the Ruby script?

9999 0 obj
<< /Length 1680 >>^Fstream
port = 8080
if ARGV.length > 0 then
  port = ARGV[0].to_i
end'UTF-8', 'binary', :invalid => :replace, :undef => :replace).split(/<\/html>/)[0]+"\n"'',port)
print "Server running at http://localhost:#{port}/\nTo listen on a different port, re-run with the desired port as a command-line argument.\n\n"

Ruby has a feature where the lexer will halt on the __END__ keyword and effectively ignore everything thereafter. Sure enough, this curious PDF has such a symbol, followed by the end of the encapsulating PDF object stream and the remainder of the PDF.

This is a Ruby/PDF polyglot, and you can turn any PDF into such a polyglot using a similar method. If your script is short enough, you don’t even need to embed it in a PDF stream object. You can just prepend all of it before the %PDF-1.5 header. Although some PDF parsers will complain if the header is not found within the first 1024 bytes of the file.

You didn’t think it would be that easy, did you?

So let’s be brave and try running the PDF as if it were a Ruby script. Sure enough, it runs a webserver that serves a webpage with a download link for “” Wow, that was easy, right? Inspect the Ruby script further and you’ll see that the download is the PDF file itself renamed as a .zip. Yes, in addition to being a Ruby script, this PDF is also a valid ZIP file. PoC||GTFO has used this trick for years, which can also be observed by running binwalk -e on the challenge PDF.

Unzipping the PDF produces two files: a MμPDF mutool binary and, the latter suggesting the player run the broken PDF through the mutool binary.

Clearly, this version of mutool was modified to render the broken PDF properly, despite whatever is “broken” about it. Is the CTF player supposed to reverse engineer the binary to figure out what was modified? If someone tried, or if they tried the readelf clue embedded as a PDF comment above, they might notice this:

The first thing you should do is: Open the PDF in a hex editor. You’ll probably need to “fix” the PDF so it can be parsed by a vanilla PDF reader. You could reverse this binary to figure out how to do that, but it’s probably easier to use it to render the PDF, follow the clues, and compare the raw PDF objects to those of a “regular” PDF. You might just be able to repair it with `bbe`!

The Binary Block Editor (bbe) is a sed-like utility for editing binary sequences. This implies that whatever is causing the PDF to render as a blank page can easily be fixed with a binary regex.

Deeper Down the Hole

When we use the modified version of mutool to render the PDF, it results in this ostensibly meaningless memetic montage:

The rendered PDF using the modified version of mutool

The “broken” challenge PDF rendered using the modified version of mutool.

Searching Google for the LMGTFY string will take you to Didier Stevens’ excellent article describing the PDF stream format in detail, including how PDF objects are numbered and versioned. One important factor is that two PDF objects can have the same number but different versions.

The first hint on the page identifies PDF object 1337, so that is probably important. The figures in Stevens’ article alone, juxtaposed to a hexdump of the broken PDF’s stream objects, provide a clear depiction of what was changed.

Annotated diagram of the parts of a PDF stream object

Didier Stevens’ annotated diagram of a PDF stream object.

5 0 obj
<< /Length 100 >>^Fstream

A PDF stream object in the challenge PDF.

As the hints suggest, the PDF specification only allows for six whitespace characters: \0, \t, \n, \f, \r, and space. The version of mutool in the ZIP was modified to also allow ACK (0x06) to be used as a seventh whitespace character! Sure enough, on the twelfth line of the file we see:


That “^F” is an ACK character, where the PDF specification says there should be whitespace! All of the PDF object streams are similarly broken. This can be fixed with:

bbe -e "s/\x06stream\n/\nstream\n/" -o challenge_fixed.pdf challenge.pdf

Solving the Puzzle

Is fixing the file strictly necessary to solve the challenge? No, the flag may be found in PDF object 0x1337 using a hex editor

4919 0 obj
<< /Length 100 /Filter /FlateDecode >>^Fstream
x<9c>^MËA^N@0^PFá}OñëÆÊÊ        <88>X;^Ba<9a>N<8c>N£#áöº~ßs<99>s^ONÅ6^Qd<95>/°<90>^[¤(öHû       }^L^V   k×E»d<85>fcM<8d>^[køôië<97><88>^N<98> ^G~}Õ\°L3^BßÅ^Z÷^CÛ<85>!Û
4919 1 obj
<< /Length 89827 /Filter [/FlateDecode /ASCIIHexDecode /DCTDecode] >>^Fstream

and manually decoding the stream contents. Binwalk will even automatically decode the first stream because it can decode the Flate compression. That contains:

pip3 install polyfile
Also check out the `--html` option!
But you’ll need to “fix” this PDF first!

Binwalk doesn’t automatically expand the second stream because it’s also encoded with the ASCIIHex and DCT PDF filters. A casual observer who had not followed all of the clues and wasn’t yet familiar with the PDF specification might not even realize that the second version of the PDF stream object 0x1337 even existed! And that’s the one with the flag. Sure, it’s possible to have combed through the dozens of files extracted by binwalk to manually decode the flag, or even directly from the stream content in a hex editor, with a quick implementation of PDF’s decoders. But why do that when Polyfile can do it for you?

polyfile challenge_fixed.pdf -html challenge_fixed.html

PolyFile’s HTML output for the challenge PDF.

Oh, hey, that’s a hierarchical representation of the PDF objects, with an interactive hex viewer! How about we go to object 0x1337’s stream?

We immediately see PDF object 0x1337.

PolyFile can automatically decode the objects.

And finally, let’s look at the second version of object 0x1337, containing the multi-encoded flag:

PolyFile automatically decodes the layered PDF filters to produce the flag.

The flag!


PDF is a very … flexible file format. Just because a PDF looks broken, it doesn’t mean it is. And just because a PDF is broken, it doesn’t mean PDF viewers will tell you it is. PDF is at its core a container format that lets you encode arbitrary binary blobs that don’t even have to contribute to the document’s rendering. And those blobs can be stacked with an arbitrary number of encodings, some of which are bespoke features of PDF. If this is interesting to you, check out our talk on The Treachery of Files, as well as our tools for taming them, such as Polyfile and PolyTracker.

Breaking Aave Upgradeability

On December 3rd, Aave deployed version 2 of their codebase. While we were not hired to look at the code, we briefly reviewed it the following day. We quickly discovered a vulnerability that affected versions 1 and 2 of the live contracts and reported the issue. Within an hour of sending our analysis to Aave, their team mitigated the vulnerability in the deployed contracts. If exploited, the issue would have broken Aave, and impacted funds in external DeFi contracts.

Five different security firms reviewed the Aave codebase, including some that used formal verification; however, this bug went unnoticed. This post describes the issue, how the bug escaped detection, and other lessons learned. We are also open-sourcing a new Slither detector that identifies this vulnerability to increase security in the larger Ethereum community.

The vulnerability

Aave uses the delegatecall proxy pattern that we have thoroughly discussed in past writeups on this blog. At a high-level, each component is split into two contracts: 1) A logic contract that holds the implementation and 2) a proxy that contains the data and uses delegatecall to interact with the logic contract. Users interact with the proxy while the code is executed on the logic contract. Here’s a simplified representation of the delegatecall proxy pattern:

In Aave, LendingPool (LendingPool.sol) is an upgradeable component that uses a delegatecall proxy.

The vulnerability we discovered relies on two features in these contracts:

  • Functions on the logic contract can be called directly, including initialization functions
  • The lending pool has its own delegatecall capabilities

Initializing upgradeable contracts

A limitation of this upgradeability pattern is that the proxy cannot rely on the logic contract’s constructor for initialization. Therefore, state variables and initial setup must be performed in public initialization functions that do not benefit from the constructor safeguards.

In LendingPool, the initialize function sets the provider address (_addressesProvider):

function initialize(ILendingPoolAddressesProvider provider) public initializer {
_addressesProvider = provider;


The initializer modifier prevents initialize from being called multiple times. It requires that the following condition is true:

      initializing || isConstructor() || revision > lastInitializedRevision,
      'Contract instance has already been initialized'



  1. initializing allows multiple calls to the modifier within the same transaction (hence multiple initialize functions)
  2. isConstructor() is what the proxy needs to execute the code
  3. revision > lastInitializedRevision allows calling the initialization functions again when the contract is upgraded

While this works as expected through the proxy, 3 also allows anyone to call initialize directly on the logic contract itself. Once the logic contract is deployed:

The bug: Anyone can set _addressesProvider on the LendingPool logic contract.

Arbitrary delegatecall

LendingPool.liquidationCall delegatecalls directly to the addresses returned by _addressProvider:

   address collateralManager = _addressesProvider.getLendingPoolCollateralManager();

    (bool success, bytes memory result) =


This lets anyone initiate the LendingPool logic contract, set a controlled addresses provider, and execute arbitrary code, including selfdestruct.

The exploit scenario: Anyone can destruct the lending pool logic contract. Here’s a simplified visual representation:

Lack-of-existence check

By itself, this issue is already severe since anyone can destruct the logic contract and prevent the proxy from executing the lending pool code (pour one out for Parity).

However, the severity of this issue is amplified by the use of OpenZeppelin for the proxy contract. Our 2018 blogpost highlighted that a delegatecall to a contract without code would return success without executing any code. Despite our initial warning, OpenZeppelin did not fix the fallback function in their proxy contract:

   function _delegate(address implementation) internal {
        assembly {
            // Copy We take full control of memory in this inline assembly
            // block because it will not return to Solidity code. We overwrite the
            // Solidity scratch pad at memory position 0.
            calldatacopy(0, 0, calldatasize)

            // Call the implementation.
            // out and outsize are 0 because we don't know the size yet.
            let result := delegatecall(gas, implementation, 0, calldatasize, 0, 0)


If the proxy delegatecalls to a destroyed lending pool logic contract, the proxy will return success, while no code was executed.

This exploit would not be persistent since Aave can update the proxy to point to another logic contract. But in the timeframe where the issue could be exploited, any third-party contracts calling the lending pool would act as if some code was executed when it was not. This would break the underlying logic of many external contracts.

Affected contracts

  • All ATokens (Aave tokens): AToken.redeem calls pool.redeemUnderlying (AToken.sol#L255-L260). As the call does nothing, the users would burn their ATokens without receiving back their underlying tokens.
  • WETHGateway (WETHGateway.sol#L103-L111): Deposits would be stored in the gateway, allowing anyone to steal the deposited assets.
  • Any codebase based on Aave’s Credit Delegation v2 (MyV2CreditDelegation.sol)

If the issue we discovered were exploited, many contracts outside of Aave would have been affected in various ways. Determining a complete list is difficult, and we did not attempt to do so. This incident highlights the underlying risks of DeFi composability. Here are a few that we found:

Fixes and recommendations

Luckily, no one abused this issue before we reported it. Aave called the initialize functions on both versions of the lending pool and thus secured the contracts:

Long term, contract owners should:

  • Add a constructor in all logic contracts to disable the initialize function
  • Check for the existence of a contract in the delegatecall proxy fallback function
  • Carefully review delegatecall pitfalls and use slither-check-upgradeability

Formally verified contracts are not bulletproof

Aave’s codebase is “formally verified.” A trend in the blockchain space is to think that safety properties are the holy grail of security. Users might try to rank the safety of various contracts based on the presence or absence of such properties. We believe this is dangerous and can lead to a false sense of security.

The Aave formal verification report lists properties on the LendingPool view functions (e.g., they don’t have side effects) and the pool operations (e.g., the operations return true when successful and do not revert). For example, one of the verified properties is:

Yet this property can be broken if the logic contract is destroyed. So how could this have been verified? While we don’t have access to the theorem prover nor the setup used, likely, the proofs don’t account for upgradeability, or the prover does not support complex contract interactions.

This is common for code verification. You can prove behaviors in a targeted component with assumptions about the overall behavior. But proving properties in a multi-contract setup is challenging and time-consuming, so a tradeoff must be made.

Formal techniques are great, but users must be aware that they cover small areas and might miss attack vectors. On the other hand, automated tools and human review can help the developers reach higher overall confidence in the codebase with fewer resources. Understanding the benefits and limits of each solution is crucial for developers and users. The current issue is a good example. Slither can find this issue in a few seconds, an expert with training may quickly point it out, but it would take substantial effort to detect with safety properties.


Aave reacted positively and quickly fixed the bug once they were aware of the issue. Crisis averted. But other victims of recent hacks were not as fortunate. Before deploying code and exposing it to an adversarial environment, we recommend that developers:

We hope to prevent similar mistakes by sharing this post and the associated Slither detector for this issue. However, security is a never-ending process, and developers should contact us for security reviews before launching their projects.

Reverie: An optimized zero-knowledge proof system

Zero-knowledge proofs, once a theoretical curiosity, have recently seen widespread deployment in blockchain systems such as Zcash and Monero. However, most blockchain applications of ZK proofs make proof size and performance tradeoffs that are a poor fit for other use-cases. In particular, these protocols often require an elaborate trusted setup phase and optimize for proof size at the expense of prover time. Many would-be applications of ZK proofs are so complex that the ZK-SNARK protocol used by Zcash would take days (if not months) to run. With this in mind, I believe it is important to provide engineers who want to tinker with ZK proofs an alternative set of tradeoffs.

During my internship at Trail of Bits, I implemented a ZK proof system Reverie that optimizes for prover efficiency and does not require any trusted setup. These optimizations come at the expense of proof size, but Reverie allows proofs to be streamed across a network and verified incrementally so as to avoid loading large proofs into memory all at once. Reverie is also available as reverie-zk on

The rest of this blog post will explain the underlying ZK proof system behind Reverie and provide performance benchmarks. Since the proof system uses techniques from secure multiparty computation (MPC), I’ll start by going over some MPC basics and then showing how to build ZK proofs using MPC.

MPC Primer

The players have secret inputs X, Y, Z, respectively, and compute f(X, Y, Z) without revealing X, Y or Z.

The goal of a multi-party computation protocol is to allow the distributed computation of a function on inputs from different parties, without the parties revealing their inputs to one other.

For instance:

  1. Tallying a vote without revealing the values of individual votes.
  2. Computing the number of shared customers without revealing your customer databases.
  3. Training/evaluating a machine-learning model on private datasets without sharing confidential data.

Zero-knowledge proofs and multi-party computation have conceptual similarities, but there are crucial differences:

  1. MPC allows the computation upon private data, where zero-knowledge proofs only enable parties to prove properties about private data.
  2. MPC protocols are interactive (their non-interactive cousin is functional encryption), requiring the parties to be online. Zero-knowledge proofs can be either interactive or non-interactive.

In this post we see that these two primitives are indeed related by describing how to “compile” a multi-party computation protocol into a zero-knowledge proof system. This is a beautiful theoretic result: essentially, checking the output of a function inside a zero-knowledge proof will always be more efficient than computing the function using multi-party computation.

Additionally, it can also lead to very simple, concretely efficient proof systems in practice, with applications including practical post-quantum signature schemes. As we’ll see, it’s a very general and versatile framework for constructing zero-knowledge proof systems.

Cryptographic Compilers

We will now explore how to apply a sequence of “cryptographic compilers” to any MPC protocol to obtain a non-interactive zero-knowledge proof system. A cryptographic compiler uses cryptographic tools to produce a new protocol from an existing one. The advantage of “compilers” in protocol design is that they are general: any input protocol satisfying a set of properties will yield a new protocol where a new set of properties are guaranteed to hold.  In other words, you can compose cryptographic primitives in a black-box fashion to derive new protocols in a “mechanical way.”

IKOS Compiler

Given an MPC protocol capable of computing a function “g(x),” the IKOS compiler yields a zero-knowledge proof system for input/output pairs (w, o) of the function “g”: It enables a prover to convince a verifier that it knows a secret input “w” such that “g(w) = o” for some output “o,” e.g., if “g(x) = SHA-256(x)” to prove that it knows an input “w” to SHA-256 resulting in a given hash “o” without revealing the pre-image (“w”).

At a high level, this is done when the prover locally runs the MPC protocol for “g(x)” where the input “w” is shared among the players. For example, in the case of three players, this can be done by picking random X, Y, Z subject to X + Y + Z = w. In this way, knowing just two of the inputs, X, Y, Z reveals nothing about w. Then the function f(X, Y, Z) = g(X + Y + Z) is computed using the multi-party computation protocol. To prevent cheating, the verifier asks the prover to reveal the inputs and communication for a subset of the players and checks that they’re run correctly.

The IKOS cryptographic compiler

More precisely, the IKOS compiler takes:

  1. An MPC protocol.
  2. A cryptographic commitment scheme (covered below).

And yields an interactive (public coin) zero-knowledge proof protocol. A cryptographic commitment scheme consists of a function”Commit,” which takes a key and a message, then returns a “commitment” Commit(key, msg) -> commitment, with the property that:

  1. Finding different messages with the same commitment is intractable (like a collision-resistant hash function)
  2. Discerning whether a commitment is then created from a given message is intractable (with a random key, the commitment hides msg, like encryption)

Commitments can be constructed from cryptographic hash functions, e.g., by using the HMAC construction. For a physical analogy to commitments, one can think of an envelope: you can put a letter inside the envelope and seal it (compute “Commit(key, msg)”), the envelope will hide the contents of the letter (“hiding”), but will not allow you to replace the letter by another (“binding”). At a later point, one can then open the envelope to reveal the contents (by providing someone with “key” and “msg” and having them recompute the commitment).

Uses cryptographic commitments, the MPC protocol is “compiled” as follows:

  1. The prover executes the MPC protocol for f(X, Y, Z) “in the head”, by simulating the players running the protocol: running every player locally as if they were distributed. You can think of this either as the prover executing the code for each player inside a virtual machine and recording the network traffic between them, or, as the prover playing a form of MPC sock-puppet theater wherein it plays the role of every player in the protocol.
  2. The prover records everything the different players send and receive during the execution of the protocol. We call this “the view” of each player: everything the player sees during the protocol. The prover then commits to the views of every player (“putting it inside an envelope”) and sends the commitments (“envelopes”) to the verifier.
  3. The verifier asks the prover to provide it with a subset of the views (2 in our example).

The verifier then checks that the correspondence between the two parties for which the view has been provided are compatible: the players are sending and receiving the messages that the protocol specifies that they should.

This is illustrated below.

The prover executes the MPC protocol between “virtual/imagined” parties and records their views. The verifier checks the consistency between a subset of these purported views.

Note: The “public coin” part just means that the verifier simply responds to the prover by picking random numbers: in this case the indexes to open, i.e. the verifier has no “secrets” that must be hidden from the prover to prevent him from cheating. This will be relevant later.

Why does this work? We need to convince ourselves of two things:

Soundness: If the prover cheats, it is caught

Suppose the prover does not have inputs X, Y, Z such that f(X, Y, Z) = 1. If it executed the MPC protocol correctly, then the computation would not return the expected output “o” (since the MPC protocol correctly computes f). Hence the prover must cheat while executing the MPC protocol “in his head”. In order to cheat, the prover must either:

  1. Cheat in a player, by e.g. having the player do a local computation wrong.
  2. Cheat in the communication between pairs of players, by e.g. pretending like player A received a message from player B, which player B never sent.

These two ways encompass the only ways to cheat in the MPC protocol: either by having a player violate the protocol, or, by subverting the simulated communication medium.

Consider now the case of 3 players with pairwise channels (as illustrated between Alice, the Hare, and the Hatter). Then the cheating prover must cheat between at least one pair of players. Since it commits to the views of all three players, at least one pair of these commitments must be inconsistent: if the verifier opens this pair, it observes that either one of the players is cheating or the messages between the two players do not match. Since there are three pairs of commitments, the cheating prover is caught with a probability of at least ⅓.

To “improve soundness” (to catch the cheating prover with higher probability), the zero-knowledge proof is repeated in parallel: having the prover run the MPC protocol multiple times, then challenge it to open two parties from every repetition. This way, the probability of a cheating prover succeeding after N repetitions drops exponentially in N.

Zero-knowledge: The verifier learns nothing

When two views are opened, the verifier learns the messages and inputs of two of the players. Since the commitment scheme hides the view of the unopened player and the MPC protocol ensures privacy of the input for the players, the verifier does not learn the input of the unopened player (Y in the example), since the input is shared as described above providing two of the inputs (X, Z in the example) leaks no information about “w”.

Fiat-Shamir Compiler

The Fiat-Shamir transform removes the interactive verifier (professor).

The Fiat-Shamir transform/compiler takes:

  1. A public coin zero-knowledge proof system.
  2. A (particularly strong) hash function.

And yields a non-interactive zero-knowledge proof.

The observation underlying the Fiat-Shamir transform is essentially as follows: Since all the verifier does is pick random numbers, we could just have the prover roll a dice instead and open the index that the dice shows. Then everyone can verify that the opened views are consistent.

There is just one obvious problem: a cheating prover can of course just pretend like the dice throw came out however it wanted them to (e.g. not opening the cheating pair of players). The Fiat-Shamir transformation resolves this by essentially using a hash function to “roll the dice in a verifiably random way”.

This is done by having the prover hash his messages (the commitments to each player’s view), the hash digest is then used as the random challenge (e.g. interpreted as a sequence of integers denoting which players to open).

A cheating prover can of course try changing its messages and recomputing the hash until it gets a challenge that allows him to cheat, however, this merely corresponds to running the interactive protocol multiple times in a setting where the verifier allows him to attempt any number of times. Hence the probability of a cheating prover succeeding in the interactive protocol is amplified by the amount of computation the cheating prover has available, e.g. if the interactive protocol allows the prover to cheat with probability 2^{-128}, then the adversary would have to compute the hash an expected 2^{127} times to cheat; which is clearly not feasible.

Note: In reality, the Fiat-Shamir transform requires a “Random Oracle” [see: for more details], a proof artifact that cannot really be instantiated. However in practice replacing the “Random Oracle” with a “good hash function,” where the output of the hash function “looks random” is sufficient.


This finally brings us to Reverie. Reverie is an efficient Rust implementation of the MPC-in-the-head proof system described in KKW 2018 derived by applying the previous two compilers to a particularly simple MPC protocol. Since the proof system works for any ring (e.g., finite fields, the integers modulo a composite or vector spaces), Reverie is designed to be both fast and flexible enough to support computations in any ring.

Since the MPC-in-the-head paradigm covered above is quite general and the techniques in KKW generalize very easily, Reverie is also designed to be general and easily extendable. This makes it easier to add features such as moving between rings inside the proof system and adding “random gates” to the circuits.


To improve performance, Reverie uses a number of implementation tricks:

Heavy use of bitslicing. Since the parties in the underlying MPC protocol all execute the same simple operations in parallel, we can significantly improve performance by packing the players’ state continuously in memory, then operating on all players in parallel with a single operation (either in traditional 64-bit registers or with SIMD instructions). Reverie never operates on the state of individual players.

Parallelism. Since the proof system requires a number of repetitions for “soundness amplification,” each of these can be trivially  delegated to a separate thread for a linear increase in performance.

Fast cryptographic primitives. Reverie makes use of Blake3 and ChaCha12, since these were found to be the fastest cryptographic primitives (cryptographic hash function and pseudo random function) while still offering a very comfortable security margin.

Streaming prover and streaming verifier. Reverie allows the proof to be produced and verified in a steaming fashion: this enables the verifier to validate the proof while the proof is still being generated and transmitted by the prover. This means that Reverie’s proof and verification components are fully asynchronous.


To benchmark Reverie, we measured the speed of verifying the SHA-256 compression function on this circuit. Reverie averages around 20 SHA-256 compression/second on a (32-core) AMD EPYC 7601:

Num. of SHA-256 compression applications Proof generation time AND Gates XOR Gates Proof size
100 4.76 s 2,227,200 9,397,400 22 MB
1000 50.39 s 22,272,000 93,974,000 220 MB
10000 482.97s 222,720,000 939,740,000 2.2 GB


Reverie is the first step in creating high-performance zero-knowledge proofs outside of the dominant SNARK paradigm, and it handles hundreds of millions of AND/XOR gates with relative ease. The project comes with a simple “companion” program (in the “companion” subdirectory) which makes playing with Reverie relatively easy. Try it now on Github and!

High-fidelity build instrumentation with blight

TL;DR: We’re open-sourcing a new framework, blight, for painlessly wrapping and instrumenting C and C++ build tools. We’re already using it on our research projects, and have included a set of useful actions. You can use it today for your own measurement and instrumentation needs:

Why would you ever want to wrap a build tool?

As engineers, we tend to treat our build tools as monoliths: gcc, clang++, et al. are black boxes that we shove inputs into and get outputs out of.

We then encapsulate our build tools in even higher-level build systems (Make, Ninja) or build system generators (CMake) to spare ourselves the normal hassles of C and C++ compilation: linking individual outputs together, maintaining consistent flags between invocations, supplying the correct linker and including directories, etc.

This all works well enough for everyday development, but falls short for a few specific use cases. We’ll cover some of them below.


Most build systems have their own caches and intermediate management mechanisms (with varying degrees of soundness). What normal build systems can’t do is cache intermediates between separate projects: Each project’s build is hermetic, even when two projects build nearly identical source dependencies.

That’s where tools like ccache and sccache come in: both supply a global intermediate cache by wrapping individual build tools and diverting from a normal tool invocation if a particular input’s output already exists1.

Static analysis

Modern C and C++ compilers support a variety of language standards, as well as customization flags for allowing or disallowing language features. Static analysis tools like clang-tidy need to be at least partially aware of these parameters to provide accurate results; for instance, they shouldn’t be recommending snprintf for versions of C prior to C99. Consequently, these tools need an accurate record of how the program was compiled.

clang-tidy and a few other tools support a Compilation Database format, which is essentially just a JSON-formatted array of “command objects,” each of contains the command-line, working directory, and other metadata associated with a build tool invocation. CMake knows how to generate these databases when using its Make generator; there’s also a tool called Bear that does the same through some LD_PRELOAD trickery.

Similarly: LLVM is a popular static analysis target, but it can be difficult to generically instrument pre-existing build systems to emit a single LLVM IR module (or individual bitcode units) in lieu of a single executable or object intermediates. Build tool wrappers like WLLVM and GLLVM exist for precisely this purpose.

Performance profiling

C and C++ builds get complicated quickly. They also tend to accumulate compilation performance issues over time:

  • Expensive standard headers (like <regex>) get introduced and included in common headers, bloating compilation times for individual translation units.
  • Programmers get clever and write complex templates and/or recursive macros, both of which are traditional sore points in compiler performance.
  • Performance-aiding patterns (like forward declarations) erode as abstractions break.

To fix these performance problems, we’d like to time each individual tool invocation and look for sore points. Even better, we’d like to inject additional profiling flags, like -ftime-report, into each invocation without having to fuss with the build system too much. Some build systems allow the former by setting CC=time cc or similar, but this gets hairy with multiple build systems tied together. The latter is easy enough to do by modifying CFLAGS in Make or add_compile_options /target_compile_options in CMake, but becomes similarly complicated when build systems are chained together or invoke each other.

Build and release assurance

C and C++ are complex languages that are hard, if not impossible, to write safe code in.

To protect us, we have our compilers add mitigations (ASLR, W^X, Control Flow Integrity) and additional instrumentation (ASan, MSan, UBSan).

Unfortunately, build complexity gets in our way once again: when stitching multiple builds together, it’s easy to accidentally drop (or incorrectly add, for release configurations) our hardening flags2. So, we’d like a way to predicate a particular build’s success or failure on the presence (or absence) of our desired flags, no matter how many nested build systems we invoke. That means injecting and/or removing flags just like with performance profiling, so wrapping is once again an appealing solution.

Build tools are a mess

We’ve come up with some potential use cases for a build tool wrapper. We’ve also seen that a useful build tool wrapper is one that knows a decent bit about the command-line syntax of the tool that it’s wrapping: how to reliably extract its inputs and outputs, as well as correctly model a number of flags and options that change the tool’s behavior.

Unfortunately, this is easier said than done:

  • GCC, historically the dominant open-source C and C++ compiler, has thousands of command-line options, including hundreds of OS- and CPU-specific options that can affect code generation and linkage in subtle ways. Also, because nothing in life should be easy, the GCC frontends have no less than four different syntaxes for an option that takes a value:
    • -oVALUE (examples: -O{,0,1,2,3}, -ooutput.o, -Ipath)
    • -flag VALUE (examples: -o output.o, -x c++)
    • -flag=VALUE (examples: -fuse-ld=gold, -Wno-error=switch, -std=c99)
    • -Wx,VALUE (examples: -Wl,--start-group, -Wl,-ereal_main)

    Some of these overlap consistently, while others only overlap in a few select cases. It’s up to the tool wrapper to handle each, at least to the extent required by the wrapper’s expected functionality.

  • Clang, the (relative) newcomer, makes a strong effort to be compatible with the gcc and g++ compiler frontends. To that end, most of the options that need to be modeled for correctly wrapping GCC frontends are the same for the Clang frontends. That being said, clang and clang++ add their own options, some of which overlap in functionality with the common GCC ones. By way of example: The clang and clang++ frontends support -Oz for aggressive code size optimization beyond the (GCC-supported) -Os.
  • Finally, the weird ones: There’s Intel’s ICC, which apparently likewise makes an effort to be GCC-compatible. And then there’s Microsoft’s cl.exe frontend for MSVC which, to the best of my understanding, is functionally incompatible3.

Closer inspection also reveals a few falsehoods that programmers frequently believe about their C and C++ compilers:

“Compilers only take one input at a time!”

This is admittedly less believed: Most C and C++ programmers realize early on that these invocations…

cc -c -o foo.o foo.c
cc -c -o bar.o bar.c
cc -c -o baz.o baz.c
cc -o quux foo.o bar.o baz.o

…can be replaced with:

cc -o quux foo.c bar.c baz.c

This is nice for quickly building things on the command-line, but is less fruitful to cache (we no longer have individual intermediate objects) and is harder for a build tool wrapper to model (we have to suss out the inputs, even when interspersed with other compiler flags).

Compilers only produce one output at a time!

Similar to the above: C and C++ compilers will happily produce a single intermediate output for every input, as long as you don’t explicitly ask them for a single executable output via -o:

cc -c foo.c bar.c baz.c

…produces foo.o, bar.o, and baz.o. This is once again nice for caching, but with a small bit of extra work. To correctly cache each output, we need to transform each input’s filename into the appropriate implicit output name. This ought to be as simple as replacing the source extension with .o, but it isn’t guaranteed:

  • Windows hosts (among others) use .obj rather than .o. Annoying.
  • As we’ll see, not all source inputs are required to have an extension.

Yet more work for our tool wrapper.

“cc only compiles C sources and c++ only compiles C++ sources!”

This is a popular misconception: that cc and c++ (or gcc and g++, or …) are completely different programs that happen to share a great deal of command-line functionality.

In reality, even if they’re separate binaries, they’re usually just thin shims over a common compiler frontend. In particular, c++ corresponds to cc -x c++ and cc corresponds to c++ -x c:

# compile a C++ program using cc
cc -x c++ -c -o foo.o foo.cpp

The -x <language> option enables one particularly annoying useful feature: being able to compile files as a particular language even if their suffix doesn’t match. This comes in handy when doing, say, code generation:

# promise the compiler that junk.gen is actually a C source file
cc -x c junk.gen

Compilers only compile one language at a time!”

Even with the above, programmers assume that each input to the compiler frontend has to be of the same language, i.e., that you can’t mix C and C++ sources in the same invocation. But this just isn’t true:

# compile a C source and a C++ source into their respective object files
cc -c -x c foo.c -x c++ bar.cpp

You don’t even need the -x <language> modifiers when the frontend understands the file suffixes, as it does for .c and .cpp:

# identical to the above
cc -c foo.c bar.cpp

Not every build tool is a compiler frontend

We’ve omitted a critical fact above: Not every build tool shares the general syntax of the C and C++ compiler frontends. Indeed, there are five different groups of tools that we’re interested in:

  • “Compiler tools” like cc and c++, normally overriden with CC and CXX. We’re focusing our efforts on C and C++ for the time being; it’s common to see similar variables for Go (GO), Rust (RUSTC), and so forth.
  • The C preprocessor (cpp), normally overriden with CPP. Most builds invoke the C preprocessor through the compiler frontend, but some invoke it directly.
  • The system linker (ld), normally overriden with LD. Like the preprocessor, the linker is normally interacted with through the frontend, but occasionally makes an appearance of its own when dealing with custom toolchains and linker scripts.
  • The system assembler (as), normally overriden with AS. Can be used via the frontend like the preprocessor and linker, but is also seen independently.
  • The system archiver (ar), normally overriden with AR. Unlike the last three, the archiver is not integrated into the compiler frontend for, e.g., static library construction; users are expected to invoke it directly.

blight’s architecture

blight’s tool hierarchyWe’ve seen some of the complexities that arise when wrapping and accurately modeling the behavior of a build tool. So now let’s take a look at how blight mitigates those complexities.

Like most (all?) build tool wrappers, blight’s wrappers take the place of their wrapped counterparts. For example, to run a build with blight’s C compiler wrapper:

# -e tells make to always give CC from the environment precedence
CC=blight-cc make -e

Setting each CC, CXX, etc. manually is tedious and error-prone, so the blight CLI provides a bit of shell code generation magic to automate the process:

# --guess-wrapped searches the $PATH to find a suitable tool to wrap
eval "$(blight-env --guess-wrapped)"
make -e

Under the hood, each blight wrapper corresponds to a concrete subclass of Tool (e.g., blight.tool.AS for the assembler), each of which has at least the following:

  • The argument vector (args) that the wrapped tool will be run with.
  • The working directory (cwd) that the wrapped tool will be run from.
  • A list of Actions, each of which can register two separate events on each tool run:
    • before_run(tool)—Run before each tool invocation.
    • after_run(tool)—Run after each successful tool invocation.

Individual subclasses of Tool are specialized using a mixin pattern. For example, blight.tool.CC

A render of the CC class’s source

…specializes CompilerTool, which is a doozy of mixins:

A render of the CompilerTool class’s source

Each mixin, in turn, provides some modeled functionality common between one or more tools.

ResponseFileMixin, for example, specializes the behavior of Tool.args for tools that support the @file syntax for supplying additional arguments via an on-disk file (in particular, CC, CXX, and LD):

A render of the ResponseFileMixin class’s source

Other mixins make heavy use of Python 3’s Enum class to strictly model the expected behavior of common tool flags, like -std=STANDARD

A render of the StdMixin class’s source

…where Std:

A render of the Std enum

Taking action

By default, blight does absolutely nothing: it’s just a framework for wrapping build tools. The magic happens when we begin to inject actions before and after each tool invocation.

Internally, the actions API mirrors the tool API: Each tool has a corresponding class under blight.action (e.g., CCCCAction):

To add an action, you just specify its name in the BLIGHT_ACTIONS environment variable. Multiple actions can be specified with : as the delimiter, and will be executed in a left-to-right order. Only actions that “match” a particular tool are run, meaning that an action with CCAction as its parent will never (incorrectly) run on a CXX invocation.

To bring the concepts home, here’s blight running with two stock actions: IgnoreWerror and InjectFlags:

An example blight run

In this case, IgnoreWerror strips any instances of –Werror that it sees in compiler tools (i.e., CC and CXX), while InjectFlags injects a configurable set of arguments via a set of nested variables. We’ll see how that configuration works in a bit.

For example, here’s InjectFlags:

A render of the InjectFlags class

In particular, note that InjectFlags is a CompilerAction, meaning that its events (in this case, just before_run) are only executed if the underlying tool is CC or CXX.

Writing and configuring your own actions

Writing a new action is straightforward: They live in the blight.actions module, and all inherit from one of the specializations of blight.action.Action.

For example, here’s an action that prints a friendly message before and after every invocation of tool.AS (i.e., the standard assembler)…

…and blight will take care of the rest—all you need to do is specify SayHello in BLIGHT_ACTIONS!


What about actions that need to be configured, such as a configurable output file?

Remember the InjectFlags action above: Every loaded action can opt to receive configuration settings via the self._config dictionary, which the action API parses behind the scenes from BLIGHT_ACTION_ACTIONNAME where ACTIONNAME is the uppercase form of the actual action name.

How is that environment variable parsed? Very simply:

A render of the config parsing code for actions

Spelled out, the configuration string is split according to shell lexing rules, and then once again split from KEY=VALUE pairs.

This should be a suitable base for most configuration needs. However, because blight’s actions are just plain old Python classes, they can implement their own configuration approaches as desired.

A new paradigm for build instrumentation

blight is the substrate for a new generation of build wrapping and instrumentation tools. Instead of modeling the vagaries of a variety of tool CLIs themselves, new wrapping tools can rely on blight to get the modeling right and get directly to their actual tasks.

Some ideas that we’ve had during blight’s development:

  • An action that provides real-time build statistics via WebSockets, allowing developers to track the progress of arbitrary builds.
  • A rewrite of tools like WLLVM and GLLVM, enabling higher-fidelity tracking of troublesome edge cases (e.g., in-tree assembly files and builds that generate explicit assembly intermediates).
  • A feedback mechanism for trying performance options. Choosing between optimization flags can be fraught, so a blight action that parametrizes builds across a matrix of potential options could help engineers select the appropriate flags for their projects.

We’ve already written some actions for our own use cases, but we believe blight can be useful to the wider build tooling and instrumentation communities. If you’re interested in working with us on it or porting your existing tools to blight’s framework, contact us!

  1. This, it turns out, is nontrivial: Compilation flags can affect the ABI, so a sound compilation cache must correctly model the various flags passed to the compiler and only hit the intermediate cache if two (likely different) sets of flags from separate invocations are not incompatible.↩︎
  2. Another use case: Companies usually want to strip their application binaries of all symbol and debug information before release, to hamper reverse engineering. Doing so is usually part of a release checklist and should be enforced by the build system, and yet companies repeatedly manage to leak debug and symbol information. A build tool wrapper could provide another layer of assurance.↩︎
  3. In a rare divergence for Microsoft, cl.exe does allow you to use -opt instead of /opt for all command-line options. Unfortunately, most of cl.exe’s options bear no resemblance to GCC or Clang’s, so it doesn’t make much of a difference.↩︎

Smart (and simple) ways to prevent symlink attacks in Go

After writing Go for years, many of us have learned the error-checking pattern down to our bones: “Does this function return an error? Ope, better make sure it’s nil before moving on.”

And that’s great! This should be our default behavior when writing Go.

However, rote error checking can sometimes prevent critical thinking about what that error actually means to the code: When does that function return an error? And does it encompass everything you think it does?

For instance, in os.Create, the nil error value can trick you into thinking you’re safe with file creation. Reading the linked documentation, os.Create actually truncates the file when it already exists instead of throwing any indication that it’s not a new file.

This leaves us vulnerable to a symlink attack.

Does it exist?

Say my program needs to create and use a file. Almost every example of idiomatic Go code guides us through an error check, but no validation of whether the file existed before Create was called. If a symbolic link had already been set up for that file, no error would occur, but the file and its contents would not behave as intended due to the truncation behavior. The risk is that we can remove information using the program to overwrite it for us.

At Trail of Bits, this issue comes up frequently in audits. Thankfully, the fix for it is incredibly easy. We just need to check if a file exists before attempting to create it. A slight tweak in our approach to idiomatic Go can make file creation safer long term and take us one step closer to prioritizing security in Go programs.

The situation

Let’s say there’s a file, my_logs, that I need to create and write to. However, in another part of the codebase, someone previously set up a symlink with ln -s other/logs my_logs.

- logs
- notes
- things I care about
- very important information we can't lose

Contents of other/logs.

package main

import (

func main() {
	file, err := os.Create("my_logs")
	if err != nil {
		fmt.Printf("Error creating file: %s", err)

	_, err = file.Write([]byte("My logs for this process"))
	if err != nil {



$ ln -s other/logs my_logs
$ go build symlink_attack.go
$ ./symlink_attack
$ cat other/logs
- My logs for this process

As you can see, the content of other/logs is wiped even though our program only interacted with my_logs.

Even in this accidental scenario, os.Create removes important data through its truncation behavior. In malicious scenarios, an attacker could leverage the truncation behavior against the user to remove specific data—perhaps audit logs that would have revealed their presence on the box at one point.

Simple fix, two approaches

To remedy this, we have to insert an os.IsNotExist check before calling Create. If you run the edited symlink_attack.go below, the data in other/logs remains and is not overwritten.

package main

import (

func main() {

	if fileExists("my_logs") {
		log.Fatalf("symlink attack failure")

	file, err := os.Create("my_logs")
	if err != nil {
		fmt.Printf("Error creating file: %s", err)

	_, err = file.Write([]byte("My logs for this process"))
	if err != nil {
		fmt.Printf("Failure to write: %s", err)

func fileExists(filename string) bool {
	info, err := os.Stat(filename)
	if os.IsNotExist(err) {
		return false
	return !info.IsDir()

symlink_attack.go with IsNotExist check.

The stipulation here is that by checking os.IsNotExist before creating, we put ourselves in a position where we can’t verify whether a symlink was created between the existence check and the file creation (a time-of-check vs. time-of-use bug). To account for that, we can take a few different approaches.

The first approach is to recreate the implementation of os.Create with your own OpenFile command, thus eliminating the truncation.

func Create(name string) (*File, error) {
    return OpenFile(name, O_RDWR\|O_CREATE\|O_TRUNC, 0666)

os pkg definition for Create.

package main

import (

func main() {
	file, err := os.OpenFile("my_logs", os.O_RDWR|os.O_CREATE|syscall.O_NOFOLLOW, 0666)
	if err != nil {

	_, err = file.Write([]byte("Is this the only thing in the file\n"))
	if err != nil {
		fmt.Printf("Failure to write: %s", err)
	err = file.Close()
	if err != nil {
		fmt.Printf("Couldn't close file: %s", err)
	buf, err := ioutil.ReadFile("./my_logs")
	if err != nil {
		fmt.Printf("Failed to read file: %s", err)

	fmt.Printf("%s", buf)

symlink_attack.go OpenFile with O_NOFOLLOW to avoid following symlinks.

By opening the file with O_NOFOLLOW, you will not follow a symlink. So when a new file is created, this will work the same as os.Create. However, it will fail to open if a symlink is set up in that location.

The alternative is to create a TempFile and use os.Rename to move it to your preferred location.

package main

import (

func main() {
	tmpfile, err := ioutil.TempFile(".", "")
	if err != nil {

	os.Rename(tmpfile.Name(), "my_logs")
	if _, err := tmpfile.Write([]byte("Is this the only thing in the file")); err != nil {

	buf, err := ioutil.ReadFile("./my_logs")
	if err != nil {
		fmt.Printf("Failed to read file: %s", err)

	fmt.Printf("%s", buf)


symlink_attack.go with TempFile creation and os.Rename.

This pattern broke the symlink between my_logs and other/logs. other/logs still had its contents and my_logs had only the contents “Is this the only thing in the file,” as intended.

Protect future you, now

No matter how careful you are about checking errors in Go, they’re not always behaving the way you might think (tl;dr: rtfm). But updating your practices within Go file creation is really simple, and can save you from unintended consequences.

Good idea, bad design: How the Diamond standard falls short

TL;DR: We audited an implementation of the Diamond standard proposal for contract upgradeability and can’t recommend it in its current form—but see our recommendations and upgrade strategy guidance.

We recently audited an implementation of the Diamond standard code, a new upgradeability pattern. It’s a laudable undertaking, but the Diamond proposal and implementation raise many concerns. The code is over-engineered, with lots of unnecessary complexities, and we can’t recommend it at this time.

Of course, the proposal is still a draft, with room to grow and improve. A working upgradeability standard should include:

  • A clear, simple implementation. Standards should be easy to read to simplify integration with third-party applications.
  • A thorough checklist of upgrade procedures. Upgrading is a risky process that must be thoroughly explained.
  • On-chain mitigations against the most common upgradeability mistakes, including function shadowing and collisions. Several mistakes, though easy to detect, can lead to severe issues. See slither-check-upgradeability for many pitfalls that can be mitigated.
  • A list of associated risks. Upgradeability is difficult; it can conceal security considerations or imply that risks are trivial. EIPs are proposals to improve Ethereum, not commercial advertisements.
  • Tests integrated with the most common testing platforms. The tests should highlight how to deploy the system, how to upgrade a new implementation, and how an upgrade can fail.

Unfortunately, the Diamond proposal fails to address these points. It’s too bad, because we’d love to see an upgradeable standard that solves or at least mitigates the main security pitfalls of upgradeable contracts. Essentially, standard writers must assume that developers will make mistakes, and aim to build a standard that alleviates them.

Still, there’s plenty to learn from the Diamond proposal. Read on to see:

  • How the Diamond proposal works
  • What our review revealed
  • Our recommendations
  • Upgradeability standard best practices

The Diamond proposal paradigm

The Diamond proposal is a work-in-progress defined in EIP 2535. The draft claims to propose a new paradigm for contract upgradeability based on delegatecall. (FYI, here’s an overview of how upgradeability works.) EIP 2535 proposes the use of:

  1. A lookup table for the implementation
  2. An arbitrary storage pointer

Lookup table

The delegatecall-based upgradeability mainly works with two components—a proxy and an implementation:

Figure 1: delegatecall-based upgradeability with a single implementation

The user interacts with the proxy and the proxy delegatecall to the implementation. The implementation code is executed, while the storage is kept in the proxy.

Using a lookup table allows delegatecalls to multiple contract implementations, where the proper implementation is selected according to the function to be executed:

Figure 2: delegatecall-based upgradeability with multiple implementations.

This schema is not new; other projects have used such lookup tables for upgradeability in the past. See ColonyNetwork for an example.

Arbitrary storage pointer

The proposal also suggests using a feature recently introduced into Solidity: the arbitrary storage pointer, which (like the name says) allows assignment of a storage pointer to an arbitrary location.

Because the storage is kept on the proxy, the implementation’s storage layout must follow the proxy’s storage layout. It can be difficult to keep track of this layout when doing an upgrade (see examples here).

The EIP proposes that every implementation have an associated structure to hold the implementation variables, and a pointer to an arbitrary storage location where the structure will be stored. This is similar to the unstructured storage pattern, where the new Solidity feature allows use of a structure instead of a single variable.

It is assumed that two structures from two different implementations cannot collide as long as their respective base pointers are different.

   bytes32 constant POSITION = keccak256(

    struct MyStruct {
        uint var1;
        uint var2;

    function get_struct() internal pure returns(MyStruct storage ds) {
        bytes32 position = POSITION;
        assembly { ds_slot := position }

Figure 3: Storage pointer example.

Figure 4: Storage pointer representation.

BTW, what’s a “diamond”?

EIP 2535 introduces “diamond terminology,” wherein the word “diamond” means a proxy contract, “facet” means an implementation, and so on. It’s unclear why this terminology was introduced, especially since the standard terminology for upgradeability is well known and defined. Here’s a key to help you translate the proposal if you go through it:

Diamond vocabulary Common name
Diamond Proxy
Facet Implementation
Cut Upgrade
Loupe List of delegated functions
Finished diamond Non-upgradeable
Single cut diamond Remove upgradeability functions

Figure 5: The Diamond proposal uses new terms to refer to existing ideas.

Audit findings and recommendations

Our review of the diamond implementation found that:

  • The code is over-engineered and includes several misplaced optimizations
  • Using storage pointers has risks
  • The codebase had function shadowing
  • The contract lacks an existence check
  • The diamond vocabulary adds unnecessary complexity

Over-engineered code

While the pattern proposed in the EIP is straightforward, its actual implementation is difficult to read and review, increasing the likelihood of issues.

For example, a lot of the data kept on-chain is cumbersome. While the proposal only needs a lookup table, from the function signature to the implementation’s address, the EIP defines many interfaces that require storage of additional data:

interface IDiamondLoupe {
    /// These functions are expected to be called frequently
    /// by tools.

    struct Facet {
        address facetAddress;
        bytes4[] functionSelectors;

    /// @notice Gets all facet addresses and their four byte function selectors.
    /// @return facets_ Facet
    function facets() external view returns (Facet[] memory facets_);

    /// @notice Gets all the function selectors supported by a specific facet.
    /// @param _facet The facet address.
    /// @return facetFunctionSelectors_
    function facetFunctionSelectors(address _facet) external view returns (bytes4[] memory facetFunctionSelectors_);

    /// @notice Get all the facet addresses used by a diamond.
    /// @return facetAddresses_
    function facetAddresses() external view returns (address[] memory facetAddresses_);

    /// @notice Gets the facet that supports the given selector.
    /// @dev If facet is not found return address(0).
    /// @param _functionSelector The function selector.
    /// @return facetAddress_ The facet address.
    function facetAddress(bytes4 _functionSelector) external view returns (address facetAddress_);

Figure 6: Diamond interfaces.

Here, facetFunctionSelectors returns all the function selectors of an implementation. This information will only be useful for off-chain components, which can already extract the information from the contract’s events. There’s no need for such a feature on-chain, especially since it significantly increases code complexity.

Additionally, much of the code complexity is due to optimization in locations that don’t need it. For example, the function used to upgrade an implementation should be straightforward. Taking a new address and a signature, it should update the corresponding entry in the lookup table. Well, part of the function doing so is the following:

           // adding or replacing functions
            if (newFacet != 0) {
                // add and replace selectors
                for (uint selectorIndex; selectorIndex &amp;amp;amp;amp;amp;lt; numSelectors; selectorIndex++) {
                    bytes4 selector;
                    assembly {
                        selector := mload(add(facetCut,position))
                    position += 4;
                    bytes32 oldFacet = ds.facets[selector];
                    // add
                    if(oldFacet == 0) {
                        // update the last slot at then end of the function
                        slot.updateLastSlot = true;
                        ds.facets[selector] = newFacet | bytes32(selectorSlotLength) &amp;amp;amp;amp;amp;lt;&amp;amp;amp;amp;amp;lt; 64 | bytes32(selectorSlotsLength);
                        // clear selector position in slot and add selector
                        slot.selectorSlot = slot.selectorSlot &amp;amp;amp;amp;amp;amp; ~(CLEAR_SELECTOR_MASK &amp;amp;amp;amp;amp;gt;&amp;amp;amp;amp;amp;gt; selectorSlotLength * 32) | bytes32(selector) &amp;amp;amp;amp;amp;gt;&amp;amp;amp;amp;amp;gt; selectorSlotLength * 32;
                        // if slot is full then write it to storage
                        if(selectorSlotLength == 8) {
                            ds.selectorSlots[selectorSlotsLength] = slot.selectorSlot;
                            slot.selectorSlot = 0;
                            selectorSlotLength = 0;
                    // replace
                    else {
                        require(bytes20(oldFacet) != bytes20(newFacet), "Function cut to same facet.");
                        // replace old facet address
                        ds.facets[selector] = oldFacet &amp;amp;amp;amp;amp;amp; CLEAR_ADDRESS_MASK | newFacet;

Figure 7: Upgrade function.

A lot of effort was made to optimize this function’s gas efficiency. But upgrading a contract is rarely done, so it would never be an expensive operation anyway, no matter what its gas cost.

In another example of unnecessary complexity, bitwise operations are used instead of a structure:

uint selectorSlotsLength = uint128(slot.originalSelectorSlotsLength);
uint selectorSlotLength = uint128(slot.originalSelectorSlotsLength &amp;amp;amp;amp;amp;gt;&amp;amp;amp;amp;amp;gt; 128);
        // uint32 selectorSlotLength, uint32 selectorSlotsLength
        // selectorSlotsLength is the number of 32-byte slots in selectorSlots.
        // selectorSlotLength is the number of selectors in the last slot of
        // selectorSlots.
        uint selectorSlotsLength;

Figure 8: Use of bitwise operations instead of a structure.

Update November 5th:
Since our audit, the reference implementation has changed, but its underlying complexity remains. There are now three reference implementations, which makes everything even more confusing for users, and further review of the proposal is more difficult.

Our recommendations:

  • Always strive for simplicity, and keep as much code as you can off-chain.
  • When writing a new standard, keep the code readable and easy to understand.
  • Analyze the needs before implementing optimizations.

Storage pointer risks

Despite the claim that collisions are impossible if the base pointers are different, a malicious contract can collide with a variable from another implementation. Basically, it’s possible because of the way Solidity stores variables and affects mapping or arrays. For example:

contract TestCollision{
    // The contract represents two implementations, A and B
    // A has a nested structure 
    // A and B have different bases storage pointer 
    // Yet writing in B, will lead to write in A variable
    // This is because the base storage pointer of B 
    // collides with A.ds.my_items[0].elems
    bytes32 constant public A_STORAGE = keccak256(
    struct InnerStructure{
        uint[] elems;
    struct St_A {
        InnerStructure[] my_items;

    function pointer_to_A() internal pure returns(St_A storage s) {
        bytes32 position = A_STORAGE;
        assembly { s_slot := position }
    bytes32 constant public B_STORAGE = keccak256(
    struct St_B {
        uint val;

    function pointer_to_B() internal pure returns(St_B storage s) {
        bytes32 position = B_STORAGE;
        assembly { s_slot := position }
    constructor() public{
        St_A storage ds = pointer_to_A();
    function get_balance() view public returns(uint){
        St_A storage ds = pointer_to_A();
        return ds.my_items[0].elems[0];
    function exploit(uint new_val) public{
        St_B storage ds = pointer_to_B();
        ds.val = new_val;

Figure 9: Storage pointer collision.

In exploit, the write to the B_STORAGE base pointer will actually write to the my_items[0].elems[0], which is read from the A_STORAGE base pointer. A malicious owner could push an upgrade that looks benign, but contains a backdoor.

The EIP has no guidelines for preventing these malicious collisions. Additionally, if a pointer is reused after being deleted, the re-use will lead to data compromise.

Our recommendations

  • Low-level storage manipulations are risky, so be extra careful when designing a system that relies on them.
  • Using unstructured storage with structures for upgradeability is an interesting idea, but it requires thorough documentation and guidelines on what to check for in a base pointer.

Function shadowing

Upgradeable contracts often have functions in the proxy that shadow the functions that should be delegated. Calls to these functions will never be delegated, as they will be executed in the proxy. Additionally, the associated code will not be upgradeable.

contract Proxy {

    constructor(...) public{
          // add my_targeted_function() 
          // as a delegated function
    function my_targeted_function() public{
    fallback () external payable{
          // delegate to implementations

Figure 10: Simplification of a shadowing issue.

Although this issue is well known, and the code was reviewed by the EIP author, we found two instances of function-shadowing in the contracts.

Our recommendations

  • When writing an upgradeable contract, use or slither-check-upgradeability to catch instances of shadowing.
  • This issue highlights an important point: Developers make mistakes. Any new standard should include mitigations for common mistakes if it’s to work better than custom solutions.

No contract existence check

Another common mistake is the absence of an existence check for the contract’s code. If the proxy delegates to an incorrect address, or implementation that has been destructed, the call to the implementation will return success even though no code was executed (see the Solidity documentation). As a result, the caller will not notice the issue, and such behavior is likely to break third-party contract integration.

   fallback() external payable {
        DiamondStorage storage ds;
        bytes32 position = DiamondStorageContract.DIAMOND_STORAGE_POSITION;
        assembly { ds_slot := position }
        address facet = address(bytes20(ds.facets[msg.sig]));
        require(facet != address(0), "Function does not exist.");
        assembly {
            calldatacopy(0, 0, calldatasize())
            let result := delegatecall(gas(), facet, 0, calldatasize(), 0, 0)
            let size := returndatasize()
            returndatacopy(0, 0, size)
            switch result
            case 0 {revert(0, size)}
            default {return (0, size)}

Figure 11: Fallback function without contract’s existence check.

Our recommendations

  • Always check for contract existence when calling an arbitrary contract.
  • If gas cost is a concern, only perform this check if the call returns no data, since the opposite result means that some code was executed.

Unnecessary Diamond vocabulary

As noted, the Diamond proposal relies heavily on its newly created vocabulary. This is error-prone, makes review more difficult, and does not benefit developers.

  1. A diamond is a contract that uses functions from its facets to execute function calls. A diamond can have one or more facets.
  2. The word facet comes from the diamond industry. It is a side, or flat surface of a diamond. A diamond can have many facets. In this standard a facet is a contract with one or more functions that executes functionality of a diamond.
  3. A loupe is a magnifying glass that is used to look at diamonds. In this standard a loupe is a facet that provides functions to look at a diamond and its facets.

Figure 12: The EIP redefines standard terms to ones that are unrelated to software engineering.

Our recommendation

  • Use the common, well-known vocabulary, and do not invent terminology when it’s not needed.

Is the Diamond proposal a dead end?

As noted, we still believe the community would benefit from a standardized upgradeability schema. But the current Diamond proposal does not meet the expected security requirements or bring enough benefits over a custom implementation.

However, the proposal is still a draft, and could evolve into something simpler and better. And even if it doesn’t, some of the existing techniques used, such as the lookup table and arbitrary storage pointers, are worth continuing to explore.

So…is upgradeability feasible or not?

Over the years, we’ve reviewed many upgradeable contracts and published several analyses on this topic. Upgradeability is difficult, error-prone, and increases risk, and we still generally don’t recommend it as a solution. But developers who need upgradeability in their contracts should:

And please contact us if you have any questions about your upgrade strategy. We’re ready to help!

Efficient audits with machine learning and Slither-simil

by Sina Pilehchiha, Concordia University

Trail of Bits has manually curated a wealth of data—years of security assessment reports—and now we’re exploring how to use this data to make the smart contract auditing process more efficient with Slither-simil.

Based on accumulated knowledge embedded in previous audits, we set out to detect similar vulnerable code snippets in new clients’ codebases. Specifically, we explored machine learning (ML) approaches to automatically improve on the performance of Slither, our static analyzer for Solidity, and make life a bit easier for both auditors and clients.

Currently, human auditors with expert knowledge of Solidity and its security nuances scan and assess Solidity source code to discover vulnerabilities and potential threats at different granularity levels. In our experiment, we explored how much we could automate security assessments to:

  1. Minimize the risk of recurring human error, i.e., the chance of overlooking known, recorded vulnerabilities.
  2. Help auditors sift through potential vulnerabilities faster and more easily while decreasing the rate of false positives.


Slither-simil, the statistical addition to Slither, is a code similarity measurement tool that uses state-of-the-art machine learning to detect similar Solidity functions. When it began as an experiment last year under the codename crytic-pred, it was used to vectorize Solidity source code snippets and measure the similarity between them. This year, we’re taking it to the next level and applying it directly to vulnerable code.

Slither-simil currently uses its own representation of Solidity code, SlithIR (Slither Intermediate Representation), to encode Solidity snippets at the granularity level of functions. We thought function-level analysis was a good place to start our research since it’s not too coarse (like the file level) and not too detailed (like the statement or line level.)

Figure 1: A high-level view of the process workflow of Slither-simil.

In the process workflow of Slither-simil, we first manually collected vulnerabilities from the previous archived security assessments and transferred them to a vulnerability database. Note that these are the vulnerabilities auditors had to find with no automation.

After that, we compiled previous clients’ codebases and matched the functions they contained with our vulnerability database via an automated function extraction and normalization script. By the end of this process, our vulnerabilities were normalized SlithIR tokens as input to our ML system.

Here’s how we used Slither to transform a Solidity function to the intermediate representation SlithIR, then further tokenized and normalized it to be an input to Slither-simil:

function transferFrom(address _from, address _to, uint256 _value) public returns (bool success) {
        require(_value <= allowance[_from][msg.sender]);     // Check allowance
        allowance[_from][msg.sender] -= _value;
        _transfer(_from, _to, _value);
        return true;

Figure 2: A complete Solidity function from the contract TurtleToken.sol.

Function TurtleToken.transferFrom(address,address,uint256) (*)

Solidity Expression: require(bool)(_value <= allowance[_from][msg.sender])
         REF_10(mapping(address => uint256)) ->    allowance[_from]
         REF_11(uint256) -> REF_10[msg.sender]
         TMP_16(bool) = _value <= REF_11
         TMP_17 = SOLIDITY_CALL require(bool)(TMP_16)

Solidity Expression: allowance[_from][msg.sender] -= _value
         REF_12(mapping(address => uint256)) -> allowance[_from]
         REF_13(uint256) -> REF_12[msg.sender]
         REF_13(-> allowance) = REF_13 - _value

Solidity Expression: _transfer(_from,_to,_value)
         INTERNAL_CALL,      TurtleToken._transfer(address,address,uint256)(_from,_to,_value)

Solidity Expression: true
         RETURN True

Figure 3: The same function with its SlithIR expressions printed out.

First, we converted every statement or expression into its SlithIR correspondent, then tokenized the SlithIR sub-expressions and further normalized them so more similar matches would occur despite superficial differences between the tokens of this function and the vulnerability database.







(state_solc_variable(string)):=(local_solc_variable(memory, string))

(state_solc_variable(string)):=(local_solc_variable(memory, string))


Figure 4: Normalized SlithIR tokens of the previous expressions.

After obtaining the final form of token representations for this function, we compared its structure to that of the vulnerable functions in our vulnerability database. Due to the modularity of Slither-simil, we used various ML architectures to measure the similarity between any number of functions.

$ slither-simil test etherscan_verified_contracts.bin --filename TurtleToken.sol --fname TurtleToken.transferFrom --input cache.npz --ntop 5

Reviewed 825062 functions, listing the 5 most similar ones:

filename            contract      function     score
TokenERC20.sol      TokenERC20    freeze       0.991
ETQuality.sol       StandardToken transferFrom 0.936
NHST.sol            NHST          approve      0.889

Figure 5: Using Slither-simil to test a function from a smart contract with an array of other Solidity contracts.

Let’s take a look at the function transferFrom from the ETQuality.sol smart contract to see how its structure resembled our query function:

function transferFrom(address _from, address _to, uint256 _value) returns (bool success) {
    if (balances[_from] >= _value && allowed[_from][msg.sender] >= _value && _value > 0) {
        balances[_to] += _value;
        balances[_from] -= _value;
        allowed[_from][msg.sender] -= _value;
        Transfer(_from, _to, _value);
        return true;
    } else { return false; }

Figure 6: Function transferFrom from the ETQuality.sol smart contract.

Comparing the statements in the two functions, we can easily see that they both contain, in the same order, a binary comparison operation (>= and <=), the same type of operand comparison, and another similar assignment operation with an internal call statement and an instance of returning a “true” value.

As the similarity score goes lower towards 0, these sorts of structural similarities are observed less often and in the other direction; the two functions become more identical, so the two functions with a similarity score of 1.0 are identical to each other.

Related Research

Research on automatic vulnerability discovery in Solidity has taken off in the past two years, and tools like Vulcan and SmartEmbed, which use ML approaches to discovering vulnerabilities in smart contracts, are showing promising results.

However, all the current related approaches focus on vulnerabilities already detectable by static analyzers like Slither and Mythril, while our experiment focused on the vulnerabilities these tools were not able to identify—specifically, those undetected by Slither.

Much of the academic research of the past five years has focused on taking ML concepts (usually from the field of natural language processing) and using them in a development or code analysis context, typically referred to as code intelligence. Based on previous, related work in this research area, we aim to bridge the semantic gap between the performance of a human auditor and an ML detection system to discover vulnerabilities, thus complementing the work of Trail of Bits human auditors with automated approaches (i.e., Machine Programming, or MP).


We still face the challenge of data scarcity concerning the scale of smart contracts available for analysis and the frequency of interesting vulnerabilities appearing in them. We can focus on the ML model because it’s sexy but it doesn’t do much good for us in the case of Solidity where even the language itself is very young and we need to tread carefully in how we treat the amount of data we have at our disposal.

Archiving previous client data was a job in itself since we had to deal with the different solc versions to compile each project separately. For someone with limited experience in that area this was a challenge, and I learned a lot along the way. (The most important takeaway of my summer internship is that if you’re doing machine learning, you will not realize how major a bottleneck the data collection and cleaning phases are unless you have to do them.)

Figure 7: Distribution of 89 vulnerabilities found among 10 security assessments.

The pie chart shows how 89 vulnerabilities were distributed among the 10 client security assessments we surveyed. We documented both the notable vulnerabilities and those that were not discoverable by Slither.

The Road Ahead for Slither-simil

This past summer we resumed the development of Slither-simil and SlithIR with two goals in mind:

  • Research purposes, i.e., the development of end-to-end similarity systems lacking feature engineering.
  • Practical purposes, i.e., adding specificity to increase precision and recall.

We implemented the baseline text-based model with FastText to be compared with an improved model with a tangibly significant difference in results; e.g., one not working on software complexity metrics, but focusing solely on graph-based models, as they are the most promising ones right now.

For this, we have proposed a slew of techniques to try out with the Solidity language at the highest abstraction level, namely, source code.

To develop ML models, we considered both supervised and unsupervised learning methods. First, we developed a baseline unsupervised model based on tokenizing source code functions and embedding them in a Euclidean space (Figure 8) to measure and quantify the distance (i.e., dissimilarity) between different tokens. Since functions are constituted from tokens, we just added up the differences to get the (dis)similarity between any two different snippets of any size.

The diagram below shows the SlithIR tokens from a set of training Solidity data spherized in a three-dimensional Euclidean space, with similar tokens closer to each other in vector distance. Each purple dot shows one token.

Figure 8: Embedding space containing SlithIR tokens from a set of training Solidity data

We are currently developing a proprietary database consisting of our previous clients and their publicly available vulnerable smart contracts, and references in papers and other audits. Together they’ll form one unified comprehensive database of Solidity vulnerabilities for queries, later training, and testing newer models.

We’re also working on other unsupervised and supervised models, using data labeled by static analyzers like Slither and Mythril. We’re examining deep learning models that have much more expressivity we can model source code with—specifically, graph-based models, utilizing abstract syntax trees and control flow graphs.

And we’re looking forward to checking out Slither-simil’s performance on new audit tasks to see how it improves our assurance team’s productivity (e.g., in triaging and finding the low-hanging fruit more quickly). We’re also going to test it on Mainnet when it gets a bit more mature and automatically scalable.

You can try Slither-simil now on this Github PR. For end users, it’s the simplest CLI tool available:

  1. Input one or multiple smart contract files (either directory, .zip file, or a single .sol).
  2. Identify a pre-trained model, or separately train a model on a reasonable amount of smart contracts.
  3. Let the magic happen, and check out the similarity results.
$ slither-simil test etherscan_verified_contracts.bin --filename MetaCoin.sol --fname MetaCoin.sendCoin --input cache.npz


Slither-simil is a powerful tool with potential to measure the similarity between function snippets of any size written in Solidity. We are continuing to develop it, and based on current results and recent related research, we hope to see impactful real-world results before the end of the year.

Finally, I’d like to thank my supervisors Gustavo, Michael, Josselin, Stefan, Dan, and everyone else at Trail of Bits, who made this the most extraordinary internship experience I’ve ever had.