Contract verification made easier

Smart contract authors can now express security properties in the same language they use to write their code (Solidity) and our new tool, manticore-verifier, will automatically verify those invariants. Even better, Echidna and Manticore share the same format for specifying property tests.

In other words, smart contract authors can now write one property test and have it tested with fuzzing and verified by symbolic execution! Ultimately, manticore-verifier reduces the initial effort and cost involved in symbolic testing of arbitrary properties.

How it works

A smart contract’s behavior—and its potential bugs—are often unique and depend heavily on unspoken contract invariants. Let’s test a simple contract:

contract Ownership{
    address owner = msg.sender;
    function Owner() public{
        owner = msg.sender;
    }
    modifier isOwner(){
        require(owner == msg.sender);
        _;
    }
}
contract Pausable is Ownership{
    bool is_paused;
    modifier ifNotPaused(){
        require(!is_paused);
        _;
    }
    function paused() isOwner public{
        is_paused = true;
    }
    function resume() isOwner public{
        is_paused = false;
    }
}
contract Token is Pausable{
    mapping(address => uint) public balances;
    function transfer(address to, uint value) ifNotPaused public{
        balances[msg.sender] -= value;
        balances[to] += value;
    }
}

This contract maintains a balance sheet and allows for simple transactions. Users can send their tokens to other users, but the total amount of tokens must remain fixed—in other words, tokens can’t be created after the contract has started. So under this invariant, a valid property could state: “If there are only 10,000 tokens, no user could own more than that.”

We can express this property as a Solidity method: “crytic_test_balance.”

import "token.sol";
contract TestToken is Token {
    constructor() public{
        balances[msg.sender] = 10000;
    }
    // the property
    function crytic_test_balance() view public returns(bool){
        return balances[msg.sender] <= 10000;
    }   
}

The emulated world

ManticoreEVM compiles and then creates the contract in a fully emulated symbolic blockchain.

Different normal accounts are also created there to replicate real-world situations. A deployer account is used to deploy the contract, others are used to explore the contract and try to break the properties, and, finally, a potentially different account is used to test the properties.

ManticoreEVM detects the property type methods present in high-level source code and checks them after every combination of symbolic transactions. A normal property is considered failed if the method returns false.

The loop (exploration)

The deployer account initially creates the target contract via a CREATE transaction. Then manticore-verifier simulates all possible interleaving transactions originating from the contract testers until (for example) no more coverage is found. After each symbolic transaction, the properties are checked in the name of the property-checker account, and if anything looks broken, a report of the reproducible exploit trace is generated. Normal properties like crytic_test_balance() are expected to return true; any other result is reported as a problem.

manticore-verifier dapp.sol –contract TestToken

It’s a command–line-based tool

Several aspects of the exploration, the stopping condition, and the user accounts employed can be modified by command line arguments. Try $manticore-verifier –help for a thorough list. Here’s an excerpt of it in action:

$manticore-verifier dapp.sol  --contract TestToken

# Owner account: 0x28e9eb58c2f5be87161a261f412a115eb85946d9
# Contract account: 0x9384027ebe35100de8ef216cb401573502017f7
# Sender_0 account: 0xad5e556d9699e9e35b3190d76f75c9bf9997533b
# PSender account: 0xad5e556d9699e9e35b3190d76f75c9bf9997533b
# Found 1 properties: crytic_test_balance
# Exploration will stop when some of the following happens:
# * 3 human transaction sent
# * Code coverage is greater than 100% measured on target contract
# * No more coverage was gained in the last transaction
# * At least 1 different properties where found to be breakable. (1 for fail fast)
# * 240 seconds pass
# Starting exploration...
Transaction 0. States: 1, RT Coverage: 0.0%, Failing properties: 0/1
Transaction 1. States: 2, RT Coverage: 60.66%, Failing properties: 0/1
Found 1/1 failing properties. Stopping exploration.
60.66% EVM code covered 
+---------------------+------------+
|    Property Named   |   Status   |
+---------------------+------------+
| crytic_test_balance | failed (0) |
+---------------------+------------+
Checkout testcases here:./mcore_kkgtybqb

Note that each failing property will have a test case number associated with it. More details can be found at the specified test case files: ./mcore_kkgtybqb/user_000000.tx

Bug Found!

In our example, manticore-verifier finds a way to break the specified property. When trying to transfer an incredibly large amount tokens, an internal integer representation exceeds its limits and makes it possible to boost the sender’s savings, i.e., create tokens out of thin air.

transfer(0,115792089237316195422001709574841237640532965826898585773776019699400460720238) -> STOP (*)

Conclusion: Interoperability = 101%

manticore-verifier lowers the initial cost to symbolically test arbitrary properties. It also allows our symbolic executor to work more tightly with Solidity, Echidna, and slither-prop.

The same methodology can be used with our Ethereum fuzzer, Echidna. As a result, you can write the properties once and test them with symbolic execution and fuzzing with no extra effort.

manticore-verifier can check automatically generated ERC20 properties. Moreover, slither-prop, our static analyzer, has detailed information about what an ERC20 contract should do, and can automatically produce properties for ERC20 that manticore-verifier can check, automatically.

So get your contract, add the property methods, and test with manticore-verifier at will. If you have any questions please join the Empire Hacking Slack.

Advocating for change

As a company, we believe Black lives matter. In the face of continued police brutality, racial disparities in law enforcement, and limited accountability, we demand an end to systemic racism, endorse restrictions on police use of force, and seek greater accountability for police actions. We believe police misconduct, militarization of police, and unchecked abuse of power are issues that we as Americans should protest.

Giving time, money, and attention

In this spirit, I have reaffirmed our employees’ right to protest without reprisal or retaliation. While there’s certainly no account of who has and hasn’t, I’m aware that many of our employees have recently marched to end systemic racism and police brutality.

To support understanding and discussion, we created a #solidarity channel on our company Slack. Conversations there grew rapidly as we shared research on social policy and upcoming legislation, including policies that have been analyzed by social scientists studying criminal justice:

  1. A large-scale analysis of racial disparities in police stops across the United States
  2. Collective Bargaining Rights and Police Misconduct: Evidence from Florida
  3. Evidence that curtailing proactive policing can reduce major crime
  4. Good Cop, Bad Cop: Using Civilian Allegations to Predict Police Misconduct
  5. The Wandering Officer

Many of our employees also decided to “protest with our wallets” and use our existing charitable donation matching program to support organizations we believe can effect change. In the last two weeks, employees have donated $12k and the company matched $12k ($24k total) to a number of related non-profits, including:

More we can do now: Calls to action

Advocacy is not new to us—Trail of Bits is among the largest employers of cybersecurity professionals in NYC, and has frequently advocated for policy change as part of Tech:NYC and the Coalition for Responsible Cybersecurity. As an NYC-based company, we urge the NYC Council to take action.

The June 18 legislative session of the NYC Council will be livestreamed, and we’ll be watching. We urge our representatives to:

  • Pass all five bills that were heard in the last meeting of the Public Safety Committee
  • Pass the POST Act and require reporting on NYPD use of surveillance technology
  • Commit to NYC Budget Justice and reallocate funds towards social programs

While policing is largely a state and local matter in the United States, federal action has a strong effect on state and local policies. We call on the US Congress to:

Local and state action may have the most direct impact on policing practices. If you want to lobby your representatives as an individual, use “Who are my representatives?” to find their contact information and give them a call. Personal, authentic contact with local representatives can be very effective at shaping policy decisions.

If you’re an individual motivated to support a charitable organization, consider reviewing the following resources first:

When donating, strongly consider a charitable donation matching program. If your employer does not offer one, suggest that they sign up for the Technology Partner program from RaisedBy.Us. Trail of Bits uses their service to facilitate donation matching through Brightfunds.

If you are planning to attend a protest, research what local activists in your area are recommending to protect yourself and others. There are widespread disparities in treatment of protesters across the United States: a “March for Families” in NYC may be completely unlike a similarly named event in Oregon. Consider advice from the Legal Aid Society of NYC or Vice (and their digital guide) and put on a mask before attending a protest.

We can always do more

We know our efforts are modest, and that the problems will not be fixed by a few waves of donations and legislation. Our own efforts to advocate for change started small, but they are growing.

We also recognize the diversity deficit in our own company. As part of our effort to close that gap, we are working with diversity and inclusion-focused recruiting groups and conducting implicit bias training. We’ve created the CTF Field Guide to help eliminate the knowledge gap for industry newcomers and we host yearly winternships that provide inroads for people new to computer security. We’re also increasing the matching for our existing charity matching program and making the most of our diversity-focused donation to the Summercon Foundation. Finally, to help ensure this is not a one-off effort, we are listening to our employees and community to hold us accountable.

The protests have been extraordinarily effective in moving legislation forward; so much so, it can be tough to keep up. We realize it’s only a long-overdue beginning, but the more we know about what’s gaining ground, the better we can advocate for it. To help, we’ve assembled a summary of the changes we’ve seen at the NYC, New York State, and federal levels.

Upgradeable contracts made safer with Crytic

Upgradeable contracts are not as safe as you think. Architectures for upgradeability can be flawed, locking contracts, losing data, or sabotaging your ability to recover from an incident. Every contract upgrade must be carefully reviewed to avoid catastrophic mistakes. The most common delegatecall proxy comes with drawbacks that we’ve catalogued before.

Crytic now includes a comprehensive suite of 17 upgradeability checks to help you avoid these pitfalls.

The how-to

Reviewing upgradeable contracts is a complex low-level task that requires investigating the storage layout and organization of functions in memory. We created a sample token that supports upgradeability to help walk through the steps in crytic/upgradeability-demo. This simple demo repository includes:

  • MyToken, our initial implementation of a simple token
  • Proxy, our proxy

Any call to Proxy will use a delegatecall on MyToken to execute its logic, while the storage variables will be held on Proxy. This is a standard setup for most upgradeable contracts.

Consider these two contracts are already deployed on mainnet. However, the code for MyToken has become stale and you need to change its features. It’s time for MyTokenV2! The code for MyTokenV2 is similar to MyToken, with the exception of removing the init() function and its associated state variable.

Let’s use Crytic to ensure that deploying MyTokenV2 does not introduce new security risks.

Configuration

First, tell Crytic about your upgradeable contracts. Go to your Crytic settings and find this panel:

Here you can configure:

  1. The contract being upgraded
  2. The proxy used
  3. The new version of the contract

Note: (1) and (2) are optional; Crytic will run as many checks as are appropriate.

For example, if you only have the upgradeable contract, and no proxy or new version, Crytic can already look for flaws in the initialization schema. If you have the upgradeable contract and the proxy, but no new version, Crytic can look for function collisions between the implementation and the proxy. If you have multiple upgradeable contracts, or multiple proxies, you can then configure any combination that fits your setup.

Back to MyToken, we have these three contracts:

Once we configure Crytic, the upgradeability checks will run on every commit and pull request, similar to security checks and unit tests:

Crytic’s Findings

Occasionally, Crytic will find serious errors in your upgradeability code (oh no!). We built one such issue into our demo. Here’s what it looks like when Crytic discovers a security issue:

The was_init storage variable was removed, so balances has a different storage offset in MyToken and MyTokenV2, breaking the storage layout of the contract.

This is a common mistake that can be particularly difficult to find by hand in complex codebases with many contracts and inheritances—but Crytic will catch the issue for you!

What else can Crytic find?

Crytic will review (depending on your configuration):

  • Storage layout consistency between the upgrades and the proxy
  • Function collisions between the proxy and the implementation
  • Correct initialization schema
  • Best practices for variable usage

Here’s the detailed list of checks:

Num What it Detects Impact Proxy needed New version needed
1 Variables that should not be constant High X
2 Function ID collision High X
3 Function shadowing High X
4 Missing call to init function High
5 initializer() is not called High
6 Init function called multiple times High
7 Incorrect vars order in v2 High X
8 Incorrect vars order in the proxy High X
9 State variables with an initial value High
10 Variables that should be constant High X
11 Extra vars in the proxy Medium X
12 Variable missing in the v2 Medium X
13 Extra vars in the v2 Informational X
14 Initializable is not inherited Informational
15 Initializable is missing Informational
16 Initialize function that must be called Informational
17 initializer() is missing Informational

Check your contracts with Crytic

In addition to finding 90+ vulnerabilities, Crytic can now detect flaws in your upgradeability code. It is the only platform that can protect your codebase in depth for so many issues. If you want to avoid catastrophic mistakes, use Crytic before deploying any upgradeable contract.

Got questions? Join our Slack channel (#crytic) or follow @CryticCI on Twitter.

ECDSA: Handle with Care

The elliptic curve digital signature algorithm (ECDSA) is a common digital signature scheme that we see in many of our code reviews. It has some desirable properties, but can also be very fragile. For example, LadderLeak was published just a couple of weeks ago, which demonstrated the feasibility of key recovery with a side channel attack that reveals less than one bit of the secret nonce.

ECDSA is fragile and must be handled with care

This post will walk you through:

  • the various ways in which ECDSA nonce bias can be exploited
  • how simple it is to attack in practice when things go wrong, and
  • how to protect yourself.

You’re probably familiar with attacks against ECDSA. Some attacks are trivial, and some involve advanced Fourier analysis and lattice math. Although these attacks can be complicated, I hope this post will demonstrate that they are easy to implement in practice. In fact, even if you don’t know anything about lattices, after reading this blog post you will be able to leverage a lattice attack to break ECDSA signatures produced with a very slightly faulty RNG using less than 100 lines of python code.

Math disclaimer: to read this post, you will need to be somewhat familiar with mathematical groups, recognizing that they have a binary operation and a group generator. You do not need to be an expert on elliptic curves; you just need to know that elliptic curves can be used to form a mathematical group (and, thus, have a concept of addition and scalar multiplication). Familiarity with other math concepts like lattices is helpful, but not required.

DSA primer

ECDSA is a specific form of the digital signature algorithm (DSA). DSA is a pretty common digital signature scheme, and is defined with three algorithms: key generation, signing, and verification. The key generation algorithm generates a private and public key; the private key is responsible for creating signatures; and the public key is responsible for verifying signatures. The signature algorithm takes as input a message and private key, and produces a signature. The verification algorithm takes as input a message, signature, and public key, and returns true or false, indicating whether the signature is valid.

DSA is defined over any mathematical group, and this scheme is secure as long as the discrete log problem is hard over this group. The group typically used is the integers modulo a prime, p. Along with this group, we will have a group generator, g, and some cryptographically secure hash function, H. We can assume that p, g, and H will all be publicly known.

Key generation works by first randomly selecting a value, x, from the integers mod p. Then the value y = gx mod p is computed. The private signing key is set to x, and the public key is y. The signing key must be kept secret, as this is what allows signatures to be made.

The signing algorithm produces a signature from a message, m, and the secret key, x. First, a random element of the group, k, is generated. This is known as the nonce, which is important when talking about attacks. Then, the values r = gk mod p and s = (k-1(H(m) + xr)) mod p are computed. Here k-1 is the group inverse, and H(m) is the result of computing the hash of m and interpreting the result as an integer mod p. The signature is defined to be the pair (r,s). (Note: if either of the r or s values equal 0, the algorithm restarts with a new k value).

The verification algorithm receives as input the signature, (r,s), the message, m, and the public key, y. Let ŝ = s-1, then the algorithm outputs true if and only if r,s ≠ 0 and r = (gH(m)yr)ŝ. This verification check works because gH(m)yr = gH(m)+xr = gks, and so (gH(m)yr)ŝ = gk = r.

A digital signature scheme is considered secure if it is unforgeable. Unforgeability has a formal cryptographic meaning, but on a high level it means that you cannot produce signatures without knowing the secret key (unless you have copied an already existing signature created from the secret key). DSA is proven to be unforgeable under the discrete log assumption.

ECDSA

DSA is defined over a mathematical group. When DSA is used with the elliptic curve group as this mathematical group, we call this ECDSA. The elliptic curve group consists of elliptic curve points, which are pairs (x,y) that satisfy the equation y2 = x3 + ax + b, for some a,b. For this blog post, all you need to know is that, using elliptic curves, you can define a finite group, which means you obtain a group generator, g (an elliptic curve point), and addition and scalar multiplication operations just like you can with integers. Since they form a finite group, the generator, g, will have a finite order, p. This blog post will not explain or require you to know how these elliptic curve operations work, but If you’re curious, I encourage you to read more about them here.

ECDSA works the same way as DSA, except with a different group. The secret key, x, will still be a random value from the integers mod p. Now, the public key, y, is still computed as y = gx, except now g is an elliptic curve point. This means that y will also be an elliptic curve point (before, y was an integer mod p). Another difference occurs in how we compute the value r. We still generate a random nonce, k, as an integer mod p, just as before. We will compute gk, but again, g is an elliptic curve point, and so gk is as well. Therefore, we can compute (xk,yk) = gk, and we set r = xk. Now, the s value can be computed as before, and we obtain our signature (r,s), which will still be integers mod p as before. To verify, we need to adjust for the fact that we’ve computed r slightly differently. So, as before, we compute the value (gH(m)yr)ŝ, but now this value is an elliptic curve point, so we take the x-coordinate of this point and compare it against our r value.

Recovering secret keys from reused nonces

Now that we understand what ECDSA is and how it works, let’s demonstrate its fragility. Again, since it’s a digital signature scheme, it is imperative that the secret key is never revealed to anyone other than the message signer. However, if a signer ever releases a signature and also releases the nonce they used, an attacker can immediately recover the secret key. Say I release a signature (r,s) for a message m, and I accidentally reveal that I used the nonce k. Since s = (k-1(H(m) + xr)), we can easily compute the secret key:

s = (k-1(H(m) + xr))

ks = H(m) + xr

ksH(m) = xr

x = r-1(ksH(m))

Therefore, not only does a signer need to keep their secret key secret, but they also must keep all of their nonces they ever generate secret.

Even if the signer keeps every nonce secret, if they accidentally repeat a single nonce (even for different messages), the secret key can immediately be recovered as well. Let (r,s1) and (r,s2) be two signatures produced on messages m1 and m2 (respectively) from the same nonce, k—since they have the same nonce, the r values will be the same, so this is very easily detected by an attacker:

s1 = k-1(H(m1) + xr) and s2 = k-1(H(m2) + xr)

s1s2 = k-1(H(m1) – H(m2))

k(s1s2) = H(m1) – H(m2)

k = (s1s2)-1(H(m1) – H(m2))

Once we have recovered the nonce, k, using the formula above, we can then recover the secret key by performing the previously described attack.

Let’s take a moment to digest this.

If a nonce for a signature is ever revealed, the secret key can immediately be recovered, which breaks our entire signature scheme. Further, if two nonces are ever repeated, regardless of what the messages are, an attacker can easily detect this and immediately recover the secret key, again breaking our entire scheme. That is pretty fragile, and these are just the easy attacks!

Attacking ECDSA from leaked and biased nonces

It turns out that even leaking small parts of the nonce can also be very damaging to the signature scheme. In 1999, work by Howgrave-Graham and Smart demonstrated the feasibility of using lattice attacks to break DSA from partial nonce leakage. Later, Nguyen and Shparlinski improved on their work, and were able to recover secret keys on 160-bit DSA (here 160-bit refers to p), and later ECDSA, by knowing only three bits of each nonce from 100 signatures.

Later, Mulder et al were able to perform more attacks on partial nonce leakage. They used a different, Fourier transform-based attack derived from work by Bleichenbacher. Using these techniques, and knowing only five bits of each nonce from 4,000 signatures, they were able to recover secret keys from 384-bit ECDSA, and leveraged their techniques to break 384-bit ECDSA running on a smart card.

You may have heard of the Minerva attack: Several timing side channels were leveraged to recover partial nonce leakage, and these lattice attacks were performed on a wide variety of targets. With enough signatures, they were able to successfully attack targets even when only the size of the nonce was leaked!

Even worse, a few weeks back, the LadderLeak attack further improved on Fourier analysis attacks, and now ECDSA secret keys can be recovered if only 1 bit of the nonce is leaked! In fact, the single bit can be leaked with probability less than 1, so attackers technically need less than 1 bit. This was leveraged to attack a very small leakage in Montgomery ladders in several OpenSSL versions.

Again, let’s digest this. Even when only a few bits of the nonce are leaked—or further, even if only the size of the nonce is leaked—or further, if one bit of nonce is leaked—then, most of the time, the entire signature scheme can be broken by observing enough signatures. This is incredibly fragile!

On top of this, even if you manage to keep all of your nonces secret and never repeat a nonce, and you never leak any bits of your nonce to an attacker, you still aren’t fully protected! Work by Breitner and Heninger showed that a slightly faulty random number generator (RNG) can also catastrophically break your scheme by leveraging lattice attacks. Specifically, when using 256-bit ECDSA, if your RNG introduces a bias of just 4 bits in your nonce, your signature scheme can be broken completely by a lattice attack, even if we don’t know what those biased values are.

These attacks involve some complicated math. Like most cryptographic attacks, they formulate a series of ECDSA signatures as another hard math problem. In this case, the problem is known as the Hidden Number Problem. The Hidden Number Problem has been fairly widely studied by other researchers, so there are a lot of techniques and algorithms for solving it. This means that once we figure out how to mold a series of ECDSA signatures into an instance of the Hidden Number Problem, we can then apply existing techniques to find an ECDSA secret key.

Breaking ECDSA from bad nonces

Now, Fourier analysis, Hidden Number Problems, and lattice attacks are more complicated than your everyday cryptography, and they seem daunting. However, the fact that these attacks involve complicated math may fool some people into thinking they’re very difficult to implement in practice. This is not the case. As I mentioned in the beginning, I will teach you how to implement these attacks using fewer than 100 lines of Python code. Moreover, to perform this attack, you actually don’t need to know anything about the Hidden Number Problem or lattices. The only lattice component we need is access to the LLL algorithm. However, we can treat this algorithm as a black box; we don’t need to understand how it works or what it is doing.

We’ll be attacking signatures produced from bad nonces (i.e., bad RNG). Specifically, these nonces will have a fixed prefix, meaning their most significant bits are always the same. (The attack still works even if the fixed bits aren’t the most significant bits, but this is the easiest to follow). When using LLL, all we have to know is that we will input a matrix of values, and the algorithm will output a matrix of new values. If we use a series of ECDSA signatures to construct a matrix in a particular way, LLL will output a matrix that will allow us to recover the ECDSA private key. More specifically, because of the way we construct this matrix, one of the rows of the output of LLL will contain all of the signatures’ nonces. (It requires more complicated math to understand why, so we won’t discuss it here, but if you’re curious, see section 4 of this paper). Once we recover the nonces, we can use the basic attack described above to recover the secret key.

To perform the attack we’ll need access to an ECDSA and an LLL library in python. I chose this ECDSA library, which allows us to input our own nonces (so we can input nonces from bad RNGs to test our attack), and this LLL library. We’ll perform this attack on the NIST P-256 elliptic curve, beginning with the easiest form of the attack: We are given two signatures generated from only 128-bit nonces. First, we generate our signatures.

import ecdsa
import random

gen = ecdsa.NIST256p.generator
order = gen.order()
secret = random.randrange(1,order)

pub_key = ecdsa.ecdsa.Public_key(gen, gen * secret)
priv_key = ecdsa.ecdsa.Private_key(pub_key, secret)

nonce1 = random.randrange(1, 2**127)
nonce2 = random.randrange(1, 2**127)

msg1 = random.randrange(1, order)
msg2 = random.randrange(1, order)

sig1 = priv_key.sign(msg1, nonce1)
sig2 = priv_key.sign(msg2, nonce2)

Now that we have our signatures, we need to craft the matrix we’ll input into the LLL algorithm:

Matrix that we will input into the LLL algorithm

Here N is the order of NIST P-256 (ord in code snippet above), B is the upper bound on the size of our nonces (which will be 2128 in this example, because both nonces are only 128 bits in size); m1 and m2 are the two random messages; and (r1, s1) and (r2,s2) are the two signature pairs. In our python code, our matrix will look like this (here modular_inv is a function for computing the inverse mod N):

r1 = sig1.r
s1_inv = modular_inv(sig1.s, order)
r2 = sig2.r
s2_inv = modular_inv(sig2.s, order)

matrix = [[order, 0, 0, 0], [0, order, 0, 0],
[r1*s1_inv, r2*s2_inv, (2**128) / order, 0],
[msg1*s1_inv, msg2*s2_inv, 0, 2**128]]

Now we’ll input this matrix into the black-box LLL algorithm, which will return a new matrix to us. For reasons that don’t matter here, one of the rows of this returned matrix will contain the nonces used to generate the two signatures. If we knew more about what the algorithm is actually doing, we could probably predict where the nonce is going to be. But since we don’t care about the details, we are just going to check every row in the returned matrix to see if we can find the secret key. Remember, we already showed how to recover the private key once we have the nonce, k. Specifically, we compute r-1(ksH(m)). An attacker in the real world would have access to the public key corresponding to these signatures. Therefore, to determine if we have found the correct private key, we will compute its corresponding public key and compare it against the known public key. The attack will look like this:

import olll

new_matrix = olll.reduction(matrix, 0.75)
r1_inv = modular_inv(sig1.r, order)
s1 = sig1.s

for row in new_matrix:
    potential_nonce_1 = row[0]
    potential_priv_key = r1_inv * ((potential_nonce_1 * s1) - msg1)

    # check if we found private key by comparing its public key with actual public key
    if ecdsa.ecdsa.Public_key(gen, gen * potential_priv_key) == pub_key:
        print(&quot;found private key!&quot;)

I should mention that there is a noticeable failure rate for this basic attack. If you run the code presented to you, you will notice this as well. But again, for the purposes of this post, don’t worry about these specifics. Also, this failure rate should decrease if you perform this same attack with more signatures.

Hopefully at this point I’ve shown why these attacks aren’t so complicated. We were able to recover the secret key from just two signatures, and we didn’t do anything overly complicated. That said, some of you would probably argue that being able to attack signatures with only 128-bit nonces isn’t that interesting. So let’s move on to more realistic attacks.

Exploiting real-world ECDSA bugs

You may have heard of a recent bug in the randomness generated in Yubikeys. Essentially, bad randomness caused as many as 80 bits of the nonce to be fixed to the same value. Attacking this real-world bug will not be much more difficult than the attack we just performed above, except we don’t know what the fixed 80-bit values are (in the previous example, we knew the fixed 128 bits were all set to 0). To overcome this, we need to add a trick to our attack.

Imagine we receive a collection of signatures whose nonces have 80 fixed bits. For ease of explanation, we will assume these 80 bits are the most significant bits (the attack is still feasible if this is not the case; you simply shift the fixed bits to the most significant bits by multiplying each signature by a power of 2). Even though we don’t know what these 80 bits are, we know that if we subtract any two nonces, the 80 most significant bits of their difference will all be zeros. Therefore, we are going to perform the same attack as above, except with our signature values subtracted. Specifically, given a set of n signatures and messages, we will build the following matrix:

Matrix that we will input into the LLL algorithm when the nonce bias is unknown

This time, we will again input this matrix into LLL and receive a new matrix back. However, since we subtracted the nth value from every entry in this matrix, instead of receiving a row full of nonces, we will actually receive a row with the difference between each nonce and the nth nonce. In other words, the matrix returned from LLL will give us the value k1 – kn, the difference between the nonces for signatures 1 and n. It takes some algebraic manipulation, but we can still recover the secret key from this value using the following formula:

s1 = k1-1(m1 + xr1) and sn = kn-1(mn + xrn)

s1k1 = m1 + xr1 and snkn = mn + xrn

k1 = s1-1(m1 + xr1) and kn = sn-1(mn + xrn)

k1kn = s1-1(m1 + xr1) – sn-1(mn + xrn)

s1sn(k1kn) = sn(m1 + xr1) – s1(mn + xrn)

s1sn(k1kn) = xsnr1 – xs1rn + snm1 – s1mn

x(s1rn – snr1) = snm1 – s1mn s1sn(k1kn)

Secret key = x = (rns1 – r1sn)-1 (snm1 – s1mn – s1sn(k1 – kn))

With all of that context, let’s exploit the Yubikey bug. If signatures are produced from nonces with 80 fixed bits, we only need five signatures to recover the secret key. We will build the matrix above with n = 6 to reduce the error rate:

# generate 80 most significant bits, nonce must be less than order
yubikey_fixed_prefix = random.randrange(2**176, order)

msgs = [random.randrange(1, order) for i in range(6)]
nonces = [random.randrange(1, 2**176) + yubikey_fixed_prefix for i in range(6)]
sigs = [priv_key.sign(msgs[i],nonces[i]) for i in range(6)]

matrix = [[order, 0, 0, 0, 0, 0, 0],
[0, order, 0, 0, 0, 0, 0],
[0, 0, order, 0, 0, 0, 0],
[0, 0, 0, order, 0, 0, 0],
[0, 0, 0, 0, order, 0, 0]]

row, row2 = [], []
[msgn, rn, sn] = [msgs[-1], sigs[-1].r, sigs[-1].s]
rnsn_inv = rn * modular_inv(sn, order)
mnsn_inv = msgn * modular_inv(sn, order)

# 2nd to last row: [r1(s1^-1) - rn(sn^-1), ... , rn-1(sn-1^-1) - rn(sn^-1), 2^176/order, 0 ]
# last row: [m1(s1^-1) - mn(sn^-1), ... , mn-1(sn-1^-1) - mn(sn^-1), 0, 2^176]
for i in range(5):
    row.append((sigs[i].r * modular_inv(sigs[i].s, order)) - rnsn_inv)
    row2.append((msgs[i] * modular_inv(sigs[i].s, order)) - mnsn_inv)

# add last elements of last two rows, B = 2**(256-80) for yubikey
row.append((2**176) / order)
row.append(0)
row2.append(0)
row2.append(2**176)

matrix.append(row)
matrix.append(row2)

new_matrix = olll.reduction(matrix, 0.75)

for row in new_matrix:
    potential_nonce_diff = row[0]

    # Secret key = (rns1 - r1sn)-1 (snm1 - s1mn - s1sn(k1 - kn))
    potential_priv_key = (sn * msgs[0]) - (sigs[0].s * msgn) - (sigs[0].s * sn * potential_nonce_diff)
    potential_priv_key *= modular_inv((rn * sigs[0].s) - (sigs[0].r * sn), order)

    # check if we found private key by comparing its public key with actual public key
    if ecdsa.ecdsa.Public_key(gen, gen * potential_priv_key) == pub_key:
        print(&quot;found private key!&quot;)

That’s it! We just exploited a real-world bug in about 50 lines of python.

Some might further argue that although this was an actual bug, systems producing 80 fixed bits are rare. However, this attack can be much more powerful than shown in this one example! For 256-bit elliptic curves, this attack will work even if only 4 bits of the nonce are fixed. Moreover, the attack does not become more complicated to implement. You simply need to increase the dimension of your lattice—i.e., in the matrix figure above, just increase the value of n and repeat the attack—nothing else! This will increase the running time of your attack, but not the complexity to implement. You could copy that code snippet and recover ECDSA secret keys generated from nonces with as little as 4 bits of bias. On top of that, the attack against nonce leakage is a similar level of difficulty.

Hopefully, I’ve now convinced you of the fragility of ECDSA and how easily it can be broken in practice when things go wrong.

By the way, some of you may be wondering how we determine the value n. Remember, n is the number of signatures we need to recover the secret key. When the nonce had the first 128 bits fixed to 0, this value was 2 (this value is 3 when 128 bits are fixed, but we don’t know to what value they are fixed). When the nonce had 80 randomly fixed bits, this value was 5. If you consult the relevant publications around these attacks, you can find the exact formula and derivation of this value for a given number of fixed bits. For simplicity, I derived these values empirically by attempting this attack with different numbers of signatures on different amounts of fixed bits. I’ve compiled the results into the figure below:

The number of signatures required to use this attack for a given number of fixed nonce bits (derived empirically)

Protecting your ECDSA signatures

If ECDSA is so fragile, how can users protect themselves? Ideally, we recommend that you use EdDSA instead of ECDSA, which handles nonce generation much more safely by eliminating the use of RNGs. Further, Ed25519, which is EdDSA over Curve25519, is designed to overcome the side-channel attacks that have targeted ECDSA, and it is currently being standardized by NIST.

If you’re required to use ECDSA, proceed with caution and handle with care! ECDSA is fragile, but it is not broken. As we saw, it is imperative that nonces used for ECDSA signatures are never repeated, never revealed (even partially), and generated safely.

To protect yourself from nonce leakage, the mitigation strategy is to write the implementation to operate in “constant time.” However, guaranteeing this can be very difficult, as we saw with OpenSSL. For instance, code can appear to be constant time, but then an optimizing compiler can introduce non-constant time behavior. Further, some assembly instructions are constant time in some architectures or processor models, but not in others. (Read more about this here).

Another technique for mitigating nonce leakage is known as blinding, where random numbers are included in your arithmetic to randomize timing information. However, evaluating the security of your blinding implementation can be tricky, and slightly weak blinding schemes can be problematic.

With both of these mitigations, keep in mind that the amount of nonce leakage is on the order of a single bit, so even the slightest changes by an optimizing compiler or the slightest leakage from your blinding technique can be catastrophic to your signature scheme.

To ensure that nonces are generated safely, most people recommend using RFC 6979, which specifies a way to securely generate nonces deterministically (i.e., without an RNG), using the message and secret key as entropy. This protocol to generate nonces eliminates the problem of bad RNGs, which can be problematic for devices such as Yubikeys where generating randomness securely is difficult. The signature scheme EdDSA actually uses a similar nonce generation method by default to avoid bad RNGs.

If you are using ECDSA in your system, I encourage you to consider all of those recommendations. Hopefully, with enough care, your signature scheme won’t end up like this:

This is what happens to ECDSA when you don’t generate your nonces safely

We’re always experimenting and developing tools to help you work faster and smarter. Need help with your next project? Contact us!

How to check if a mutex is locked in Go

TL;DR: Can we check if a mutex is locked in Go? Yes, but not with a mutex API. Here’s a solution for use in debug builds.

Although you can Lock() or Unlock() a mutex, you can’t check whether it’s locked. While it is a reasonable omission (e.g., due to possible race conditions; see also Why can’t I check whether a mutex is locked?), having such functionality can still be useful for testing whether the software does what it is supposed to do.

In other words, it would be nice to have an AssertMutexLocked function solely for debug builds, which could be used like this:

// this method should always be called with o.lock locked
func (Object* o) someMethodImpl() {
    AssertMutexLocked(&amp;o.lock)
    // (...)
}

Having such a function would allow us to confirm the assumption that a given mutex is locked and find potential bugs when it’s added into an existing codebase. In fact, there was a GitHub issue about adding this exact functionality in the official Go repository (golang/go#1366), but it was closed with a WontFix status.

I also learned via the great grep.app project that many projects have similar preconditions about mutexes, such as google/gvisor, ghettovoice/gossip, vitessio/vitess, and others.

Now let’s implement the MutexLocked (and other) functions.

Checking if a mutex is locked

To check whether a mutex is locked, we have to read its state. The sync.Mutex structure contains two fields:

type Mutex struct {
	state int32
	sema  uint32
}

The state field’s bits correspond to the following flags (source):

const (
	mutexLocked = 1 << iota // mutex is locked
	mutexWoken
	mutexStarving
	mutexWaiterShift = iota
	// (...)

So if a mutex is locked, its state field has the mutexLocked (1) bit set. However, we can’t just access the state field directly from a Go program, because this field is not exported (its name does not start with a capital letter). Luckily, the field can still be accessed with Go reflection, which I used in the code below when I implemented the functions that allow us to check if a given sync.Mutex or sync.RWMutex is locked.

package main

import (
	"fmt"
	"reflect"
	"sync"
)

const mutexLocked = 1

func MutexLocked(m *sync.Mutex) bool {
	state := reflect.ValueOf(m).Elem().FieldByName("state")
	return state.Int()&amp;mutexLocked == mutexLocked
}

func RWMutexWriteLocked(rw *sync.RWMutex) bool {
	// RWMutex has a "w" sync.Mutex field for write lock
	state := reflect.ValueOf(rw).Elem().FieldByName("w").FieldByName("state")
	return state.Int()&amp;mutexLocked == mutexLocked
}

func RWMutexReadLocked(rw *sync.RWMutex) bool {
	return reflect.ValueOf(rw).Elem().FieldByName("readerCount").Int() &gt; 0
}

func main() {
	m := sync.Mutex{}
	fmt.Println("m locked =", MutexLocked(&amp;m))
	m.Lock()
	fmt.Println("m locked =", MutexLocked(&amp;m))
	m.Unlock()
	fmt.Println("m locked =", MutexLocked(&amp;m))

	rw := sync.RWMutex{}
	fmt.Println("rw write locked =", RWMutexWriteLocked(&amp;rw), " read locked =", RWMutexReadLocked(&amp;rw))
	rw.Lock()
	fmt.Println("rw write locked =", RWMutexWriteLocked(&amp;rw), " read locked =", RWMutexReadLocked(&amp;rw))
	rw.Unlock()
	fmt.Println("rw write locked =", RWMutexWriteLocked(&amp;rw), " read locked =", RWMutexReadLocked(&amp;rw))
	rw.RLock()
	fmt.Println("rw write locked =", RWMutexWriteLocked(&amp;rw), " read locked =", RWMutexReadLocked(&amp;rw))
	rw.RLock()
	fmt.Println("rw write locked =", RWMutexWriteLocked(&amp;rw), " read locked =", RWMutexReadLocked(&amp;rw))
	rw.RUnlock()
	fmt.Println("rw write locked =", RWMutexWriteLocked(&amp;rw), " read locked =", RWMutexReadLocked(&amp;rw))
	rw.RUnlock()
	fmt.Println("rw write locked =", RWMutexWriteLocked(&amp;rw), " read locked =", RWMutexReadLocked(&amp;rw))
}

We can see this program’s output below:

m locked = false
m locked = true
m locked = false
rw write locked = false  read locked = false
rw write locked = true  read locked = false
rw write locked = false  read locked = false
rw write locked = false  read locked = true
rw write locked = false  read locked = true
rw write locked = false  read locked = true
rw write locked = false  read locked = false

And this can later be used to create AssertMutexLocked and other functions. To that end, I’ve created a small library with these functions at trailofbits/go-mutexasserts—which enables the assertion checks only in builds with a debug tag.

Note: Although there are other tools for detecting race conditions in Go, such as Go’s race detector or OnEdge from Trail of Bits, these tools will detect problematic situations only once they occur, and won’t allow you to assert whether the mutex precondition holds.

We’re always developing tools to help you work faster and smarter. Need help with your next project? Contact us!

Breaking the Solidity Compiler with a Fuzzer

Over the last few months, we’ve been fuzzing solc, the standard Solidity smart contract compiler, and we’ve racked up almost 20 (now mostly fixed) new bugs. A few of these are duplicates of existing bugs with slightly different symptoms or triggers, but the vast majority are previously unreported bugs in the compiler.

This has been a very successful fuzzing campaign and, to our knowledge, one of the most successful ever launched against solc. This isn’t the first time solc has been fuzzed with AFL; fuzzing solc via AFL is a long-standing practice. The compiler has even been tested on OSSFuzz since January of 2019. How did we manage to find so many previously undiscovered bugs–and bugs worth fixing fairly quickly, in most cases? Here are five important elements of our campaign.

1. Have a secret sauce

Fortunately, it’s not necessary that the novelty actually be kept secret, just that it be genuinely new and somewhat tasty! Essentially, we used AFL in this fuzzing campaign, but not just any off-the-shelf AFL. Instead, we used a new variant of AFL expressly designed to help developers fuzz language tools for C-like languages without a lot of extra effort.

The changes from standard AFL aren’t particularly large; this fuzzer just adds a number of new AFL havoc mutations that look like those used by a naive, text-based source code mutation testing tool (i.e., universalmutator). The new approach requires less than 500 lines of code to implement, most of it very simple and repetitive.

This variation of AFL is part of a joint research project with Rijnard van Tonder at Sourcegraph, Claire Le Goues at CMU, and John Regehr at the University of Utah. In our preliminary experiments comparing the method to plain old AFL, the results look good for solc and the Tiny C Compiler, tcc. As science, the approach needs further development and validation; we’re working on that. In practice, however, this new approach has almost certainly helped us find many new bugs in solc.

We found a few of the early bugs reported using plain old AFL in experimental comparisons, and some of the bugs we found easily with our new approach we also eventually duplicated using AFL without the new approach—but the majority of the bugs have not been replicated in “normal” AFL. The graph below shows the number of issues we submitted on GitHub, and underscores the significance of the AFL changes:

The big jump in bug discovery in late February came immediately after we added a few smarter mutation operations to our version of AFL. It could be coincidence, but we doubt it; we manually inspected the files generated and saw a qualitative change in the AFL fuzzing queue contents. Additionally, the proportion of files AFL generated that were actually compilable Solidity jumped by more than 10%.

2. Build on the work of others

Fuzzing a system that has never been fuzzed can certainly be effective; the system’s “resistance” to the kinds of inputs fuzzers generate is likely to be extremely low. However, there can also be advantages to fuzzing a system that has been fuzzed before. As we noted, we aren’t the first to fuzz solc with AFL. Nor were previous efforts totally freelance ad-hoc work; the compiler team was involved in fuzzing solc, and had built tools we could use to make our job easier.

The Solidity build includes an executable called solfuzzer that takes a Solidity source file as input and compiles it using a wide variety of options (with and without optimization, etc.) looking for various invariant violations and kinds of crashes. Several of the bugs we found don’t exhibit with the normal solc executable unless you use specific command-line options (especially optimization) or run solc in certain other, rather unusual, ways; solfuzzer found all of these. We also learned from the experience of others that a good starting corpus for AFL fuzzing is in the test/libsolidity/syntaxTests directory tree. This was what other people were using, and it definitely covers a lot of the “what you might see in a Solidity source file” ground.

Of course, even with such existing work, you need to know what you’re doing, or at least how to look it up on Google. Nothing out there will tell you that simply compiling solc with AFL won’t actually produce good fuzzing. First, you need to notice that that the fuzzing results in a very high map density, which measures the degree to which you’ve “filled” AFL’s coverage hash. Then you either need to know the advice given in the AFL User Guide, or search for the term “afl map density” and see that you need to recompile the whole system with AFL_INST_RATIO set to 10 to make it easier for the fuzzer to identify new paths. This only happens, according to the AFL docs, when “you’re fuzzing extremely hairy software.” So if you’re used to fuzzing compilers, you probably have seen this before, but otherwise you probably haven’t run into map density problems.

3. Play with the corpus

You may notice that the last spike in submitted bugs comes long after the last commit made to our AFL-compiler-fuzzer repository. Did we make local changes that aren’t yet visible? No, we just changed the corpus we used for fuzzing. In particular, we looked beyond the syntax tests, and added all the Solidity source files we could find under test/libsolidity. The most important thing this accomplished was allowing us to find SMT checker bugs, because it brought in files that used the SMTChecker pragma. Without a corpus example using that pragma, AFL has essentially no chance of exploring SMT Checker behaviors.

The other late-bloom bugs we found (when it seemed impossible to find any new bugs) mostly came from building a “master” corpus including every interesting path produced by every fuzzer run we’d performed up to that point, and then letting the fuzzer explore it for over a month.

4. Be patient

Yes, we said over a month (on two cores). We ran over a billion compilations in order to hit some of the more obscure bugs we found. These bugs were very deep in the derivation tree from the original corpus. Bugs we found in the Vyper compiler similarly required some very long runs to discover. Of course, if your fuzzing effort involves more than just playing around with a new technique, you may want to throw machines (and thus money) at the problem. But according to an important new paper, you may need to throw exponentially more machines at the problem if that’s your only approach.

Moreover, for feedback-based fuzzers, just using more machines may not produce some of the obscure bugs that require a long time to find; there’s not always a shortcut to a bug that requires a mutation of a mutation of a mutation of a mutation…of an original corpus path. Firing off a million “clusterfuzz” instances will produce lots of breadth, but it doesn’t necessarily achieve depth, even if those instances periodically share their novel paths with each other.

5. Do the obvious, necessary things

There’s nothing secret about reducing your bug-triggering source files before submitting them, or trying to follow the actual issue submission guidelines of the project you’re reporting bugs to. And, of course, even if it’s not mentioned in those guidelines, performing a quick search to avoid submitting duplicates is standard. We did those things. They didn’t add much to our bug count, but they certainly sped up the process of recognizing the issues submitted as real bugs and fixing them.

Interestingly, not much reduction was usually required. For the most part, just removing 5-10 lines of code (less than half the file) produced a “good-enough” input. This is partly due to the corpus, and (we think) partly due to our custom mutations tending to keep inputs small, even beyond AFL’s built-in heuristics along those lines.

What did we find?

Some bugs were very simple problems. For instance, this contract used to cause the compiler to bomb out with the message “Unknown exception during compilation: std::bad_cast”:

contract C {
    function f() public returns (uint, uint) {
        try this() {
        } catch Error(string memory) {
        }
    }
}

The issue was easily fixed by changing a typeError into a fatalTypeError, which prevents the compiler from continuing in a bad state. The commit fixing that was only one line of code (though quite a few lines of new tests).

On the other hand, this issue, which prompted a bug bounty award and made it into a list of important bug fixes for the 0.6.8 compiler release, could produce incorrect code for some string literals. It also required substantially more code to handle the needed quoting.

Even the un-reduced versions of our bug-triggering Solidity files look like Solidity source code. This is probably because our mutations, which are heavily favored by AFL, tend to “preserve source-code-y-ness.” Much of what seems to be happening is a mix of small changes that don’t make files too nonsensical plus combination (AFL splicing) of corpus examples that haven’t drifted too far from normal Solidity code. AFL on its own tends to reduce source code to uncompilable garbage that, even if merged with interesting code, won’t make it past initial compiler stages. But with more focused mutations, splicing can often get the job done, as in this input that triggers a bug that’s still open (as we write):

contract C {
    function o (int256) public returns (int256) {
    assembly {
        c:=shl(1,a)
    }
    }

    int constant c=2 szabo+1 seconds+3 finney*3 hours;
}

The triggering input combines assembly and a constant, but there are no files in the corpus we used that contain both and look much like this. The closest is:

contract C {
  bool constant c = this;
  function f() public {
    assembly {
        let t := c
    }
  }
}

Meanwhile, the closest file containing both assembly and a shl is:

contract C {
    function f(uint x) public returns (uint y) {
        assembly { y := shl(2, x) }
    }

Combining contracts like this is not trivial; no instance much like the particular shl expression in the bug-exposing contract even appears anywhere in the corpus. Trying to modify a constant in assembly isn’t too likely to show up in legitimate code. And we imagine manually producing such strange but important inputs is extremely non-trivial. In this case, as happens so often with fuzzing, if you can think of a contract at all like the one triggering the bug, you or someone else probably could have written the right code in the first place.

Conclusion

It’s harder to find important bugs in already-fuzzed high-visibility software than in never-fuzzed software. However, with some novelty in your approach, smart bootstrapping based on previous fuzzing campaigns (especially for oracles, infrastructure, and corpus content), plus experience and expertise, it is possible to find many never-discovered bugs in complex software systems, even if they are hosted on OSSFuzz. In the end, even our most aggressive fuzzing only scratches the surface of truly complex software like a modern production compiler—so cunning, in addition to brute force, is required.

We’re always developing tools to help you work faster and smarter. Need help with your next project? Contact us!

Detecting Bad OpenSSL Usage

by William Wang, UCLA

OpenSSL is one of the most popular cryptographic libraries out there; even if you aren’t using C/C++, chances are your programming language’s biggest libraries use OpenSSL bindings as well. It’s also notoriously easy to mess up due to the design of its low-level API. Yet many of these mistakes fall into easily identifiable patterns, which raises the possibility of automated detection.

As part of my internship this past winter and spring, I’ve been prototyping a tool called Anselm, which lets developers describe and search for patterns of bad behavior. Anselm is an LLVM pass, meaning that it operates on an intermediate representation of code between source and compilation. Anselm’s primary advantage over static analysis is that it can operate on any programming language that compiles to LLVM bitcode, or any closed-source machine code that can be converted backwards. Anselm can target any arbitrary sequence of function calls, but its original purpose was to inspect OpenSSL usage so let’s start there.

OpenSSL

The design of OpenSSL makes it difficult to understand and work with for beginners. It has a variety of inconsistent naming conventions across its library and offers several, arguably too many, options and modes for each primitive. For example, due to the evolution of the library there exist both high level (EVP) and low level methods that can be used to accomplish the same task (e.g. DSA signatures or EC signing operations). To make this worse, their documentation can be inconsistent and difficult to read.

In addition to being difficult to use, other design choices make the library dangerous to use. The API inconsistently returns error codes, pointers (with and without ownership), and demonstrates other surprising behavior. Without rigorously checking error codes or defending against null pointers, unexpected program behavior and process termination can occur.

So what types of errors can Anselm detect? It depends on what the developer specifies, but that could be anything from mismanaging the OpenSSL error queue to reusing initialization vectors. It’s also important to remember that these are heuristics, and misidentifying both good and bad behavior is always possible. Now, let’s get into how the tool works.

Function Calls

While the primary motivation of this project was to target OpenSSL, the library itself doesn’t actually matter. One can view OpenSSL usage as a sequence of API calls, such as EVP_EncryptUpdate and EVP_EncryptFinal_ex. But we could easily replace those names with anything else, and the idea remains the same. Hence bad behavior is a pattern of any function calls (not just OpenSSL’s) which we would like to detect.

My main approach was to search through all possible paths of execution in a function, looking for bad sequences of API calls. Throughout this post, I’ll be using OpenSSL’s symmetric encryption functions in my examples. Let’s consider EVP_EncryptUpdate, which encrypts a block of data, and EVP_EncryptFinal_ex, which pads the plaintext before a final encryption. Naturally, they should not be called out of order:

EVP_EncryptFinal_ex(ctx, ciphertext + len, &amp;amp;len);
...
EVP_EncryptUpdate(ctx, ciphertext, &amp;amp;len, plaintext, plaintext_len);

This should also be flagged, since the bad sequence remains a possibility:

EVP_EncryptFinal_ex(ctx, ciphertext + len, &amp;amp;len);
...
if (condition) {
  EVP_EncryptUpdate(ctx, ciphertext, &amp;amp;len, plaintext, plaintext_len);
}

I worked with LLVM BasicBlocks, which represent a list of instructions that always execute together. BasicBlocks can have multiple successors, each reflecting different paths of execution. A function, then, is a directed graph of many BasicBlocks. There is a single root node, and any leaf node represents an end of execution.

Finding all possible executions amounts to performing a depth-first search (DFS) starting from the root node. However, notice that the graph can contain cycles; this is analogous to a loop in code. If we performed a blind DFS, we could get stuck in an infinite loop. On the other hand, ignoring previously visited nodes can lead to missed behavior. I settled this by limiting the length of any path, after which Anselm stops exploring further (this can be customized).

One issue remains, which is that performing DFS over an entire codebase can be very time-consuming. Even if our exact pattern is simple, it still needs to be matched against all possible paths generated by the search. To solve this, I first prune the graph of any BasicBlock that does not contain any relevant API calls. This is done by pointing any irrelevant node’s predecessors to each of its successors, removing the middleman.

In practice, this dramatically reduces the complexity of a graph for faster path-finding: entire if statements and while loops can be eliminated without any consequences! It also makes any path limit much more reasonable.

Matching Values

Although solely examining function calls is a good start, we can do better. Consider OpenSSL contexts, which are created by EVP_CIPHER_CTX_new and must be initialized with algorithm, key, etc. before actual use. In the following situation, we want every context to be initialized by EVP_EncryptInit_ex:

EVP_CIPHER_CTX *ctx1 = EVP_CIPHER_CTX_new();
EVP_CIPHER_CTX *ctx2 = EVP_CIPHER_CTX_new();
EVP_EncryptInit_ex(ctx1, EVP_aes_256_cbc(), NULL, key, iv);

EVP_EncryptInit_ex always follows EVP_CIPHER_CTX_new, yet ctx2 is obviously not initialized properly. A more precise pattern would be, “Every context returned from EVP_CIPHER_CTX_new should later be initialized in EVP_CIPHER_CTX_new.”

I addressed this by matching arguments and return values — checking whether they pointed to the same LLVM Value object in memory. Contexts are a prime situation to match values, but we can use the same technique to detect repeated IVs:

EVP_EncryptInit_ex(ctx1, EVP_aes_256_cbc(), NULL, key1, iv);
EVP_EncryptInit_ex(ctx2, EVP_aes_256_cbc(), NULL, key2, iv);

Internally, Anselm uses regex capture groups to perform this analysis; every execution path is a string of function calls and Value pointers, while a bad behavior is defined by some regex pattern.

Pattern Format

Towards the end of my internship, I also defined a format for developers to specify bad behaviors, which Anselm translates into a (somewhat messy) regex pattern. Every line begins with a function call, followed by its return value and arguments. If you don’t care about a value, use an underscore. Otherwise, define a token which you can use elsewhere. Hence a rule banning repeat IVs would look like this:

EVP_EncryptInit_ex _ _ _ _ _ iv
EVP_EncryptInit_ex _ _ _ _ _ iv

Since the iv token is reused, Anselm constrains its search to only match functions which contain the same Value pointer at that argument position.

I also defined a syntax to perform negative lookaheads, which tells Anselm to look for the absence of specific function calls. For example, if I wanted to prevent any context from being used before initialized, I would prepend an exclamation mark like so:

EVP_CIPHER_CTX_new ctx
! EVP_EncryptInit_ex _ ctx _ _ _ _
EVP_EncryptUpdate _ ctx _ _ _ _

In English, this pattern identifies any calls to EVP_CIPHER_CTX_new and EVP_EncryptUpdate that do not have EVP_EncryptInit_ex sandwiched in between.

Final Notes

With its current set of tools, Anselm is capable of interpreting a wide range of function call patterns and searching for them in LLVM bitcode. Of course, it’s still a prototype and there are improvements to be made, but the main ideas are there and I’m proud of how the project turned out. Thanks to Trail of Bits for supporting these types of internships — it was a lot of fun!

Verifying Windows binaries, without Windows

TL;DR: We’ve open-sourced a new library, μthenticode, for verifying Authenticode signatures on Windows PE binaries without a Windows machine. We’ve also integrated it into recent builds of Winchecksec, so that you can use it today to verify signatures on your Windows executables!

svcli

As a library, μthenticode aims to be a breeze to integrate: It’s written in cross-platform, modern C++ and avoids the complexity of the CryptoAPI interfaces it replaces (namely WinVerifyTrust and CertVerifyCertificateChainPolicy). You can use it now as a replacement for many of SignTool’s features, and more are on the way.

A quick Authenticode primer

Authenticode is Microsoft’s code signing technology, comparable in spirit (but not implementation) to Apple’s Gatekeeper.

At its core, Authenticode supplies (or can supply, as optional features) a number of properties for signed programs:

  • Authenticity: A program with a valid Authenticode signature contains a chain of certificates sufficient for validating that signature. Said chain is ultimately rooted in a certificate stored in the user’s Trusted Publishers store, preventing self-signed certificates without explicit opt-in by the user.
  • Integrity: Each Authenticode signature includes a cryptographic hash of the signed binary. This hash is compared against the binary’s in-memory representation at load time, preventing malicious modifications.
    • Authenticode can also embed cryptographic hashes for each page of memory. These are used with forced integrity signing, which is necessary for Windows kernel drivers and requires a special Microsoft cross-signed “Software Publisher Certificate” instead of a self-signed or independently trusted Certificate Authority (CA).
  • Timeliness: Authenticode supports countersignatures embedding from a Timestamping Authority (TSA), allowing the signature on a binary to potentially outlive the expiration dates of its signing certificates. Such countersignatures also prevent backdating of a valid signature, making it more difficult for an attacker to re-use an expired signing certificate.

Like all code signing technologies, there are things Authenticode can’t do or guarantee about a program:

  • That it has no bugs: Anybody can write buggy software and sign for it, either with a self-signed certificate or by purchasing one from a CA that’s been cross-signed by Microsoft.
  • That it runs no code other than itself: The Windows execution contract is notoriously lax (e.g., the DLL loading rules for desktop applications), and many applications support some form of code execution as a feature (scripts, plugins, sick WinAMP skins, etc). Authenticode has no ability to validate the integrity or intent of code executed outside of the initial signed binary.

Similarly, there are some things that Authenticode, like all PKI implementations, is susceptible to:

  • Misplaced trust: CAs want to sell as many certificates as possible, and thus have limited incentives to check the legitimacy of entities that purchase from them. Anybody can create a US LLC for a few hundred bucks.
  • Stolen certificates: Code-signing and HTTPS certificates are prime targets for theft; many real-world campaigns leverage stolen certificates to fool users into trusting malicious code. Companies regularly check their secret material into source control systems, and code signing certificates are no exception.
  • Fraudulent certificates: Flame infamously leveraged a novel chosen-prefix attack on MD5 to impersonate a Microsoft certificate that had been accidentally trusted for code signing. Similar attacks on SHA-1 are now tractable at prices reasonable for nation states and organized crime.

All told, Authenticode (and all other forms of code signing) add useful authenticity and integrity checks to binaries, provided that you trust the signer and their ability to store their key material.

With that said, let’s take a look at what makes Authenticode tick.

Parsing Authenticode signatures: spicy PKCS#7

In a somewhat unusual move for 2000s-era Microsoft, most of the Authenticode format is actually documented and available for download. A few parts are conspicuously under-defined or marked as “out of scope”; we’ll cover some of them below.

At its core, Authenticode has two components:

  • The certificate table, which contains one or more entries, each of which may be a SignedData.
  • SignedData objects, which are mostly normal PKCS#7 containers (marked with a content type of SignedData per RFC 2315).

The certificate table

The certificate table is the mechanism by which Authenticode signatures are embedded into PE files.

It has a few interesting properties:

  • Accessing the certificate table involves reading the certificate table directory in the data directory table. Unlike every other entry in the data directory table, the certificate directory’s RVA field is not a virtual address—it’s a direct file offset. This is a reflection of the behavior of the Windows loader, which doesn’t actually load the certificates into the address space of the program.
  • Despite this, real-world tooling appears to be inflexible about placement and subsequent parsing of the certificate table. Microsoft’s tooling consistently places the certificate table at the end of the PE (after all sections); many third-party tools naively seek to the certificate table offset and parse until EOF, allowing an attacker to trivially append additional certificates1.

Once located, actually parsing the certificate table is straightforward: It’s an 8 byte-aligned blob of WIN_CERTIFICATE structures:


win_certificate.png

…with some fields of interest:

  • wRevision: the “revision” of the WIN_CERTIFICATE.
    • MSDN only recently fixed the documentation for this field: WIN_CERT_REVISION_2_0=0x0200 is the current version for Authenticode signatures; WIN_CERT_REVISION_1_0=0x0100 is for “legacy” signatures. I haven’t been able to find the latter in the wild.
  • wCertificateType: the kind of encapsulated certificate data.
    • MSDN documents four possible values for wCertificateType, but we’re only interested in one: WIN_CERT_TYPE_PKCS_SIGNED_DATA.
  • bCertificate: the actual certificate data. For WIN_CERT_TYPE_PKCS_SIGNED_DATA, this is the (mostly) PKCS#7 SignedData mentioned above.

As you might have surmised, the structure of the certificate table allows for multiple independent Authenticode signatures. This is useful for deploying a program across multiple versions of Windows, particularly versions that might have legacy certificates in the Trusted Publishers store or that don’t trust a particular CA for whatever reason.

Authenticode’s SignedData

Microsoft helpfully2 supplies this visualization of their SignedData structure:

authenticode-signeddata

This is almost a normal PKCS#7 SignedData, with a few key deviations:

  • Instead of one of the RFC 2315 content types, the Authenticode SignedData’s contentInfo has a type of SPC_INDIRECT_DATA_OBJID, which Microsoft defines as 1.3.6.1.4.1.311.2.1.43.
  • The structure corresponding to this object identifier (OID) is documented as SpcIndirectDataContent. Microsoft conveniently provides its ASN.1 definition:
    spc_indirect(Observe that the custom AlgorithmIdentifier is actually just X.509’s AlgorithmIdentifier—see RFC 3279 and its updates).

    ⚠ The code below does no error handling or memory management; read the μthenticode source for the full version. ⚠

    Given the ASN.1 definitions above, we can use OpenSSL’s (hellish and completely undocumented) ASN.1 macros to parse Microsoft’s custom structures:


    spc_indirect_macros

Actually checking the signature

With our structures in place, we can use OpenSSL’s (mostly) undocumented PKCS#7 API to parse our SignedData and indirect data contents:

verify1

…and then validate them:

verify2

Voilà: the basics of Authenticode. Observe that we pass PKCS7_NOVERIFY, as we don’t necessarily have access to the entire certificate chain—only Windows users with the relevant cert in their Trusted Publishers store will have that.

Calculating and checking the Authenticode hash

Now that we have authenticity (modulo the root certificate), let’s do integrity.

First, let’s grab the hash embedded in the Authenticode signature, for eventual comparison:

authenticode_digest

Next, we need to compute the binary’s actual hash. This is a little involved, thanks to a few different fields:

  • Every PE has a 32-bit CheckSum field that’s used for basic integrity purposes (i.e., accidental corruption). This field needs to be skipped when calculating the hash, as it’s calculated over the entire file and would change with the addition of certificates.
  • The certificate data directory entry itself needs to be skipped, since relocating and/or modifying the size of the certificate table should not require any changes to pre-existing signatures.
  • The certificate table (and constituent signatures) itself, naturally, cannot be part of the input to the hash.
  • To ensure a consistent hash, Authenticode stipulates that sections are hashed in ascending order by the value of each section header’s PointerToRawData, not the order of the section headers themselves. This is not particularly troublesome, but requires some additional bookkeeping.

μthenticode’s implementation of the Authenticode hashing process is a little too long to duplicate below, but in pseudocode:

  1. Start with an empty buffer.
  2. Insert all PE headers (DOS, COFF, Optional, sections) into the buffer.
  3. Erase the certificate table directory entry and CheckSum field from the buffer, in that order (to avoid rescaling the former’s offset).
  4. Use pe-parse’s IterSec API to construct a list of section buffers. IterSec yields sections in file offset order as of #129.
  5. Skip past the certificate table and add trailing data to the buffer, if any exists.
  6. Create and initialize a new OpenSSL message digest context using the NID retrieved from the signature.
  7. Toss the buffer into EVP_DigestUpdate and finish with EVP_DigestFinal.
  8. Compare the result with the Authenticode-supplied hash.

Other bits and pieces

We haven’t discussed the two remaining major Authenticode features: page hashes and timestamp countersignatures.

Page hashes

As mentioned above, page hashes are conspicuously not documented in the Authenticode specification, and are described as stored in a “[…] binary structure [that] is outside the scope of this paper.”

Online information on said structure is limited to a few resources:

  • The VirtualBox source code references OIDs for two different versions of the page hashes structure:
    • SPC_PE_IMAGE_PAGE_HASHES_V1_OBJID: 1.3.6.1.4.1.311.2.3.1
    • SPC_PE_IMAGE_PAGE_HASHES_V2_OBJID: 1.3.6.1.4.1.311.2.3.2

    These OIDs are not listed in Microsoft’s OID reference or in the OID repository4, although they do appear in Wintrust.h.

  • At least one fork of osslsigncode has support for generating and validating page hashes, and grants us further insight:
    • The V1 OID represents SHA-1 page hashes; V2 represents SHA2-256.
    • The serializedData of each SpcSerializedObject is an ASN.1 SET, each member of which is an ASN.1 SEQUENCE, to the effect of:page_hash_asn1(The definitions above are my reconstruction from the body of get_page_hash_link; osslsigncode confusingly reuses the SpcAttributeTypeAndOptionalValue type for Impl_SpcPageHash and constructs the rest of the contents of SpcSerializedObject manually.)

As far as I can tell, osslsigncode only inserts one Impl_SpcPageHash for the entire PE, which it calculates in pe_calc_page_hash. The code in that function is pretty dense, but it seems to generate a table of structures as follows:

page_hash

…where IMPL_PAGE_HASH_SIZE is determined by the hash algorithm used (i.e., by Impl_SpcPageHash.type), and the very first entry in the table is a null-padded “page hash” for just the PE headers with page_offset=0. This table is not given an ASN.1 definition—it’s inserted directly into Impl_SpcPageHash.pageHashes.

Timestamp countersignatures

Unlike page hashes, Authenticode’s timestamp countersignature format is relatively well documented, both in official and third-party sources.

Just as the Authenticode SignedData is mostly a normal PKCS#7 SignedData, Authenticode’s timestamp format is mostly a normal PKCS#9 countersignature. Some noteworthy bits include:

  • When issuing a timestamp request (TSR) to a timestamp authority (TSA), the request takes the form of an HTTP 1.1 POST containing a DER-encoded, then base64-encoded ASN.1 message:timestamp_asn1…where countersignatureType is the custom Microsoft OID 1.3.6.1.4.1.311.3.2.1 (i.e., SPC_TIME_STAMP_REQUEST_OBJID) and content is the original Authenticode PKCS#7 ContentInfo.
  • The TSA response is a PKCS#7 SignedData, from which the SignerInfo is extracted and embedded into the main Authenticode SignedData. The certificates from the TSA response are similarly embedded into the certificate list as unauthenticated attributes.

Wrapup

We’ve covered all four major components of Authenticode above: verifying the signature, checking the integrity of the file against the verified hash, calculating page hashes, and verifying any timestamp countersignatures.

μthenticode itself is still a work in progress, and currently only has support for signatures and the main Authenticode hash. You can help us out by contributing support for page hash parsing and verification, as well as timestamp signature validation!

μthenticode’s’ APIs are fully documented and hosted, and most can be used immediately with a peparse::parsed_pe *:

uthenticode_example

Check out the svcli command-line tool for an applied example, including retrieving the embedded Authenticode hash.

Prior work and references

μthenticode was written completely from scratch and uses the official Authenticode document supplied by Microsoft as its primary reference. When that was found lacking, the following resources came in handy:

The following resources were not referenced, but were discovered while researching this post:

  • jsign: A Java implementation of Authenticode

Want the scoop on our open-source projects and other security innovations? Contact us or sign up for our newsletter!


  1. Adding additional certificates without this parsing error is still relatively trivial, but requires the attacker to modify more parts of the PE rather than just append to it.↩︎
  2. For some definitions of helpful.↩︎
  3. The OID tree for Authenticode shows lots of other interesting OIDs, most of which aren’t publicly documented.↩︎
  4. I’ve submitted them, pending approval by the repository’s administrator.↩︎

Emerging Talent: Winternship 2020 Highlights

The Trail of Bits Winternship is our winter internship program where we invite 10-15 students to join us over the winter break for a short project that has a meaningful impact on information security. They work remotely with a mentor to create or improve tools that solve a single impactful problem. These paid internships give student InfoSec engineers real industry experience, plus a publication for their resumé—and sometimes even lead to a job with us (congrats, Samuel Caccavale!).

Check out the project highlights from some of our 2020 Winternship interns below.

Aaron Yoo—Anvill Decompiler

UCLA

This winter break I added a tool to the Anvill decompiler that produces “JSON specifications” for LLVM bitcode functions. These bitcode-derived specifications tell Anvill about the physical location (register or memory) of important values such as function arguments and return values. Anvill depends on this location information to intelligently lift machine code into high-quality, easy-to-analyze LLVM bitcode.

A typical specification looks something like this:

{
    "arch": "amd64",
    "functions": [
        {
            "demangled_name": "test(long, long)",
            "name": "_Z4testll",
            "parameters": [
                {
                    "name": "param1",
                    "register": "RDI",
                    "type": "l"
                },
...

Overall, I had fun learning so much about ABIs and using the LLVM compiler infrastructure. I got to see my tool operate in some sample full “machine code to C” decompilations, and overcame tricky obstacles, such as how a single high-level parameter or return value can be split across many machine-level registers. My final assessment: Decompilers are pretty cool!

Paweł Płatek—DeepState and Python

AGH University of Science and Technology

The main goal of my winternship project was to find and fix bugs in the Python part of the DeepState source code. Most of the bugs I found were making it nearly impossible to build DeepState for fuzzing or to use the fuzzer’s executors. Now, the build process and executors work correctly, and have been tested and better documented. I’ve also identified and described places where more work is needed (in GitHub issues). Here are the details of my DeepState project:

Cmake—Corrected build options; added the option to build only examples; added checks for compilers and support for Honggfuzz and Angora; improved cmake for examples (so the examples are automatically found).

Docker—Split docker build into two parts—deepstate-base and deepstate—to make it easier to update the environment; added multi-stage docker builds (so fuzzers can be updated separately, without rebuilding everything from scratch); added –jobs support and Angora.

CI—Rewrote a job that builds and pushes docker images to use GitHub action that supports caching; added fuzzer tests.

Fuzzer executors (frontends)—Unified arguments and input/output directory handling, reviewed each fuzzer documentation, so executors set appropriate runtime flags, reimplemented pre- and post-execution methods, added methods for fuzzers’ executables discovery (based on FUZZERNAME_HOME environment variables and CLI flag), reimplemented logging, fixed compilation functions, rewritten methods related to fuzzer’s runtime statistics, reimplemented run method (so that the management routine that retrieves statistics & synchronize stuff is called from time to time and that exceptions are handled properly; also added cleanup function and possibility to restart process), examined each fuzzer directory structure and seed synchronization possibilities and, based on that, implemented fuzzers resuming and fixed ensembling methods.

Fuzzer testing—Created a basic test that checks whether executors can compile correctly and whether fuzzers can find a simple bug; created test for seed synchronization.

Documentation—Split documentation into several files, and added chapters on fuzzer executor usage and harness writing.

Philip Zhengyuan Wang—Manticore

University of Maryland

During my winternship, I helped improve Manticore’s versatility and functionality. Specifically, I combined it with Ansible (an automated provisioning framework) and Protobuf (a serialization library) to allow users to run Manticore in the cloud and better understand what occurs during a Manticore run.

As it stands, Manticore is very CPU-intensive; it competes for CPU time with other user processes when running locally. Running a job on a remotely provisioned VM in which more resources could be diverted to a Manticore run would make this much less of an issue.

To address this, I created “mcorepv” (short for Manticore Provisioner), a Python wrapper for Manticore’s CLI and Ansible/DigitalOcean that allows users to select a run destination (local machine/remote droplet) and supply a target Manticore Python script or executable along with all necessary runtime flags. If the user decides to run a job locally, mcorepv executes Manticore analysis in the user’s current working directory and logs the results.

Things get more interesting if the user decides to run a job remotely—in this case, mcorepv will call Ansible and execute a playbook to provision a new DigitalOcean droplet, copy the user’s current working directory to the droplet, and execute Manticore analysis on the target script/executable. While the analysis is running, mcorepv streams logs and Manticore’s stdout back in near real time via Ansible so a user may frequently check on the analysis’ progress.

Manticore should also simultaneously stream its list of internal states and their statuses (ready, busy, killed, terminated) to the user via a protobuf protocol over a socket in order to better describe the analysis’ status and resource consumption (this is currently a work in progress). To make this possible, I developed a protobuf protocol to represent Manticore’s internal state objects and allow for serialization, along with a terminal user interface (TUI). Once started on the droplet, Manticore spins up a TCP server that provides a real-time view of the internal state lists. The client can then run the TUI locally, which will connect to the Manticore server and display the state lists. Once the job has finished, the Manticore server is terminated, and the results of the Manticore run along with all logs are copied back to the user’s local machine where they may be further inspected.

There’s still some work to be done to ensure Manticore runs bug-free in the cloud. For example:

  • Port forwarding must be set up on the droplet and local machine to ensure Manticore’s server and client TUI can communicate over SSH.
  • The TUI needs additional optimization and improvement to ensure the user gets the right amount of information they need.
  • Mcorepv and its Ansible backend need to be more rigorously tested to ensure they work properly.

I’m glad that in my short time at Trail of Bits, I was able to help move Manticore one step closer to running anywhere, anytime.

Fig 1: Proof of concept—Manticore TUI displaying a list of state objects and log messages to the user.

Samuel Caccavale—Go

Northeastern University

During my winternship, I developed AST- and SSA-based scanners to find bugs in Go code that were previously overlooked by tools like GoSec and errcheck. One unsafe code pattern targeted was usage of a type assertion value before checking whether the type assertion was ok in value, ok := foo.(). While errcheck will detect type assertions that do not bind the ok value (and therefore cause a panic if the type assertion fails), it can’t exhaustively check whether the usage of value is within a context where ok is true. The example reproduces the most trivial example where errcheck and the SSA approach diverge; the SSA approach will correctly detect the usage of safe as safe, and the usage of unsafe as unsafe:

package main

import ("fmt")

func main() {
    var i interface{} = "foo"

    safe, ok := i.(string)
    if ok {
        fmt.Println(safe)
    }

    unsafe, ok := i.(string)
    fmt.Println(ok)
    if true {
        fmt.Println(unsafe)
    }
}

Taylor Pothast—Mishegos

Vanderbilt University

During my winternship, I worked on improving the performance of mishegos, Trail of Bits’ differential fuzzer for x86_64 decoders, by switching the cohort collection and output components from their original proof-of-concept JSON format to a compact binary format.

To do this, I learned about mishegos’ internal workings, its in-memory result representations, binary parsing, and how to write Kaitai Struct definitions. My work was merged on the final day of my internship, and is now the default output format for mishegos runs.

To make my work compatible with mishegos’s preexisting analysis tools, I also added a helper utility, mish2jsonl, for converting the new binary output into a JSON form mostly compatible with the original output format. Finally, I updated the analysis tools to handle the changes I made in the JSON-ified output format, including new symbolic fields for each fuzzed decoder’s state.

Thomas Quig—Crytic and Slither

University of Illinois Urbana-Champaign

While at Trail of Bits, I integrated Slither’s smart contract upgradeability checks into Crytic, Trail of Bits’ CI service for Ethereum security. Because smart contracts are immutable upon upload, users need to be able to upgrade their contracts if they ever need to be uploaded. To resolve this issue, the user can have the old contract, a new contract, and a proxy contract. The proxy contract contains what are essentially function pointers that can be modified to point towards the new contract. Risks arise if the aforementioned process is done incorrectly (e.g., incorrect global variable alignment). With Slither, the user can check to make sure those risks are mitigated.

The integration process was much more complicated than I thought it would be, but I was still able to successfully implement the checks. Learning the codebase and the multiple languages required to work with it simultaneously was quite difficult, but manageable. Crytic now grabs the list of all smart contracts and displays them within settings so they can be chosen. The user can pick which contract is the old contract, the new contract, and the proxy contract. Upgradeability checks are then run on those contracts, and the output is displayed to a new findings page in an easily analyzable JSON format.

I enjoyed my time at Trail of Bits. My mentors helped me learn the infrastructure while giving me opportunities to work independently. I gained significant experience in a short period of time and learned about many topics I didn’t expect to like.

William Wang—OpenSSL and Anselm

UCLA

This winter I worked on Anselm. OpenSSL is one of the most popular cryptography libraries for developers, but it’s also shockingly easy to misuse. Still, many instances of improper use fall into specific patterns, which is where Anselm comes in. My main goal was to prototype a system for detecting these behaviors.

I spent most of my time writing an LLVM pass to detect OpenSSL API calls and form a simplified graph of nodes representing possible execution paths. In larger codebases, traversing through each individual IR block can be time consuming. By “compressing” the graph to its relevant nodes (API calls) first, Anselm enables more complex analyses on it.

I also explored and began implementing heuristics for bad behaviors of varying complexity. For example, mandating that a cipher context be initialized only involves searching the children of nodes where one is created. Similarly, other ordering requirements can be enforced through the call graph alone. However, a bug like repeated IVs/nonces likely requires argument/return value analysis, which I’d like to research more in the future.

While I’m happy with what I accomplished over the winternship, there’s a lot more to be done. In addition to fleshing out the ideas mentioned above, I’d like to polish up the final interface so other developers can easily write their own heuristics. Working with LLVM IR also means that Anselm can theoretically operate on other languages that use OpenSSL bindings. I’m excited to tackle these and other challenges in the future!

See you in the fall?

We’ve selected our summer 2020 interns, but if you’d like to apply (or suggest an intern) for fall, contact us!

Reinventing Vulnerability Disclosure using Zero-knowledge Proofs

We, along with our partner Matthew Green at Johns Hopkins University, are using zero-knowledge (ZK) proofs to establish a trusted landscape in which tech companies and vulnerability researchers can communicate reasonably with one another without fear of being sabotaged or scorned. Over the next four years, we will push the state of the art in ZK proofs beyond theory, and supply vulnerability researchers with software that produces ZK proofs of exploitability. This research is part of a larger effort funded by the DARPA program Securing Information for Encrypted Verification and Evaluation (SIEVE).

Much recent ZK work has focused on blockchain applications, where participants must demonstrate to the network that their transactions follow the underlying consensus rules. Our work will substantially broaden the types of statements provable in ZK. By the end of the program, users will be able to prove statements about programs that run on an x86 processor.

Why ZK proofs of exploitability?

Software makers and vulnerability researchers have a contentious relationship when it comes to finding and reporting bugs. Disclosing too much information about a vulnerability could ruin the reward for a third-party researcher while premature disclosure of a vulnerability could permanently damage the reputation of a software company. Communication between these parties commonly breaks down, and the technology industry suffers because of it.

Furthermore, in many instances companies are unwilling to engage with security teams and shrug off potential hazards to user privacy. In these situations, vulnerability researchers are put in a difficult position: stay silent despite knowing users are at risk, or publicly disclose the vulnerability in an attempt to force the company into action. In the latter scenario, researchers may themselves put users in harm’s way by informing attackers of a potential path to exploitability.

ZK proofs of exploitability will radically shift how vulnerabilities are disclosed, allowing companies to precisely define bug bounty scope and researchers to unambiguously demonstrate they possess valid exploits, all without risking public disclosure.

Designing ZK proofs

In ZK proofs, a prover convinces a verifier that they possess a piece of information without revealing the information itself. For example, Alice might want to convince Bob that she knows the SHA256 preimage, X, of some value, Y, without actually telling Bob what X is. (Maybe X is Alice’s password). Perhaps the most well-known industrial application of ZK proofs is found in privacy-preserving blockchains like Zcash, where users want to submit transactions without revealing their identity, the recipient’s identity, or the amount sent. To do this, they submit a ZK proof showing that the transaction follows the blockchain’s rules and that the sender has sufficient funds. (Check out Matt Green’s blog post for a good introduction to ZK proofs with examples.)

Now you know why ZK proofs are useful, but how are these algorithms developed, and what tradeoffs need to be evaluated? There are three metrics to consider for developing an efficient system: prover time, verifier time, and bandwidth, which is the amount of data each party must send to the other throughout the protocol. Some ZK proofs don’t require interaction between the prover and verifier, in which case the bandwidth is just the size of the proof.

And now for the hard part, and the main barrier that has prevented ZK proofs from being used in practice. Many classical ZK protocols require that the underlying statement be first represented as a Boolean or arithmetic circuit with no loops (i.e., a combinatorial circuit). For Boolean circuits, we can use AND, NOT, and XOR gates, whereas arithmetic circuits use ADD and MULT gates. As you might imagine, it can be challenging to translate a statement you wish to prove into such a circuit, and it’s particularly challenging if that problem doesn’t have a clean mathematical formulation. For example, any program with looping behavior must be unrolled prior to circuit generation, which is often impossible when the program contains data-dependent loops.

circuits

Examples of Boolean and arithmetic circuits

Furthermore, circuit size has an impact on the efficiency of ZK protocols. Prover time usually has at least a linear relationship with circuit size (often with large constant overhead per gate). Therefore, ZK proofs in vulnerability disclosure require the underlying exploit to be captured by a reasonably sized circuit.

Proving Exploitability

Since ZK proofs generally accept statements written as Boolean circuits, the challenge is to prove the existence of an exploit by representing it as a Boolean circuit that returns “true” only if the exploit succeeds.

We want a circuit that accepts if the prover has some input to a program that leads to an exploit, like a buffer overflow that leads to the attacker gaining control of program execution. Since most binary exploits are both binary- and processor-specific, our circuit will have to accurately model whatever architecture the program was compiled to. Ultimately, we need a circuit that accepts when successful runs of the program occur and when an exploit is triggered during execution.

Taking a naive approach, we would develop a circuit that represents “one step” of whatever processor we want to model. Then, we’d initialize memory to contain the program we want to execute, set the program counter to wherever we want program execution to start, and have the prover run the program on their malicious input—i.e., just repeat the processor circuit until the program finishes, and check whether an exploit condition was met at each step. This could mean checking whether the program counter was set to a known “bad” value, or a privileged memory address was written to. An important caveat about this strategy: The entire processor circuit will need to run at each step (even though only one instruction is being actually executed), because singling out one piece of functionality would leak information about the trace.

Unfortunately, this approach will produce unreasonably large circuits, since each step of program execution will require a circuit that models both the core processor logic and the entirety of RAM. Even if we restrict ourselves to machines with 50 MB of RAM, producing a ZK statement about an exploit whose trace uses 100 instructions will cause the circuit to be at least 5 GB. This approach is simply too inefficient to work for meaningful programs. The key problem is that circuit size scales linearly with both trace length and RAM size. To get around this, we follow the approach of SNARKs for C, which divides the program execution proof into two pieces, one for core logic and the other for memory correctness.

To prove logic validity, the processor circuit must be applied to every sequential pair of instructions in the program trace. The circuit can verify whether one register state can legitimately follow the other. If every pair of states is valid, then the circuit accepts. Note, however, that memory operations are assumed to be correct. If the transition function circuit were to include a memory checker, it would incur an overhead proportional to the size of the RAM being used.

Trace Proof (1)

Checking processor execution validity without memory

Memory can be validated by having the prover also input a memory-sorted execution trace. This trace places one trace entry before the other if the first accesses a lower-numbered memory address than the second (with timestamps breaking ties). This trace lets us do a linear sweep of memory-sorted instructions and ensure consistent memory access. This approach avoids creating a circuit whose size is proportional to the amount of RAM being used. Rather, this circuit is linear in the trace size and only performs basic equality checks instead of explicitly writing down the entirety of RAM at each step.

Memory Proof (1)

Checking a memory sorted trace

The final issue we need to address is that nothing stops the prover from using a memory-sorted trace that doesn’t correspond to the original and create a bogus proof. To fix this, we have to add a “permutation checker” circuit that verifies we’ve actually sorted the program trace by memory location. More discussion on this can be found in the SNARKs for C paper.

Modeling x86

Now that we know how to prove exploits in ZK, we need to model relevant processor architectures as Boolean circuits. Prior work has done this for a RISC architecture called TinyRAM, which was designed to run efficiently in a ZK context. Unfortunately, TinyRAM is not used in commercial applications and is therefore not realistic for providing ZK proofs of exploitability in real-world programs, since many exploits rely on architecture-specific behavior.

We will begin our development of ZK proofs of exploitability by modeling a relatively simple microprocessor that is in widespread use: the MSP430, a simple chip found in a variety of embedded systems. As an added bonus, the MSP430 is also the system the Microcorruption CTF runs on. Our first major goal is to produce ZK proofs for each Microcorruption challenge. With this “feasibility demonstration” complete, we will set our sights on x86.

Moving from a simple RISC architecture to an infamously complex CISC machine comes with many complications. Circuit models of RISC processors clock in between 1–10k gates per cycle. On the other hand, if our x86 processor turned out to contain somewhere on the order of 100k gates, a ZK proof for an exploit that takes 10,000 instructions to complete would produce a proof of size 48 gigabytes. Since x86 is orders of magnitude more complex than MSP430, naively implementing its functionality as a Boolean circuit would be impractical, even after separating the proof of logic and memory correctness.

Our solution is to take advantage of the somewhat obvious fact that no program uses all of x86. It may be theoretically possible for a program to use all 3,000 or so instructions supported by x86, but in reality, most only use a few hundred. We will use techniques from program analysis to determine the minimal subset of x86 needed by a given binary and dynamically generate a processor model capable of verifying its proper execution.

Of course, there are some x86 instructions that we cannot support, since some x86 instructions implement data-dependent loops. For example, repz repeats the subsequent instruction until rcx is 0. The actual behavior of such an instruction cannot be determined until runtime, so we cannot support it in our processor circuit, which must be a combinatorial circuit. To handle such instructions, we will produce a static binary translator from normal x86 to our program-specific x86 subset. This way, we can handle the most complex x86 instructions without hard-coding them into our processor circuit.

A new bug disclosure paradigm

We are very excited to start this work with Johns Hopkins University and our colleagues participating in the DARPA SIEVE Program. We want to produce tools that radically shift how vulnerabilities are disclosed, so companies can precisely specify the scope of their bug bounties and vulnerability researchers can securely submit proofs that unambiguously demonstrate they possess a valid exploit. We also anticipate that the ZK disclosure of vulnerabilities can serve as a consumer protection mechanism. Researchers can warn users about the potential dangers of a given device without making that vulnerability publicly available, which puts pressure on companies to fix issues they may not have been inclined to otherwise.

More broadly, we want to transition ZK proofs from academia to industry, making them more accessible and practical for today’s software. If you have a specific use for this technology that we haven’t mentioned here, we’d like to hear from you. We’ve built up expertise navigating the complex ecosystem of ZK proof schemes and circuit compilers, and can help!