DeepState Now Supports Ensemble Fuzzing

by Alan Cao, Francis Lewis High School, Queens, NY

We are proud to announce the integration of ensemble fuzzing into DeepState, our unit-testing framework powered by fuzzing and symbolic execution. Ensemble fuzzing allows testers to execute multiple fuzzers with varying heuristics in a single campaign, while maintaining an architecture for synchronizing generated input seeds across fuzzer queues.

This new ensemble fuzzer includes a new deepstate-ensembler tool and several notable fuzzer engines powered by a new and open-sourced DeepState front-end API/SDK. This SDK enables developers to integrate front-end executors for DeepState and provides seed synchronization support for our ensembler tool.

The Fallacy of Smart Fuzzers

Fuzzers are one of the most effective tools in a security researcher’s toolbox. In recent years, they have been widely studied to build better heuristics, or strategies that fuzzers use to explore programs. However, one thing remains clear: fuzzer heuristics rarely live up to their hype. When scaling up so-called smart fuzzers for real-world programs, their performance often falters, and we end up defaulting back to our “dumb” standard tooling, like AFL and LibFuzzer.

Since we have explored the topic of evaluating fuzzing research in the past, let’s take a tangent and instead explore the possibility of combining various fuzzer heuristics to maximize fuzzer performance, without giving up time in our testing workflow. This led to my summer internship project of integrating ensemble fuzzing into DeepState.

What is Ensemble Fuzzing?

The insight from ensemble fuzzing is that, while certain heuristics work well in certain contexts, combining them should produce greater results than just a single fuzzer with a single strategy. This idea was first introduced in Chen et al.’s EnFuzz paper, however, there are no available open-source or commercial implementations.

Our implementation of an ensemble fuzzer follows the architecture implemented in EnFuzz, as seen below:

alan_1

Given a set of pre-determined diverse base fuzzers with respective local seed queues, we can integrate a global-asynchronous and local-synchronous (GALS) seed synchronization mechanism that pulls interesting seeds from local queues to a shared global queue during an execution cycle. Therefore, as a base fuzzer’s heuristics fail to improve coverage or discover interesting input seeds, it can now pull other fuzzers’ seeds from this global queue in the next execution cycle. Furthermore, once the campaign is terminated, we can receive any fuzzing feedback from the ensembler regarding base fuzzer performance, crash triaging/deduplication, or any other post-processing statistics.

Using ensemble fuzzing and our already powerful unit-testing framework for fuzzing/symbolic execution, DeepState, we are able to approach the following problems during testing:

  • Fuzzer performance diversity – do different fuzzer heuristics contribute varying useful seeds, maximizing the potential for improving coverage and crashes discovered?
  • Fuzzer workflow – how can we do exhaustive fuzz testing and/or symbolic execution while simplifying our workflow?
  • Invariant consistency – do different fuzzers return different results, indicating that there might be a source of nondeterminism in our test?

Spinning up Front-ends

Since DeepState already supports Eclipser as a backend, we chose to first build a front-end API, where a developer can write a front-end wrapper for a fuzzer backend. This orchestrates the running fuzzer process, and performs compile-time instrumentation, pre- and post-processing, and seed synchronization. It also simplifies the fuzzing environment setup by unifying how we can construct tools while implementing functionality.

The snippet below shows an example of a front-end wrapper for AFL. It inherits from a base DeepStateFrontend class and includes methods that define fuzzer-related functionality.

alan_2

Example front-end wrapper for AFL. One inherited method, pre_exec, allows the user to perform sanity checks before execution. Both environment properties (i.e. core dump pattern) and argument parsing are checked.

In order to build a front-end wrapper, we should have the following methods in our fuzzer object:

alan_6Each fuzzer has its own ensemble method, which provides a specialized rsync call to push and pull seeds from a global and local queue directory:

alan_3

Ensemble method for seed synchronization. Each __sync_seeds() call invokes a specialized rsync command to transfer seeds between a local and global queue directory.

Once built, we can use a front-end wrapper as so:

# compile a DeepState harness with AFL instrumentation
$ deepstate-afl --compile_test MyDeepStateHarness.cpp --compiler_args=”-Isomelib/include -lsomelib -lotherlib”

# execute the AFL fuzzer through DeepState
$ deepstate-afl -i seeds -o out ./out.afl

For a more detailed look into the fuzzer front-end API and how you can implement your own frontends, see this tutorial. DeepState has existing front-end executors for AFL, libFuzzer, Angora, Eclipser, and Honggfuzz.

https://asciinema.org/a/262023

Building the Ensembler

Using the unified API, we can now build an ensemble fuzzer that provisions front-end objects and executes fuzzers concurrently while maintaining seed synchronization.

To start, take a DeepState test harness input, and “ensemble compile” multiple instrumented binaries to a workspace directory with our fuzzers through the exposed compile() call from each frontend object.

alan_4

Provisioning a test workspace with binaries with the “ensemble compilation” strategy.

Once complete, each parallel fuzzer process is instantiated through run(). Since each front-end wrapper invokes rsync-style synchronization through ensemble(), the ensembler simply calls it from each front-end after a specified sync cycle (in seconds) to synchronize seeds.

This implementation is surprisingly simple, and was built with around 300 lines of code in Python. Here’s a quick demo running the ensembler on one of our test examples, Crash.cpp.

Fuzz ‘Em All!

Inspired by Google’s fuzzer-test-suite and the aforementioned work in fuzzer-performance testing, we decided that a DeepState-powered test suite, deepstate-test-suite, could help fuzzer performance A/B tests and actual bug-hunting. With our easy-to-use fuzzers and ensembler, let’s evaluate how they stand against real-world test-case benchmarks!

Bignum vulnerabilities are an especially interesting class of bugs because edge cases are much more difficult, and even probabilistic, to discover. This makes them ideal targets for DeepState property tests.

We benchmarked our base and ensembler fuzzers for their performance reproducing an existing test case and a real-world bignum vulnerability—a carry mis-propagation bug in TweetNaCl. Following the evaluation methods in this fuzzing evaluation paper, the average time was measured for each test case using 10 crash instances from well-formed initial inputs:

alan_5

These results provide some interesting insights about fuzzer diversity. Running smarter fuzzers like Angora and Eclipser on smaller contained test cases, like the Runlen example, work well. However, their performance falters when scaled up to the context of actual bug discovery in real-world software, like the TweetNaCl bug. The ensemble fuzzer’s performance shows it can scale up well for both of these test cases.

What’s in the future for Ensemble Fuzzing?

Ensemble fuzzing is a powerful technique that scales to real-world software libraries and programs. Integrating an ensemble fuzzer into DeepState gives power to unit-testing with a simplified workflow, and it opens up the possibility for many other research and engineering efforts.

Based on our current benchmark results, we can’t definitely say that ensemble fuzzing is the best fuzzing strategy, but it’s worth noting that there are always elements of randomness and probabilistic behavior when evaluating fuzzers. Effective ensemble fuzzing may be dependent on base fuzzer selection—determining which fuzzers to invoke and when based on the type of target or bug class being analyzed.

Maybe our current ensemble of fuzzers works effectively on reproducing bignum vulnerabilities, but would they work just as well on other classes of bugs? Would it be even more effective if we invoke fuzzers in a specific order? These are the questions we can answer more accurately with more benchmark tests on diverse targets.

Thanks!

Being a security engineering intern at Trail of Bits has been a wonderful experience, as always. Working with awesome employees and interns has really propelled my understandings in security research and how we can turn insightful academic research into working software implementations, just like my previous work with analyzing cryptographic primitives with Manticore. I’m especially excited to continue to do this at NYU, where I’ll be starting in the spring!

Rewriting Functions in Compiled Binaries

by Aditi Gupta, Carnegie Mellon University

As a summer intern at Trail of Bits, I’ve been working on building Fennec, a tool to automatically replace function calls in compiled binaries that’s built on top of McSema, a binary lifter developed by Trail of Bits.

The Problem

Let’s say you have a compiled binary, but you don’t have access to the original source code. Now, imagine you find something wrong with your program, or something you’d like to change. You could try to fix it directly in the binary—for example, by patching the file in a hex editor—but that becomes tedious very quickly. Instead, being able to write a C function and swap it in would massively speed up the process.

I spent my summer developing a tool that allows you to do so easily. Knowing the name of the function you want to replace, you can write another C function that you want to use instead, compile it, and feed it into Fennec, which will automatically create a new and improved binary.

A Cryptographic Example

To demonstrate what Fennec can do, let’s look at a fairly common cryptographic vulnerability that has shown up in the real world: the use of a static initialization vector in the CBC mode of AES encryption. In the very first step of CBC encryption, the plaintext block is XOR’ed with an initialization vector (IV). An IV is a block of 128 bits (the same as the block size) that is used a single time in any encryption to prevent repetition in ciphertexts. Once encrypted, this ciphertext plays the role of the IV for the next block of plaintext and is XOR’ed with this plaintext block.

aditi_1

This process can become insecure when an initialization vector is constant throughout plaintexts. Under a fixed IV, if every message begins with the same block of plaintext, they will all correspond to the same ciphertext. In other words, a static IV can allow an attacker to analyze multiple ciphertexts as a group rather than as individual messages. Below is an example of an IV being generated statically.

unsigned char *generate_iv() {
  return (unsigned char *)"0123456789012345";
}

Sometimes, developers will use cryptography libraries like OpenSSL to do the actual encryption, but write their own functions to generate IVs. This can be dangerous, since non-random IVs can make AES insecure. I built Fennec to help fix issues like this one—it checks whether IV generation was random or static and replaces the function with a new, secure IV if necessary.

The Process

The end goal was to lift executable binaries to LLVM bitcode with McSema and combine it with some LLVM manipulation to replace any function automatically. I started by understanding my cryptographic example and exploring different ways of patching binaries as a bit of background before getting started.

My first step was to work through a couple of the Matasano Cryptopals Challenges to learn about AES and how it can be used and broken. This stage of the project gave me working encryption and decryption programs in C, both of which called OpenSSL, as well as a few Python scripts to attack my implementations. My encryption program used a static IV generation function, which I was hoping to replace automatically later.

I kept using these C binaries throughout the summer. Then, I started looking at binary patching. I spent some time looking into both LD_PRELOAD and the Witchcraft Compiler Collection, which would work if my IV generation function was dynamically linked into the program. The goal of my project, however, was to replace function calls within binaries, not just dynamically loaded functions.

I didn’t want to complicate everything with lifted bitcode yet, so I started by using clean bitcode that generated directly from source code. I wanted to run an LLVM pass on this bitcode to change the functionality of part of my program—namely, the part that generated an IV.

I started by trying to change the function’s bitcode directly in my pass, but soon moved to writing a new function in C and making my original program call that function instead. Every call to the old function would be replaced with a call to my new function.

After some experiments, I created an LLVM pass that would replace all calls to my old function with calls to a new one. Before moving to lifted bitcode, I added code to make sure I would still be able to call the original function if I wanted to. In my cryptographic example, this meant being able to check whether the original function was generating a static IV and, if so, replace it with the code below, as opposed to assuming it was insecure and replacing it no matter what.

// a stub function that represents the function in original binary
unsigned char *generate_iv_original() {
  unsigned char *result = (unsigned char *)"";
  // the contents of this function do not matter
  return result;
}

unsigned char *random_iv() {
  unsigned char *iv = malloc(sizeof(int) * 16);
  RAND_bytes(iv, 16);  // an OpenSSL call
  return iv;
}

unsigned char *replacement() {
  unsigned char *original = generate_iv_original();
  for (int i = 0; i < 10; i++) {
    unsigned char *iv = generate_iv_original();
    if (iv == original) { . // if the IV is static
      return random_iv();
    }
  }
  return original;
}

With my tool working on clean bitcode, it was time to start looking at lifted bitcode. I familiarized myself with how McSema worked by lifting and recompiling binaries and looking through the intermediate representation. Because McSema changes the way functions are called, it took some extra effort to make my tool work on lifted bitcode in the same way that it had on clean bitcode. I had to lift both the original binary and the replacement with McSema. Additional effort was required because the replacement function in a non-lifted binary doesn’t follow McSema’s calling conventions, so it couldn’t be swapped in trivially.

aditi_2

Function names and types are more complex through McSema, but I eventually made a working procedure. Like the tool for clean bitcode, the original function could be kept for use in the replacement.

The last step was to generalize my process and wrap everything into a command line tool that others could use. So I tested it on a variety of targets (including stripped binaries and dynamically-loaded functions), added tests, and tested my installation process.

The Function Replacement Pass

The complete process consists of three primary steps: 1) lifting the binaries to bitcode with McSema, 2) using an LLVM pass to carry out the function replacement within the bitcode, and 3) recompiling a new binary. The LLVM pass is the core of this tool, as it actually replaces the functions. The pass works by iterating through each instruction in the program and checking whether it is a call to the function we want to replace. In the following code, each instruction is checked for calls to the function to replace.

for (auto &B : F) {
  for (auto &I : B) {
    // check if instruction is call to function to be replaced
    if (auto *op = dyn_cast(&I)) {
      auto function = op->getCalledFunction();
      if (function != NULL) {
        auto name = function->getName();
        if (name == OriginalFunction) {
...

Then, we find the replacement function by looking for a new function with the specified name and same type as the original.

Type *retType = function->getReturnType();
FunctionType *newFunctionType =
  FunctionType::get(retType, function->getFunctionType()->params(), false);

// create new function
newFunction = (Function *)(F.getParent()->getOrInsertFunction(ReplacementFunction, newFunctionType));

The next step is to pass the original function’s arguments to the new call.

CallSite CS(&I);

// get args to original function to be passed to replacement
std::vector arguments;

for (unsigned int i = 0; i uses()) {
User* user = U.getUser();
user->setOperand(U.getOperandNo(), newCall);
}

The Complete Tool

Although the LLVM pass does the work of replacing a given function, it is wrapped with the other steps in a bash script that implements the full process. First, we disassemble and lift both input binaries using McSema.

aditi_3

Lifts binaries with McSema

Next, we analyze and tweak the bitcode to find the names of the functions as McSema represents them. This section of code includes support for both dynamically-loaded functions and stripped binaries, which affect the names of functions. We need to know these names so that we can pass them as arguments to the LLVM pass when we actually do the replacement. If we were to look for the names from the original binary, the LLVM pass wouldn’t be able to find any matching functions, since we’re using lifted bitcode.

aditi_4

Finds the names of functions to be replaced

Finally, we run the pass. If we don’t need access to the original function, we only need to run the pass on the original binary. If, however, we want to call the original function from the replacement, we run the pass on both the original binary and the replacement binary. In this second case, we are replacing the original function with the replacement function, and the stub function with the original function. Lastly, we recompile everything to a new working binary.

aditi_5

Runs the pass and compiles a new binary from updated bitcode

Results

Fennec uses binary lifting and recompilation to make a difficult problem relatively manageable. It’s especially useful for fixing security bugs in legacy software, where you might not have access to source code.

Using this tool, it becomes possible to automatically fix a cryptographic IV vulnerability. As seen below, the original binary encrypts a message identically each time using a static IV. After running Fennec, however, the newly created binary uses a different IV, thereby producing a unique ciphertext each time it is run, even on the same plaintext (blue).

# Original binary
aditi@nessie:~/ToB-Summer19$ ./encrypt ""
MDEyMzQ1Njc4OTAxMjM0NQ==/reJh+5rktBatDpyuJNQEBo++0pyIRGZiNsmZkN09HTPIOBVqQ9ov6CrxPXO7dC4cUJGYzBEsejHuTQyjVQh+XsLCHyDkURmfCuJ+a97raPY+o8pKKt8yf/xTmYMtyq2zf7EQxqPxv2bXKdP+6K+h9KyuO3q4+3JbuJFTesNLy8Np1m9ShJ9UAHvAdO6LCZvQ
N91kz0ytIH+s7LgajIWyises+yz26UBQwOzZLeLcQp4=
176

aditi@nessie:~/ToB-Summer19$ ./encrypt ""
MDEyMzQ1Njc4OTAxMjM0NQ==/reJh+5rktBatDpyuJNQEBo++0pyIRGZiNsmZkN09HTPIOBVqQ9ov6CrxPXO7dC4cUJGYzBEsejHuTQyjVQh+XsLCHyDkURmfCuJ+a97raPY+o8pKKt8yf/xTmYMtyq2zf7EQxqPxv2bXKdP+6K+h9KyuO3q4+3JbuJFTesNLy8Np1m9ShJ9UAHvAdO6LCZvQ
N91kz0ytIH+s7LgajIWyises+yz26UBQwOzZLeLcQp4=
176

aditi@nessie:~/ToB-Summer19$ ./encrypt ""
MDEyMzQ1Njc4OTAxMjM0NQ==/reJh+5rktBatDpyuJNQEBo++0pyIRGZiNsmZkN09HTPIOBVqQ9ov6CrxPXO7dC4cUJGYzBEsejHuTQyjVQh+XsLCHyDkURmfCuJ+a97raPY+o8pKKt8yf/xTmYMtyq2zf7EQxqPxv2bXKdP+6K+h9KyuO3q4+3JbuJFTesNLy8Np1m9ShJ9UAHvAdO6LCZvQ
N91kz0ytIH+s7LgajIWyises+yz26UBQwOzZLeLcQp4=
176

aditi@nessie:~/ToB-Summer19$ bash run.sh 2 ../mcsema-2.0.0-ve/remill-2.0.0/remill-build-2/ /home/aditi/ToB-Summer19/ida-6.9/idal64 encrypt replaceIV generate_iv replacement generate_iv_original -lcrypto

# Fennec's modified binary
aditi@nessie:~/ToB-Summer19$ ./encrypt.new ""
L+PYRFiOKMcu18hSqdGQEw==/aK2hYm/GXHwA2tqZxPmoNccQwW+Zhj7E0PQUSRF+lOLJiEMwOc7yv+/Z2AA0pEJjP7Jq4lHMpq2eIVl73lvav0pJiVlOcmfnFwQ9cu0MW0EWqUdgl2FCsWKtO/TAfGhcQPopJyvP8KD/LHlru4QIfZiym7//tt0V9vvabFCLNiSTRG350XKO/zoydeuRFfSu
0HmNNQbAcLSQkcUETH424RyQ4SxmcreW3krOw30kfJY=
176

aditi@nessie:~/ToB-Summer19$ ./encrypt.new ""
hYnowxN2Z3QyPIzwNaFzJw==/pzCq+V1q5ipHoqJXZ9MaeDr+nMdV5E1RbeI+YrcQqXjFHcVmDSq4yZboEuIJJjkbNbdO5DG6n3CQnZ1C7CumGdaZsddaYJueORROk7X+PnQZUq5bKqvdN7ZJEhK7qaerjogOF4TAotDV3ryLC6l/EWY01DkhGrf0hlXAkjQnOz28lXF40GNMd6pIjcoIbZze
V72v5s5q67fVdKdCzVE3BH76qX8qYS9YnN5JkGLERYA=
176

aditi@nessie:~/ToB-Summer19$ ./encrypt.new ""
r3/wMu5nD3rEFn7N88fCjQ==/MisK9RcK8RLsqjV2nrAfprghBYrBmeJS3FbJ4YG6zHBk+uA0CcZ+R4CSDolAaAPlCmkupfxy6bFHNEqyMVv7moPaiJEAkHDDU/FKen8eAJjMvz9+RK+xmQja238jk7xmaS6JbJOdh8teQ2XiMzlHsBYBVpw89UBFrTqOSN8qtlgU3aR4xUVlwZAA1+Pg2GHy
2CIWQI6ioHGDhN3P3po7MaOldJAgHGZO5d2GluroI70=
176

You can download Fennec and find instructions for its use here.

If you have questions or comments about the tool, you can find Aditi on Twitter at @aditi_gupta0!

Binary symbolic execution with KLEE-Native

by Sai Vegasena, New York University, and Peter Goodman, Senior Security Engineer

KLEE is a symbolic execution tool that intelligently produces high-coverage test cases by emulating LLVM bitcode in a custom runtime environment. Yet, unlike simpler fuzzers, it’s not a go-to tool for automated bug discovery. Despite constant improvements by the academic community, KLEE remains difficult for bug hunters to adopt. We’re working to bridge this gap!

My internship project focused on KLEE-Native, a fork of KLEE that operates on binary program snapshots by lifting machine code to LLVM bitcode.

What doesn’t kill you makes you stronger

KLEE’s greatest strength is also its biggest weakness: It operates on LLVM bitcode. The most apparent strength of operating on bitcode is that KLEE can run on anything that the Clang compiler toolchain can compile: C, C++, Swift, Rust, etc. However, there is a more subtle benefit to KLEE’s approach that is often overlooked. Operating on bitcode means the “supporting code” or “runtime” can be implemented in C or C++, compiled to bitcode, linked against the system under test (SUT), and then subjected to the same symbolic path exploration as the SUT itself.

This provides flexibility and power. For example, a KLEE runtime can implement I/O-related system calls (read, write​, etc.) as plain old C functions. These functions are subject to symbolic exploration, just like the SUT, and contribute to code coverage. This allows KLEE to “see into” the OS kernel, and explore paths that might lead to tricky corner cases.

Now for the downside of operating in bitcode. Typically, KLEE is used on programs where source code is available, but getting bitcode from source code is not easy because of the difficulties created by build systems, configurations, and dependencies. Even when bitcode is available, a vulnerability researcher may have to manually inject KLEE API calls into the source code, link in the KLEE runtime, and possibly stub out or manually model external library dependencies. These tasks become daunting when dealing with large code bases and complicated build systems. McSema is a black-box option when source code is not available, but the limitations of binary control-flow graph recovery and occasional inaccuracies may not produce acceptable results.

KLEE-Native runs on binary program snapshots

First, we focused on getting bitcode for any program, and the approach we took was to operate on bitcode lifted from machine code, as opposed to compiled source. Using a dynamic approach based on snapshots like with GRR, we can start KLEE-Native deep into a program’s execution, which isn’t possible with mainline KLEE.

Snapshotting

By default, the KLEE-Native snapshotter captures the program’s memory and register state right before the first instruction is executed. This means KLEE-Native needs to emulate pre-main code (e.g., loading shared libraries), which isn’t ideal. To avoid emulating that type of deterministic setup, we implemented a feature that works by injecting an INT3 breakpoint instruction at a user-specified virtual address, via ptrace.

In this mode, the target process executes natively until the breakpoint instruction is hit. Once it’s hit, the snapshotter reclaims control of the target and subsequently dumps the target process memory and a register state structure compatible with Remill into a “workspace” directory. Memory mappings of the original process can then be recreated from this workspace.

$ klee-snapshot-7.0 --workspace_dir ws --breakpoint 0x555555555555 --arch amd64 -- ./a.out

For address-space layout randomization (ASLR) binaries, the --dynamic flag instructs the snapshotter to interpret the breakpoint address as an offset within the main program binary. To do this, we use a neat trick of parsing the /proc/[pid]/maps file for the target process to discover the base virtual address of the loaded program. Do some arithmetic and voila, we have our breakpoint location!

$ klee-snapshot-7.0 --workspace_dir ws --dynamic --breakpoint 0x1337 --arch amd64_avx -- ./a.out

Side note: One interesting side effect involves CPU feature testing. Libc checks for available CPU features using the CPUID instruction and uses it to determine whether to use specialized versions of some functions (e.g., a hand-coded, SSE4-optimized memset). If a snapshot is taken after CPUID is executed natively, then you must specify an AVX option for the architecture of the snapshot. Otherwise those kinds of instructions might not lift.

Dynamically lifting machine code to bitcode

Now that we have our snapshot, we can ask KLEE-Native to dynamically lift and execute the program. We can do this with the following command:

$ klee-exec-7.0 --workspace_dir ws

The following diagram shows the control flow of klee-exec.

Screenshot from 2019-08-09 15-11-26

At a high level, KLEE-Native just-in-time decodes and lifts traces, which are simply LLVM function objects that contain a logical segment of the lifted machine code. Traces contain LLVM instructions emulating machine code and calls to Remill intrinsics and other lifted traces.

Intrinsic calls may be handled by KLEE’s “special function handler” capability, which allows for runtime bitcode to have direct access to KLEE’s executor and state. For example, Remill’s memory read-and-write intrinsics use the special function handler to interact with the snapshotted address space.

When emulating a function like malloc, traces are created from libc’s implementation, and execution is able to continue smoothly. All is good in the world. We see the light and everything makes sense…

… Just kidding!

brk and mmap come along, and now we have to execute a system call. What do we do?

The runtime is the kernel and the machine

KLEE-Native lifts all machine code down to the raw system call instructions. Using Remill, system call instructions are handled by a call to the intrinsic function __remill_async_hyper_call in the runtime bitcode. Remill doesn’t specify the semantics of this function, but the intent is that it should implement any hardware- or OS-specific functionality needed to carry out a low-level action.

In our case, KLEE-Native implements the __remill_async_hyper_call function, so it passes execution from lifted bitcode back into the runtime bitcode, where each Linux system call wrapper is implemented.

System call wrappers are parameterized by an Application Binary Interface (ABI) type, which finds the system call number and arguments and stores the return value. Here is an example the SysOpen function, which wraps the POSIX open system call implemented by KLEE’s runtime.

trent_edit

Snippet of emulated open system call

This wrapper performs some of the error checking that the actual OS kernel would do. (i.e., making sure the file path name can be read from the snapshotted address space, checking the path length, etc.) Here is where we play to KLEE’s strengths: All of these error-checking paths are subject to symbolic exploration if any of the data being tested is symbolic.

We have now dynamic-lifted machine code to bitcode, and we’ve emulated system calls. That means we can run the lifted machine code and have it “talk” to KLEE’s own runtime the same way that bitcode compiled from source might do.

Our call to malloc continues executing and allocating a chunk of memory. Life is good again, but things are starting to get slow. And boy, do I mean sloooooooow.

Recovering source-level abstractions with paravirtualization

Operating on machine code means that we must handle everything. This is problematic when translating seemingly benign libc functions. For example, strcpy, strcmp, and memset require significant lifting effort, as their assembly is comprised of SIMD instructions. Emulating the complex instructions that form these functions ends up being more time consuming than if we had emulated simplistic versions. This doesn’t even address the sheer amount of state forking that can occur if these functions operate on symbolic data.

We paravirtualized libc to resolve this issue. This means we introduced an LD_PRELOAD-based library into snapshotted programs that lets us interpose and define our own variants of common libc functions. In the interceptor library, our paravirtualized functions are thin wrappers around the POSIX originals, and executing them results in the original POSIX functions being called before the snapshot.

Their purpose is to be a single point of entry that we find and patch during the snapshot phase. In the following example, the snapshotter will patch over a JMP with a NOP instruction so that malloc ends up invoking INT 0x81 when it is emulated by KLEE-Native.

trent_edits3

LD_PRELOAD malloc interceptor

When an interrupt is hit during emulation, we check the interrupt vector number and hook to a corresponding paravirtualized libc runtime function. Sometimes our paravirtualized versions of the libc functions cannot handle all cases, so we support a fallback mechanism, where we give control to the original libc function. To do this, we increment the emulated program counter by one and jump over the RETN instruction, which leads to the original function being executed.

Here are the libc functions that our LD_PRELOAD library currently intercepts. Each of the numbers (e.g., malloc’s 0x81) is the interrupt vector number that corresponds with the paravirtualized version of that function.

inter_alias_nbc

libc​ functions with paravirtualized equivalents

Wonderful! Our call to malloc has, instead, hit an interrupt, and we are able to hook to its paravirtualized version in KLEE. What exactly are we doing with it, and why do we care?

Modeling heap memory with libc interceptors

A call to mmap or brk during an emulated malloc will layout memory for allocations. While this is an accurate representation of the lifted machine code instructions, it is not a productive model for finding bugs. The problem: mmaps are very coarse grained.

Every access to memory can be seen, but it is unclear where a given allocation begins or ends. As a result, it is difficult to do things like bounds checks to detect overflows and underflows. Furthermore, there is no oversight on bug classes like double frees and use-after-frees, when allocations are opaque blobs.

That’s why we have interposed on malloc and other allocation routines to formulate a crystalline memory model that demarcates memory allocations. This approach makes it trivial for KLEE-Native to classify and report heap vulnerabilities. What’s unique about this approach is that we have invented a totally new address format for allocations to make our lives easier. It contains metadata about each allocation, which makes it simple to locate in our memory model.

trent_edit2

Union showing the components of our custom address encoding

Allocations backed by our paravirtualized malloc don’t truly “exist” in a traditional address space. Instead, they exist within allocation lists. These are structures that allow individual allocations to coexist so that they’re easy to access, track, and manipulate, which makes bounds checks, double-free detection, and overflow/underflow detection extremely transparent. Furthermore, allocation lists give us the flexibility to recover from issues like heap-based buffer overflows by expanding the backing “allocation” in place.

Huzzah! We’ve achieved a clear-cut representation of the target’s allocated memory using allocation lists. But wait. Isn’t this supposed to be a symbolic execution engine? All of this is really only concrete execution. What’s going on?

Eager concretization for an improved forking model

Our approach to symbolic execution is a departure from the typical scheduling and forking model used by KLEE. Where KLEE is a “dynamic symbolic executor,” KLEE-Native is closer to SAGE, a static symbolic executor.

KLEE-Native’s approach to forking favors eager concretization and depth-first exploration. The kick is that we can do it without sacrificing comprehensiveness. Meaning, it is always possible to go to a prior point in execution and request the next possible concretization, if that is our policy. This is different than something like KLEE or Manticore, which eagerly fork (i.e., generate a multitude of feasible states, and then defer to a scheduler to choose them over time).

The mechanism we created to enable eager concretization, without sacrificing forking potential, is implemented using state continuations. State continuations are like Python generators. In KLEE-Native, they package up and hold on to a copy of the execution state prior to any forking, as well as any meta-data needed in order to produce the next possible concretization (thus giving us comprehensiveness). The executor can then request the next possible forked state from a given continuation. Thus, each request gives us back a new execution state, where some condition has been concretized (hence the term “eager concretization”).

For now, we store state continuations on a stack. The result is that KLEE-Native “goes deep” before it “goes wide.” This is because at each point where the state could be forked, we create a continuation, get the first state from that continuation, and push it onto the stack. When a state is done executing (e.g. it exits, or an unrecoverable error is encountered), we look at the last continuation on the stack and ask for its next state. This process continues until a continuation is exhausted. If that happens, it is popped off and we go to the next continuation on the stack. In the future, we will explore alternative strategies.

trent4

How to fork when eagerly concretizing

Our approach was motivated by a need to handle symbolic memory addresses. We started adding symbols, but couldn’t always get KLEE to explore all paths. KLEE was concretizing memory addresses, but not in a comprehensive way. This was honestly expected, because symbolic memory is a hard problem.

Implementing concrete memory is easy, because there is essentially a map of addresses to byte values. However, what does it mean when an address could take on many values? We decided the best policy was to create a generic mechanism to concretize the address immediately without throwing away all the other possibilities, and then leave it up to policy handlers to make more substantive approaches. Examples of more substantive policies could be sampling, min/max, etc.

Wonderful! We can now explore the program’s state space. Let’s go hunting.

Applying KLEE-Native in the real world

Because we have control over emulated address space and memory allocations, classifying different types of memory corruption vulnerabilities becomes easy with KLEE-Native, and vulnerability triaging is a fantastic use case for this. Furthermore, our eager concretization strategy ensures we will stick to the code path of interest.

Here is CVE-2016-5180 from the Google fuzzer-test-suite. It is a one-byte-write heap buffer overflow in c-ares that was used in a ChromeOS exploit chain.

We first snapshot the program at main with a dynamic breakpoint:

$ klee-snapshot-7.0 --workspace_dir ws_CVE --dynamic --breakpoint 0xb33 --arch amd64_avx -- ./c_ares

And simply run the klee-exec command:

$ klee-exec-7.0 --workspace_dir ws

Here we get KLEE-Native detecting a one-byte heap overflow.

repro_c

So what makes KLEE-Native special compared to AddressSanitizer or Valgrind? This is where our policy handler comes in. One policy to handle memory access violations like this one is replacing overflow bytes with symbolic ones. As execution continues, we could potentially diagnose the severity of the bug by reporting the range of symbolic overflow bytes at the end. This could let a vulnerability researcher distinguish states that allow limited possibility for an overflow from ones that could potentially allow a write-what-where primitive.

In KLEE-Native, undefined behavior can be the new source for symbolic execution. This enables vulnerability triaging without prior knowledge of your threat model and the need for tedious reverse engineering.

Au revoir!

My internship produced KLEE-Native; a version of KLEE that can concretely and symbolically execute binaries, model heap memory, reproduce CVEs, and accurately classify different heap bugs. The project is now positioned to explore applications made possible by KLEE-Native’s unique approaches to symbolic execution. We will also be looking into potential execution time speed-ups from different lifting strategies. As with all articles on symbolic execution, KLEE is both the problem and the solution.

Reverse Taint Analysis Using Binary Ninja

by Henry Wildermuth, Horace Mann High School

We open-sourced a set of static analysis tools, KRFAnalysis, that analyze and triage output from our system call (syscall) fault injection tool KRF. Now you can easily figure out where and why, KRF crashes your programs.

During my summer internship at Trail of Bits, I worked on KRF, a fuzzer that directly faults syscalls to cause crashes. KRF works extremely well and pumps out core dumps like nobody’s business. However, it is difficult to determine which faulted syscall caused a particular crash since there could be hundreds of faulted syscalls in a single run. Manually tracing the cause of the crash through the source or disassembled binary is tedious, tiring, and prone to errors.

I set out to solve this problem using a technique I call Reverse Taint Analysis, and implemented my solution using the Binary Ninja API. The script gives a short list of possible causes of the crash, drastically limiting the amount of manual work required. Here, I describe the process I went through to create the algorithm and script, and give a brief overview of the additional tools built to ease its use.

Human in the Loop

How can we reliably determine the source of a crash? Well, how would a human determine the cause of a crash? First, we would look at the stack trace and figure out where the crash occurred. Let’s take a look at this vulnerable example program:

#include 
#include

void fillBuffer(char * string, unsigned len) {
  for (unsigned i = 0; i < len; ++i) {
      string[i] = 'A'; // if string = NULL, segfaults on invalid write
  }
}

int main() {
  char *str;

  // Memory allocated
  str = (char *) malloc(16); // if malloc fails, str = NULL 
  fillBuffer(str, 16); // str not checked for malloc errors!
  free(str);
  return 0;
}

Running KRF against this program caused a fault. In this case, we can easily guess why the crash occurred—a faulted brk or mmap caused malloc to return NULL, which produced a segfault when fillBuffer tried to write to NULL. But let’s figure out the cause of the crash for certain, acting as if we didn’t have access to the source code.

First, let’s open up the core dump’s stack trace with gdb and see what caused the crash:

(gdb) bt
#0 0x00005555555546a8 in fillBuffer ()
#1 0x00005555555546e1 in main ()

Next, let’s take a look at our memory maps for the process so we can find the instruction in the binaries:

(gdb) info proc mappings
Mapped address spaces:
Start Addr End Addr Size Offset objfile
0x555555554000 0x555555555000 0x1000 0x0 /vagrant/shouldve_gone_for_the_head
[output truncated]

Now, cross-referencing the address from the stack trace, we see that the instructions of the top stack frame, at 0x555555554000, are in the binary /vagrant/shouldve_gone_for_the_head. We can calculate the instruction pointer’s offset in the binary by subtracting its location in the mapped address space from the beginning of the memory-mapped objfile and adding the offset:

0x00005555555546a8 - 0x555555554000 + 0x0 = 0x6a8.

Great! Now we can examine the binary itself in our disassembler of choice (Binary Ninja) and see what went wrong.

binja1

Here, we can see the disassembly of the fillBuffer() function, with the instruction that causes the segfault highlighted in red. This instruction sets the byte pointed to by rax to the character code for A. So, the issue must be an invalid value of rax. Looking back, we see that rax = rax + rdx, which are both previously set to the local variables string and i, respectively. We can see in the instruction at 0x68e that string was originally stored in rdi, which is the first argument to the function. i is initialized to zero and is only incremented, so we can ignore it, since we know it could not have been tainted by a function call or the function’s arguments.

Knowing that the first argument to fillBuffer() is tainted, we can go to the next frame in the stack trace and see what happened. We perform the same subtraction with the memory map to the address in the stack trace, 0x00005555555546e1, and get the actual address:

0x00005555555546e1 - 0x555555554000 + 0x0 = 0x6e1.​

This address is going to be one instruction after the function call to fillBuffer() since it is the return address. So, we want to examine the instruction directly before the one at 0x6e1. Let’s open it up in Binary Ninja!

binja2

Here, we have the instruction at 0x6e1 highlighted in blue, and the previous instruction highlighted in red. We know from our manual analysis of fillBuffer that the first parameter is stored in rdi, so we should track the data being stored in rdi. In the instruction before, we see that rdi is set to rax, and above that, there is a call to malloc, which stores its return value in rax.

Great! Now we know that the output of malloc gets passed into fillBuffer, where it causes the segfault. We’ve figured it out! But that was really annoying. If only there were a better way…

Enter MLIL Static Single Assignment

Well, it turns out there is a better way! Binary Ninja can decompile code into something called Medium Level IL (MLIL), which is a more human-readable form of assembly. It can then convert that MLIL into a form called Static Single Assignment (SSA), where every variable is assigned exactly once. This becomes really useful, because we don’t need to worry about things changing a variable other than its definition. As an example of SSA, consider this pseudocode function:

def f(a):
  if a < 5:
    a = a * 2
  else:
    a = a - 5
  return a

In SSA form is:

def f(a0):
  if a0 < 5:
    a1 = a0 * 2
  else:
    a2 = a0 - 5
  a3 = Φ(a1, a2) // meaning “a3 is either a1 or a2”
  return a3

So, let’s look at our same example again through the lens of SSA MLIL. Here’s fillBuffer in SSA MLIL:

binja3

Here, we can easily trace rax_2#4 to rax_1#3 + rdx_1#2, then trace rax_1#3 to string#1, which we see is arg1. We can also easily trace back i and see that it is set to 0. We have once again discovered that the first argument to fillBuffer is the source of the crash. So now, let’s look at main.

binja4

This is where we really see the benefits of SSA MLIL over regular disassembly. It lets us see what arguments are passed into fillBuffer, and what values are returned by malloc, making the analysis much easier. By tracing the sources of rdi#1 backwards, we again see that malloc is tainting the first argument of fillBuffer and, therefore, causing the crash.

We’re in the endgame now

So now that we’ve realized (for the second time) that malloc is the cause of our issues, let’s write out the process we’ve been applying, so we can easily convert it to code:

1. Make an empty stack. 
2. Push the crashing instruction to the stack. 
3. While the stack is not empty: 
4.   Pop an instruction off the stack. 
5.   If it is a MLIL function call instruction: 
6.     The return value of that function call may be cause of crash 
7.   Otherwise: 
8.     For each SSA variable used in the MLIL instruction: 
9.        If it’s not assigned in this function: 
10.         # It’s a function argument 
11.         We will have to go another frame up our stack trace. 
12.         # The same as going to main after finding arg1 was tainted
13.       Otherwise: 
14.         Add the instruction assigning SSA variable to the stack.

This is going to be easy! We just have to write it out in Python using the Binary Ninja API. We need to write a function that takes our instruction’s address and a BinaryView (a class holding information on the binary), and prints out the taint sources of the instruction.

def checkFunction(self, inst_addr, bv): 
  # Get MLILFunction obj for the function containing the instruction 
  func = bv.get_functions_containing(inst_addr)[0].medium_level_il   
  
  # Get the MLILInstruction obj for instruction at inst_addr
  inst = func[func.get_instruction_start(inst_addr)].ssa_form
  
  # Convert MLILFunction to SSA form
  func = func.ssa_form 
  
  # Keep track of what is seen
  visited_instructions = set()
  
  # Variables we are interested in
  var_stack = []  
  
  # Add the vars used by first instruction to stack
  for v in inst.vars_read: 
    var_stack.append(v)
 
  # Continuously run analysis while elements are in the stack
  while len(var_stack) &gt; 0:
    var = var_stack.pop()
    if var not in visited_instructions:
      # Add to list of things seen
      visited_instructions.add(var) 

    # Get variable declaration
    decl = func.get_ssa_var_definition(var)
    
    # Check if its an argument
    if decl is None: 
      print(&quot;Argument &quot; + var.var.name + &quot; tainted from function call&quot;)
      continue

    # Check if its a function call
    if decl.operation == MediumLevelILOperation.MLIL_CALL_SSA:
      # If direct call
      if decl.dest.value.is_constant:
        # Get MLILFunction object of callee from address 
        func_called = bv.get_function_at(decl.dest.value.value) 
        print(&quot;Tainted by call to&quot;, func_called.name, &quot;(&quot; + hex(decl.dest.value.value) + &quot;)&quot;)
      else: 
        # Indirect calls
        print(&quot;Tainted by indirect call at instruction&quot;, hex(decl.address))
      continue

    # If not an argument or call, add variables used in instruction to the stack. Constants are filtered out
    for v in decl.vars_read:
      var_stack.append(v)

The power of SSA is used in the vars_read and get_ssa_var_definition methods. MLIL makes detecting calls easy using decl.operation == MediumLevelILOperation.MLIL_CALL_SSA.

Extending the script

We can expand on a lot here with error handling, edge cases, automatically analyzing the frame above in the stack trace, automatically extracting information from the stack trace, etc. Thankfully, I’ve already done some of that with a set of python scripts.

python3 main.py binary coredump1 [coredump2] …

Automatically extracts the needed information from the core dumps, then inserts that information and binaries into a tarball to be copied to another computer, including libraries that are called in the stack trace.

gdb.py

Uses GDB Python API to extract data from each core dump. It’s called by main.py, so they must be in the same directory.

python3 analyze.py tarball.tar.gz

Takes a tarball output by main.py and automatically runs reverse taint analysis on each core dump in it, automatically cascading tainted arguments to the next frame. It uses krf.py to run the analysis, so they must be in the same directory.

krf.py contains the analysis code, which is a more featured version of the script written in this blog post. (Requires the Binary Ninja API.)

Let’s try them on our test binary:

$ # Linux VM with KRF
$ python3 main.py shouldve_gone_for_the_head core
Produced tar archive krfanalysis-shouldve_gone_for_the_head.tar.gz in /vagrant

$ # Machine with Binary Ninja
$ python3 analyze.py krfanalysis-shouldve_gone_for_the_head.tar.gz
Analyzing binary shouldve_gone_for_the_head
Done
Analyzing crash krfanalysis-shouldve_gone_for_the_head/cores/core.json
Tainted by call to malloc (0x560)
All paths checked

Conclusion

Writing this analysis script has shown me the Binary Ninja API is amazing. The versatility and automatic analysis it allows is incredible, especially considering it acts directly on binaries, and its intermediate languages are easy to use and understand.

I’d also like to mention LLVM, another framework for static analysis, which has a very similar API to Binary Ninja. It has many benefits over Binary Ninja, including better access to debug and type information, being free, having a more mature codebase, and always-perfect analysis of calling conventions. Its downside is that it needs the source code or LLVM IR of what you are analyzing.

Three LLVM passes are available in the KRFAnalysis repository to run static analysis: one detecting race conditions caused by checking the state of a system before use (i.e. time-of-check, time-of-use or TOC/TOU), another detecting unchecked errors from standard library calls, and a third reimplementing reverse taint analysis.

My summer: A small price to pay for salvation

I am incredibly grateful to everyone at Trail of Bits for my internship. I gained some amazing technical experience and got the chance to work with the Linux Kernel, FreeBSD Kernel, and LLVM—codebases I had previously considered to be mystical.

Some of my highlights:

  • I ported KRF to FreeBSD
  • Added the ability for KRF to target processes by PID, GID, UID, or if it had a specific file open
  • Wrote LLVM passes for static analysis
  • Upstreamed LLVM changes
  • Learned how to use Binary Ninja and its API
  • Picked up good coding practices
  • Gained a sense of the security industry

I also met some incredible people. I would like to give special thanks to my mentor Will Woodruff (@8x5clPW2), who was always willing to talk over an implementation, idea, or review my pull requests. I can’t wait to apply what I’ve learned at Trail of Bits as I move forward in my career.

Wrapper’s Delight

by Patrick Palka, University of Illinois at Chicago

During my summer at Trail of Bits, I took full advantage of the latest C++ language features to build a new SQLite wrapper from scratch that is easy to use, lightweight, high performance, and concurrency friendly—all in under 750 lines of code. The wrapper is available at https://github.com/trailofbits/sqlite_wrapper under the Apache 2.0 license. Comments and pull requests are welcome.

The motivation for this new SQLite wrapper came from working on an internal code-auditing tool built on top of Clang that uses a database to store and perform queries on semantic information about source code. Originally, RocksDB was chosen as the backing database, but I quickly found myself wrestling with the rigidity of a key-value database as queries became more complex. Wishing that we had a more expressive relational database, I began to explore switching to SQLite.

Initial experiments suggested that switching would not degrade performance if the database was properly tuned (see below), so I began looking at existing C++ SQLite wrappers to see which, if any, could suit our needs. We wanted something that would let us perform raw SQL queries that fit all our criteria for being both lightweight and able to handle concurrency. Unfortunately, none of the existing wrappers satisfied all of these, so I set out to write one from scratch.

After migrating over the backend to SQLite, we were impressed by the scalability and feature-richness of SQLite. It has a command line interface that makes debugging and prototyping easy, handles databases on the order of 100GB without a sweat and even has a built-in full-text search (FTS) extension. The wrapper makes interfacing with SQLite in C++ about as easy as possible, too.

Details

  • An example of its usage can be found at https://gist.github.com/patrick-palka/ffd836d0294f71d183f4199d0e842186. For simplicity, we chose to model parameter and column bindings as variadic function calls, so that all binds are specified at once.
  • Some of the modern C++ language features you’ll notice are inline and template variables, constexpr if and auto template parameters, generic lambdas, fold expressions, and thread_local variables.
  • This wrapper supports user-defined serialization and deserialization hooks. (https://gist.github.com/patrick-palka/d22df0eceb9b73ed405e6dfec10068c7)
  • This wrapper also supports automatic marshaling of C++ functions into SQL user functions. (https://gist.github.com/patrick-palka/00f002c76ad35ec55957716879c87ebe)
  • There is no database or connection object to explicitly initialize. Because the wrapper utilizes thread_local objects to manage connections to the database, a connection is made implicitly before the first use of the connection and is disconnected once the thread exits.
  • The database name and query strings are passed as template arguments instead of function arguments. This creates compile-time separation of the thread_local connection objects, which are per-database, and the thread_local prepared-statement caches, which are per-database, per-query-string. This design decision discourages the use of dynamically-generated query strings, since non-type template arguments must be compile-time constant expressions. In cases where a database name or a query string must be dynamically generated, the wrapper does support passing a lambda, which builds and returns the query string at runtime.
  • Every single prepared statement made by this wrapper is cached and reused, so the bare minimum number of calls to sqlite3_prepare is made throughout the lifetime of the program.
  • Downside: This wrapper cannot be used to manually manage connections to the database. It currently handles connections using a thread_local object, so a connection is created before the first query is performed on a given thread and is destroyed during thread exit. If you find yourself needing fine-grained control of when to connect or disconnect from your SQLite database, this wrapper may not work well for you. But this is limitation may be amended in the future.

Fine-tuning SQLite

Here are some SQLite tuning tips to maximize performance. Our wrapper does the first three for you automatically.

  1. Prefer “external” FTS tables when using SQLite’s FTS extension. Build the table after the data is inserted using the ‘rebuild’ command. (https://www.sqlite.org/fts5.html#the_rebuild_command)
  2. Reuse prepared statements. Create them using the sqlite_prepare_v3() routine and pass in the SQLITE_PREPARE_PERSISTENT option. (https://www.sqlite.org/c3ref/prepare.html)
  3. Use the SQLITE_STATIC option when binding text and blob values via the sqlite3_bind_*() routines. Ensure that the underlying memory is valid until the first call to sqlite3_step(). This avoids a redundant copy of the text or blob data. (https://www.sqlite.org/c3ref/bind_blob.html)
  4. Perform your insertions and updates in bulk transactions when possible. The speedup relative to using unit-size transactions grows nearly linearly with the size of the transaction, so inserting 10,000 rows per transaction is thousands of times faster than inserting 1 row per transaction.
  5. Create your indexes after all most or all of your data has been inserted, and choose your indices wisely. Creating indices once is going to be faster overall than continuously building and rebuilding them as more data gets inserted. Use the SQLite command-line interface to double-check each query’s plan, and install a log callback to have SQLite inform you whenever it decides to create a temporary index.
  6. Don’t use the same database connection or prepared statement object concurrently. SQLite serializes access to these objects. (https://sqlite.org/threadsafe.html) Also, the isolation component of ACID is guaranteed only between separate connections to the same database. (https://www.sqlite.org/isolation.html)
  7. Consider pragma temp_store = memory when storing temporary tables and data structures in memory. (https://www.sqlite.org/pragma.html#pragma_temp_store)

SQLite C API Tips

Finally, here are some miscellaneous tips to simplify working with the SQLite C API, where the first two are done for you by our wrapper.

  1. Install a concurrency-friendly sqlite_busy_handler to avoid having to check for `SQLITE_BUSY` after every API call. (https://www.sqlite.org/c3ref/busy_handler.html)
  2. Set up a log callback to have errors and other notices, like hints on where to add indices, printed automatically. (https://www.sqlite.org/errlog.html)
  3. Bundle a copy of SQLite into your project. This is the recommended way to use SQLite in your application. (https://www.sqlite.org/custombuild.html) Doing so also lets you enable SQLite’s full-text-search extension and its other useful disabled-by-default extensions.
  4. Use C++11 raw string literals to format query strings.

Final Thoughts

This summer, some of my takeaways were that when locally storing a moderate amount of structured data without large concurrency demands, sooner or later you will want to perform complex queries on this data. Unless you have a clear vision for your data-access patterns from the outset, using a key-value database will quickly back you into a corner whenever your data-access pattern changes. On the other hand, relational databases make it easy to adapt your database to continuously changing access patterns. And finally, modern C++ can help make interfacing with SQLite and other C APIs concise and easy, and when configured properly, SQLite is quite scalable.

A Day in the Life of Alessandro Gario, Senior Security Engineer

People interested in joining Trail of Bits often ask us what it’s like to work on the Engineering Services team. We felt that the best answer would be a profile of some of the talented individuals on our team, and let them describe their experiences at Trail of Bits in their own words.

Today, we’re featuring Alessandro Gario, a member of our Engineering Team who lives in Italy. Alessandro works on open-source projects implementing new functionalities, reducing technical debt and improving their performance overall.

unnamed

How did you end up at Trail of Bits?

I first learned of Trail of Bits in my Twitter feed. I was on the lookout for new opportunities, so I started sniffing around the company and learning about its many open-source projects. I began with McSema, a project for lifting compiled executables to LLVM IR. Originally, I just wanted to try out the software, but I wanted to talk to the developers so I ended up in the Empire Hacking slack, where Trail of Bits engineers answer questions about their open-source work. My main contributions on the project were to the CMake code, improving the build experience and implementing better dependency management.

Dan Guido (CEO, Trail of Bits) noticed my contributions to McSema, and happened to have an immediate need for someone to work on osquery issues, so he made me an offer. Dan sent me a contract to work on a single task, and I officially became a Trail of Bits contractor! I had so much fun; it was the first time I was allowed so much freedom in working on a project — both in when I could work, and how I could direct my own tasking.

My contract ended when I finished the osquery task. With more time on my hands, it was now the perfect opportunity to engage more with the community, and take on the bugfixes and feature requests submitted by the users. Eventually, Trail of Bits had received enough requests for osquery work that they sent me a full-time job offer, and the rest is history.

What projects are you working on currently?

Primarily I am involved with the osquery project, having dedicated so much of my time to it that I was accepted as a member of the five-person committee of maintainers! The project, and especially its build toolchain, is currently being renovated to operate independently of its old home at Facebook.

I also provide Qt SDK UI work for an internal project where we are creating a powerful source code navigation and cross-referencing utility for security auditors. Beyond that, I occasionally help out our other projects with their CMake-related issues.

On the side, I’ve continued to pursue experiments with how to fit eBPF into osquery, which is part of an ongoing effort to improve osquery’s event-driven detection capability on Linux. I recently spoke on this topic at QueryCon.

How do you schedule your workday?

When I don’t have to work alone, for any kind of collaboration I need to align my schedule with the rest of the team. Because of the time zone differences, I have to be flexible. If I were to stick to a strict 9am-6pm work shift, it wouldn’t really work. I organize my workday around my preferred schedule, but also that of the US-based Trail of Bits employees and customers who are 6 to 9 hours behind/earlier than me here in Italy. When it’s late afternoon in New York, it’s nighttime in Milan. Most of my meetings are around 5pm or 6pm my time, which suits me. It has never been a problem; I really like the schedule.

What are the most challenging aspects of your job?

Sometimes a task’s requirements, at least the way the task is initially envisioned, are hard to implement due to technical or design hard constraints. That’s difficult, because you have to find a creative compromise that works for everyone.

On rare occasions that I get stuck, I get the help of the team. Our Slack channels are like a mini StackOverflow website: you can just ask, and get immediate answers from experts. That is one of the great things about working here.

When contributing to any open-source project with external maintainers, you will eventually have to work with people outside the company to finish your job and get the work integrated into the next release. Sometimes, you have to work a little extra after you think the task is “finished,” because you still have to work with the upstream project to make everyone happy.

What is the most rewarding aspect of your work at Trail of Bits?

I was always interested in information security. I would look at Twitter and see all of these conferences, events, and people who were building great things. I am finally able to travel to these events and meet these people. I even gave my first conference talk last month, at QueryCon!

I am exposed to challenging issues that make me learn, especially when I get other people at the company involved. The ability to work with, and learn from, a talented group of experienced engineers is a reward in itself.

When I am given a task, I am trusted with the responsibility to see it through to the end, and work on it on my own. I do my best work and feel the most motivated when I am trusted this way.

What is some career advice for someone who wants to join us here?

Whenever I sought positions in the security engineering field, they seemed to be mostly for external pen-testing web services, which wasn’t particularly interesting to me. I’ve done a little bit of reverse-engineering and CTFs, but vulnerability research is not really my field either. I like to apply my engineering skills working on projects to build software. I’ve decided that you have to actively seek something that challenges and interests you, and carve out your own opportunity.

My advice is to find a relevant project you would like to support, and look for easy issues to solve, or even just review an open Pull Request, or improve the documentation. Once you get to know the project, it becomes easier to start contributing cool changes.

This is exactly what has worked for me personally. I know it is hard, because most people don’t have the time for after-hours work. And there’s no guarantee that you will get hired. But choose projects that are intrinsically motivating to you, and keep doing cool stuff as much as possible in your spare time. Have fun, and in the end you will get noticed.

The Engineering Services Team is Hiring

We appreciate Alessandro taking the time from his projects to talk about what it’s like to work here. Our Engineering Services Team is distributed around the globe, and each of our engineers brings a unique set of skills and contributions. Our work is public-facing, open-source, and client-driven. In close partnership with our customers, we are continuously working to extend and improve endpoint security solutions like osquery, Santa, and gVisor. Our recent work includes the implementation of 2FA support within PyPI, the Python package management system. We contribute to security event alerting pipeline projects like StreamAlert or the Carbon Black API, and are always working to improve our own security analysis tools like McSema and Remill. Our customers rely on us to solve their open-source security software challenges.

We are currently hiring for another Senior Security Engineer. Please apply if you are interested and feel you are qualified!

246 Findings From our Smart Contract Audits: An Executive Summary

Until now, smart contract security researchers (and developers) have been frustrated by limited information about the actual flaws that survive serious development efforts. That limitation increases the risk of making critical smart contracts vulnerable, misallocating resources for risk reduction, and missing opportunities to employ automated analysis tools. We’re changing that. Today, Trail of Bits is disclosing the aggregate data from every full smart contract security review we’ve ever done. The most surprising and impactful results we extracted from our analysis are:

  • Smart contract vulnerabilities are more like vulnerabilities in other systems than the literature would suggest.
  • A large portion (about 78%) of the most important flaws (those with severe consequences that are also easy to exploit) could probably by detected using automated static or dynamic analysis tools.
  • On the other hand, almost 50% of findings are not likely to ever be found by any automated tools, even if the state-of-the-art advances significantly.
  • Finally, manually produced unit tests, even extensive ones, likely offer either weak or, at worst, no protection against the flaws an expert auditor can find.

Continue reading this post for a summary of our study and more details on these results.

Everyone wants to prevent disastrous vulnerabilities in Ethereum smart contracts. Academic researchers have supported that effort by describing some categories of smart contract vulnerabilities. However, the research literature and most online discourse is usually focused on understanding a relatively small number of real-world exploits, typically with a bias towards the highly visible. For example, reentrancy bugs are widely discussed because they were responsible for the infamous DAO attack and are somewhat unique to smart contracts. However, is reentrancy the most common serious problem in real smart contracts? If we don’t know, then we cannot effectively allocate resources to preventing smart contract vulnerabilities. It’s not enough to understand detection techniques. We have to know how and where to apply them. Smart contracts are new. Decision makers have a relatively shallow pool of developer/analyst experience upon which to base their actions. Having real data to draw conclusions from is essential.

So, we collected the findings from the full final reports for twenty-three paid security audits of smart contract code we performed, five of which have been kept private. The public audit reports are available online, and make informative reading. We categorized all 246 smart-contract related findings from these reports, in some cases correcting the original audit categorization for consistency, and we considered the potential for both static and dynamic analysis tools, in the long-run, to detect each finding. We also compared the frequencies of categories to those for fifteen non-smart-contract audits we performed. Using paid, expert audits of code means our statistics aren’t overwhelmed by the large number of relatively silly contracts on the blockchain. Using many audit findings instead of a handful of exploited vulnerabilities gives us a better picture of potential problems to keep watch for in the future.

Category Frequencies are Different than Other Audits… But Not as Much as You’d Think

The most common type of smart contract finding is also the most common kind of finding in the 15 non-smart-contract audits we examined in order to compare smart contract and other kinds of audits: data validation flaws are extremely common in every setting, constituting 36% of smart contract findings, and 53% of non-smart contract findings. This is no surprise; accepting inputs that should be rejected, and lead to bad behavior, will always be easy to do, and always be dangerous. Access control is another common source of problems in smart contracts (10% of findings) and in other systems we audited (18%); it’s easy to accidentally be too permissive, and access control can also be disastrous if too restrictive (for example, when even the owner of a contract can’t perform critical maintenance tasks in some states, due to a contract bug).

Some categories of problems are much less common in smart contracts: unsurprisingly, denial of service, configuration, and cryptography issues are less frequent in a context where the blockchain abstracts away communication issues and operating system/platform-specific behavior changes, and gas limits reduce the temptation to roll your own cryptography. Data exposure problems are also less common in smart contracts; most developers seem to understand that data on the blockchain is inherently public, so there seem to be fewer misunderstandings about the consequences of “surprise” visibility. But these cases are somewhat unusual; for a majority of types of finding, including overflow/underflow and arithmetic precision, patching, authentication, timing, error reporting, and auditing and logging, the percentages in findings are within 10% of those for non-smart-contract audits.

The Worst of the Worst

In addition to counting how many findings there were in each category, we looked at how serious those findings tended to be; their potential severity, and the difficulty for an attacker to exploit them. We refer to the worst findings as high-low: high severity, low difficulty. These issues can allow an attacker to inflict major damage with relative ease.

Many of our twenty-two categories had no high-low findings, but a small number had high-low rates greater than 10%: access controls (25% high-low), authentication (25%), timing (25%), numerics (23%), undefined behavior (23%), data validation (11%) and patching (11%). Note that much-dreaded reentrancy, while often serious (50% of reentrancy findings were high severity) had no high-low findings at all, and accounted for only 4 of the 246 total findings.

Tools and Automation: We Can Do a Lot Better

For each finding, we determined, to the best of our knowledge, if it could potentially be detected by automated static analysis (e.g., our Slither tool), using a reasonable detector without too many false positives, or by automated dynamic analysis (e.g., with property-based testing like Echidna or symbolic execution like Manticore), either with off-the-shelf properties like standard ERC20 semantics, or using custom invariants. Rather than restricting ourselves to current tools, we looked to the future, and rated a finding as detectable if a tool that could be produced with significant engineering effort, but without unprecedented advances in the state-of-the-art in software analysis, could potentially find the problem. That is, we asked “could we write a tool that would find this, given time and money?” not “can current tools definitely find this?” Obviously, this is a somewhat subjective process, and our exact numbers should not be taken as definitive; however, we believe they are reasonable approximations, based on careful consideration of each individual finding, and our knowledge of the possibilities of automated tools.

Using this standard, 26% of the full set of findings could likely be detected using feasible static approaches, and 37% using dynamic methods (though usually only with the addition of a custom property to check). However, the potential for automated tools is much better when we restrict our attention to only the worst, high-low, findings. While static tools have less potential for detecting high-low findings than dynamic ones (33% vs. 63%), four of the high-low issues could probably only be detected by static analysis tools, which are also much easier to apply, and require less user effort. Combining both approaches, in the best-case scenario, would result in automatic detection of 21 of the 27 high-low findings: almost 78% of the most important findings. Our estimates of how effective static or dynamic analysis tools might be, in the limit, also vary widely by the kind of finding:

Category Dynamic Static
Access controls 50% 4%
API inconsistency 0% 0%
Auditing/logging 0% 38%
Authentication 25% 0%
Code quality 0% 67%
Coding bug 67% 50%
Configuration 0% 0%
Cryptography 0% 100%
Data exposure 0% 0%
Data validation 57% 22%
Denial of service 40% 0%
Documentation 0% 0%
Error reporting 29% 14%
Front-running 0% 0%
Logic 0% 0%
Missing logic 67% 0%
Numerics 46% 69%
Patching 17% 33%
Race condition 6% 59%
Reentrancy 75% 100%
Timing 50% 25%
Undefined behavior 0% 31%

Of course, in some cases, these percentages are not particularly informative; for instance, there was only one cryptography finding in our audit set, so it isn’t safe to assume that all cryptography bugs are easy to catch with a static analysis tool. Similarly, the coding bug category, containing what amount to “typos” in code, is likely to have an even higher percentage of easily statically detected problems, but there were only a handful of such problems in our audits.

Our best guess is that the combination of major impact on system behavior (thus high severity) and low difficulty (thus easy to find, in some sense) is not only a boon to would-be attackers, but a big help to automated analysis tools. That’s good news. Ongoing efforts to improve smart contract analysis tools are well worth the effort. That’s part of our motivation in releasing Crytic, a kind of Travis CI for smart contracts — with built-in support for running static analysis (including some Slither detectors not yet available in the public release) and, soon, dynamic analysis, on your code, automatically.

Perhaps the most important upshot here is that using high-quality automated static analysis is a best practice with almost no downside. If you’re writing important smart contracts, looking at a relatively small number of false positives in order to detect, with almost no developer effort, some of the most critical flaws is simply the right thing to do.

Tools and Automation: No Silver Bullet

However, a lot of the findings (almost 49%) are almost impossible to imagine detecting with a tool. In most of these cases, in fact, a tool isn’t even very likely to help. Slither’s code understanding features may assist in finding some issues, but many problems, and almost a quarter of the most important problems, require deeper understanding of the larger context of blockchains and markets. For example, tools can’t inform you about most front-running. Problems requiring human attention are not limited to the obvious categories, such as front-running, configuration, and documentation, either: there are 35 data validation findings, 12 access controls findings, and 10 undefined behavior findings, for example, that are unlikely to be detectable by automated tools – and 3 of these are high-low findings. A full 35% of high severity findings are unlikely to be detected automatically.

Even in the best of possible near-future automated tool worlds, and even with full formal verification (which might take up to 9x the developer effort), a great many problems simply require human attention. Security is a Strong-AI Hard problem. Until the robots replace us, independent expert attention will remain a key component of security for the most essential contracts.

Unit Tests are Great… But Maybe Not For This

Finally, what about unit testing? We didn’t add any unit tests during our audits, but we can look at whether the presence of significant unit tests was correlated with fewer findings during audits, or at least fewer high-low findings. The number of data points is, of course, too small to draw any solid conclusions, but we didn’t find any statistically significant correlation between our estimate of unit test quantity and quality and the presence of either findings in general or high-low findings. In fact, the insignificant relationships we did detect were in the wrong direction: more unit tests means more problems (the positive relationship was at least weaker for high-low findings, which is comforting). We hope and believe that’s just noise, or the result of some confounding factor, such as a larger attack surface for more complex contracts. While our basis for this result is subject to a number of caveats, including our ability to gauge the quality of unit tests without examining each line of code in detail, we do believe that if there were a causal relationship between better unit tests and fewer audit findings, with a large and consistent effect size, we’d have seen better evidence for it than we did.

Obviously, this doesn’t mean you shouldn’t write unit tests! It means that the kinds of things attackers and auditors are looking for may not overlap significantly with the kinds of problems unit tests help you avoid. Unit testing can probably improve your development process and make your users happier, but it may not help you actually be much more secure. It’s widely known that the problems developers can imagine happening, and write unit tests to check for, do not often overlap with the problems that cause security vulnerabilities. That’s why fuzzing and property-based testing are so valuable.

One key point to take away here is that the bugs found by property-based testing can be added as new unit tests to your code, giving you the best of both worlds. The pyfakefs module for creating high-fidelity mock file systems in Python, originally developed at Google, was a widely used software system, with a fairly extensive set of well-written unit tests. However, using the TSTL property-based testing tool for Python revealed over 100 previously undetected problems in pyfakefs (all of which were fixed), and let the developers of pyfakefs add a large number of new, more powerful, unit tests to detect regressions and new bugs. The same workflow can be highly effective with a well-unit-tested smart contract and Echidna; in fact, it can be easier, because Echidna does a better job of automatically figuring out the public interface to a contract than most property-based testing tools do when interacting with a library API.

Stay Tuned for More Details

We’ll be publishing the full results after we add a few more of our own smaller-scale audits and validate our results by comparing to estimates for audits performed by other companies. In the meantime, use our preliminary results to inform your own thinking about defects in smart contracts.

From The Depths Of Counterfeit Smartphones

In an age of online second-hand retailers, marketplace exchanges, and third-party refurb shops, it’s easier than ever to save hundreds of dollars when buying a phone. These channels provide an appealing alternative for people foregoing a retail shopping experience for a hefty discount.

However, there is an additional option for those bargain hunters seeking even more savings: counterfeits of popular phone models. These knock-offs have become a burgeoning industry, transforming cheap hardware and free software into mass profits at almost no cost to the manufacturers. These clones often sell at under 1/10th of the retail price and are often very convincing replicas at first glance.

Last year, we helped Motherboard Vice with an investigative teardown of a counterfeit iPhone X. We were haunted by the many security concerns and vulnerabilities we’d discovered, so we worked with DeviceAssure—a company dedicated to anti-counterfeit solutions for mobile platform—to do a deeper dive into these dangerous duplicates.

This post details what we found and provides some insight into exactly what you are getting when you use one of these phones. Whether it’s intentional malice or just dangerous ineptitude, there is plenty to be concerned about!

First Impressions

We looked at two counterfeits: an iPhone 6 and a Samsung S10.

Front and back of the counterfeit iPhone

Front and back of counterfeit Samsung S10

The visual aesthetic of the devices are very convincing. Proper attention to peripheral layout, dimensions, and overall finish is almost identical to their retail counterparts. The various switches and buttons correspond to what you would expect in the real devices to control the phone lock, adjust the volume, and turn them on and off. The counterfeit iPhone even uses a lightning cable in its charge port!

Both models are equipped with haptic feedback and fingerprint sensors that do indeed work … mostly. Facial biometrics are also included, though they had a considerably higher failure rate and would often not work at all.

From an initial glance at the underlying guts, both devices rely on controllers from Mediatek, a Chinese hardware company that provides an ARM chipset for embedded devices that is both incredibly cheap and reasonably capable. They also rely, as many counterfeits do, on custom, largely community-built ROMs of the Android runtime—a telltale sign that functionality will be non-standard and rife with one of hundreds of variant-specific quirks.

The Good – They look and work somewhat like the real thing… sometimes

You get a phone that looks and works vaguely like the one it counterfeited, with a few exceptions. What’s good other than that?

Nothing.

No really, even if these devices had hypothetically sported pristine ROMs (which, hint: they didn’t), they still came with a slew of critical problems, even if they weren’t outright backdoored with preinstalled malware (which, hint: they were).

We can give a C+ for effort to some of the detail rendered into the system UI. A good majority of modal popups and panel settings are faithfully intercepted and recreated using extensions to the native resource framework.

In particular, the Samsung counterfeit uses the native launcher, UI/Icon pack, and theming engine for its variant of Android; it is almost indistinguishable from the original. It even includes legitimate portals for both Samsung and Google play app stores.

The iPhone, however, quickly falls apart after minutes of exploration. The ROM system layer initially presents a believable iOS UI, but edge cases in event behaviors (WiFi connection errors, application exceptions, certain input text types, etc.) reveal stock Android screens. In addition, the “iOS” apps all displayed in noticeably low resolution and contain creatively broken english (in stark contrast to the ROM system layer, suggesting separate authors).

The Bad – They are full of unpatched vulnerabilities and insecure bloatware

Both phones report running the latest version of Android Pie 9.0; a relatively hardened OS in most regards. However, it’s not true.

In the case of the iPhone, further digging revealed that it runs a far older version of Android: Kitkat 4.4.0. Kitkat’s last update came in 2014. As you can imagine, hundreds of CVEs have appeared since then, not to mention inherent design flaws that have since been reworked: sandbox mechanisms, file system partitions, and dangerous permission APIs to name a few.

We probed a few well-known weaknesses and confirmed they were unpatched. The device is susceptible to the notorious Stagefright bugs which exploit media processing in the SMS/MMS messages to gain remote control of the device. In addition, several vulnerabilities in old Android system daemons, including the Mediaserver and Surfaceflinger, exhibited unpatched functionality. Because these AOSP ROMs are compiled out-of-band and maintained ad-hoc in the depths of board-hacking and system-modding forums, it is unlikely that users could ever patch these for themselves. There is certainly no over-the-air upgrade capability.

The S10 runs a slightly newer Android: Lollipop 5.1. Last updated in 2015, Lollipop replaced the Dalvik VM with the modern ART VM, and added Material UI theming elements thus allowing our counterfeit to use the Samsung UI components.

However, there is an even more serious problem that plagues both phones: outdated kernels. In addition to the Android runtime updates, the Linux kernel in Android phones often requires vendor participation to downstream security fixes onto the phone. Even in legitimate Android devices, this process often lags behind security releases and requires additional engineering effort by the vendor. The volunteer community of Mediatek ROM maintainers aren’t going to keep up with daily security updates, so outdated kernels in counterfeits are inevitable. Both phones had vulnerable kernels that were successfully exploited by known bugs, like DirtyCow (a copy-on-write memory race condition) and Towelroot (Futex timing bug ported to Android). No doubt a wide host of other kernel bugs are available for a potential attacker to abuse.

The Mediatek device drivers and daemons are a source of abundant vulnerabilities as well, often leading to kernel-level execution. Again, the ability or likelihood that a user would be able to appropriately find and patch these systems is highly unlikely.

Another pitfall of these phones is the presence of debug and testing utilities that expose dangerous system-level permissions in the Mediatek baseline ROM packages. This was observed on both of these devices, as well as on multiple other counterfeit variants we’ve researched. The Galaxy S10 counterfeit features a remote debugging server that allows remote control over media files, logging SMS messages, and deleting phone numbers.

The Mediatek Android daemon (MTKAndroidSuite package) on the Galaxy S10 counterfeit starts a local FTP server that can be used to manipulate files due to the elevated permissions of the service.

Still on the S10, incoming SMS’ are saved to the application’s SQLite database, which is not protected by access controls and can be read by other applications.

An overview displaying some of the Mediatek daemon capabilities (as indicated via class filenames) shows that the daemon can also retrieve media files, dump phone contacts, delete messages from the phone, and more.

These counterfeits are undeniably insecure. Both lie about their Android versions. The ROM versions used were severely outdated and vulnerable to public exploits, as were their kernels. They include bloatware, like remote debugging services, that enable abuse. This is what you’d expect from a phone that’s built around a volunteer-maintained, outdated Android ROM.

The ability for vendors and developers to seamlessly integrate and enforce security updates across their devices has been a massive win for mobile security. This requires a larger ecosystem that extends beyond the phone itself. Not only are these clones lacking in the latest and greatest hardware mitigations, but being isolated from the larger ecosystem and its security safety net is an inherent risk that can never truly be mitigated by these knockoffs.

The Ugly – They contain malware and rootkits

Both the Galaxy S10 and iPhone 6 counterfeits we assessed contained malware and rootkits.

The first issue we noticed in both devices was the presence of Umeng, an invasive analytics library, embedded into many of the applications and system libraries. Based out of China, Umeng has been caught employing malware in their operations. It collects and sends user information, including name, gender, IMEI numbers, serials, and more, back to their servers regularly without prompting any of the usual permission-consent disclaimers.

In the case of the S10, we found the SystemUI framework was modified to embed a server that can arbitrarily download, install, and run .dex files, in addition to reporting event information collected from system events such as geolocation, contact creation, and package installation and removal. For example, the library components used for facial recognition came bundled with functionality that can install arbitrary Android applications on demand.

On the S10, a hidden component in the SystemUI downloads files off the internet in the background. Note the Mandarin logs at the bottom of the screenshot!

Monitoring the S10’s network activity, we found it periodically reaching out to an unknown server. This is the origin of those requests, found embedded inside the SystemUI framework library.

One example of a component that has the capability to install additional applications on-demand in the facial recognition software. “ReadFace” is a third-party library, integrated inside the SystemUI framework, that seems to simulate biometric facial recognition. Within this code, it seems that there is the ability to arbitrarily install APKs.

Finally, the S10 included a RAT masquerading as a font extension system service (“LovelyFonts”) that allows for remote native code execution, complete with a shell, arbitrary file upload/download, and logging of system events. This RAT provides unlimited access to the person who planted it there, enabling total compromise of the phone and all its data. We observed that certain events, such as installing packages or sending text messages, would trigger connections to exchange encrypted payloads remotely related to this backdoor. As a note, while this specific malware wasn’t present on the particular iPhone 6 that we studied, we have encountered variants of it on other counterfeit iPhone ROMs in the past.

This is a function inside the Lovelyfonts library that invokes a direct system call. The Lovelyfonts service comes with a library that allows a remote user to execute code directly on the machine, bypassing the Android Runtime.

Here, the malware is saying that it’s trying to instantiate a “network interceptor,” ostensibly interfering with network traffic.

The RAT malware detects whenever an app is installed or uninstalled and generates an encrypted payload to send to a remote API server.

Insecure, outdated ROMs are bad. Actual evidence of malicious intent is ugly. The phones we looked at both had Umeng, a known invasive analytics library that steals user data, embedded in multiple applications. The S10 had a server embedded in the SystemUI framework that can download, install, and run applications, and collect system data, and it had malware that grants unlimited access to the device to whoever planted it there.

The moral of the story? If you’re using counterfeit phones, there’s a high likelihood that it will provide bad actors access to your data by design. Embedding malware here is easy. It is trivial for a counterfeit manufacturer to implant and modify the ROM before distribution. Tracking or detecting either action is impossible for most users. While it is theoretically possible to find a ‘clean’ distribution, it is a gamble to make, never mind the inherent risk of using an insecure baseline system.

Conclusion – If you used one of these phones, you’d already be hacked

As the price point for handheld devices continues to climb, there will always be a temptation to seek cheaper alternatives. Counterfeit smartphones will continue to evolve in sophistication, performance, and threat to users. Using them puts your data at risk and may enable abuse of the applications and networks that you access and use.

Often times, it’s not obvious to buyers that they’re purchasing counterfeits. Fake versions like these are often acquired through Craigslist or other 3rd parties. Some are sold as scam upgrades or gifts. In some countries, it can be difficult to determine genuine sellers from counterfeit vendors because all phones are purchased independently from cellular contracts. Buying direct from Apple or Samsung is the best way to ensure nothing malicious comes preinstalled on your phone, and enables you to receive new software updates that patch security issues (well, at least theoretically). If you’re a company that allows employees to access corporate data on their phones, consider verifying devices for genuine software.

We hope that this investigation helped illuminate the dangers of opting into an “off-brand” device. If this was helpful, or sounds similar to a security concern you or your organization confront, reach out! We offer a wide range of services, including iVerify – a personal security app for iOS – for further securing your phone.

We’d like to again thank DeviceAssure for reaching out and providing us with the hardware to conduct this analysis as well as the opportunity to do some digging into this matter. They will be at Blackhat this year and so will some of us, so stop by and say hi. And as always, we love to hear about weird and strange products out there in the wild, so drop a line if there is something you think we should look at!

Better Encrypted Group Chat

Broadly, an end-to-end encrypted messaging protocol is one that ensures that only the participants in a conversation, and no intermediate servers, routers, or relay systems, can read and write messages. An end-to-end encrypted group messaging protocol is one that ensures this for all participants in a conversation of three or more people.

End-to-end encrypted group messaging is a necessary problem to solve. Whether it be for limiting liability, providing verifiable client-side security, or removing a single point of failure, there are good reasons for a group messaging host to use an end-to-end encrypted protocol.

End-to-end encrypted group messaging is also a hard problem to solve. Existing solutions such as Signal, WhatsApp, and iMessage have inherent problems with scaling, which I’ll discuss in detail, that make it infeasible to conduct group chats of more than a few hundred people. The Message Layer Security (MLS) protocol aims to make end-to-end encrypted group chat more efficient while still providing security guarantees like forward secrecy and post-compromise security.1

To these ends, I’ve been working on molasses, a Rust implementation of MLS, designed with safety, ease-of-use, and difficulty-of-misuse in mind.

Molasses has helped refine the MLS spec

The primary contribution of molasses has been in detecting errors in the specification and other implementations through unit and interoperability testing. Molasses implements most of MLS draft 6. Why not all of draft 6? There was an error in the spec that made it impossible for members to be added to any group. This broke all the unit tests that create non-trivial groups. Errors like this are hard to catch just by reading the spec; they require some amount of automated digging. Once they are found, the necessary revisions tend to be pretty obvious, and they are swiftly incorporated into the subsequent draft.

Iterating this discovery/patching process using molasses has given me a chance to put the spec through its paces and help make things clearer. This winter internship (“winternship”) project has been a great experience, especially as a first-time IETF contributor.

How to build encrypted group chat

In this section we derive why MLS is constructed the way it is (hint: for efficiency reasons), and how it compares to other solutions (hint: it’s better).

First off, MLS works on a lower level than most chat applications. It is a protocol upon which applications can be built. For example, MLS does not govern group permissions such as who can add people to the chat — this would be done by an application using MLS under the hood). Thus, we can leave things like formal rule systems out of the conversation entirely when analyzing the protocol. Here, we’re only going to consider the sending of messages and the removal of members.

The following section makes use of cryptographic primitives such as digital signatures, Diffie-Hellman key exchange, (a)symmetric encryption, and key-derivation functions. If the reader feels underprepared in any of these areas, a quick skim of the sections in Serious Cryptography on ECIES and Authenticated Diffie-Hellman should be sufficient.

Without further ado,

A Motivating Problem

Wilna is planning a retirement party for an acquaintance, Vince. The logistics are a nightmare, so she invites her friends Xavier, Yolanda, and Zayne to help her plan. They would like to make a group chat on Slack so they can all stay on the same page, but they remember that Vince is an infrastructure manager for Slack—he can see all the messages sent over any Slack server in the world. This is a problem, since they want to give Vince a nice long vacation upstate and they want it to be a surprise. Vince’s position poses even more problems: he happens to manage every single server in town. Even if Wilna purchases her own server to mediate the group chat, Vince will be tasked with managing it, meaning that he can read everything the server stores.

What Wilna needs is a centralized end-to-end encrypted group chat, i.e., a group chat where every member can broadcast messages and read all incoming messages, but the single server that mediates these messages cannot read anything. For clarity, we’ll distinguish between application messages, which carry the textual content of what a group member wants to say to everyone else in the group, and auxiliary messages (called “Handshake messages” in MLS), which members use to manage group membership and cryptographic secrets. Since this is all mediated through one server, the members can rely on the server to broadcast their messages to the rest of the group.

With the setup out of the way, what are the options?

Solution #1: Pairwise Channels

Suppose Wilna, Xavier, Yolanda, and Zayne all know each other’s public keys for digital signatures. This means that each pair of people can do an authenticated Diffie-Hellman key exchange over some auxiliary messages and derive a shared symmetric key called the pair key. This process produces six separate pairwise channels, represented here:

If Wilna wants to send an application message m to the group, she has to encrypt it three separate times (once for each member of the group) and send all the ciphertexts:

The grey arrows represent application messages encrypted under a symmetric key.

Note that Wilna isn’t making use of the server’s ability to broadcast messages, since each member in the group can only decrypt messages encrypted under their own pair keys. Generalizing this, if there is a group of size N, sending an application message requires a member to encrypt and send N-1 times. Roughly speaking, this is how iMessage does group chat.2

Great, so that’s just three encryptions per person. This probably takes at most a few milliseconds on a phone. What’s the issue? The issue is what about the WhatsApp group with >10,000 members where my aunts talk about who’s getting married next? Do you want them to do 9,999 encryptions every time they send something? I do, but they probably don’t. To accommodate my aunts, we need to get cleverer.

Solution #2: Sender Keys

Instead of having a key between every user in the group, let’s give every user a sender key that they use to encrypt application messages. This is roughly what Signal2, WhatsApp2, and Keybase do. If you’re a group member, you have to go through the following setup:

  1. Randomly generate your sender key
  2. For every user in the group, encrypt your sender key with your pair key that you share with that user
  3. Send every user their encrypted copy of your sender key as an auxiliary message

After the setup, which requires N-1 encryptions for each user in a group of size N (that’s Θ(N2) total auxiliary messages), we finally see some efficient behavior. To send an application message m, Wilna:

  1. Encrypts m with her sender key precisely once
  2. Broadcasts the ciphertext to the group

The grey arrows represent application messages encrypted under a symmetric key.

The grey arrows represent application messages encrypted under a symmetric key.

Although there are three arrows here, they are all the same ciphertext, so the application message only needs to be encrypted and broadcast once. Thus, after the setup phase, each outgoing application message only costs a single encryption. So we’re done, right? Wrong, of course wrong. Because…

What about Removal?

The fallacy here is that the setup phase runs once. It actually runs every time the group is modified. Suppose in the process of premeditating this “retirement party,” the group finds out that Zayne has been leaking details to Vince the whole time. Naturally, they kick Zayne out. Now Zayne still knows all the sender keys, so if he talks to Vince and gets an encrypted transcript of the group conversation that happened after his departure, he would still be able to decrypt it. This is a no-no, since Zayne has already defected. To prevent this from happening, each remaining user in the group has to create a new sender key and share it with everyone else through their pairwise channels. Again, this is Θ(N2) total auxiliary messages, which can be a lot. So if we want to tolerate tons of group modifications,3 we’re going to have to find a way to bring down the number of auxiliary messages sent during the setup phase, while still being able to keep using sender keys for application messages. A well-known secret in computer science is that when the naïve solutions of pairs and lists don’t work, there’s a next logical step:

Solution #3: Trees

We would like to have sender keys (since they make application messages efficient). We also want to be able to transmit new sender keys to subsets of the group without using too many auxiliary messages. The important insight here is that, when we remove a member, we shouldn’t need to individually send new keying information to every single remaining member like we had to in the previous solution. After all, we need to send this to the whole group minus just one person. So why not have public keys that cover large subsets of the group, and use those for sending auxiliary messages? This is exactly what the MLS ratchet tree (a.k.a. TreeKEM) affords us.

The MLS ratchet tree is a binary tree4 whose leaves correspond to members of the group, and whose non-leaf nodes, called intermediate nodes, carry a Diffie-Hellman public key and private key. Intermediate nodes don’t represent people, computers, or locations on a network; they’re just pieces of data that facilitate auxiliary message sending. We also allow nodes to be blank, meaning that they do not have an associated keypair. A node that does have an associated keypair is said to be filled. Every member in the group retains a copy of the ratchet tree, minus the private keys. Knowledge of the private keys follows the ratchet tree property:

Ratchet Tree Property If a member M is a descendant of intermediate node N, then M knows the private key of N.

*deep breath* Sender keys are derived via key-derivation function (KDF) from the root node’s private key, and private keys are derived via KDF from its most-recently updated child’s private key.5 Upon the removal of a user, new private keys are distributed to the resolutions of the copath nodes, i.e, the maximal non-blank nodes of the subtrees whose root is the sibling of an updated node.

That paragraph alone took about 10 minutes to write, so let’s just see…

A Small Example

We start off with a group like so:

Zayne wants out, so Yolanda removes him.6 To remove him, Yolanda will first blank out Zayne and all his ancestors:

The boxes with red slashes through them represent blank nodes.

The boxes with red slashes through them represent blank nodes.

Yolanda needs to contribute new keying information to the new group so that the new sender keys can be derived from the new root’s private key. To do this, she generates a new personal keypair pubY’ and privY’ and derives all her ancestors’ keypairs by iteratively applying a KDF to the private key and computing its corresponding public key (this is called “ratcheting,” whence “ratchet tree”).

The green circles indicate recently updated nodes.

But Yolanda isn’t done. Wilna and Xavier need to be told about these new keys somehow. It’s Yolanda’s job to share this info. In particular,

  1. Every member needs to get a copy of the public keys of all updated nodes (i.e., Yolanda’s own public key and all her ancestors’). This is important. The public keys are part of the shared group state, and shared group state is how a bunch of values in the MLS protocol are derived.
  2. Every member needs to get a copy of the private keys of their nearest modified ancestor. This is in order to preserve the ratchet tree property.

Remember that the end goal is still to derive the sender keys, which means that Wilna and Xavier need to be told the value of the root private key, privY”’. This will be a consequence of item two above.

Since everyone needs public keys and public keys are not secret, Yolanda can just broadcast them as unencrypted auxiliary messages. But private keys are more sensitive. She needs to encrypt them for just the members who need them. This is where we use the ratchet tree property. If she wants Wilna and Xavier to be able to read an auxiliary message containing privY”’, she need only encrypt the message under pubWX, since Wilna and Xavier are descendants of the WX intermediate node, and will therefore be able to decrypt anything encrypted under pubWX.7 This describes how the auxiliary messages are sent to the rest of the group:

The solid black arrows above indicate public-key encrypted messages. The dashed arrows indicate plaintext messages. The arrows do not indicate who is doing the sending (since that’s all Yolanda). They’re just meant to illustrate where in the tree the values are coming from and whom they’re intended for.

Now Wilna and Xavier will update their view of the tree by saving the public keys and decrypting the root private key. Thus, everyone is on the same page and the ratchet tree property is preserved. Finally, everyone re-derives their sender keys, and the removal is complete:

Note that Zayne’s position remains blank after the removal. This saves the members from the computational overhead of shuffling themselves around and recomputing their ancestors’ keypairs. MLS defines two ways to prevent removed members from overcrowding the tree: it allows blank nodes to be removed from the right end of the tree after removals (not applicable in the example above), and it allows new members to be added in the position of previously removed members. So if the “party-planners” above wanted to replace Zayne, they could do so without making the tree bigger.

This example illustrates the smaller details in updating keys, but it doesn’t do a particularly good job at illustrating which node secrets are sent to which other nodes in the resolutions of the copath nodes. To give an idea, here’s…

A Much Bigger Example

Suppose Zayne wants to break out and go solo, but still feels the desire to be in a boy band. After cloning himself 15 times, Zayne #1 notices that one of the clones, Zayne #11, keeps hinting at breaking off and doing a solo career of his own. Zayne #1 acquiesces and removes him from the group. He sees what he’s created. Zayne #1 looks up at the stars. War soon.

Let’s see what auxiliary messages were sent when Zayne #11 was booted. In this removal process, Zayne #1 generates new secrets, ratchets them all the way up the tree, and shares them with the appropriate subtrees:

The green circles still represent the updated nodes. The solid arrows represent the private key of its tail being encrypted under the public key of its head.

The green circles still represent the updated nodes. The solid arrows represent the private key of its tail being encrypted under the public key of its head.

Notice on the right hand side of the tree, since you can’t encrypt to a blank node, the root private key needs to be encrypted under three separate public keys. The dashed arrows were omitted for clarity, but it’s still true that the public keys of all the circled nodes are broadcasted in this step.
With this larger example, you might start to see some pattern in how many auxiliary messages are sent per tree update. Let’s play

Can You Eyeball the Asymptotic Behavior?

We got efficient application messages with sender keys, and we’d like to say that we got efficient auxiliary messages with TreeKEM so we can call it a day. Is this true? Absolutely not, at least not entirely. Let’s first talk about the example above, where we start off with a tree whose nodes are all filled.

Removal in a Filled Tree

The Zayne example is actually worst-case removal behavior in a filled tree in terms of number of auxiliary messages (you should prove this to yourself: what would happen if Zayne #1 removed Zayne #6 instead?). If there are N many members in the group, there are at most log(N)-1 encrypted auxiliary messages that don’t have to deal with blank nodes, and another log(N)-1 that do. Plus, there are log(N) many public keys to share. So, to complete the sage wisdom from computer scientists of days past, if you use trees, you get O(log(N)) behavior. This is way better than the quadratic number of auxiliary messages we saw in solution #2. The same WhatsApp group of kibbitzing mumehs now only takes about 3log2(10,000) ≈ 40 total auxiliary messages to establish a new set of sender keys (assuming a filled tree) instead of the N(N-1) ≈ 99 million total auxiliary messages required previously.

Removal in a Tree with Blanks

This logarithmic behavior is fantastic, but we only checked for the very specific case where we start with a full group and then remove one person. How efficient is it when we remove a single person from a group that already has some blanks? The good news is that it’s still better than Θ(N2). The bad news is that the worst case is… well let me just show you.

Suppose every odd-numbered Zayne was removed from the group besides Zayne #1. Finally, Zayne #2 deals the finishing blow, removing Zayne #1 and restoring peace. Here is what the update looks like:

That’s N-1 messages to remove a single person! As mentioned before, this can be a prohibitively large number of auxiliary messages for large N. Even worse, it may be possible for malicious group members to strategically remove people until the tree reaches the worst-case state, thus slowing down group operations for everyone in the group.

Dealing with this situation is an open issue, and people are actively working on resolving or at least mitigating it. As of this writing, though, there are no proposed solutions that would materially improve the worst-case behavior.

Conclusion and More Info

It’s underwhelming to end at an open issue, but this is where the protocol stands today. Efficiently updating keys is at the crux of end-to-end group messaging. The TreeKEM method, edge cases and all, is one of the most important singular contributions that MLS makes. Given that there’s still at least one open issue in the spec, you may wonder

How close is the protocol to being done?

No clue. MLS has plenty of open issues (nine as of this writing) and is being tweaked constantly. Draft 7 landed just this month, and it completely overhauled the symmetric key schedule. Inefficiencies are being shaved down as issues around authenticity, confidentiality, deniability, etc. are being patched.

What are the other implementations?

The unofficial reference implementation, mlspp, is used to create test vectors that we implementers all test against. There’s also MLS*, a project at Inria to implement and formally model the protocol in F*. And there’s even another Rust implementation, melissa, being written at Wire.

Remind me why you’re writing yet another Rust implementation?

The more implementations the better. Writing this implementation has helped find errors in mlspp and the specification itself.

Errors found in mlspp include missing important fields (missing protocol version and missing hash of WelcomeInfo, which enforces sequencing), incorrect tree addressing (using leaf indices instead of node indices and vice-versa), and incorrectly generated test vectors. Errors in the specification that we found include ambiguities (how are removed nodes pruned from the ratchet tree?), logical impossibilities (how can you add a user to the group if your WelcomeInfo doesn’t include the current decryption keys?), and deontological omissions (SHOULD8 a user verify the broadcasted pubkeys against their derived pubkeys or not?).

Ok great, but why Rust?

*cracks knuckles*

I thought it would be nice to have an MLS implementation that has a clear API (thanks to molasses’ careful design and Rust’s strong typing), memory-safe semantics (thanks to the Rust borrow checker), thorough documentation (thanks to cargo doc and molasses’ current 43% comment-code ratio), and good performance (thanks to ZERO-COST-ABSTRACTIONS). Of course, none of these features make up for the fact that molasses is not formally verified like MLS* and may never be, but hey, nobody ever complained that cotton isn’t as bulletproof as kevlar, cuz those are for different things.

How can I help?

I don’t recommend filing issues with molasses quite yet. The spec is moving too quickly and the library has to be redesigned accordingly each time. If you would like to contribute, the MLS IETF page has a mailing list where you can read and participate in discussions. The organizers are helpful and patient, and I appreciate them immensely. If you want to write your own implementation, see the implementers’ Github repo for organizing info and test vectors.

If you are interested in reading more about the protocol and seeing some of the other open issues, you should give the spec9 a read.

“I want your expertise”

Well that’s going to cost you. We offer consulting in end-to-end protocol design, engineering, and auditing. Drop us a line on our contact page if you’re interested.

Thanks for reading! If you have questions or corrections, please feel free to email me at michael.rosenberg@trailofbits.com.

Footnotes:

  1. Full post-compromise security, i.e., the problem of non-deterministically deriving all new shared data so as to make the excluded parties unable to participate, is actually not easily achieved in this scheme. There is ongoing research in characterizing how post-compromise secure MLS is after a certain number of group updates.
  2. Source. This is a fantastic paper which provides a lot of context for this article. Seriously, if you want to understand this topic better, you should read the MLS spec and this paper and compare the two, since they differ in pretty subtle but significant ways. E.g., the ART scheme used in the paper does not allow intermediate nodes to be blank, which affects confidentiality of messages sent to offline members.
  3. The problem of Removal in this article is a placeholder for (a weaker form of) post-compromise security. Here, “group modifications” includes updating key material without changing group membership.
  4. Specifically, it is a left-balanced binary tree. This is fancy computer talk for “every left subtree is full,” which itself is fancy computer talk for “it behaves good when stuffed into an array.”
  5. Both these statements are technically false, but it’s way easier to think of things this way, and it’s close enough to the truth imo. In reality, sender keys are derived from a long chain of secret values relating to group state and state transitions. Node private keys are simpler, but they are also derived from chains of other secrets called “node secrets” and “path secrets.” As always, see the spec for more details.
  6. MLS doesn’t allow users to remove themselves. This is a quirk of the protocol, but it doesn’t really affect anything.
  7. If you’re confused why I say all these keys are DH keys and then use public-key encryption, it’s because the public-key encryption in MLS is done with ECIES. More specifically, it’s HPKE.
  8. The all-caps “SHOULD” means something specific in IETF RFCs. Its meaning is governed by not one but two RFCs, which are referred to as Best Current Practice 14. The linguistic conventions of RFCs are super cool and alone make it worth skimming a few specs and paying attention to their “conventions and terminology” sections. TLS is as good a place to start as any.
  9. If you want a particularly nice reading experience, you should compile the spec yourself from source. It really is appreciably better.

SVGs of the images in this post are available.

Crytic: Continuous Assurance for Smart Contracts

Note: This blog has been reposted from Truffle Suite’s blog.

We are proud to announce our new smart contract security product: https://crytic.io/. Crytic provides continuous assurance for smart contracts. The platform reports build status on every commit and runs a suite of security analyses for immediate feedback.

The beta will be open soon. Follow us on twitter to be notified and benefit from the service as soon as possible! The first three months are free.

How Crytic will secure your smart contracts

Once connected to your GitHub repository, Crytic will continuously:

  • Run our static analyzer Slither, which detects the most common smart contracts vulnerabilities and will save you from critical mistakes.
  • Run your Truffle tests to ensure that no functional bugs are added while developing your project.

Slither will analyze your codebase for more than 60 security flaws, including reentrancy, integer overflows, race conditions, and many others. Half of these flaw-detectors are private and were not available to the public. They can detect flaws for which public knowledge is limited and that no other tool can find. The recent GridLock bug would have been detected ahead of time using Crytic!

We built this platform for developers, so we integrated it with GitHub. It will watch every commit and branch to ensure that bugs are not added during development. In addition, Crytic will run the checks on every PR to facilitate your code review.

For every security issue found, Crytic will:

  • Show you a detailed report on the bug, including source-code highlighting.
  • Allow you to create a GitHub issue to keep track of the fixes easily.
  • Let you triage the results, so you can decide what needs to be fixed.

Quick Walkthrough

Adding Crytic to your system is straightforward: you just need to connect to your GitHub repository. We have first-class support for Truffle; it works out of the box! We also support most of the other smart contract platforms, including Embark, Dapp, and Etherlime. After adding your repository, the dashboard (Figure 1) will show you a summary of the project, like this crytic-demo:

Figure 1: The Crytic Dashboard

From now on, you will benefit from continuous security analyses.

Issue reports

Finding an issue is only the first part. Crytic will provide you with detailed information you need about the bug to fix it:

Figure 2: Reports provide detailed information needed to understand and fix issues

A careful reader will notice the vulnerability here: function constuctor creates a public function (with a typo!) that is callable by anyone instead of being run only at initialization. Crytic will detect these types of critical mistakes instantaneously.

Triaging issues

Once a bug has been found, the user can decide to:

  • create a GitHub issue, to easily keep track of the fix, or
  • discard the issue.

Figure 3: Crytic easily creates GitHub issues for selected reports

Crytic follows the modifications to your code and reports only new bugs that are introduced. Each new PR will be analyzed automatically:

Figure 4: Crytic is fully integrated with GitHub Pull Requests

What’s next for Crytic

We are constantly improving Crytic. Expect to see new bug detectors and new features in the future. We are planning to add:

  • Echidna and Manticore integration: to ensure your code is checked for custom security properties.
  • Automatic bug repair: Crytic will propose patches to fix the issues it finds.
  • Slither printer integration: to help visualize the underlying details of your code.
  • Delegatecall proxy checker: to prevent you from making critical—and all too common—mistakes in your upgradeability process.

Questions? Bring them to TruffleCon, and pose them to us at our booth or at our Friday workshop on automated vulnerability detection tools!

Whether or not you can make it to TruffleCon, join our slack channel (#crytic) for support, and watch @CryticCI to find out as soon as our beta is open.