Part 1: The life of an optimization barrier

By Fredrik Dahlgren

Many engineers choose Rust as their language of choice for implementing cryptographic protocols because of its robust security guarantees. Although Rust makes safe cryptographic engineering easier, there are still some challenges to be aware of. Among them is the need to preserve constant-time properties, which ensure that, regardless of the input, code will always take the same amount of time to run. These properties are important in preventing timing attacks, but they can be compromised by compiler optimizations.

Recently, a client asked us how packaging a library as an npm module using wasm-pack and then running it using node would affect constant-time cryptographic code implemented in Rust. Writing constant-time code is always a fight against the optimizer, but in this case, the code would be optimized twice before being executed, first by LLVM and then by the Turbofan JIT compiler. How would this affect the constant-time cryptographic code used by the library?

We ran a number of small test cases to explore how optimizing the codebase twice would affect the constant-time properties of the code. This post will focus on the challenges of implementing constant-time Rust code and show that LLVM may introduce a new side-channel when compiling constant-time code to WebAssembly (Wasm). In part 2, we look at whether it is possible to selectively disable optimizations for security-critical code that needs to execute in constant time.

Constant-time what?

Cryptography is difficult to implement correctly. This is true when you are implementing both high-level protocols and low-level cryptographic primitives. Apart from worrying about overall correctness and edge cases that could expose secrets in unexpected ways, potential side-channel leakages and timing attacks are also deep concerns.

A timing attack is an attempt to exploit the fact that the application’s execution time may subtly depend on the input. If the application makes control flow-related decisions based on secret data, like the seed for a random number generator or a private key, this could ever so slightly affect the execution time of the application. Likewise, if secret data is used to determine which location in memory to read from, this could cause cache misses, which in turn would affect the execution time of the application. In both cases, information about the secret data is leaked through timing differences during the execution of the program.

To prevent such timing differences, cryptography engineers typically avoid implementing decisions based on secret data. However, in situations in which code needs to make decisions based on secret data, there are clever ways to implement them in constant time, that is, in a way that always executes in the same amount of time regardless of the input. For example, consider the following function, which performs a conditional selection between variables a and b in Rust.

#[inline]
fn conditional_select(a: u32, b: u32, choice: bool) -> u32 {
    if choice { a } else { b }
}

The function returns a if choice is true, otherwise b is returned. Depending on the compiler toolchain and the targeted instruction set, the compiler could choose to implement the conditional selection using a branching instruction like jne on x86 or bne on ARM. This would introduce a timing difference in the execution of the function, which could leak information about the choice variable. The following Rust implementation uses a clever trick to perform the same conditional selection in constant time.

#[inline]
fn conditional_select(a: u32, b: u32, choice: u8) -> u32 {
    // if choice = 0, mask = (-0) = 0000...0000
    // if choice = 1, mask = (-1) = 1111...1111
    let mask = -(choice as i32) as u32;
    b ^ (mask & (a ^ b))
}

Here, we make no choices based on the choice secret value, which means that there is only one path through the function. Consequently, the execution time will always be the same.

Fighting the compiler

Ideally, this should be the end of the story, but in practice, there are risks inherent to this approach. Since the compiler has no concept of time, it doesn’t view timing differences as observable behavior. This means that it is free to rewrite and optimize constant-time code, which could introduce new timing leaks into the program. A carefully written constant-time implementation like the one above could still be optimized down to a branching instruction by the compiler, which would leak the value of choice!

This feels like an impossible situation. If this is really the case, the compiler is actually working against us to break our carefully crafted constant-time implementation of conditional_select. So what could we do to stop the compiler from optimizing the function and potentially breaking the constant-time properties of the code?

The most obvious solution is the nuclear option—to turn off all optimizations and compile the entire codebase with the -C opt-level=0 flag. This is almost always an untenable solution, however. Cryptographic code typically handles huge amounts of data, which means that it needs all the optimizations it can get from the compiler. A more attractive solution is to attempt to stop the compiler from optimizing sensitive code paths using what is known as optimization barriers. The subtle crate uses the following construction to attempt to thwart LLVM’s attempts to optimize constant-time code paths.

#[inline(never)]
fn black_box(input: u8) -> u8 {
    unsafe { core::ptr::read_volatile(&input as *const u8) }
}

Here, the call to core::ptr::read_volatile tells the compiler that the memory at &input is volatile and that it cannot make any assumptions about it. This call functions as an optimization barrier that stops LLVM from “seeing through the black box” and realizing that the input is actually a boolean. This in turn prevents the compiler from rewriting boolean operations on the output as conditional statements, which could leak timing information about the input. The Rust Core Library documentation has the following to say about core::ptr::read_volatile:

“Rust does not currently have a rigorously and formally defined memory model, so the precise semantics of what ‘volatile’ means here is subject to change over time. That being said, the semantics will almost always end up pretty similar to C11’s definition of volatile.”

This doesn’t seem very reassuring, but remember that timing differences are not viewed as observable by the compiler, so the compiler is always free to rewrite constant-time code and introduce new side-channel leaks. Any attempt to stop the compiler from doing so is bound to be on a best-effort basis until there is built-in language and compiler support for secret types. (There is a Rust RFC introducing secret types, but this has been postponed, awaiting LLVM support.)

Let’s see what happens with the conditional_select function if we compile it without an optimization barrier. To better illustrate this, we will target an instruction set that does not have conditional instructions like cmov (like x86_64 and aarch64), which allows the compiler to optimize the function without breaking the constant-time properties of the implementation. The following function simply calls the constant-time version of conditional_select to return either a or b.

pub fn test_without_barrier(a: u32, b: u32, choice: bool) -> u32 {
    let choice = choice as u8;
    conditional_select(a, b, choice)
}

By compiling the function for the ARM Cortex M0+ (which is used by the Raspberry Pi Pico), we get the following decompiled output.

We see that the compiler has replaced our carefully crafted conditional selection with a simple branch on the value of choice (in r2), completely destroying the constant-time properties of the function! Now, let’s see what happens if we insert an optimization barrier.

pub fn test_with_barrier(a: u32, b: u32, choice: bool) -> u32 {
    let choice = black_box(choice as u8);
    conditional_select(a, b, choice)
}

Looking at the corresponding disassembly, we see that it consists of a single basic block, resulting in a single path through the function, independent of the value of choice. This means that we can be reasonably sure that the function will always run in constant time.

So what about Wasm?

Now, let’s come back to the original problem. Our client was running code compiled from Rust down to Wasm using node. This means that the library is first compiled to Wasm using LLVM and then compiled again by node using the Turbofan JIT compiler. We expect that LLVM will respect the optimization guards inserted by libraries like the subtle crate, but what about Turbofan?

To see how the codebase would be affected, we compiled the test_with_barrier function defined above using wasm-bindgen and wasm-pack. We then dumped the code generated by the Turbofan JIT and examined the output to see whether the optimization barrier remained and whether the constant-time properties of the implementation had been preserved.

The following code is the result of compiling our example using wasm-pack and dumping the resulting Wasm in text format using wasm2wat. (We annotated some of the functions and removed some sections related to wasm-bindgen to make the code more readable.)

(module
    (type (;0;) (func (param i32) (result i32)))
    (type (;1;) (func (param i32 i32 i32) (result i32)))
 
    (func $black_box (type 0) (param $input i32) (result i32)
      (local $__frame_pointer i32)
      global.get $__stack_pointer
      i32.const 16
      i32.sub
      local.tee $__frame_pointer   ;; push __stack_pointer - 16
      local.get $input
      i32.store8 offset=15         ;; store input at __stack_pointer - 1
      local.get 1
      i32.load8_u offset=15)       ;; load output from __stack_pointer - 1
 
    (func $test_without_barrier (type 1) 
      (param $a i32) (param $b i32) (param $choice i32) (result i32)
      local.get $a
      local.get $b
      local.get $choice
      select)
 
    (func $test_with_barrier (type 1)
      (param $a i32) (param $b i32) (param $choice i32) (result i32)
      local.get $b
      local.get $a
      i32.xor                     ;; push a ^ b
      i32.const 0
      local.get $choice
      i32.const 0
      i32.ne                      ;; push input = choice != 0
      call $black_box             ;; push output = black_box(input)
      i32.const 255
      i32.and                     ;; push output = output & 0xFF
      i32.sub                     ;; push mask = 0 - output
      i32.and                     ;; push mask & (a ^ b)
      local.get $b
      i32.xor)                    ;; push b ^ (mask & (a ^ b))
 
    (table (;0;) 1 1 funcref)
    (memory (;0;) 17)
    (global $__stack_pointer (mut i32) (i32.const 1048576))
    (export "memory" (memory 0))
    (export "test_without_barrier" (func $test_without_barrier))
    (export "test_with_barrier" (func $test_with_barrier)))

We see that black_box has been compiled down to a simple i32.store8 followed by an (unsigned) i32.load8_u from the same offset. This initially looks like it could be optimized away completely since the memory is never read outside black_box.

We also see that test_with_barrier has not been optimized across the call to black_box. The function still performs a branchless conditional selection controlled by the output from the optimization barrier. This looks good and gives us some confidence that the constant-time properties provided by the subtle crate are preserved when targeting Wasm. However, as soon as the Wasm module is loaded by node, it is passed off to the Liftoff and Turbofan JIT compilers to optimize the code further.

To investigate how this affects our small example, we load the compiled Wasm module using JavaScript and dump the trace output from Turbofan using node. This can be done by passing the --trace-turbo flag to the node runtime. The trace generated by node can then be viewed in the Turbolizer web GUI (which can be found in the V8 repository).

Turbolizer can be used to analyze each step of the Turbofan compilation pipeline. Here, we are interested in displaying only what the emitted assembly code looks like for a given function. Looking at the output for test_with_barrier, we see that no optimizations are performed across the black_box function call on line 2c. The output is essentially identical to the decompiled Wasm code above.

  B0:
       0  push             rbp
       1  REX.W movq   rbp,rsp
       4  push             0x8
       6  push             rsi
       7  REX.W subq   rsp,0x18
       b  REX.W movq   rbx,[rsi+0x2f]
       f  REX.W movq   [rbp-0x18],rdx           
      13  REX.W movq   [rbp-0x20],rax
      17  REX.W cmpq   rsp,[rbx]
      1a  jna          B2 <+0x4a>
 
  B1:
      20  cmpl             rcx,0x0         ;; rax = choice? 1: 0
      23  setnzl       bl
      26  movzxbl      rbx,rbx
      29  REX.W movq   rax,rbx
      2c  call             0x7ffc2400fa31  ;; call to black_box(rax)
      31  movzxbl      rbx,rax             ;; rbx = -black_box(rax)
      34  negl             rbx
      36  REX.W movq   rdx,[rbp-0x20]      ;; rdx = a ^ b
      3a  xorl             rdx,[rbp-0x18]
      3d  andl             rbx,rdx         ;; rbx = rbx & rdx
      3f  REX.W movq   rax,[rbp-0x18]      ;; rax = b ^ (rbx & (a ^ b))
      43  xorl             rax,rbx
      45  REX.W movq   rsp,rbp
      48  pop          rbp
      49  retl                             ;; return rax
 
  B2:
      4a  REX.W movq   [rbp-0x28],rcx
      4e  call             0x7ffc2400fa7b
      53  REX.W movq   rsi,[rbp-0x10]
      57  REX.W movq   rcx,[rbp-0x28]
      5b  jmp          B1 <+0x20>
      5d  nop
      5e  nop

It also is interesting to see what the Turbolizer output for black_box looks like. Looking at the emitted assembly for black_box, we see that apart from setting up the local stack frame, the function simply stores and then immediately loads the input from memory (lines 14 and 18) before returning.

   B0:
       0  push             rbp
       1  REX.W movq   rbp,rsp
       4  push             0x8
       6  push             rsi
       7  REX.W movq   rbx,[rsi+0x17]
       b  REX.W movq   rdx,[rsi+0x67]
       f  movl             rdx,[rdx]
      11  subl             rdx,0x10
      14  movb             [rdx+rbx*1+0xf],al   ;; store input to memory
      18  movzxbl      rax,[rdx+rbx*1+0xf]   ;; load output from memory
      1d  REX.W movq   rsp,rbp
      20  pop          rbp
      21  retl

You may be surprised that this function is not inlined or optimized away by Turbofan. Since there is nothing in Wasm that corresponds to the volatile read in Rust, there is really no reason for Turbofan to keep black_box around anymore. However, since black_box writes to memory, it is not completely side-effect free, and so cannot be optimized away completely by the JIT compiler.

Introducing a new side-channel

The fact that the compiled version of black_box writes the input to memory before returning it is actually somewhat surprising. Since black_box takes a value as input and read_volatile takes a reference as input, LLVM needs to turn the input value into a reference somehow. When compiling for architectures like x86 or ARM, LLVM can simply use the address of the input on the stack, but the Wasm stack is not addressable in this way, which means that LLVM has to write the input to memory to be able to reference it. All of this means that the secret value that we wanted to protect using an optimization barrier is leaked to Wasm memory by LLVM. Moreover, looking at the compiled Wasm code above, we see that this memory is exported by the Wasm module, which means that it can be read from JavaScript. If we call the exported test_with_barrier function and examine the memory before and after the call, we can see that the secret value passed to black_box is now accessible from JavaScript.

  const path = require('path').join(__dirname, 'ct_wasm.wasm');
  const bytes = require('fs').readFileSync(path);
 
  // Load Wasm module from file.
  const wasmModule = new WebAssembly.Module(bytes);
  const wasmInstance = new WebAssembly.Instance(wasmModule);
  const wasmMemory = new Uint8Array(wasmInstance.exports.memory.buffer);
 
  const testWithBarrier = wasmInstance.exports.test_with_barrier;
 
  // __stack_pointer defined by the Wasm module.
  const stackPointer =  1048576;
 
  // Print memory[__frame_pointer + 15] before call to black_box.
  const before = wasmMemory[stackPointer - 1];     
  console.log("Before the call to black_box: " + before);
    
  // Call test_with_barrier which calls black_box with secret value 1.
  testWithBarrier(123, 456, 1);
 
  // Print memory[__frame_pointer + 15] after call to black_box.
  const after = wasmMemory[stackPointer - 1];
  console.log("After the call to black_box:  " + after);

Running this small test produces the following output, showing that the secret value passed to black_box is indeed leaked by the program.

 node js/ct_wasm.js
  Before the call to black_box: 0
  After the call to black_box:  1

Since the purpose of the black_box function is to protect the code from optimizations based on secret values, every value that goes into black_box is sensitive by definition. This is not a good situation.

Using a different optimization barrier

There have been some discussions in the Rust Cryptography Interest Group about defining a new Rust intrinsic based on this C++ optimization barrier. The corresponding Rust implementation would then look something like the following (here using the now deprecated llvm_asm macro).

#[inline(never)]
fn black_box(input: u8) -> u8 {
    unsafe { llvm_asm!("" : "+r"(input) : : : "volatile"); }
 
    input
}

After recompiling the codebase with wasm-pack and decompiling the resulting Wasm module, we see that black_box is now given by a single local.get $input (returning the first argument to the function), which is what we want. This function does not leak secret values to memory, but is it preserved by Turbofan?

By running the corresponding test_with_barrier function through Turbofan, we see that it results in machine code that is identical to the previous constant-time version above. Thus, with the llvm_asm-based barrier, we get a constant-time implementation that does not leak secret values to the surrounding JavaScript runtime.

However, as we have already noted, there is no reason to expect Turbofan not to inline the black_box function in future versions of the compiler. (In fact, if we look at the source code responsible for the Wasm compilation pipeline in the V8 repository, we see that the FLAG_wasm_inlining flag, which controls the WasmInliningPhase in the compiler pipeline, defaults to false in version 9.7.24 of V8; but we expect this optimization phase to be enabled at some point.)

Going forward

It is clear that fighting LLVM by inserting optimization barriers is not a great way to provide constant-time guarantees. There are ongoing efforts to address this problem at the language level. The secret types RFC and the CT-Wasm project, which introduce secret types for Rust and Wasm respectively, are two great examples of such efforts. What is missing is a way forward for getting secret types and the corresponding semantics into LLVM. This is most likely a precondition for the Rust implementation to move forward. (The Rust RFC is currently postponed, awaiting a corresponding RFC for LLVM.) Without LLVM support, it is difficult to see how higher-level languages that depend on LLVM could provide any absolute constant-time guarantees. Until then, we are all left playing hide-and-seek with the compiler back end.

In this post, we examined the use of optimization barriers to prevent the optimizer from wreaking havoc on constant-time cryptographic implementations in Rust, and the security guarantees optimization barriers provide when targeting Wasm. In the upcoming second part of this blog post, we will explore how constant-time properties of the implementation may be preserved by selectively disabling optimizations at the function level.

C your data structures with rellic-headergen

By Francesco Bertolaccini

Have you ever wondered how a compiler sees your data structures? Compiler Explorer may help you understand the relation between the source code and machine code, but it doesn’t provide as much support when it comes to the layout of your data. You might have heard about padding, alignment, and “plain old data types.” Perhaps you’ve even dabbled in emulating inheritance in C by embedding one structure in another. But could you guess the exact memory layout of all these types, without looking at the ABI reference for your platform or the source for your standard library?

struct A { int x; };
struct B { double y; };
struct C : A, B { char z; };

With the requisite ABI knowledge, reasoning about C structs is relatively simple. However, more complex C++ types are an altogether different story, especially when templates and inheritance come into play. Ideally, we’d be able to convert all these complicated types into simple C structs so that we could more easily reason about their in-memory layouts. This is the exact purpose of rellic-headergen, a tool I developed during my internship at Trail of Bits. In this blog post, I will explain why and how it works.

rellic-headergen

The purpose of rellic-headergen is to produce C type definitions that are equivalent to those contained in an LLVM bitcode file, which have not necessarily been generated from C source code. This facilitates the process of analyzing programs that contain complex data layouts. The following image provides an example of rellic-headergen’s capabilities.

The left-side window shows our source code. We execute the first command in the bottom window to compile the code to LLVM bitcode, and we use the second command to run it through rellic-headergen. The right-side window shows rellic-headergen’s output, which is valid C code that matches the layout of the input C++ code.

The utility works on the assumption that the program being analyzed can be compiled to LLVM bitcode with full debug information. The utility begins building a list of all the types for which debug information is available, beginning with function (“subprogram”) definitions.

Now, the utility needs to decide the order in which the types will be defined, but this is no simple task given the requirements of the C language: the language requires explicit forward declarations when referencing types that have not yet been defined, and structs cannot contain, for example, a field whose type has only been forward declared.

One approach to solving this problem would be to preventively forward declare all present types. However, this is not sufficient. For example, a struct cannot contain a field whose type has not been fully defined, though it can contain a field whose type is a pointer to a forward declared type.

Thus, the utility forms a directed acyclic graph (DAG) from the type definitions, on which it can find a topological sort.

Once the utility finds a topological sort, it can inspect the types in this order with the confidence that the type of any of the fields has been fully defined.

Struct shenanigans

The DWARF metadata provides a few pieces of information we can use to recover C structure definitions for the types it describes:

  • The size of the type
  • The type of each field
  • The offset of each field
  • Whether the type was originally a struct or a union

rellic-headergen’s reconstruction algorithm starts by sorting the fields in order of increasing offset, then defines a new struct in which to add each field. The metadata provides no information on whether the original definition was declared as packed or not, so rellic-headergen first tries to generate the layout directly, as specified by the metadata. If the resulting layout doesn’t match the one given as input, the utility starts from scratch and generates a packed layout instead, inserting padding manually as needed.

Now, we could use any number of sophisticated heuristics to decide the offset of each field from the start of the struct, but things can get quite hairy, especially in the case of bit fields. A better approach is to get this information from something that already has the logic worked out: a compiler.

Fortunately, rellic-headergen already uses Clang to generate the definitions. Unfortunately, querying Clang itself about the fields’ offsets is not quite so simple, as Clang allows the retrieval of layout information only for complete definitions. To get around this particular quirk of the API, the utility generates temporary struct definitions that contain all the fields up to the one it is currently processing.

Structs and inheritance

As I was working on more involved use cases, I stumbled upon some instances in which the ABI works in ways that are not immediately obvious. For example, handling C++ inheritance takes some care, as the naive approach is not always correct. Converting

struct A { int x; };
struct B : A { int y; };

into

struct A { int x; };
struct B {
    struct A base;
    int y;
};

seems like a good idea and works in practice, but this method doesn’t scale very well. For example, the following snippet cannot be converted in this way:

struct A { int x; char y; };
struct B : A { char z; };

The reason is that on a machine in which int is 4 chars wide, struct A typically contains 3 additional chars of padding after y. Thus, embedding struct A directly into B would put z at offset 8. In order to minimize the amount of padding in structs, compilers opt to place the fields of the derived type directly inside the base struct instead.

Furthermore, empty structs are technically not valid in C. They can be used via GCC and Clang extensions, and they are valid in C++, but they present an issue: an empty struct’s sizeof is never 0. Instead, it is typically 1. Among other reasons, this is so that in a code snippet like the following, every field is guaranteed to have separate addresses:

struct A {};
struct B {
    struct A a;
    int b;
};

The example above works perfectly fine, but there are places in which treating empty structs the naive way doesn’t work. Consider the following:

struct A {};
struct B : A {
    int x;
};

This example produces the following DWARF metadata:

!2 = !{}
!10 = distinct !DICompositeType(
    tag: DW_TAG_structure_type, name: "A", size: 8, elements: !2)
!11 = distinct !DICompositeType(
    tag: DW_TAG_structure_type, name: "B", size: 32, elements: !12)
!12 = !{!13, !14}
!13 = !DIDerivedType(tag: DW_TAG_inheritance, baseType: !10)
!14 = !DIDerivedType(tag: DW_TAG_member, name: "x", baseType: !15, size: 32)
!15 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)

If we followed the same logic for DW_TAG_inheritance as we did for DW_TAG_member, we’d end up with this conversion:

struct A {};
struct B {
    struct A a;
    int b;
};

This is not equivalent to the original definition! Field b would end up at an offset different from 0, as fields cannot have size 0. Getting all of these C++ details working was challenging but worthwhile. Now we can use rellic-headergen to convert arbitrary C++ types into plain old C types. Many reverse engineering tools embed some form of basic C parsing support in order for a user to provide “type libraries,” which describe the types used by machine code. These basic parsers typically don’t have any C++ support, and so rellic-headergen bridges this gap.

What’s next for rellic-headergen?

There are opportunities to further improve rellic-headergen. One of the objectives of the utility is to be able to recover field access patterns from code that has been optimized. Consider the following program:

struct A {
    char a, b, c, d;
};

char test(struct A x) { return x.c; }

This program produces the following bitcode:

define dso_local signext i8 @test(i32 %x.coerce) local_unnamed_addr #0 {
entry:
    %x.sroa.1.0.extract.shift = lshr i32 %x.coerce, 16
    %x.sroa.1.0.extract.trunc = trunc i32 %x.sroa.1.0.extract.shift to i8
    ret i8 %x.sroa.1.0.extract.trunc
}

In this bitcode, the original information about the structure of x has been lost. Essentially, if Clang/LLVM performs optimizations before emitting bitcode or lifting bitcode from compiled machine code, this could cause the resulting bitcode to be too low level, creating a mismatch between the type information found in the debug metadata and the information in the bitcode itself. In this case, rellic-headergen cannot resolve this mismatch on its own. Improving the utility to be able to resolve these issues in the future would be beneficial; knowing the exact layout of structs can be useful when trying to match bit shifts and masks to field accesses in order to produce decompiled code that is as close to the original as possible.

Also, languages that employ different DWARF features are not handled as well by rellic-headergen. Rust, for example, uses an ad hoc representation for discriminated unions, which is difficult for the utility to handle. There is an opportunity to one day add functionality to the utility to handle DWARF features such as these.

Finally, another future rellic-headergen feature worth exploring is the possibility to change the output language: sometimes you do want to keep that inheritance information as C++, after all!

Closing thoughts

Although rellic-headergen currently has a very narrow scope, it is already incredibly robust when working with C and C++ codebases, as it is able to extract type information for rellic itself, which includes LLVM and Clang. It already provides useful insights when navigating binaries that have been built with debugging information, but expanding its set of features to be able to extract information from more varied codebases will make it even more useful when dealing with bigger projects.

Working on rellic-headergen was very fun, interesting, and instructive. I am grateful to Trail of Bits for the opportunity to work on such an innovative project with talented people. This was a great learning experience, and I would like to thank my mentor Peter Goodman for giving me guidance with almost free reign over the project, and Marek Surovič for his patience in sharing his experience in rellic with me.

Finding unhandled errors using CodeQL

By Fredrik Dahlgren

One of your developers finds a bug in your codebase—an unhandled error code—and wonders whether there could be more. He combs through the code and finds unhandled error after unhandled error. One lone developer playing whack-a-mole. It’s not enough. And your undisciplined team of first-year Stanford grads never learned software engineering. You’re doomed.

Good developers know that unhandled errors can be exploitable and cause serious problems in a codebase. Take CVE-2018-1002105, a critical vulnerability in Kubernetes allowing attackers to leverage incorrectly handled errors to establish a back-end connection through the Kubernetes API.

At Trail of Bits, we find issues like this all the time, and we know that there are better ways to find the rest than manually searching for them one by one. One particularly recent discovery of this problem motivated us to write this post. Rather than manually sifting through the codebase like our poor developer playing whack-a-mole, we used CodeQL to conduct a variant analysis (taking an existing vulnerability and searching for similar patterns). In this post, we’ll walk you through how we used CodeQL to whack all the moles at once.

Building a CodeQL database

To be able to run CodeQL queries against the codebase, we first needed to build a CodeQL database. Typically, this is done with the CodeQL CLI using the following command:

codeql database create -l <language> -c '<build command>' <database name>

In our case, the codebase under audit was developed on Windows using Visual Studio. Since CodeQL does not integrate with Visual Studio directly, we used MSBuild.exe to build solutions from the Windows command line using the following command:

codeql database create -l cpp -c 'MSBuild.exe <solution>.sln' <solution>.codeql

Setting up a custom query pack

To be able to run queries against the database, we defined a custom query pack, or a QL pack, which contains query metadata. Query packs can also be used to define custom query suites and query test suites. (If this makes your heart beat faster, see here and here.) In the same directory housing our custom queries, we created a file named qlpack.yml with the following content:

name: <some snazzy QL pack name>
version: 0.0.1
libraryPathDependencies: [codeql-cpp]

This file defines the custom query pack and its dependencies. The last line of the qlpack.yml file simply indicates that the query pack depends on the built-in CodeQL libraries for C and C++, which we need to get off the ground.

Finding unhandled errors

For this post, let’s say that the codebase used a custom error type called CustomErrorType to propagate errors. To locate all calls to functions returning CustomErrorType, we started by creating a new CodeQL type called CustomError:

class CustomError extends FunctionCall {
    CustomError() {
        this.getUnderlyingType().getName() = "CustomErrorType"
    }
}

Since return values are represented as function calls in CodeQL, it makes sense to extend the FunctionCall type (which is, in fact, a subtype of the more general Expr type used to model arbitrary expressions). Using this.getUnderlyingType() ensures that the name of the underlying type is CustomErrorType. This means that we capture all function calls in which the return type is either CustomErrorType or any typedef that resolves to CustomErrorType.

To test that the CustomError class does what we expect, we simply ran a query over the codebase and selected all CustomErrorType return values. To do so, we added the following select clause immediately below the class definition:

 
from 
    CustomError ce
select
    ce.getLocation(), 
    "Unhandled error code in ", ce.getEnclosingFunction().getName(), 
    "Error code returned by ", ce.getTarget().getName()
 

Here, ce.getEnclosingFunction() returns the function containing the CustomErrorType instance (i.e., the calling function), and ce.getTarget() returns the target function of the underlying FunctionCall (i.e., the called function).

We saved the file under a descriptive and colorful name—for this post, let’s call it UnhandledCustomError.ql. To run the query, we wrote the following:

codeql query run -d <database name here> UnhandledCustomError.ql

This query returns all call sites of functions in the codebase that return a value of type CustomErrorType, along with the names of the calling function and the called function.

Developing new queries iteratively in this way—by first over-approximating the vulnerability class you’re trying to model and then successively refining the query to prune false positives—makes it easier to catch mistakes as they happen since actually running a query on a codebase is a bit of a black box.

So what is an unhandled error?

To be able to restrict the results to unhandled errors, we need to define what it means to handle an error using CodeQL. Intuitively, handling an error means that the return value is acted upon and affects control flow in some way. This idea can be captured using CodeQL by checking whether the return value taints the condition of a branching statement, like an if statement, a while statement, or a switch statement. As CodeQL supports both local and global taint tracking, we had a choice of how to model this.

In our case, we were initially a bit concerned about how CodeQL’s global taint tracking engine would handle itself on a larger codebase, so we decided to try modeling the problem using local taint tracking. As a first approximation, we considered cases in which the returned error code directly affected the control flow of the calling function in some way. To capture these cases, we added the following predicate to the CustomError CodeQL type (thus, this below refers to an instance of CustomError):

 
// True if the return value is checked locally.
predicate isChecked() {
    // The return value flows into the condition of an if-statement.    	    
    exists (IfStmt is |
        TaintTracking::localTaint(
            DataFlow::exprNode(this),
            DataFlow::exprNode(is.getCondition().getAChild*())
        )
    ) or
    // The return value flows into the condition of a while-statement.
    exists (WhileStmt ws |
        TaintTracking::localTaint(
            DataFlow::exprNode(this),
            DataFlow::exprNode(ws.getCondition().getAChild*())
        )
    ) or
    // The return value flows into the condition of a switch-statement.
    exists (SwitchStmt ss |
        TaintTracking::localTaint(
            DataFlow::exprNode(this), 
            DataFlow::exprNode(ss.getExpr().getAChild*())
        )
    )
}

Since TaintTracking::localTaint only models local data flow, we did not need to require that the conditional statement (the sink) is located in the same function as the returned error (the source). With local taint tracking, we got this for free.

Our intuition told us that we wanted to model taint flowing into the condition of a branching statement, but looking at how this is modeled, we actually required taint to flow into a sub-expression of the condition. For example, is.getCondition().getAChild*() returns a sub-expression of the if statement condition. (The * indicates that the operation is applied 0 or more times. You can use + for 1 or more times.)

It may not be immediately obvious why we needed to use getAChild() here. If this taints a sub-expression of the if statement condition C, then it is natural to assume that this would also taint the entire condition. However, looking at the CodeQL taint tracking documentation, it is clear that taint propagates only from a source to a sink if a “substantial part of the information from the source is preserved at the sink.” In particular, boolean expressions (which carry only a single bit of information) are not automatically considered tainted by their individual sub-expressions. Thus, we needed to use getAChild() to capture that this taints a sub-expression of the condition.

It is worth mentioning that it would have been possible to use the DataFlow module to model local data flow: both DataFlow::localFlow and TaintTracking::localTaint can be used to capture local data flow. However, since DataFlow::localFlow is used to track only value-preserving operations, it made more sense in our case to use the more general TaintTracking::localTaint predicate. This allowed us to catch expressions like the following, in which the returned error is mutated before it is checked:

    if ( ((CustomErrorType)(response.GetStatus(msg) & 0xFF)) == NO_ERROR )
    {
    	[…]
    }

To restrict the output from the CodeQL select statement, we added a where clause to the query:

 
from 
    CustomError ce
where
    not ce.isChecked()
select
    ce.getLocation(), 
    "Unhandled error code in ", ce.getEnclosingFunction().getName(), 
    "Error code returned by ", ce.getTarget().getName()
 

Refining the query

Running the query again, we noticed that it found numerous locations in the codebase in which a returned error was not handled correctly. However, it also found a lot of false positives in which the return value did affect control flow globally in some way. Reviewing some of the results manually, we noticed three overarching classes of false positives:

  1. The returned error was simply returned from the enclosing function and passed down the call chain.
  2. The returned error was passed as an argument to a function (which hopefully acted upon the error in some meaningful way).
  3. The returned error was assigned to a class member variable (which could then be checked elsewhere in the codebase).

The local behavior in all three of these cases could clearly be modeled using local taint tracking.

First, to exclude all cases in which the returned error was used to update the return value of the calling function, we added the following predicate to the CustomError class:

// The return value is returned from the enclosing function.
predicate isReturnValue() {
    exists (ReturnStmt rs |
        TaintTracking::localTaint(
            DataFlow::exprNode(this),
            DataFlow::exprNode(rs.getExpr())
        )
    )
}

Second, to filter out cases in which the return value was passed as an argument to some other function, we added the following predicate:

// The return value is passed as an argument to another function.
predicate isPassedToFunction() {
    exists (FunctionCall fc |
        TaintTracking::localTaint(
            DataFlow::exprNode(this),
            DataFlow::exprNode(fc.getAnArgument())
        )
    )
}

Again, since TaintTracking::localTaint only models local data flow, we did not need to require that the enclosing function of the FunctionCall node fc is identical to the enclosing function of this.

Finally, to model the case in which the returned error was used to update the value of a class member variable, we needed to express the fact that the calling function was a class method and that the return value was used to update the value of a member variable on the same class. We modeled this case by casting the enclosing function to a MemberFunction and then requiring that there is a member variable on the same object that is tainted by this.

 
// Test if the return value is assigned to a member variable.
predicate isAssignedToMemberVar() {
    exists (MemberVariable mv, MemberFunction mf |
        mf = this.getEnclosingFunction() and
        mf.canAccessMember(mv, mf.getDeclaringType()) and
        TaintTracking::localTaint(
            DataFlow::exprNode(this), 
            DataFlow::exprNode(mv.getAnAccess())
        )
    )
}

Note that it is not enough to require that data flows from this to an access of a member variable. If you do not restrict mv further, mv could be a member of any class defined in the codebase. Clearly, we also needed to require that the calling function is a method on some class and that the member variable is a member of the same class. We captured this requirement using the predicate canAccessMember, which is true when the enclosing method mf can access the member variable mv in the context of the mf.getDeclaringType() class.

Running the updated version of the query, we then noticed that some of the results were from unit tests. Since issues in unit tests are typically of less interest, we wanted to exclude them from the final result. This, of course, could easily be done using grep -v on the resulting output from codeql, but it could also be done by restricting the location of the call site using CodeQL itself.

To filter by file path, we defined a new class called IgnoredFile, which captured the type of files we wanted to exclude from the result. In this case, we excluded any file with an absolute path containing the word "test":

 
class IgnoredFile extends File {
    IgnoredFile() {
        this.getAbsolutePath().matches("%test%")
    }
}

We then added the following line to the where clause of the final query, which excluded all the locations that we were less interested in:

not ce.getFile() instanceof IgnoredFile

The final query resulted in slightly over 100 code locations that we were able to review and verify manually. For reference, the final query is located here.

But what about global data flow?

CodeQL supports global data flow and taint tracking through the DataFlow::Configuration and TaintTracking::Configuration classes. As explained earlier, we can use the DataFlow module to track value-preserving operations and the TaintTracking module to track more general flows in which the value may be updated along the flow path. While we were initially concerned that the codebase under review was too large for CodeQL’s global taint tracking engine to handle, we were also curious to see whether a global analysis would give us more accurate results than a local analysis could. As it turns out, the query using global data flow was easier to express, and it achieved more accurate results with the same running time as the query using local data flow!

Since we didn’t want to restrict ourselves to value-preserving operations, we needed to extend the TaintTracking::Configuration class. To do so, we defined what it means to be a source and a sink by overriding the isSource and isSink predicates as follows:

 
class GuardConfiguration extends TaintTracking::Configuration {
    GuardConfiguration() { this = "GuardConfiguration" }
  
    override predicate isSource(DataFlow::Node source) {
        source.asExpr().(FunctionCall).getUnderlyingType().getName() =
            "CustomErrorType"
    }
  
    override predicate isSink(DataFlow::Node sink) {
        exists (IfStmt is | sink.asExpr() = is.getCondition().getAChild*()) or
        exists (WhileStmt ws | sink.asExpr() = ws.getCondition().getAChild*()) or
        exists (SwitchStmt ss | sink.asExpr() = ss.getExpr().getAChild*())
    }
}

We then redefined the predicate CustomError::isChecked in terms of global taint tracking as follows:

 
class CustomError extends FunctionCall {
    CustomError() {
        this.getUnderlyingType().getName() = "CustomErrorType"
           }
  
    predicate isCheckedAt(Expr guard) {
        exists (GuardConfiguration config |
            config.hasFlow(
                DataFlow::exprNode(this), 
                DataFlow::exprNode(guard)
            )
        )
    }
  
    predicate isChecked() {
        exists (Expr guard | this.isCheckedAt(guard))
    }
}

That is, the return error is handled if it taints the condition of an if, while, or switch statement anywhere in the codebase. This actually made the entire query much simpler.

Interestingly, the run time for the global analysis turned out to be about the same as the run time for local taint tracking (about 20 seconds to compile the query and 10 seconds to run it on a 2020 Intel i5 MacBook Pro).

Running the query using global taint tracking gave us over 200 results. By manually reviewing these results, we noticed that error codes often ended up being passed to a function that created a response to the user of one of the APIs defined by the codebase. Since this was expected behavior, we excluded all such cases from the end result. To do so, we simply added a single line to the definition of GuardCondition::isSink as follows:

 
override predicate isSink(DataFlow::Node sink) {
    exists (ReturnStmt rs | sink.asExpr() = rs.getExpr()) or
    exists (IfStmt is | sink.asExpr() = is.getCondition().getAChild*()) or
    exists (WhileStmt ws | sink.asExpr() = ws.getCondition().getAChild*()) or
    exists (SwitchStmt ss | sink.asExpr() = ss.getExpr().getAChild*()) or
    exists (IgnoredFunctionCall fc | sink.asExpr() = fc.getExtParam())
}

Here, IgnoredFunctionCall is a custom type capturing a call to the function generating responses to the user. Running the query, we ended up with around 150 locations that we could go through manually. In the end, the majority of the locations identified using CodeQL represented real issues that needed to be addressed by the client. The updated file UnhandledCustomError.ql can be found here.

At Trail of Bits, we often say that we never want to see the same bug twice in a client’s codebase, and to make sure that doesn’t happen, we often deliver security tooling like fuzzing harnesses and static analysis tools with our audit reports. In this respect, tools like CodeQL are great, as they let us encode our knowledge about a bug class as a query that anyone can run and benefit from—in effect, ensuring that we never see that particular bug ever again.

Toward a Best-of-Both-Worlds Binary Disassembler

By Stefan Nagy

This past winter, I was fortunate to have the opportunity to work for Trail of Bits as a graduate student intern under the supervision of Peter Goodman and Artem Dinaburg. During my internship, I developed Dr. Disassembler, a Datalog-driven framework for transparent and mutable binary disassembly. Though this project is ongoing, this blog post introduces the high-level vision behind Dr. Disassembler’s design and discusses the key implementation decisions central to our current prototype.

Introduction

Binary disassembly is surprisingly difficult. Many disassembly tasks (e.g., code/data disambiguation and function boundary detection) are undecidable and require meticulous heuristics and algorithms to cover the wide range of real-world binary semantics. An ideal disassembler has two key properties: (1) transparency, meaning that its underlying logic is accessible and interpretable, and (2) mutability, meaning that it permits ad hoc interaction and refinement. Unfortunately, despite the abundance of disassembly tools available today, none have both transparency and mutability. Most off-the-shelf disassemblers (e.g., objdump, Dyninst, McSema, and Angr) perform “run-and-done” disassembly, and while their underlying heuristics and algorithms are indeed open source, even the slightest of changes (e.g., toggling on a heuristic) requires a complete rebuild of the tool and regeneration of the disassembly. In contrast, popular commercial disassemblers like IDA Pro and Binary Ninja provide rich interfaces for user-written plugins, yet these tools are almost entirely proprietary, making it impossible to fully vet where their core heuristics and algorithms fall short. Thus, reverse engineers are left to choose between two classes of disassemblers: those full of ambiguity or those with zero flexibility.

In this blog post, I introduce our vision for a best-of-both-worlds (transparent and mutable) platform for binary disassembly. Our approach was inspired by recent disassembly tools like ddisasm and d3re, which use the Soufflé Datalog engine. Dr. Disassembler uses Trail of Bits’ in-house incremental and differential Datalog engine, Dr. Lojekyll, to specify the disassembly process. Below, I describe how Dr. Disassembler’s relational view of disassembly is a step toward transparent, mutable disassembly—streamlining the integration of new heuristics, algorithms, and retroactive updates—without the need to perform de novo disassembly per every incremental update.

Background: Disassembly, Datalog, and Dr. Lojekyll

Disassembly is the process of translating a binary executable from machine code into a human-interpretable, assembly language representation of the program. In software security, disassembly forms the backbone of many critical tasks such as binary analysis, static rewriting, and reverse engineering. At Trail of Bits, disassembly is the crucial first step in our executable-to-LLVM lifting efforts, such as Remill and McSema.

At a high level, a disassembler begins by first parsing a binary’s logical sections to pinpoint those that contain executable code. From there, instruction decoding translates machine code into higher-level instruction semantics. This procedure uses one of two strategies: linear sweep or recursive descent.

Linear sweep disassemblers (e.g., objdump) perform instruction decoding on every possible byte, beginning at the very first byte index. However, on variable-length instruction set architectures like x86, a linear sweep disassembler that naively treats all bytes as instructions could perform instruction decoding on non-instruction bytes (e.g., inlined jump tables). To overcome this issue, many modern disassemblers improve their analyses by recovering metadata (e.g., debugging information) or applying data-driven heuristics (e.g., function entry patterns).

On the other hand, recursive descent disassemblers (e.g., IDA Pro) follow the observed control flow to selectively re-initiate linear sweep only on recovered branch target addresses. While recovering the target addresses of jump tables is generally sound, recovering the targets for indirect calls is a far more challenging problem, in which common-case soundness has yet to emerge.

Datalog is one of the more popular members in a class of programming languages known as logical programming. Compared to imperative programming languages (e.g., Python, Java, C, and C++), which are structured around a program’s control flow and state, logical programming (e.g., Prolog and Datalog) is structured solely around logical statements. In our use case of binary disassembly, a logical statement can be useful for capturing the addresses in a binary that correspond to plausible function entry points: (1) targets of direct call instructions, (2) common function prologues, or (3) any function address contained in the symbol table. This use case is shown below in Dr. Lojekyll syntax:

Listing 1: This query retrieves the set of all plausible function entry points. Here, “free” denotes that the query must find all candidates that match the subsequent clauses. In a bounded clause (e.g., given some fixed address), the tag “bound” is used instead (see listing 5).

From a logical programming perspective, the code snippet above is interpreted as follows: there is a plausible function at address FuncEA if a direct call to FuncEA, a known function entry instruction sequence starting at FuncEA, or a function symbol at FuncEA exists.

At a higher level, logical and functional programming are part of a broader paradigm known as declarative programming. Unlike imperative languages (e.g., Python, Java, C, and C++), declarative languages dictate only what the output result should look like. For instance, in the previous example of retrieving function entry points, our main focus is the end result—the set of function entry points—and not the step-by-step computation needed to get there. While there is certainly more to logical and declarative programming than the condensed explanation offered here, the key advantage of logical programming is its succinct representation of data as statements.

Here’s where Datalog shines. Suppose that after populating our database of “facts”—sections, functions, and instructions—we want to make some adjustments. For example, imagine we’re analyzing a position-independent “hello world” binary with the following disassembly obtained for function <main>:

Listing 2: An example of a relocated call target

We also know that the following relocation entries exist:

Listing 3: Relocation entry information for the example in listing 2

At runtime, the dynamic linker will update the operand of the call at 0x526 to point to printf@PLT. When the call is taken, printf@PLT then transfers to printf’s Global Offset Table (GOT) entry, and the execution proceeds to the external printf.

If you’re familiar with IDA Pro or Binary Ninja, you’ll recognize that both tools adjust the relocated calls to point to the external symbols themselves. In the context of binary analysis, this is useful because it “fixes up” the otherwise opaque calls whose targets are revealed only through dynamic linking. In Datalog, we can simply accommodate this with a few lines:

Listing 4: This exported message rewrites the call to skip its intermediary Procedure Linkage Table (PLT) entry. Here, “#export” denotes that the message will alter some fact(s) in the Datalog database.

Voila! Our representation of the indirect call no longer requires the intermediary redirection through the PLT. As a bonus, we can maintain a relationship table to map every call of this type to its targets. With this example in mind, we envision many possibilities in which complex binary semantics are modelable through relationship tables (e.g., points-to analysis, branch target analysis, etc.) to make binary analysis more streamlined and human-interpretable.

Dr. Lojekyll is Trail of Bits’ new Datalog compiler and execution engine and the foundation on which Dr. Disassembler is built. It adopts a publish/subscribe model, in which Dr. Lojekyll-compiled programs “subscribe” to messages (e.g., there exists an instruction at address X). When messages are received, the program may then introduce new messages (e.g., there exists a fall-through branch between instructions A and B) or remove previous ones. Compiled programs may also publish messages to external parties (e.g., an independent server), which may then “query” data relationships from the Datalog side.

Dr. Lojekyll’s publish/subscribe model is well suited for tasks in which “undo”-like features are required. In binary disassembly, this opens up many possibilities in human-in-the-loop binary analysis and alterations (think Compiler Explorer but for binaries). At the time of writing, Dr. Lojekyll supports the compilation of Datalog into Python programs and has emerging support for C++.

Introducing Dr. Disassembler

Conventional “run-and-done” disassemblers perform their analyses on the fly, confining them to whatever results—even erroneous ones—are obtained from the outset. Instead, Datalog enables us to move all analysis to post-disassembly, thus streamlining the integration of plug-and-play refinements and retroactive updates. And with its painless syntax, Datalog easily represents one of the most powerful and expressive platforms for user-written disassembly plugins and extensions. We implement our vision of transparent and mutable disassembly as a prototype tool, Dr. Disassembler. While Dr. Disassembler can theoretically use any Datalog engine (e.g., DDLog), we currently use Trail of Bits’ own Dr. Lojekyll. The implementation of Dr. Disassembler discussed in this blog post uses Dr. Lojekyll’s Python API. However, at the time of writing, we have since begun developing a C++-based implementation due to Python’s many performance limitations. Here, I introduce the high-level design behind our initial (and forthcoming) implementations of Dr. Disassembler.

Figure 1: Dr. Disassembler’s high-level architecture

Disassembly Procedure

Dr. Disassembler’s disassembly workflow consists of three components: (1) parsing, (2) decoding, and (3) post-processing. In parsing, we scan the binary’s sections to pinpoint those that contain instructions, along with any recoverable metadata (e.g., entry points, symbols, and imported/exported/local functions). For every identified code section, we begin decoding its bytes as instructions. Our instruction decoding process maps each instruction to two key fields: its type (e.g., call, jump, return, and everything else) and its outgoing edges.

Recovering Control Flow

An advantage of using Datalog is the ability to express complex program semantics as a series of simple, recursive relationships. Yet, when handling control flow, a purely recursive approach often breaks certain analyses like function boundary detection: recursive analysis will follow the control flow to each instruction’s targets and resume the analysis from there. But, unlike calls, jumps are not “returning” instructions; so for inter-procedural jumps, the function will not be re-entered, thus causing the disassembler to miss the remaining instructions in the function containing the jump instruction.

To unify recursive and linear descent disassembly approaches, we developed the concept of non-control-flow successor instructions: for any unconditionally transferring jump or return instruction, we record an artificial fall-through edge from the instruction to the next sequential instruction. Though this edge has no bearing on the actual program, it effectively encodes the logical “next” instruction, thus unifying our linear and recursive analyses. These non-control-flow successor edges are the linchpin of our recursive analyses, like instruction-grouping and function boundary detection.

Post-Processing

At each step of parsing and decoding, we publish any interesting objects that we’ve found to our Dr. Lojekyll database. These core objects—symbols, sections, functions, instructions, and transfers—form the building blocks of our heuristic and recursive analyses. Our fundamental approach behind Dr. Disassembler is to “engulf” as much disassembly information as possible, regardless of correctness, and to refine everything afterward on the Datalog side. Because we consider every piece of information to be plausibly correct, we can retroactively update our disassembly when any new information is observed; and unlike conventional run-and-done tools, this does not require a de novo re-disassembly.

Example Exports and Queries

Dr. Disassembler streamlines binary analysis by focusing on disassembly artifacts themselves rather than the myriad steps needed to obtain them. To showcase some of Dr. Disassembler’s many capabilities, this section highlights several implementation examples of rigorous binary analysis tasks facilitated by two of Dr. Disassembler’s fundamental constructs: “exports” (messages that change/remove facts) and “queries” (which retrieve information about facts).

Query: Grouping Instructions into Functions

Given an arbitrary function address FuncEA, this query returns all the addresses of the instructions contained in that function. Two messages form this query: (1) function(u64 StartEA) and (2) instruction(u64 InsnEA, type Type, bytes Bytes).

Listing 5: An example Dr. Disassembler query that returns the addresses of the instructions contained in a function

Export: Instructions Dominating Invalid Instructions

This export returns all the instructions whose control flow leads to invalid instructions (i.e., where instruction decoding fails). This heuristic is critical for Dr. Disassembler to filter-out the many “junk” instruction sequences that inevitably occur when decoding every possible byte sequence.

As in the previous example, we structure this relationship around two core messages: (1) instruction and (2) raw_transfer(u64 StartEA, u64 DestEA), the latter of which contains the unaltered control flow recovered from the binary (i.e., no alterations like the one in listing 4 are made yet).

Listing 6: An example Dr. Disassembler export that updates the database with all the instructions whose control flow leads to invalid instructions

Export: Inter-Function Padding

This export returns all the instruction addresses that serve as “padding” between functions (e.g., NOPs that do not belong to any function). Here, we use the following messages: (1) function, (2) section, (3) raw_transfer, and (4) basic_block(u64 BlockEA, u64 InsnEA). Identifying inter-function padding is a crucial step to refining our function-instruction grouping.

Listing 7: An example Dr. Disassembler export that updates the database with all the instruction addresses that serve as “padding” between functions

Future Work and Extensions

Our immediate plan is to extend Dr. Disassembler to a fully C++ implementation. Along with improving the performance of the tool, we expect that this transition will open many new doors for research on binary analysis:

  • Streamlined binary analysis platforms: Contemporary binary analysis platforms have rich interfaces for developing custom analysis plugins, but the sheer complexity of their APIs frequently leaves users bottlenecked by steep learning curves. As a next step, we want to develop Dr. Disassembler into a full-fledged binary analysis platform, complete with all the features needed to facilitate the easy creation and customization of user plugins.
  • GUI interfaces for binary analysis and transformation: By using Dr. Disassembler’s mutable representation of disassembly, we can develop new interfaces that enable real-time analysis and editing of binary executables (e.g., to help developers visualize how toggling on different heuristics affects analysis results). Our end goal here is something akin to Compiler Explorer… for binaries!
  • Exposing analysis blind spots: Our prototype of Dr. Disassembler is designed to use the outputs of multiple binary parsers and instruction decoders. Going forward, we would like to use Dr. Disassembler as a platform for developing automated techniques to identify where these competing tools agree and disagree with one another (e.g., on code-data disambiguation, branch target analysis, etc.) and pinpoint their weaknesses.

If any of these ideas interest you, feel free to get in touch with either me (Stefan Nagy) or Peter Goodman at Trail of Bits.

We’ll release our prototype Python implementation of Dr. Disassembler and provide a PDF version of this post at https://github.com/lifting-bits/dds. Happy disassembling!

Celebrating our 2021 Open Source Contributions

At Trail of Bits, we pride ourselves on making our best tools open source, such as algo, manticore, and graphtage. But while this post is about open source, it’s not about our tools

In 2021, Trail of Bits employees submitted over 190 pull requests (PRs) that were merged into non-Trail of Bits repositories. This demonstrates our commitment to securing the software ecosystem as a whole and to improving software quality for everyone. A representative list of contributions appears at the end of this post, but here are some highlights:

We would like to acknowledge that submitting a PR is only a tiny part of the open source experience. Someone has to review the PR. Someone has to maintain the code after the PR is merged. And submitters of earlier PRs have to write tests to ensure the functionality of their code is preserved.

We contribute to these projects in part because we love the craft, but also because we find these projects useful. For this, we offer the open source community our most sincere thanks, and wish everyone a happy, safe, and productive 2022!

Osquery description updated on January 3, 2022.

Some of Trail of Bits’ 2021 Open Source Contributions

Disclosing Shamir’s Secret Sharing vulnerabilities and announcing ZKDocs

By Filipe Casal and Jim Miller

Trail of Bits is publicly disclosing two bugs that affect Shamir’s Secret Sharing implementation of Binance’s threshold signature scheme library (tss-lib) and most of its active forks. Here is the full list of affected repositories:

These bugs allow a malicious user of the threshold signature scheme to steal the secret keys of other users or to crash their nodes. Exploiting these vulnerabilities is simple: an attacker just needs to configure a malicious ID at the start of either the key generation protocol or the resharing protocol.

Threshold signature schemes are a powerful cryptographic object; however, they require complex, non-standardized primitives such as zero-knowledge proofs, commitment schemes, and verifiable secret shares. Unfortunately, aside from academic publications, there is essentially no guidance or documentation on implementing these schemes or their security pitfalls, which leads to several issues in practice, such as the two bugs we are disclosing today.

Along with our disclosure of these vulnerabilities, we are releasing ZKDocs, our documentation for non-standardized cryptographic primitives. We hope that our work in this area can benefit the larger cryptography community.

What are threshold signature schemes?

A threshold signature scheme is a protocol that allows a group of users to generate and control a private signing key. The users can jointly produce digital signatures on messages, but none can sign messages individually.

Many are familiar with the concept of multisignature (multisig) protocols, mainly used in cryptocurrency wallets that execute transactions only after receiving enough signatures from different users. The main difference between these schemes is that in multisig schemes, each user has a personal private/public key pair for signing, and in threshold signature schemes, each user holds one share of the same key. When signing with a multisig scheme, the number of signatures is proportional to the number of users; when signing with a threshold signature scheme, one group signature is produced.

Threshold signatures are complicated. The advanced, technical details of how these schemes work are out of scope for this blog post, but if you are curious about how these schemes work in practice or how they compare with other schemes, like multisig, you can find several blog posts describing them in more detail, such as here and here.

What is verifiable secret sharing (VSS)?

Secret sharing is a cryptographic protocol for splitting a secret key (or other secret data) into key shares. These key shares should look entirely random so that, on their own, they reveal no information about the underlying secret. Still, when enough shares are combined, the original secret can be recovered. The most common technique for secret sharing is Shamir’s Secret Sharing scheme, which leverages properties of polynomials.

The high-level idea behind Shamir’s scheme is that for n users, you want at least t of them (where t ≤ n) to recover the secret by combining their shares. To achieve this, we generate a random polynomial p of degree t-1 over a finite group with the constant term set to the secret value.

\displaystyle \begin{aligned}  p(x) = \mathsf{secret} + a_1 x + \ldots + a_{t-1} x^{t-1}  \end{aligned}

Then, we create the secret shares by evaluating the polynomial at n different points, one point for each user. One single point (or even a couple of points, depending on the value of t) reveals no information about the polynomial. However, if users combine enough points, they can recover the original polynomial using polynomial interpolation. Since the secret value is encoded in the polynomial, recovering the polynomial recovers the secret.

Threshold signature schemes use secret sharing to generate a signing key that is shared among multiple users, but in practice, most schemes have to use a more advanced version of secret sharing known as verifiable secret sharing (VSS). Often, users cannot assume that others running these protocols are honest. VSS allows users to verify that the shares they received were generated honestly. The most common VSS scheme was developed by Feldman. His scheme uses the same technique as Shamir’s scheme for generating shares (i.e., generating a random polynomial using the secret as the constant term) and creates additional values to make these shares verifiable.

From zero to hero

We are disclosing two bugs that affect Feldman’s verifiable secret sharing within different threshold signature scheme implementations. These bugs are not a result of some novel analysis that could not have been foreseen; on the contrary, these bugs stem from one of the few known weaknesses of secret sharing. We highlight them today not only due to the number of affected vendors but also because they are representative of a whole host of critical bugs that stem from the same recurring problem in non-standard cryptography: a lack of documentation and guidance.

The first bug is related to how secret shares are generated. Since we defined the constant term of the polynomial as the secret value, it is essential that, when generating shares, the x-value of the polynomial point is non-zero. If we create shares at the 0 point, then the polynomial evaluates to the constant term, which leaks the secret value entirely:

\begin{aligned}  p(0) &= \mathsf{secret} + a_1 0 + \ldots + a_{t-1} 0^{t-1} \\  &= \mathsf{secret}  \end{aligned}

Most implementations avoid this possibility altogether by evaluating the polynomial at values (1, 2, …, n), where n is the number of shares needed. However, some implementations evaluate the polynomials at specific values; for instance, Binance’s implementation evaluates the polynomial at each user’s unique ID value. Implementations designed in this way must verify that these IDs are non-zero; most succeed in doing this. However, some implementations forget that these sharing schemes operate over a finite group, so this zero check has to be performed modulo the group’s order! If this check is not performed, the secret value is immediately leaked to malicious users who set their unique IDs equal to the group’s order. If the group order is q, then:

\begin{aligned}  p(q) &= \mathsf{secret} + a_1 q + \ldots + a_{t-1} q^{t-1} \pmod{q} \\  &= \mathsf{secret} + a_1 0 + \ldots + a_{t-1} 0^{t-1} \pmod{q} \\  &= \mathsf{secret}  \end{aligned}

The affected implementations checked that the user IDs used to generate these secret shares were non-zero but did not perform this check modulo the elliptic curve group order.

From zero to crash

The second bug is related to the mishandling of modular arithmetic operations. Depending on the group of users signing a message, users calculate the Lagrangian coefficient, which is the product of terms of the form IDi / (IDi – SelfID). Since we are working on a finite field, we compute the division with the modular inverse of (IDi – SelfID). If a user’s IDi is modularly equal to the current user’s ID (SelfID), the subtraction will be modularly equal to zero—but zero does not have a modular inverse! The vulnerable implementations did not validate the modular inverse and would panic with a null dereference.

We often find these bugs in our audits; they are easy to miss without scrupulous attention to modular arithmetic details. Most times, there are even validations in place, but these are insufficient if the arguments are not checked in the context of the finite field.

When using modular arithmetic with a generic Big Integer class, take the following steps:

  • Always modularly reduce the numbers before validations, such as comparisons.
  • Always validate operations such as modular inverses and modular square roots. Depending on the API, either check the return value or catch errors to ensure that the function does not panic.

If you are still unsure or would like a second opinion, contact us for an audit.

ZKDocs

Today, we are releasing ZKDocs, our documentation for non-standardized cryptographic primitives. We hope it will help developers avoid these bugs in the future by providing comprehensive implementation details and security considerations of these protocols.

As we discovered more instances of these bugs, we began to think about why they were occurring and how we could prevent them from occurring in the future. Unfortunately, for non-standardized cryptographic protocols, the burden is on the developers to figure out all of the low-level implementation details and security pitfalls. To understand how limited the resources are, try searching for information on Feldman’s verifiable secret sharing scheme (the most common scheme of its kind!). The only results you will likely find are a Wikipedia article and Feldman’s original paper from 1987. Aside from that, you may be able to find some Stack Overflow discussions or old lecture notes. But that’s about it.

These schemes are complicated! With such limited documentation and guidance available, we shouldn’t be surprised that these types of bugs end up occurring in practice. With ZKDocs, we aim to fill in that gap. For instance, to read more about the details of the first bug related to zero-shares that we found, check out the secret sharing section in ZKDocs!

The “Shamir’s Secret Sharing Scheme” section of ZKDocs

Coordinated disclosure

October 19, 2021: Discovered secret data leaks in tss-lib
October 21, 2021: Reported to Binance
November 1–December 3, 2021: Internal discovery of issues affecting Clover, Keep Network, Swingby, THORChain, and ZenGo X
December 6, 2021: Reported to Clover, Keep Network, Swingby, THORChain, and ZenGo X

As of December 20, 2021, Binance, Keep Network, Swingby, THORChain, and ZenGo X have patched their implementations with the required fixes. The one exception is Clover, who has not replied to our emails.

We would like to thank the Binance, Keep Network, SwingBy, THORChain, and ZenGo X teams for working swiftly with us to address these issues.

Detecting MISO and Opyn’s msg.value reuse vulnerability with Slither

By Simone Monica

On August 18, 2021, samczsun reported a critical vulnerability in SushiSwap’s MISO smart contracts, which put ~350 million USD (109 thousand ETH) at risk. This issue is similar to an attack that was conducted on the Opyn codebase in August of 2020.

At the time of the report, I was finishing my blockchain security apprenticeship at Trail of Bits, where I learned more about Slither’s capabilities. I immediately wondered whether it was possible to create Slither detectors for these vulnerabilities. The answer is yes! Today, we are releasing two new open-source detectors that can detect the Opyn and MISO vulnerabilities, respectively: msg-value-loop and delegatecall-loop.

msg.value inside a loop? PLS NO

The underlying result of both the MISO and Opyn vulnerabilities is the same: the reuse of the same msg.value amount multiple times. The difference is that the msg.value is used explicitly (msg.value) in Opyn and implicitly (delegatecall) in MISO.

Let’s look at a simple example to demonstrate the vulnerability.

In Opyn’s case, msg.value is used inside a loop in a payable function. If addBalances() is called with multiple receivers, the same msg.value will be reused for each recipient even though the corresponding ETH for only one recipient is sent.

contract C {
  mapping (address => uint256) balances;                
  function addBalances(address[] memory receivers) public payable {
            for (uint256 i = 0; i < receivers.length; i++) {
                                balances[receivers[i]] += msg.value;
}
  }
}

In MISO’s case, the source of the vulnerability is a delegatecall inside a loop within a payable function that also calls a payable function. Delegatecall makes a call to a function maintaining the current contract’s context, sender, and value. This is a simplified explanation of the MISO vulnerability; for more detail, I suggest that you read samczsun's blog post.


contract C {
  mapping (address => uint256) balances;
 
  function addBalance(address a) public payable {
                balances[a] += msg.value;
  }         
 
  function addBalances(address[] memory receivers) public payable {
                for (uint256 i = 0; i < receivers.length; i++) {
                                address(this).delegatecall(abi.encodewithsignature(“addBalance(address)”, receivers [i]));
}
  }
}

Slither's new detectors

By running Slither to detect calls to delegatecall and msg.value in a loop, we get the following results:

$ slither --detect delegatecall-loop Delegatecall.sol
C.addBalances(address[]) (delegatecall.sol#10-15) has delegatecall inside a loop in a payable function: address(this).delegatecall(abi.encodeWithSignature(addBalance(address),receivers[i])) (delegatecall.sol#12)
Reference: https://github.com/crytic/slither/wiki/Detector-Documentation/#payable-functions-using-delegatecall-inside-a-loop
Delegatecall.sol analyzed (1 contracts with 1 detectors), 1 result(s) found
 
$ slither --detect msg-value-loop Msgvalue.sol
C.addBalances(address[]) (msgvalue.sol#7-12) use msg.value in a loop:
balances[receivers[i]] += msg.value (msgvalue.sol#9)
Reference: https://github.com/crytic/slither/wiki/Detector-Documentation/#msgvalue-inside-a-loop
Msgvalue.sol analyzed (1 contracts with 1 detectors), 1 result(s) found

These two detectors are implemented with the same logic using the CFG representation and Slither’s intermediate language representation, SlithIR. The detectors iterate through the contracts’ payable function nodes and check whether the current node is entering or exiting a loop. Now, SlithIR comes to our aid. The two detectors’ implementations diverge, and they iterate through the node’s SlithIR operations; msg-value-loop checks whether the current operation reads msg.value (see the detector code here), and delegatecall-loop checks whether the current operation is a delegatecall (see the detector code here).

Let’s try the detectors on the smart contracts that were found to be vulnerable.

Opyn

$ slither 0x951D51bAeFb72319d9FBE941E1615938d89ABfe2 --detect msg-value-loop
OptionsContract._exercise(uint256,address) (crytic-export/etherscan-contracts/0x951D51bAeFb72319d9FBE941E1615938d89ABfe2-oToken.sol#1816-1899) use msg.value in a loop: require(bool,string)(msg.value == amtUnderlyingToPay,Incorrect msg.value) (crytic-export/etherscan-contracts/0x951D51bAeFb72319d9FBE941E1615938d89ABfe2-oToken.sol#1875)
Reference: https://github.com/crytic/slither/wiki/Detector-Documentation/#msgvalue-inside-a-loop
0x951D51bAeFb72319d9FBE941E1615938d89ABfe2 analyzed (13 contracts with 1 detectors), 1 result(s) found

SushiSwap’s MISO

$ slither 0x4c4564a1FE775D97297F9e3Dc2e762e0Ed5Dda0e --detect delegatecall-loop
BaseBoringBatchable.batch(bytes[],bool) (contracts/Utils/BoringBatchable.sol#35-44) has delegatecall inside a loop in a payable function: (success,result) = address(this).delegatecall(calls[i]) (contracts/Utils/BoringBatchable.sol#39)
Reference: https://github.com/crytic/slither/wiki/Detector-Documentation/#payable-functions-using-delegatecall-inside-a-loop
0x4c4564a1FE775D97297F9e3Dc2e762e0Ed5Dda0e analyzed (21 contracts with 1 detectors), 1 result(s) found

Conclusion

In summary, Slither is a powerful tool for preventing security vulnerabilities in smart contracts, and it can be expanded by creating new detectors. If you want to learn more about this tool, check out our "Building Secure Smart Contracts" guide.

My apprenticeship at Trail of Bits has been fun and has helped me improve my security skills and learn how to better approach audits. If you are interested in having a similar experience, you can apply to join us as a blockchain security apprentice.

What does your code use, and is it vulnerable? It-depends!

You just cloned a fresh source code repository and want to get a quick sense of its dependencies. Our tool, it-depends, can get you there.

We are proud to announce the release of it-depends, an open-source tool for automatic enumeration of dependencies. You simply point it to a source code repository, and it will build a graph with the required dependencies. it-depends currently supports cargo, npm, pip, go, CMake, and autotools codebases, packages in their associated package managers, and Ubuntu apt.

Modern programming languages and packaging frameworks increasingly include utilities to enumerate dependencies and even map them to known vulnerabilities (e.g., npm audit, cargo audit, and pip-audit). it-depends unifies these functionalities into one tool that supports languages and frameworks for which no similar tool exists, such as autotools and CMake. Simply run it-depends in the root of a source code repository, and the tool will produce a software bill of materials (SBOM) for all of the packages on which the repository could depend. it-depends not only detects known-vulnerable dependencies, but it also identifies duplicate functionality within a repository, which can support software debloating efforts. “Why is this repository using both libsodium and libssl?”

it-depends uses the CVEdb and Google’s Open Source Vulnerabilities (OSV) databases to determine which known vulnerabilities may be reachable from any package. After finding matching entries, the tool produces a comprehensive list of all reachable CVEs in a code repository or package.

Existing approaches typically resolve only direct dependencies or rely on a package lock file. In contrast, it-depends recursively builds a project’s dependency graph starting from either a source code repository or a package specification, enumerating the superset of all feasible dependency resolutions, not just a single resolution. This can help identify latent upstream vulnerabilities that might exist only in a subset of the universe of all feasible dependency resolutions.

Existing solutions stop within the walled garden of their package management ecosystem. it-depends, in contrast, is able to detect native library usage and interdependency. For example, it-depends correctly identifies that the Python package pytz depends on the native library libtinfo6, which itself depends on libcrypt1, which depends on libc6, … and so on.

it-depends can emit an SBOM, a dependency graph, or an interactive HTML page

Why enumerating dependencies is hard

  • Semantic versioning: We don’t care only about the specific versions installed on a system; we want to reason about all possible versions that can satisfy the dependencies.
  • Some build systems such as autotools and CMake do not have a concept of packages, and native library versioning is not well defined.
  • What if a package written in a high-level language, like Python or JavaScript, uses native libraries?
  • Mapping CVEs to source code packages is nontrivial.

Basic functionality (business as usual)

it-depends is written in Python and is easy to extend. It is built on a set of pluggable resolvers to handle different package managers and source code repositories. Each resolver acts as an oracle for its specific universe of packages, expanding the global dependency graph. Adding a resolver for a new build system or package manager basically requires only that one resolve method be implemented:

from it_depends.dependencies import DependencyResolver, Dependency, Package

class CustomResolver(DependencyResolver):
    def resolve(self, dependency: Dependency) -> Iterator[Package]:
        """Yields all packages that satisfy the given dependency"""
        raise NotImplementedError("TODO: Implement")

Native resolution (the not-so-secret sauce)

it-depends features a generic plug-in to infer native operating system-level dependencies of a given package. To accomplish this, each package is installed in a container, and the native libraries are monitored using ptrace. File accesses are translated to the Ubuntu packages that provide the library file loaded by the package, and inter-native dependencies are extracted from the Ubuntu package repository. A baseline is established to remove packages that are inherent to the type of package currently being analyzed.

Try it yourself!

it-depends is free and open source. You can install it by running pip3 install it-depends. Further installation instructions are available on the tool’s GitHub page. Please try it, and let us know how it works!

Acknowledgements

it-depends was developed by Trail of Bits based upon work supported by DARPA under Contract No. HR001120C0084 (Distribution Statement A, Approved for Public Release: Distribution Unlimited). Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the United States Government or DARPA.

MUI: Visualizing symbolic execution with Manticore and Binary Ninja

By Alan Chang, University of Oxford

During my summer internship, I had the wonderful opportunity to work on the Manticore User Interface (MUI). The MUI project aims to combine the strength of both Manticore, a powerful symbolic execution library, and Binary Ninja, a popular binary analysis tool, to provide a more intuitive and visual interface for working with symbolic execution.

Exploring an ELF binary with the MUI

A gentle introduction to symbolic execution

When conducting vulnerability research and reverse engineering, researchers often wish to thoroughly test a piece of software and explore all its possible states. Symbolic execution is one method that can address this matter. As shown in the following code snippet, the program execution can go into either the if case or the else case.

Code snippet and constraints graph

To test this code with a concrete input for x (e.g., x = 4), only one of the two states will be reached during execution. This means multiple runs of the program will be necessary to fully explore the state space. However, if we consider x to be symbolic, similar to the notion of a variable in mathematics, both states can be explored simultaneously, and we just have to keep track of the constraints on our symbolic variables at each point of the exploration process.

Specifically in this example, the if-else statement will create two states, one with the constraint x < 5 exploring ① and another with x ≥ 5 exploring ②. This is the key concept behind symbolic execution, which can help us figure out whether a given code segment is reachable and what input would be necessary.

The problem of state explosion

The symbolic execution technique, however, is not without its drawbacks. Notably, there’s the issue of state explosion. When there are many conditional branches and loops in a program, the number of states that need to be explored can grow exponentially. It can quickly become infeasible to explore them all. The example below illustrates this point. With just three loop iterations, the code snippet will end up with eight states to explore, and this number rapidly explodes as the iteration increments.

Example of state explosion

This issue of state explosion is further exacerbated by the fact that most symbolic execution libraries lack a method of visualizing the state exploration process. That means that it’s often difficult to even pinpoint where state explosions occur, let alone to begin fixing them.

The MUI project aims to address these issues by providing an interactive user interface to better visualize this state exploration process in symbolic execution and to keep the human in the loop. More specifically, the MUI is a Binary Ninja plugin with custom Qt widgets that provide a visual and intuitive interface to interact with Manticore, the open-source symbolic execution library developed by Trail of Bits.

MUI features and demonstration

To illustrate some of the MUI features, let’s try and solve a simple crackme challenge inside the Manticore repository. The objective is straightforward: we need to determine the correct input for this program.

Running the challenge with an incorrect input

First attempt: Using the find and avoid commands

Opening the ELF binary in Binary Ninja, we can quickly spot the two puts calls in the main function. These two function calls are used for the success and failure cases, respectively. Now, our objective can be rephrased as finding an input so that the code execution reaches the success case.

We can convey this objective to the MUI using the find command; the instruction is highlighted with green. Similarly, we can tell the MUI to avoid the failure case using the avoid command.

Now, with an objective specified, we can run the MUI to find the solution to this crackme challenge.

As shown in the gif, we can use Manticore to explore the state space of the challenge within the Binary Ninja UI, and we obtain the solution coldlikeminisodas. Giving this answer back to the program verifies that we have indeed solved the challenge.

Running the program with a correct input

Turning our attention back to the MUI, we can use the custom State List widget to see the way the MUI got to this solution and all the states that it explored. State 34 in the list denotes the final state in which we reached the success case.

State List widget in action

To further visualize the relation between each of the states, we can use the Graph View widget. This widget shows a provenance tree containing all the states. Double-clicking on a state node will bring us to the last instruction before the state was forked or terminated. Using the tab shortcut, the tree graph can be expanded to show other terminated states.

Graph View widget in action

A second solution: Custom hooks

In addition to all the cool features that we have already demonstrated, the MUI still has more tricks up its sleeves. Let’s solve this challenge again using a different method.

Body of the main function shown again for convenience

If we spend some time understanding the decompiled code, we can see that our user input is compared against the correct answer one character at a time, and when one character of our input does not match the correct answer, we see the failure message. With this knowledge, we can prevent all state forking by explicitly telling the MUI which path we want to take. This can be achieved with a custom hook and the code snippet below:

global bv,m,addr
def hook(state):
   flag_byte = state.cpu.AL - 0xa
 
   with m.locked_context() as context:
       if 'solution' in context:
           context["solution"] += chr(flag_byte)
       else:
           context["solution"] = chr(flag_byte)
       print(f'flag: {context["solution"]}')
 
   state.cpu.RIP = 0x400a51
m.hook(addr)(hook)

The custom hook feature here is a kind of fallback that gives you the full power of the Manticore API without having to write a complete script in a different environment. This allows researchers to do everything inside of Binary Ninja and reduce the amount of context switching required.

How about EVM?

Manticore is well known for its support for smart contracts, and the MUI plugin also offers basic EVM support. Documentation can be found in the project README file.

To get around the lack of built-in EVM support in Binary Ninja, the MUI leverages a few other open-source tools developed by Trail of Bits. Smart contracts are compiled using crytic-compile, an abstraction layer for smart contract build systems, and the generation and visualization of disassembly code and CFGs is handled by ethersplay, an EVM disassembler.

With these tools, the EVM feature set in the MUI is now on par with the default Manticore CLI, and more features are under active development. But even with the same features, the MUI really outshines the CLI tool in its usability and discoverability. Instead of looking through documentation to find the right command-line argument, users can now see all the available Manticore run options and pick the ones they need using a dynamically generated and up-to-date UI panel. Furthermore, the tight integration with Binary Ninja also means that these options can be persistently saved inside Binary Ninja Database (BNDB) project files, offering more convenience.

EVM run dialog

Conclusions

I’m really proud of what I was able to achieve with the MUI project during my internship. The MUI is already available for many use cases for researchers to improve their workflow. We have achieved what we set out to do with this project: to provide an intuitive and visual interface for working with symbolic execution.

This internship has been a great learning opportunity for me. Through this internship, I gained a deeper understanding of how symbolic execution works and learned about a lot of different topics ranging from project planning and documentation to Qt UI development, from multi-threaded applications to the Git workflow, and much more.

I would like to thank my mentors, Eric Kilmer and Sonya Schriner, for their tremendous help during my internship. Both of them provided me with guidance when I needed it but also gave me enough freedom to explore and innovate during the development of the MUI. My internship experience would not have been the same without them. I’m truly grateful for this internship opportunity at Trail of Bits, and I cannot wait to see what the future brings for the MUI project.

How to choose an interesting project

By Trent Brunson, Head of Research & Engineering
Originally published on October 15, 2021

Come join our team today! Trail of Bits is hiring full-time Senior Software Engineers and Software Security Research Engineers.

Over the last nine years, I’ve interviewed hundreds of applicants for research and engineering positions. One of my favorite icebreakers is, What kind of project would you choose to work on if you were given a $500,000 budget and one year to work on it with no oversight or consequences? (There’s no wrong answer.) Surprisingly, most people have never indulged themselves in this thought experiment. Why? For one, thinking of a good project isn’t easy!

Trail of Bits engineers are encouraged to dedicate a portion of their workweek to an internal research and development (IRAD) project of their choosing, so they face a similar challenge of having to commit to a project they think they might like. But it’s easy to become rudderless in a sea of security topics, where you may find yourself aimlessly scrolling through HackerNews links, scanning through blogs, and poring over preprints on arXiv.org. So here, I’d like to share a simple exercise that I go through when I’m looking for a new project.

This isn’t about persuading others to buy into your idea or justifying why your project is worthwhile. This is about you, a hard-working and curious soul, discovering topics that you genuinely find interesting and want to pursue—free of judgment, free of consequences.

As you make your decision, think about the factors that may influence your choice:

  • What skills you have
  • What skills you wish you had
  • How much time you have available
  • Where you are at in your career
  • What it will do for your career
  • Whether you will have a team to help
  • What impact you are looking to make

Where you are at and where you want to be

Start collecting and organizing information about yourself by making these five lists:

1) Your current skill set. Write down your strengths in areas in which you consider yourself knowledgeable or well read. List broad topic areas to open up the possibility of trying something completely new, or if you want to steer your thinking toward a specific topic area, only include subcategories within a specific domain.

2) What you’re interested in. It’s really simple. What do you think is cool? Maybe you read an article or a blog you thought was clever. Maybe you admired someone’s conference presentation. For this, I prefer to categorize these interests according to how much exposure I’ve had. This way, I can see where I might need to set aside time to learn the basics before making real progress.

3) How long the project will be. This isn’t necessarily meant to be the final date on which you walk away from your work to go do something else. I see this as more of a timeline in which you can stop and ask yourself whether you’re happy continuing or whether you want to choose a different path.

4) How many hours per week you will work on it. This is meant for you to take a look at your current situation and realistically determine your level of dedication. How many hours per week do you see yourself focusing on your project, knowing your schedule, prior commitments, attention span, and ability to work without distractions?

5) Desired outcome. This is meant to tie everything together and ask yourself what it is you want to produce with your effort. The outcome may be subtle, like the satisfaction of learning something new, or ambitious, like publishing a book or writing a dissertation.

Arranging these lists side-by-side helps you see the big picture and discover the different pathways that may lead to a project. I did this for myself to demonstrate how it might look:

The topics in green are ones that I understand fairly well; I could work my way through an academic publication on these topics without much trouble. Those in yellow are ones that I’ve had some exposure to but would need to do some extra Googling and reading to understand some of the subtleties. Those in red are, for the most part, completely uncharted waters for me. They sound cool, but I would have no idea what I would be getting into.

Be resourceful—use mad libs

Reading this chart from left to right, I can begin to think about all the different possibilities.

Using my ____________ skills, I can learn more about ______________ in ___ months if I commit at least ___ hours per week to produce a ______________.

When you start to build statements from your lists, it should become clear what is and isn’t feasible and what you are and aren’t willing to commit to. Here are some examples:

Using my C++ skills, I can learn more about LLVM in 6 months if I commit at least 5 hours per week to produce a peer-reviewed publication.

Sounds nice, but a peer-reviewed publication might be a bit of a stretch. I’ve completed the LLVM Kaleidoscope Tutorial and written some analysis passes before, but I’ve never taken a compiler’s course nor am I familiar with compiler and programming language research. So a blog post or pull request might be more attainable with a 6-month, 120-hour commitment. Also, an LLVM project could be good for my career.

Using my statistics and numerical analysis skills, I can learn more about open-source intelligence in 12 months if I commit at least 8 hours per week to produce a new open-source tool.

I’ve been really interested in Bellingcat’s work ever since I read about how they tracked the downing of flight MH17 over Ukraine to a Russian missile system. Really cool stuff. I think this project and commitment level are similar to a typical IRAD project. At that level of commitment, I would want it to have an impact on my career, so I would need to try and link the project to one of Trail of Bits’ core values. The next step is to narrow the search and see where today’s open-source intelligence tools fall short.

Using my natural language processing skills, I can learn more about topic modeling in 9 months if I commit at least 3 hours per week to produce a blog post.

Three hours per week sounds reasonable for something like this for a personal project. It doesn’t really align with my career goals, but it’s something I’ve read about for the past few years and want to know more about. There are elements of statistics, programming, NLP, and machine learning involved. How cool is that!

Apply today!

At Trail of Bits, I encourage my team to allocate 20% of their workweek to an IRAD project. But when discussing ideas, it’s common to hear people say that they don’t think their idea is good or novel enough, that it isn’t likely to succeed, or that it’s either too ambitious or not ambitious enough.

What makes this exercise for choosing a project so effective is that all the work is simply put into drawing a line from what you do know to what you want to know. Your commitment level is likely to be predetermined by whatever situation you’re in. And the final goal or outcome will be informed by the other four parameters. If done in earnest, this method should produce a whole line-up of possible projects you could find yourself enjoying. I hope you try it, and I hope you find it motivating.

Once again, I’d like to invite our readers to check out our Careers page for all of our current open positions. And if you’re interested in the Senior Software Engineer or Software Security Research Engineer opening, I look forward to hearing more about the IRAD projects you hope to work on at Trail of Bits!