Meet Algo, the VPN that works

I think you’ll agree when I say: there’s no VPN option on the market designed with equal emphasis on security and ease of use.

That changes now.

Today we’re introducing Algo, a self-hosted personal VPN server designed for ease of deployment and security. Algo automatically deploys an on-demand VPN service in the cloud that is not shared with other users, relies on only modern protocols and ciphers, and includes only the minimal software you need.

And it’s free.

For anyone who is privacy conscious, travels for work frequently, or can’t afford a dedicated IT department, this one’s for you.

Don’t bother with commercial VPNs

They’re crap.

Really, the paid-for services are just commercial honeypots. If an attacker can compromise a VPN provider, they can monitor a whole lot of sensitive data.

Paid-for VPNs tend to be insecure: they share keys, their weak cryptography gives a false sense of security, and they require you to trust their operators.

Even if you’re not doing anything wrong, you could be sharing the same endpoint with someone who is. In that case, your network traffic will be analyzed when law enforcement makes that seizure.

Streisand is no better

Good concept. Poor implementation.

It installs ~40 services, including numerous remote access services, a Tor relay node, and out-of-date software. It leaves you with dozens of keys to manage and it allows weak crypto.

That’s a hefty footprint and it’s too complicated for any reasonable person to secure. If you set up an individual server just for yourself, you’d never know if or when an attacker compromised it.

OpenVPN: Requires client software

OpenVPN’s lack of out-of-the-box client support on any major desktop or mobile operating system introduces unnecessary complexity. The user experience suffers.

Speaking of users, they’re required to update and maintain this software too. That is a recipe for disaster.

Worst of all, OpenVPN depends on the security of TLS, both the protocol and its implementations. Between that, and past security incidents, we simply trust it less.

Other VPNs’ S/WAN song

The original attempt at free VPN software -FreeS/WAN- died in the early 2000’s when its dev team fractured. Three people forked it into LibreSwan, strongSwan and Openswan.

To use any of them today, you need something approaching tribal knowledge. The available documentation stymied and appalled us:

  • Little differentiation – If you search for information about strongSwan’s configuration, you could easily end up at a LibreSwan page. The terms will look familiar, but the instructions will be wrong.
  • Impenetrable language – Instead of using standard terms like ‘client, server, remote and local,’ they use ‘sun, moon, bob, carol,’ and a bunch of other arbitrary words.
  • Brittle methodology – The vast majority of documentation and guides insist on using ‘tried and true’ methods such as L2TP and IKEv1, even though IKEv2 is simpler and stronger. Since Apple added IKEv2 to iOS 8, there’s no reason not to use it.

Only the strongest S/WAN survived

After wading through the convoluted quagmire that is the S/WAN triplets, we settled on strongSwan.

Its documentation -such as it is- is the best of the bunch. It was rewritten recently from scratch to support IKEv2 (a positive step when supporting a major new protocol version). It’s the only IPSEC software that even offers the option for a trusted key store.

And the community is helpful. Special thanks to Thermi.

But it’s still super-complicated. Too many contributors made it very arcane. Again, you need that tribal knowledge to make IPSEC do what you want.

These are examples of why cryptography software has a well-earned reputation for poor usability. A tightly knit development community only communicating with itself tends to lead to a profusion of options that should be deprecated. There’s no sign that the user interface or experience has been reviewed on behalf of less-experienced users. For anyone bold enough to consider these points, here lies the path to widespread adoption.

So, we built Algo

Algo is a set of Ansible scripts that simplifies the setup of a personal IPSEC VPN. It contains the most secure defaults available, works with common cloud providers, and does not require client software on most devices.

The ‘VP of all Networks’ is strong, secure and tidy. It uses the least amount of software necessary to get the job done.

We made Algo with corporate travelers in mind. To save bandwidth and increase security, it blocks ads and compresses what’s left.

We shared an early version of Algo at Black Hat this year and people loved it.

Algo’s Features Anti-features
  • Supports only IKEv2
  • Supports only a single cipher suite w/ AES-GCM, SHA2 HMAC, and P-256 DH
  • Generates mobileconfig profiles to auto-configure Apple devices
  • Provides helper scripts to add and remove users
  • Blocks ads with a local DNS resolver and HTTP proxy
  • Based on current versions of Ubuntu and strongSwan
  • Installs to DigitalOcean, Amazon, Google, Azure or your own server
  • Does not support legacy cipher suites nor protocols like L2TP, IKEv1, or RSA
  • Does not install Tor, OpenVPN, or other risky servers
  • Does not depend on the security of TLS
  • Does not require client software on most platforms
  • Does not claim to provide anonymity or censorship avoidance
  • Does not claim to protect you from the FSB, MSS, DGSE, or FSM

Designed to be disposable

We wanted Algo to be easy to set up. That way, you start it when you need it, and tear it down before anyone can figure out the service you’re routing your traffic through.

Setup is automated. Just answer a few questions, and Algo will build your VPN for you.

We’ve automated the setup process for Apple devices, too. Algo just gives you a file that you AirDrop to your device. You press ‘install’ and you’ve got your VPN. Or ‘VPNs.’

You don’t have to choose just one VPN gateway. You could make yourself 20 on different services; Digital Ocean in Bangalore, EC2 in Virginia or any other combination. You have your choice.

One last reason that Algo is such a good solution: it’s been abstracted as a set of Ansible roles that we released to the community. Ansible provides clearer documentation, ensures that we can repeat what it is that we’re doing, and allows us to monitor configuration drift.

Thanks to the roles we created in Ansible, it’s very easy for us to add and refine different features independently. Members of our team will keep up on feature requests.

We’ll make sure it’s right. You can just use it.

Try Algo today.

Want help installing Algo?

We’re planning a virtual crypto party for Friday, December 16th at 3pm EST where we’ll walk you through installing Algo on their own. Register to join us.

Shin GRR: Make Fuzzing Fast Again

grr

We’ve mentioned GRR before – it’s our high-speed, full-system emulator used to fuzz program binaries. We developed GRR for DARPA’s Cyber Grand Challenge (CGC), and now we’re releasing it as an open-source project! Go check it out.

Fear GRR

Bugs aren’t afraid of slow fuzzers, and that’s why GRR was designed with unique and innovative features that make it tread scarily fast.

  • GRR emulates x86 binaries within a 64-bit address space using dynamic binary translation (DBT). As a 64-bit program, GRR can use more hardware registers and memory than the original program. This enabled easy implementation of perfect isolation without complex register-rescheduling or memory remapping logic. The translated program never sees GRR coming.
  • GRR is fast. Existing DBTs re-translate the same program on every execution. They specialize in translating long-running programs, where the translation cost is amortized over time, and “hot” code is reorganized to improve performance. Fuzzing campaigns execute the same program over and over again, so all code is hot, and re-translating the same code is wasteful. GRR’s avoids re-translating code by caching it to disk, and it optimizes its the cached code over the lifetime of the fuzzing campaign.
  • GRR eats JIT compilers and self-modifying code for breakfast. GRR translates one basic block at a time, and indexes the translated blocks in its cache using “version numbers.” A block’s version number is a Merkle hash of the contents of executable memory. Modifying the contents of an executable page in memory invalidates its hash, thereby triggering re-translation of its code when next executed.
  • GRR is efficient. GRR uses program snapshotting to skip over irrelevant setup code that executes before the first input byte is ever read. This saves a lot cycles in a fuzzing campaign with millions or billions of program executions. GRR also avoids kernel roundtrips by emulating system calls and performing all I/O within memory.
  • GRR is extensible. GRR supports pluggable input mutators, including Radamsa, and code coverage metrics, which allows you to tune GRR’s behavior to the program being fuzzed. In the CGC, we didn’t know ahead of time what binaries we would get. There is no one-size fits all way of measuring code coverage. GRR’s flexibility let us change how code coverage was measured over time. This made our fuzzer more resilient to different types of programs.

Two fists in the air

Take a look at GRR demolishing this CGC challenge that has six binaries communicating over IPC. GRR detects the crash in the 3rd binary.

NRFIN_00006

This demo shows off two nifty features of GRR:

  1. GRR can print out the trace of system calls performed by the translated binary.
  2. GRR can print out the register state on entry to every basic block executed. Instruction-granularity register traces are available when the maximum basic block size is set to a one instruction.

Dig deeper

I like to think of GRR as an excavator for bugs. It’s a lean, mean, bug-finding machine. It’s also now open-source, and permissively licensed. You should check it out and we welcome contributions.

Come Find Us at O’Reilly Security

We’re putting our money where our mouth is again. In continued support for New York’s growing infosec community we’re excited to sponsor the upcoming O’Reilly Security Conference.

We expect to be an outlier there: we’re the only sponsor that offers consulting and custom engineering rather than just off-the-shelf products. We see this conference as an opportunity to learn more about the problems -and their root causes- that attendees face, and how we can help.

We’ve had tremendous success helping companies like Amazon and Facebook. But you don’t need to have Amazon- or Facebook-sized security problems to benefit from our tools and research. If you have difficult security challenges, we hope you’ll come speak with us.

Pick through our tools

We’re going to have a LIVE instance of our Cyber Reasoning System (CRS) at our booth. Recently, we used it to audit a much larger amount of code in less time, in greater detail, and at a lower cost than a human could. For granular details, come grab a copy of a CRS-driven security assessment we conducted on zlib for Mozilla.

Autonomous cyber defense systems happens to be the topic of the keynote by Michael Walker, the DARPA PM who ran the Cyber Grand Challenge. If you’re intrigued by what he says, swing by our booth to see the CRS in action.

The CRS is just one of a unique set of capabilities and proprietary tools that we’ve developed over the course of deep research engagements, some for DARPA. We’ll have other tools to share, such as:

  • Challenge Binaries, which make it possible to objectively compare different bug-finding tools, program-analysis tools, patching strategies and exploit mitigations.
  • Screen, which combines a set of LLVM passes that track branching behavior to help find side-channel vulnerabilities, and an associated web frontend that helps with identifying commits that introduce them.
  • ProtoFuzz, a generic fuzzer for Google’s Protocol Buffers format. Instead of defining a new fuzzer generator for custom binary formats, ProtoFuzz automatically creates a fuzzer based on the same format definition that programs use.
  • osquery for Windows, a port of Facebook’s open-source endpoint security tool. This allows you to treat your infrastructure as a database, turning operating system information into a format that can be queried using SQL-like statements. This functionality is invaluable for performing incident response, diagnosing systems operations problems, ensuring baseline security settings, and more.

That’s just the start. We’re prepared to discuss every tool that we’ve ever mentioned on this blog.

Bring us your problems.

If you’re coming to this conference with complex needs that don’t fit neatly into any one product category, come talk to us. Mark, Sophia, Yan, and Dan will be on hand to answer your questions. We’re especially keen to chat with you if you’re:

  • Building low-level software, say in C or C++.
  • Using crypto in new and interesting ways.
  • Affected by resourced threat actors, reverse engineers, or fraudsters.
  • Building your own hardware or firmware.
  • Stuck on an intractable security problem that has eluded resolution. The more difficult the better.

Come find us at booth #104 in the sponsor pavilion.

Break a leg O’Reilly.

O’Reilly has put a lot of resources into their security conference. It fills a gap. We hope that it turns out well, and that they plan more events just like it in New York. See you there!

Let’s talk about CFI: clang edition

Our previous blog posts often mentioned control flow integrity, or CFI, but we have never explained what CFI is, how to use it, or why you should care. It’s time to remedy the situation! In this blog post, we’ll explain, at a high level, what CFI is, what it does, what it doesn’t do, and how to use CFI in your projects. The examples in this blog post are clang-specific, and have been tested on clang 3.9, the latest release as of October 2016.

This post is going to be long, so if you already know what CFI is and simply want to use it in your clang-compiled project, here’s the summary:

  • Ensure you are using a link-time optimization capable linker (like GNU gold or MacOS ld).
  • Add -flto to your build and linker flags
  • Add -fvisibility=hidden and -fsanitize=cfi to your build flags
  • Sleep happier knowing your binary is more protected against binary level exploitation.

For an example of using CFI in your project, please take a look at the Makefile that comes with our CFI samples.

What is CFI?

Control flow integrity (CFI) is an exploit mitigation, like stack cookies, DEP, and ASLR. Like other exploit mitigations, the goal of CFI is to prevent bugs from turning into exploits. Bugs in a program, like buffer overflows, type confusion, or integer overflows, may allow an attacker to change the code a program executes, or to execute parts of the program out of order. To convert these bugs to exploits, an attacker must force the target program to follow a code path the programmer never intended. CFI works by reducing the attacker’s ability to do that. The easiest way to understand CFI is that it aims to enforce at run-time what the programmer intended at compile time.

Another way to understand CFI is via graphs. A program’s control flow may be represented as a graph, called the control flow graph (CFG). The CFG is a directed graph where each node is a basic block of the program, and each directed edge is a possible control flow transfer. CFI ensures the CFG determined by the compiler at compile time is followed by the program at run time, even in the presence of vulnerabilities that would otherwise allow an attacker to alter control flow.

There are more technical details, such as forward-edge CFI, backwards-edge CFI, but these are best absorbed from the numerous academic papers published on control flow integrity.

History of CFI

The original paper on CFI from Microsoft Research was released in 2005, and since then there have been numerous improvements to the performance and functionality of various CFI schemes. Continued improvements mean that now CFI is mainstream: recent versions of both the clang compiler and Microsoft Visual Studio include some form of CFI.

Clang’s CFI

In this blog post, we will look at the various options provided by clang’s CFI implementation, what each does and does not protect, and how to use it in your projects. We will not cover technical implementation details or performance numbers; a thorough technical explanation is already available from the implementation team in their paper.

Control flow integrity support has been in mainline clang since version 3.7, invoked as a part of the supported sanitizers suite. To operate, CFI requires the full control flow graph of a program. Since programs are typically built from multiple compilation units, the full control flow is not available until link time. To enable CFI, clang requires a linker capable of link-time optimization. Our code examples assume a Linux environment, so we will be using the GNU gold linker. Both GNU gold and recent versions of clang are available as packages for common Linux distributions. GNU gold is already included in modern binutils packages; Clang 3.9 packages for various Linux distributions are available from the LLVM package repository.

Some of the CFI options in clang actually have nothing to do with control flow. Instead these options detect invalid casts or other similar violations before they turn into worse bugs. These options are spiritually similar to CFI, however, because they ensure “abstraction integrity” — that is, what the programmer intended to happen is what happens at runtime.

Using CFI in Clang

The clang CFI documentation leaves a lot to be desired. We are going to describe what each option does, what limitations it has, and example scenarios where using it would prevent exploitation. These directions assume clang 3.9 and an LTO capable linker are installed and working. Once installed, both the linker and clang 3.9 should “just work”; specific installation instructions are beyond the scope of this blog post.

Several new compilation and linking flags are needed for your project: -flto to enable link-time optimization, -fsanitize=cfi to enable all CFI checks, and -fvisibility=hidden to set default LTO visibility. For debug builds, you will also want to add -fno-sanitize-trap=all to see descriptive error messages when CFI violation is detected. For release builds, omit this flag.

To review, your debug command line should now look like:

clang-3.9 -fvisibility=hidden -flto -fno-sanitize-trap=all -fsanitize=cfi -o [output] [input]

And your release command line should look like:

clang-3.9 -fvisibility=hidden -flto -fsanitize=cfi -o [output] [input]

You most likely want to enable every CFI check, but if you want to only enable select checks (each is described in the next section), specify them via -fsanitize=[option] in your flags.

CFI Examples

We have created samples with specially crafted bugs to test each CFI option. All of the samples are designed to compile cleanly with the absolute maximum warning levels* (-Weverything). The bugs that these examples have are not statically identified by the compiler, but are detected at runtime via CFI. Where possible, we simulate potential malicious behavior that occurs without CFI protections.

Each example builds two binaries, one with CFI protection (e.g. cfi_icall) and one without CFI protections (e.g. no_cfi_icall). These binaries are built from the same source, and used to illustrate the difference CFI protection makes.

We have provided the following examples:

  • cfi_icall demonstrates control flow integrity of indirect calls. The example binary accepts a single command line argument (valid values are 0-3, but try invalid values with both binaries!). The command line argument shows different aspects of indirect call CFI protection, or lack thereof.
  • cfi_vcall shows an example of CFI applied to virtual function calls. This example demonstrates how CFI would protect against a type confusion or similar attack.
  • cfi_nvcall shows clang’s protections for calling non-virtual member functions via something that is not an object that has those functions defined.
  • cfi_unrelated_cast shows how clang can prevent casts between objects of unrelated types.
  • cfi_derived_cast expands on cfi_unrelated_cast and shows how clang can prevent casts from an object of a base class to an object of a derived class, if the object is not actually of the derived class.
  • cfi_cast_strict showcases the very specific instance where the default level of base-to-derived cast protection, like in cfi_derived_cast, would not catch an illegal cast.

* Ok, we lied, we had to disable two warnings, one about C++98 compatibility, and one about virtual functions being defined inline. The point is still valid since those warnings do not relate to potential bugs.

CFI Option: -fsanitize=cfi

This option enables all CFI checks. Use this option! The various CFI protections will only be inserted where needed; you aren’t saving anything by not using this option and picking specific protections. So if you want to enable CFI, use -fsanitize=cfi.

The currently implemented CFI checks, as of clang 3.9, are described in more detail in the following sections.

CFI Option: -fsanitize=cfi-icall

The cfi-icall option is the most straightforward form of CFI. At each indirect call site, such as calls through a function pointer, an extra check verifies two conditions:

The address being called is a valid destination, like the start of a function
The destination’s static function signature matches the signature determined at compile time.

When would these conditions be violated? When exploiting memory corruption attacks! Attackers want to hijack the program’s control flow to perform their bidding. These days, anti-exploitation protections are good enough force attackers to reuse pieces of the existing program. The program re-use technique is called ROP, and the pieces are referred to as gadgets. Gadgets are almost never whole functions, but snippets of machine code close to a control flow transfer instruction. The important aspect is that these gadgets are not at the start of a function; an attacker attempting to start ROP execution will fail CFI checks.

Attackers may be clever enough to point the new function pointer to a valid function. For instance, think of what would happen if a call to write was changed to call to system. The second condition attempts to mitigate these errors, by ensuring that runtime type signatures of destinations have to fall within a list of pre-selected destinations. Both of these condition violations are illustrated in option 2 and 3 of the the cfi_icall example.

Example Output

 $ ./no_cfi_icall 2
 Calling a function:
 CFI should protect transfer to here
 In float_arg: (0.000000)
 $ ./cfi_icall 2
 Calling a function:
 cfi_icall.c:83:12: runtime error: control flow integrity check for type 'int (int)' failed during indirect function call (cfi_icall+0x424610): note: (unknown) defined here
 $ ./no_cfi_icall 3
 Calling a function:
 CFI ensures control flow only transfers to potentially valid destinations
 In not_entry_point: (2)
 $ ./cfi_icall 3
 Calling a function:
 cfi_icall.c:83:12: runtime error: control flow integrity check for type 'int (int)' failed during indirect function call
 (cfi_icall+0x424730): note: (unknown) defined here

Limitations

  • Indirect call protection doesn’t work across shared library boundaries; indirect calls into shared libraries are not protected.
  • All translation units have to be compiled with -fsanitize=cfi-icall.
  • Only works on x86 and x86_64 architectures
  • Indirect call protection does not detect calls to the same function signature. Think of changing a call from delete_user(const char *username) to make_admin(const char *username). We show this limitation in cfi_icall option 1:
     $ ./cfi_icall 1
     Calling a function:
     CFI will not protect transfer to here
     In bad_int_arg: (1)

CFI Option: -fsanitize=cfi-vcall

To explain cfi-vcall, we need a quick review of virtual functions. Recall that virtual functions are functions that can be specialized in derived classes. Virtual functions are dynamically bound — that, is the actual function called is determined at runtime, depending on the object’s type. Due to dynamic binding, all virtual calls will be indirect calls. But these indirect calls may legitimately call functions with different signatures, since the class name is a part of the function signature. The cfi-vcall protection addresses this gap, by verifying that a virtual function call destination is always a function in the class hierarchy of the source object.

So when would a bug like this ever occur? The classic example is type confusion bugs in complex C++-based software like PDF readers, script interpreters, and web browsers. In type confusion, an object is re-interpreted as an object of a different type. The attacker can then use this mismatch to redirect virtual function calls to attacker controlled locations. A simulated example of such a scenario is in the cfi_vcall example.

Example Output

 $ ./no_cfi_vcall
 Derived::printMe
 CFI Prevents this control flow
 Evil::makeAdmin
 $ ./cfi_vcall
 Derived::printMe
 cfi_vcall.cpp:45:5: runtime error: control flow integrity check for type 'Derived' failed during virtual call (vtable address 0x00000042eb20)
 0x00000042eb20: note: vtable is of type 'Evil'
 00 00 00 00 c0 6f 42 00 00 00 00 00 d0 6f 42 00 00 00 00 00 00 70 42 00 00 00 00 00 00 00 00 00

Limitations

  • Only applies to C++ code that uses virtual functions.
  • All translation units have to be compiled with -fsanitize=cfi-vcall.
  • There can be a noticeable increase in the output binary size.
  • Need to specify the -fvisibility flag when building (for most purposes use -fvisibility=hidden)

CFI Option: -fsanitize=cfi-nvcall

The cfi-nvcall option is spiritually similar to the cfi-vcall option, except it works on non-virtual calls. The key difference is that non-virtual calls are direct calls known statically at compile time, so this protection is not strictly a control flow integrity issue. What the cfi-nvcall option does is identify non-virtual calls and ensure the calling object’s type at runtime can be derived from the type of the object known at compile time.

In simple terms, imagine a class hierarchy of Balls and a class hierarchy of Bricks. With cfi-nvcall, a compile-time call to Ball::Throw may execute Baseball::Throw, but will never execute Brick::Throw, even if an attacker substitutes a Brick object for a Ball object.

Situations fixed by cfi-nvcall may arise from memory corruption, type confusion, and deserialization. While these instances do not allow an attacker to redirect control flow on their own, these bugs may result in data-only attacks, or enable enough misdeeds to permit future bugs to work. This type of attack using data-only bugs is shown in the cfi-nvcall example: a low privilege user object is used in-place of a high privilege administrator object, leading to in-application privilege escalation.

Example Output

 $ ./no_cfi_nvcall
 Admin check:
 Account name is: admin
 Would do admin work in context of: admin
 User check:
 Account name is: user
 Admin Work not permitted for a user account!
 Account name is: user
 CFI Should prevent the actions below:
 Would do admin work in context of: user
 $ ./cfi_nvcall
 Admin check:
 Account name is: admin
 Would do admin work in context of: admin
 User check:
 Account name is: user
 Admin Work not permitted for a user account!
 Account name is: user
 CFI Should prevent the actions below:
 cfi_nvcall.cpp:54:5: runtime error: control flow integrity check for type 'AdminAccount' failed during non-virtual call (vtable address 0x00000042f300)
 0x00000042f300: note: vtable is of type 'UserAccount'
 00 00 00 00 80 77 42 00 00 00 00 00 a0 77 42 00 00 00 00 00 90 d4 f0 00 00 00 00 00 41 f3 42 00

Limitations

  • The cfi-nvcall checks only apply to polymorphic objects.
  • All translation units have to be compiled with -fsanitize=cfi-nvcall.
  • Need to specify the -fvisibility flag when building (for most purposes use -fvisibility=hidden)

CFI Option: -fsanitize=cfi-unrelated-cast

This is the first of three cast related options that are grouped with control flow integrity protections, but have nothing to do with control flow. These cast options verify “abstraction integrity”. Using these cast checks guards against insidious C++ bugs that may eventually lead to control flow hijacking.

The cfi-unrelated-cast option performs two runtime checks. First, it verifies that casts between object types must be in the same class hierarchy. Think of this as permitting casts from a variable of type Ball* to Baseball* but not from a variable of type Ball* to Brick*. The second runtime check verifies that casts from void* to an object type refer to objects of that type. Think of this as ensuring that a variable of type void* that points to a Ball object can only be converted back to Ball, and not to a Brick.

This property is most effectively verified at runtime, because the compiler is forced to treat all casts from void* to another type as legal. The cfi-unrelated-cast option ensures that such casts make sense in the runtime context of the program.

When would this violation ever happen? A common use of void* pointers is to pass references to objects between different parts of a program. The classic example is the arg argument to pthread_create. The target function would have no way to determine if the void* argument is of the correct type. Similar situations happen in complex application, especially in those that use IPC, queues, or other cross-component messaging. The cfi_unrelated_cast example shows a sample scenario that is protected by the cfi-unrelated-cast option.

Example Output

 $ ./no_cfi_unrelated_cast
 I am in fooStuff
 And I would execute: system("/bin/sh")
 $ ./cfi_unrelated_cast
 cfi_unrelated_cast.cpp:55:19: runtime error: control flow integrity check for type 'Foo' failed during cast to unrelated type (vtable address 0x00000042ec40)
 0x00000042ec40: note: vtable is of type 'Bar'
 00 00 00 00 70 71 42 00 00 00 00 00 a0 71 42 00 00 00 00 00 00 00 00 00 00 00 00 00 88 ec 42 00

Limitations

  • All translation units must to be compiled with cfi-unrelated-cast
    Need to specify the -fvisibility flag when building (for most purposes use -fvisibility=hidden)
  • Some functions (e.g. allocators) legitimately allocate memory of one type and then cast it to a different, unrelated object. These functions can be blacklisted from protection.

CFI Option: -fsanitize=cfi-derived-cast

This is the second of three cast related “abstraction integrity” options. The cfi-derived-cast option ensures that an object of a base class cannot be cast to a an object of a derived class unless the object is actually a derived object. As an example, cfi-derived-cast will prevent an variable of type Ball* being cast to Baseball*. This is a stronger guarantee than cfi-unrelated-cast, which verifies that the destination type is in the same class hierarchy as the source.

The potential causes of this issue are the same as most other issues on this list, namely memory corruption, de-serialization issues, and type confusion. In the cfi_derived_cast example, we show how a hypothetical base-to-derived casting bug can be used to disclose memory contents.

Example Output

 $ ./no_cfi_derived_cast
 I am: derived class, my member variable is: 12345678
 I am: base class, my member variable is: 7fffb6ca1ec8
 $ ./cfi_derived_cast
 I am: derived class, my member variable is: 12345678
 cfi_derived_cast.cpp:32:21: runtime error: control flow integrity check for type 'Derived' failed during base-to-derived cast (vtable address 0x00000042ef80)
 0x00000042ef80: note: vtable is of type 'Base'
 00 00 00 00 00 73 42 00 00 00 00 00 30 73 42 00 00 00 00 00 00 00 00 00 00 00 00 00 b0 ef 42 00

Limitations

  • All translation units must to be compiled with cfi-derived-cast
  • Need to specify the -fvisibility flag when building (for most purposes use -fvisibility=hidden)

CFI Option: -fsanitize=cfi-cast-strict

This is the third and most confusing of all the cast-related “abstraction integrity” options is a stricter version of cfi-derived-cast. The cfi-derived-cast option is not enabled when a derived class meets a very specific set of requirements:

  • It has only a single base class.
  • It does not introduce any virtual functions.
  • It does not override any virtual functions, other than an implicit virtual destructor.

If all of the above conditions are met, the base class and the derived class have an identical in-memory layout, and casting from the base class to the derived class should not introduce any security vulnerabilities. Performing such a cast is undefined and should never be done, but apparently enough projects utilize this undefined behavior to warrant a separate CFI option. The cfi_cast_strict example shows this behavior in action.

Example Output

 $ ./no_cfi_cast_strict
 Base: func
 $ ./cfi_cast_strict
 cfi_cast_strict.cpp:22:18: runtime error: control flow integrity check for type 'Derived' failed during base-to-derived cast (vtable address 0x00000042e790)
 0x00000042e790: note: vtable is of type 'Base'
 00 00 00 00 10 6d 42 00 00 00 00 00 20 6d 42 00 00 00 00 00 50 6d 42 00 00 00 00 00 90 c3 f0 00

Limitations

  • All translation units must to be compiled with cfi-cast-strict
    Need to specify the -fvisibility flag when building (for most purposes use -fvisibility=hidden)
  • May break projects that rely on this undefined behavior.

Conclusion

Control flow integrity is an important exploit mitigation, and should be used whenever possible. Modern compilers such as clang already have support for control flow integrity, and you can use it today. In this blog post we described how to use CFI with clang, example scenarios where CFI prevents exploitation and otherwise detects subtle bugs, and discussed some limitations of CFI protections.

Now that you’ve read about what clang’s CFI does, try out out the examples and see how CFI can benefit your software development process.

But clang isn’t the only compiler to implement CFI! Microsoft Research originated CFI, and CFI protections are available in Visual Studio 2015. In our next installment, we are going to discuss Visual Studio’s control flow integrity implementation: Control Flow Guard.

Automated Code Audit’s First Customer

Last month our Cyber Reasoning System (CRS) -developed for DARPA’s Cyber Grand Challenge– audited a much larger amount of code in less time, in greater detail, and at a lower cost than a human could.

Our CRS audited zlib for the Mozilla Secure Open Source (SOS) Fund. To our knowledge, this is the first instance of a paid, automated security audit by a CRS.

This represents a shift in the way that software security audits can be performed. It’s a tremendous step toward securing the Internet’s core infrastructure.

Choice where there once was none

Every year, public, private, and not-for-profit organizations spend tens of thousands of dollars on code audits.

Over a typical two-week engagement, security professional charge a tidy fee to perform an audit. Their effectiveness will be limited by the sheer volume of the code, the documentation and organization of the code, and the inherent limitations of humans — getting tired, dreaming of vacations, etc.

You can only analyze complex C code effectively for so many hours a day.

Furthermore, a human assessor might have great experience in some subset of possible flaws or the C language, but complete or nearly complete knowledge is hard to come by. We’re talking about expertise acquired over 15 years or more. That level of knowledge isn’t affordable for non-profits, nor is it common in 1-2 week assessments.

It makes more sense for a piece of software to conduct the audit instead. Software doesn’t get tired. It can audit old, obfuscated code as easily as modern, well-commented code. And software can automatically re-audit code after every update to make sure fixes are correct and don’t introduce new errors.

Mozilla’s SOS

In August, as a part of their Secure Open Source (SOS) Fund, Mozilla engaged us to perform a security assessment of zlib, an open source compression library. Zlib is used in virtually every software package that requires compression or decompression. More than one piece of software you are using to read this very text bundles zlib.

It has a relatively small code base, but in that small size hides a lot of complexity. First, the code that runs on the machine may not exactly match the source, due to compiler optimizations. Some bugs may only occur occasionally due to use of undefined behavior. Others may only be triggered under extremely exceptional conditions. In a well-inspected code base such as zlib, the only bugs left might be too subtle for a human to find during a typical engagement.

To identify any especially subtle bugs from a human-powered audit, Mozilla would have had to spend many thousands of dollars more. But they’re a non-profit, and they have an array of other projects to audit and improve.

Great coverage at a great price

Automation made the engagement affordable for Mozilla, and viable for us. They paid 20% of what we normally have to charge for this kind of work.

Our automated assessment paired the Trail of Bits CRS with TrustInSoft’s verification software to identify memory corruption vulnerabilities, create inputs that stress varying program paths, and to identify code that may lead to bugs in the future.

For non-profits working to secure core infrastructure of the Internet, this is a wonderful opportunity to get a detailed assessment with great coverage for a fraction of the traditional cost.

Contact us for more information.

Windows network security now easier with osquery

Today, Facebook announced the successful completion of our work: osquery for Windows.

“Today, we’re excited to announce the availability of an osquery developer kit for Windows so security teams can build customized solutions for their Windows networks… This port of osquery to Windows gives you the ability to unify endpoint defense and participate in an active open source community ready to share experiences and stories.”
Introducing osquery for Windows

osquery for Windows works with Doorman

The Windows version of osquery can talk to existing osquery fleet management tools, such as doorman. osquery for Windows has full support for TLS remote endpoints and certificate validation, just like the Unix version. In this screenshot, we are using an existing doorman instance to find all running processes on a Windows machine.

How we ported osquery to Windows

This port presented several technical challenges, which we always enjoy. Some of the problems were general POSIX to Windows porting issues, while others were unique to osquery.

Let’s start with the obvious POSIX to Windows differences:

  • Paths are different — no more ‘/’ as the path separator.
  • There are no signals.
  • Unix domain sockets are now named pipes.
  • There’s no glob() — we had to approximate the functionality.
  • Windows doesn’t fork() — the process model is fundamentally different. osquery forks worker processes. We worked around this by abstracting the worker process functionality.
  • There’s no more simple integer uid or gid values — instead you have SIDs, ACLs and DACLs.
  • And you can forget about the octal file permissions model — or use the approximation we created.

Then, the less-obvious problems: osquery is a daemon. In Windows, daemons are services, which expect a special interface and are launched by the service control manager. We added service functionality to osquery, and provided a script to register and remove the service. The parent-child process relationship is different — there is no getppid() equivalent, but osquery worker processes needed to know if their parent stopped working, or if a shutdown event was triggered in the parent process.

Deeper still, we found some unexpected challenges:

  • Some code that builds on clang/gcc just won’t build on Visual Studio.
  • Certain function attributes like __constructor__() have no supported Visual Studio equivalent. The functionality had to be re-created.
  • Certain standard library functions have implementation defined behavior — for instance, fopen will open a directory for reading on Unix-based systems, but will fail on Windows.

Along the way, we also had to ensure that every library that osquery depends on worked on Windows, too. This required fixing some bugs and making substitutions, like using linenoise-ng instead of GNU readline. There were still additional complexities: the build system had to accommodate a new OS, use Windows libraries, paths, compiler options, appropriate C runtime, etc.

This was just the effort to get the osquery core running. The osquery tables – the code that retrieves information from the local machine – present their own unique challenges. For instance, the processes table needed to be re-implemented on Windows. This table retrieves information about processes currently running on the system. It is a requirement for the osquery daemon to function. To implement this table, we created a generic abstraction to the Windows Management Instrumentation (WMI), and used existing WMI functionality to retrieve the list of running processes. We hope that this approach will support the creation of many more tables to tap into the vast wealth of system instrumentation data that WMI offers.

osqueryi works on Windows too!

osqueryi, the interactive osquery shell, also works on Windows. In this screenshot we are using osquery to query the list of running processes and the cryptographic hash of a file.

The port was worth the effort

Facebook sparked a lot of excitement when it released osquery in 2014.

The open source endpoint security tool allows an organization to treat its infrastructure as a database, turning operating system information into a format that can be queried using SQL-like statements. This functionality is invaluable for performing incident response, diagnosing systems operations problems, ensuring baseline security settings, and more.

It fundamentally changed security for environments running Linux distributions such as Ubuntu or CentOS, or for deployments of Mac OS X machines.

But if you were running a Windows environment, you were out of luck.

To gather similar information, you’d have to cobble together a manual solution, or pay for a commercial product, which would be expensive, force vendor reliance, and lock your organization into using a proprietary -and potentially buggy– agent. Since most of these services are cloud-based, you’d also risk exposing potentially sensitive data.

Today, that’s no longer the case.

Disruption for the endpoint security market?

Because osquery runs on all three major desktop/server platforms, the open-source community can supplant proprietary, closed, commercial security and monitoring systems with free, community-supported alternatives. (Just one more example of how Facebook’s security team accounts for broader business challenges.)

We’re excited about the potential:

  • Since osquery is cross platform, network administrators will be able to monitor complex operating system states across their entire infrastructure. For those already running an osquery deployment, they’ll be able to seamlessly integrate their Windows machines, allowing for far greater efficiency in their work.
  • We envision startups launching without the need to develop agents that collect this rich set of data first, as Kolide.co has already done. We’re excited to see what’s built from here.
  • More vulnerable organizations -groups that can’t afford the ‘Apple premium,’ or don’t use Linux- will be able to secure their systems to a degree that wasn’t possible before.

Get started with osquery

osquery for Windows is only distributed via source code. You must build your own osquery. To do that, please see the official Building osquery for Windows guide.

Currently osquery will only build on Windows 10, the sole prerequisite. All other dependencies and build tools will be automatically installed as a part of the provisioning and building process.

There is an open issue to create an osquery chocolatey package, to allow for simple package management-style installation of osquery.

If you want our help modifying osquery’s code base for your organization, contact us.

Learn more about porting applications to Windows

We will be writing about the techniques we applied to port osquery to Windows soon. Follow us on Twitter and subscribe to our blog with your favorite RSS reader for more updates.

Plug into New York’s Infosec Community

Between the city’s size and the wide spectrum of the security industry, it’s easy to feel lost. Where are ‘your people?’ How can you find talks that interest you? You want to spend your time meeting and networking, not researching your options.

So, we put together a directory of all of the infosec gatherings, companies, and university programs in NYC that we know of at nyc-infosec.com.

Why’d we create this site?

We’re better than this. Today, when investors think ‘east coast infosec,’ they think ‘Boston.’ We believe that NYC’s infosec community deserves more recognition on the national and international stages. That will come as we engage with one another to generate more interesting work.

We need breaks from routine. It’s easy to stay uptown or downtown, or only go to forensics or software security meetings. If you don’t know what’s out there, you don’t know what you’re missing out on.

We all benefit from new ideas. That’s why we started Empire Hacking. We want to help more people learn about topics that excite and inspire action.

We want to coax academics off campus. A lot of exciting research takes place in this city. We want researchers to find the groups that will be the most interested in their work. Conversely, industry professionals have much to learn from emerging academic innovations and we hope to bring them together.

Check out a new group this month

Find infosec events, companies, and universities in the city on nyc-infosec.com. If you’re not sure where to start, we recommend:

Empire Hacking (new website!)
Information security professionals gather at this semi-monthly meetup to discuss pragmatic security research and new discoveries in attack and defense over drinks and light food.

New York Enterprise Information Security Group
Don’t be fooled by the word ‘enterprise.’ This is a great place for innovative start-ups to get their ideas in front of prospective early clients. David Raviv has created a great space to connect directly with technical people working at smart, young companies.

SummerCon
Ah, SummerCon. High-quality, entertaining talks. Inexpensive tickets. Bountiful booze. Somehow, they manage to pull together an excellent line-up of speakers each year. This attracts a great crowd, ranging from “hackers to feds to convicted felons and concerned parents.”

O’Reilly Security
Until now, New York didn’t really have one technical, pragmatic, technology-focused security conference. This newcomer has the potential to fill that gap. It looks like O’Reilly has put a lot of resources behind it. If it turns out well for them (fingers crossed), we hope that they’ll plan more events just like it.

What’d we miss?

If you know of an event that should be on the list, please let us know on the Empire Hacking Slack.

Work For Us: Fall and Winter Internship Opportunities

If you’re studying in a degree program, and you thrive at the intersection of software development and cyber security, you should apply to our fall or winter internship programs. It’s a great way to add paid experience -and a publication- to your resume, and get a taste of what it’s like to work in a commercial infosec setting.

You’d work remotely through the fall semester or over winter break on a meaningful problem to produce or improve tools that we -Trail of Bits and the InfoSec community- need to make security better. Your work won’t culminate in a flash-in-the-pan report for an isolated problem. It will contribute to a measurable impact on modern security problems.

Two Ex-Interns Share Their Experiences

  • Sophia D’Antoine -now one of our security engineers- spent her internship working on part of what would later become MAST.
  • Evan Jensen accepted a job at MIT Lincoln Labs as a security researcher before interning for us, and still credits the experience as formative.

Why did you take this internship over others?

SD: I wasn’t determined to take a winter internship until I heard about the type of work that I could do at Trail of Bits. I’d get my own project, not just a slice of someone else’s. The chance to take responsibility for something that could have a measurable impact was very appealing. It didn’t hurt that ToB’s reputation would add some weight to my resumé.

EJ: I saw this as a chance to extend the class I took from Dan: “Penetration Testing and Vulnerability Analysis.” Coincidentally, I lined up a summer internship in the office while Dan was there. As soon as he suggested I tell my interviewer what I was working on in class, the interview ended with an offer for the position.

What did you work on during your internship?

SD: MAST’s obfuscating passes that transform the code. This wasn’t anywhere near the focus of my degree; I was studying electrical engineering. But I was playing CTFs for fun, and [ToB] liked that I was willing to teach myself. I didn’t want a project that could just be researched with Google.

EJ: I actually did two winternships at ToB. During my first, I analyzed malware that the “APT1” group was used in their intrusion campaigns. During my second, I worked on generating training material for a CTF-related DARPA grant that eventually became the material in the CTF Field Guide.

What was your experience like?

SD: It was great. I spent my entire break working on my project, and loved it. I like to have an end-goal and parameters, and the independence to research and execute. The only documentation I could find for my project was the LLVM compiler’s source code. There was no tutorial online to build an obfuscator like MAST. Beyond the technical stuff, I learned about myself, the conditions where I work best, and the types of projects that interest me most.

EJ: Working at ToB was definitely enlightening. It was the first time I actually got to use a licensed copy of IDA Pro. It was great working with other established hackers. They answered every question I could think of. I learned a lot about how to describe the challenges reverse engineers face and I picked up a few analysis tricks, too.

Why would you recommend this internship to students?

SD: So many reasons. It wasn’t a lot of little tasks. You own one big project. You start it. You finish it. You create something valuable. It’s cool paid work. It’s intellectually rewarding. You learn a lot. ToB is one of the best companies to have on your resume; it’s great networking.

EJ: People will never stop asking you about Trail of Bits.

Here’s What You Might Work On

We always have a variety of projects going on, and tools that could be honed. Your project will be an offshoot of our work, such as:

  • Our Cyber Reasoning System (CRS) -which we developed for the Cyber Grand Challenge and currently used for paid engagements- has potential to do a lot more. This is a really complicated distributed system with a multitude of open source components at play, including symbolic executors, x86 lifting, dynamic binary translation, and more.
  • PointsTo, an LLVM-based static analysis that discovers object life-cycle (e.g. use-after-free) vulnerabilities in large software projects such as web browsers and network servers. Learn more.
  • McSema, an open-source framework that performs static translation of x86 and x86-64 binaries to the LLVM intermediate representation. McSema enables existing LLVM-based program analysis tools to operate on binary code. See the code.
  • MAST, a collection of several whole-program transformations using the LLVM compiler infrastructure as a platform for iOS software obfuscation and protection.

In general, you’ll make a meaningful contribution to the development of reverse engineering and software analysis tools. Not many places promise that kind of work to interns, nor pay for it.

Interested?

You must have experience with software development. We want you to help us refine tools that find and fix problems for good, not just for a one-time report. Show us code that you’ve written, examples on GitHub, or CTF write-ups you’ve published.

You must be motivated. You’ll start with a clear project and an identified goal. How you get there is up to you. Apart from in-person kick-off and debrief meetings in our Manhattan offices, you will work remotely.

But you won’t be alone. We take advantage of all the latest technology to get work done, including platforms like Slack, Github, Trello and Hangouts. You’ll find considerable expertise available to you. We’ll do our best to organize everything we can up front so that you’re positioned for success. We will make good use of your time and effort.

If you’re headed into the public sector -maybe you took a Scholarship For Service- you may be wondering what it’s like to work in a commercial firm. If you want some industry experience before getting absorbed into a government agency, intern with us.

A fuzzer and a symbolic executor walk into a cloud

Finding bugs in programs is hard. Automating the process is even harder. We tackled the harder problem and produced two production-quality bug-finding systems: GRR, a high-throughput fuzzer, and PySymEmu (PSE), a binary symbolic executor with support for concrete inputs.

From afar, fuzzing is a dumb, brute-force method that works surprisingly well, and symbolic execution is a sophisticated approach, involving theorem provers that decide whether or not a program is “correct.” Through this lens, GRR is the brawn while PSE is the brains. There isn’t a dichotomy though — these tools are complementary, and we use PSE to seed GRR and vice versa.

Let’s dive in and see the challenges we faced when designing and building GRR and PSE.

GRR, the fastest fuzzer around

GRR is a high speed, full-system emulator that we use to fuzz program binaries. A fuzzing “campaign” involves executing a program thousands or millions of times, each time with a different input. The hope is that spamming a program with an overwhelming number of inputs will result in triggering a bug that crashes the program.

Note: GRR is pronounced with two fists held in the air

During DARPA’s Cyber Grand Challenge, we went web-scale and performed tens of billions of input mutations and program executions — in only 24 hours! Below are the challenges we faced when making this fuzzer, and how we solved those problems.

  1. Throughput. Typically, program fuzzing is split into discrete steps. A sample input is given to an input “mutator” which produces input variants. In turn, each variant is separately tested against the program in the hopes that the program will crash or execute new code. GRR internalizes these steps, and while doing so, completely eliminates disk I/O and program analysis ramp-up times, which represent a significant portion of where time is spent during a fuzzing campaign with other common tools.
  2. Transparency. Transparency requires that the program being fuzzed cannot observe or interfere with GRR. GRR achieves transparency via perfect isolation. GRR can “host” multiple 32-bit x86 processes in memory within its 64-bit address space. The instructions of each hosted process are dynamically rewritten as they execute, guaranteeing safety while maintaining operational and behavioral transparency.
  3. Reproducibility. GRR emulates both the CPU architecture and the operating system, thereby eliminating sources of non-determinism. GRR records program executions, enabling any execution to be faithfully replayed. GRR’s strong determinism and isolation guarantees let us combine the strengths of GRR with the sophistication of PSE. GRR can snapshot a running program, enabling PSE to jump-start symbolic execution from deep within a given program execution.

PySymEmu, the PhD of binary symbolic execution

Symbolic execution as a subject is hard to penetrate. Symbolic executors “reason about” every path through a program, there’s a theorem prover in there somewhere, and something something… bugs fall out the other end.

At a high level, PySymEmu (PSE) is a special kind of CPU emulator: it has a software implementation for almost every hardware instruction. When PSE symbolically executes a binary, what it really does is perform all the ins-and-outs that the hardware would do if the CPU itself was executing the code.

frankenpse

PSE explores the relationship between the life and death of programs in an unorthodox scientific experiment

CPU instructions operate on registers and memory. Registers are names for super-fast but small data storage units. Typically, registers hold four to eight bytes of data. Memory on the other hand can be huge; for a 32-bit program, up to 4 GiB of memory can be addressed. PSE’s instruction simulators operate on registers and memory too, but they can do more than just store “raw” bytes — they can store expressions.

A program that consumes some input will generally do the same thing every time it executes. This happens because that “concrete” input will trigger the same conditions in the code, and cause the same loops to merry-go-round. PSE operates on symbolic input bytes: free variables that can initially take on any value. A fully symbolic input can be any input and therefore represents all inputs. As PSE emulates the CPU, if-then-else conditions impose constraints on the originally unconstrained input symbols. An if-then-else condition that asks “is input byte B less than 10” will constrain the symbol for B to be in the range [0, 10) along the true path, and to be in the range [10, 256) along the false path.

If-then-elses are like forks in the road when executing a program. At each such fork, PSE will ask its theorem prover: “if I follow the path down one of the prongs of the fork, then are there still inputs that satisfy the additional constraints imposed by that path?” PSE will follow each yay path separately, and ignore the nays.

So, what challenges did we face when creating and extending PSE?

  1. Comprehensiveness. Arbitrary program binaries can exercise any one of thousands of the instructions available to x86 CPUs. PSE implements simulation functions for hundreds of x86 instructions. PSE falls back onto a custom, single-instruction “micro-executor” in those cases where an instruction emulation is not or cannot be provided. In practice, this setup enables PSE to comprehensively emulate the entire CPU.
  2. Scale. Symbolic executors try to follow all feasible paths through a program by forking at every if-then-else condition, and constraining the symbols one way or another along each path. In practice, there are an exponential number of possible paths through a program. PSE handles the scalability problem by selecting the best path to execute for the given execution goal, and by distributing the program state space exploration process across multiple machines.
  3. Memory. Symbolic execution produces expressions representing simple operations like adding two symbolic numbers together, or constraining the possible values of a symbol down one path of an if-then-else code block. PSE gracefully handles the case where addresses pointing into memory are symbolic. Memory accessed via a symbolic address can potentially point anywhere — even point to “good” and “bad” (i.e. unmapped) memory.
  4. Extensibility. PSE is written using the Python programming language, which makes it easy to hack on. However, modifying a symbolic executor can be challenging — it can be hard to know where to make a change, and how to get the right visibility into the data that will make the change a success. PSE includes smart extension points that we’ve successfully used for supporting concolic execution and exploit generation.

Measuring excellence

So how do GRR and PSE compare to the best publicly available tools?

GRR

GRR is both a dynamic binary translator and fuzzer, and so it’s apt to compare it to AFLPIN, a hybrid of the AFL fuzzer and Intel’s PIN dynamic binary translator. During the Cyber Grand Challenge, DARPA helpfully provided a tutorial on how to use PIN with DECREE binaries. At the time, we benchmarked PIN and found that, before we even started optimizing GRR, it was already twice as fast as PIN!

The more important comparison metric is in terms of bug-finding. AFL’s mutation engine is smart and effective, especially in terms of how it chooses the next input to mutate. GRR internalizes Radamsa, another too-smart mutation engine, as one of its many input mutators. Eventually we may also integrate AFL’s mutators. During the qualifying event, GRR went face-to-face with AFL, which was integrated into the Driller bug-finding system. Our combination of GRR+PSE found more bugs. Beyond this one data point, a head-to-head comparison would be challenging and time-consuming.

PySymEmu

PSE can be most readily compared with KLEE, a symbolic executor of LLVM bitcode, or the angr binary analysis platform. LLVM bitcode is a far cry from x86 instructions, so it’s an apples-to-oranges comparison. Luckily we have McSema, our open-source and actively maintained x86-to-LLVM bitcode translator. Our experiences with KLEE have been mostly negative; it’s hard to use, hard to hack on, and it only works well on bitcode produced by the Clang compiler.

Angr uses a customized version of the Valgrind VEX intermediate representation. Using VEX enables angr to work on many different platforms and architectures. Many of the angr examples involve reverse engineering CTF challenges instead of exploitation challenges. These RE problems often require manual intervention or state knowledge to proceed. PSE is designed to try to crash the program at every possible emulated instruction. For example PSE will use its knowledge of symbolic memory to access any possible invalid array-like memory accesses instead of just trying to solve for reaching unconstrained paths. During the qualifying event, angr went face-to-face with GRR+PSE and we found more bugs. Since then, we have improved PSE to support user interaction, concrete and concolic execution, and taint tracking.

I’ll be back!

Automating the discovery of bugs in real programs is hard. We tackled this challenge by developing two production-quality bug-finding tools: GRR and PySymEmu.

GRR and PySymEmu have been a topic of discussion in recent presentations about our CRS, and we suspect that these tools may be seen again in the near future.

Your tool works better than mine? Prove it.

No doubt, DARPA’s Cyber Grand Challenge (CGC) will go down in history for advancing the state of the art in a variety of fields: symbolic execution, binary translation, and dynamic instrumentation, to name a few. But there is one contribution that we believe has been overlooked so far, and that may prove to be the most useful of them all: the dataset of challenge binaries.

Until now, if you wanted to ‘play along at home,’ you would have had to install DECREE, a custom Linux-derived operating system that has no signals, no shared memory, no threads, and only seven system calls. Sound like a hassle? We thought so.

One metric for all tools

Competitors in the Cyber Grand Challenge identify vulnerabilities in challenge binaries (CBs) written for DECREE on the 32-bit Intel x86 architecture. Since 2014, DARPA has released the source code for over 100 of these vulnerable programs. These programs were specifically designed with vulnerabilities that represent a wide variety of software flaws. They are more than simple test cases, they approximate real software with enough complexity to stress both manual and automated vulnerability discovery.

If the CBs become widely adopted as benchmarks, they could change the way we solve security problems. This mirrors the rapid evolution of the SAT and ML communities once standardized benchmarks and regular competitions were established. The challenge binaries, valid test inputs, and sample vulnerabilities create an industry standard benchmark suite for evaluating:

  • Bug-finding tools
  • Program-analysis tools (e.g. automated test coverage generation, value range analysis)
  • Patching strategies
  • Exploit mitigations

The CBs are a more robust set of tests than previous approaches to measuring the quality of software analysis tools (e.g. SAMATE tests, NSA Juliet tests, or the STONESOUP test cases). First, the CBs are complex programs like games, content management systems, image processors, and so on, instead of just snippets of vulnerable code. After all, to be effective, analysis tools must process real software with a fairly low bug density, not direct snippets of vulnerable code. Second, unlike open source projects with added bugs, we have very high confidence all the bugs in the CBs have been found, so analysis tools can be compared to an objective standard. Finally, the CBs also come with extensive functionality tests, triggers for introduced bugs, patches, and performance monitoring tools, enabling benchmarking of patching tools and bug mitigation strategies.

Creating an industry standard benchmarking set will solve several problems that hamper development of future program analysis tools:

First, the absence of standardized benchmarks prevents an objective determination of which tools are “best.” Real applications don’t come with triggers for complex bugs, nor an exhaustive list of those bugs. The CBs provide metrics for comparison, such as:

  • Number of bugs found
  • Number of bugs found per unit of time or memory
  • Categories of bugs found and missed
  • Variances in performance from configuration options

Next, which mitigations are most effective? CBs come with inputs that stress original program functionality, inputs that check for the presence of known bugs, and performance measuring tools. These allow us to explore questions like:

  • What is the potential effectiveness and performance impact of various bug mitigation strategies (e.g. Control Flow Integrity, Code Pointer Integrity, Stack Cookies, etc)?
  • How much slower does the resulting program run?
  • How good is a mitigation compared to a real patch?

Play Along At Home

The teams competing in the CGC have had years to hone and adapt their bug-finding tools to the peculiarities of DECREE. But the real world doesn’t run on DECREE; it runs on Windows, Mac OS X, and Linux. We believe that research should be guided by real-world challenges and parameters. So, we decided to port* the challenge binaries to run in those environments.

It took us several attempts to find the best porting approach to minimize the amount of code changes, while preserving as much original code as possible between platforms. The eventual solution was fairly straightforward: build each compilation unit without standard include files (as all CBs are statically linked), implement CGC system calls using their native equivalents, and perform various minor fixes to make the code compatible with more compilers and standard libraries.

We’re excited about the potential of multi-platform CBs on several fronts:

  • Since there’s no need to set up a virtual machine just for DECREE, you can run the CBs on the machine you already have.
  • With that hurdle out of the way, we all now have an industry benchmark to evaluate program analysis tools. We can make comparisons such as:
    • How good are the CGC tools vs. existing program analysis and bug finding tools
    • When a new tool is released, how does it stack up against the current best?
    • Do static analysis tools that work with source code find more bugs than dynamic analysis tools that work with binaries?
    • Are tools written for Mac OS X better than tools written for Linux, and are they better than tools written for Windows?
  • When researchers open source their code, we can evaluate how well their findings work for a particular OS or compiler.

Before you watch the competitors’ CRSs duke it out, explore the challenges that the robots will attempt to solve in an environment you’re familiar with.

Get the CGC’s Challenge Binaries in the most common operating systems.

* Big thanks to our interns, Kareem El-Faramawi and Loren Maggiore, for doing the porting, and to Artem, Peter, and Ryan for their support.