Breaking Down Binary Ninja’s Low Level IL

Hi, I’m Josh. I recently joined the team at Trail of Bits, and I’ve been an evangelist and plugin writer for the Binary Ninja reversing platform for a while now. I’ve developed plugins that make reversing easier and extended Binary Ninja’s architecture support to assist in playing the microcorruption CTF. One of my favorite features of Binary Ninja is the Low Level IL (LLIL), which enables development of powerful program analysis tools. At Trail of Bits, we have used the LLIL to automate processing of a large number of CTF binaries, as well as automate identifying memory corruptions.

I often get asked how the LLIL works. In this blog post, I answer common questions about the basics of LLIL and demonstrate how to use the Python API to write a simple function that operates on the LLIL. In a future post, I will demonstrate how to use the API to write plugins that use both the LLIL and Binary Ninja’s own dataflow analysis.

What is the Low Level IL?

Compilers use an intermediate representation (IR) to analyze and optimize the code being compiled. This IR is generated by translating the source language to a single standard language understood by the components of the toolchain. The toolchain components can then perform generic tasks on a variety of architectures without having to implement those tasks individually.

Similarly, Binary Ninja not only disassembles binary code, but also leverages the power of its own IR, called Low Level IL, in order to perform dataflow analysis. The dataflow analysis makes it possible for users to query register values and stack contents at arbitrary instructions. This analysis is architecture-agnostic because it is performed on the LLIL, not the assembly. In fact, I automatically got this dataflow analysis for free when I wrote the lifter for the MSP430 architecture.

Let’s jump right in and see how the Low Level IL works.

Viewing the Low Level IL

Within the UI, the Low Level IL is viewable only in Graph View. It can be accessed either through the “Options” menu in the bottom right corner, or via the i hotkey. The difference between IL View and Graph View is noticeable; the IL View looks much closer to a high level language with its use of infix notation. This, combined with the fact that the IL is a standardized set of instructions that all architectures are translated to, makes working with an unfamiliar language easy.

arm-x64-il-comparison.png

Graph View versus IL View; on the left, Graph View of ARM (top) and x86-64 (bottom) assembly of the same function. On the right, the IL View of their respective Graph Views.

If you aren’t familiar with this particular architecture, then you might not easily understand the semantics of the assembly code. However, the meaning of the LLIL is clear. You might also notice that there are often more LLIL instructions than there are assembly instructions. The translation of assembly to LLIL is actually a one-to-many rather than one-to-one translation because the LLIL is a simplified representation of an instruction set. For example, the x86 repne cmpsb instruction will even generate branches and loops in the LLIL:

repne cmpsb in LLIL

Low Level IL representation of the x86 instruction repne cmpsb

How is analysis performed on the LLIL? To figure that out, we’ll first dive into how the LLIL is structured.

Low Level IL Structure

According to the API documentation, LLIL instructions have a tree-based structure. The root of an LLIL instruction tree is an expression consisting of an operation and zero to four operands as child nodes. The child nodes may be integers, strings, arrays of integers, or another expression. As each child expression can have its own child expressions, an instruction tree of arbitrary order and complexity can be built. Below are some example expressions and their operands:

Operation Operand 1 Operand 2 Operand 3 Operand 4
LLIL_NOP
LLIL_SET_REG dest: string or integer src: expression
LLIL_LOAD src: expression
LLIL_CONST value: integer
LLIL_IF condition: expression true: integer false: integer
LLIL_JUMP_TO dest: expression targets: array of integers

Let’s look at a couple examples of lifted x86, to get a better understanding of how these trees are generated when lifting an instruction: first, a simple mov instruction, and then a more complex lea instruction.

Example: mov eax, 2

IL tree for mov eax, 2

LLIL tree for mov eax, 2

This instruction has a single operation, mov, which is translated to the LLIL expression LLIL_SET_REG. The LLIL_SET_REG instruction has two child nodes: dest and src. dest is a reg node, which is just a string representing the register that will be set. src is another expression representing how the dest register will be set.

In our x86 instruction, the destination register is eax, so the dest child is just eax; easy enough. What is the source expression? Well, 2 is a constant value, so it will be translated into an LLIL_CONST expression. An LLIL_CONST expression has a single child node, value, which is an integer. No other nodes in the tree have children, so the instruction is complete. Putting it all together, we get the tree above.

Example: lea eax, [edx+ecx*4]

IL tree for lea eax, [edx+ecx*4]

LLIL tree for lea eax, [edx+ecx*4]

The end result of this instruction is also to set the value of a register. The root of this tree will also be an LLIL_SET_REG, and its dest will be eax. The src expression is a mathematical expression consisting of an addition and multiplication…or is it?

If we add parenthesis to explicitly define the order of operations, we get (edx + (ecx * 4)); thus, the root of the src sub-tree will be an LLIL_ADD expression, which has two child nodes: left and right, both of which are expressions. The left side of the addition is a register, so the left expression in our tree will be an LLIL_REG expression. This expression only has a single child. The right side of the addition is our multiplication, but the multiplier in an lea instruction has to be a power of 2, which can be translated to a left-shift operation, and that’s exactly what the lifter does: ecx * 4 becomes ecx << 2. So, the right expression in the tree is actually an LLIL_LSL expression (Logical Shift Left).

The LLIL_LSL expression also has left and right child expression nodes. For our left-shift operation, the left side is the ecx register, and the right side is the constant 2. We already know that both LLIL_REG and LLIL_CONST terminate with a string and integer, respectively. With the tree complete, we arrive at the tree presented above.

Now that we have an understanding of the structure of the LLIL, we are ready to dive into using the Python API. After reviewing features of the API, I will demonstrate a simple Python function to traverse an LLIL instruction and examine its tree structure.

Using the Python API

There are a few important classes related to the LLIL in the Python API: LowLevelILFunction, LowLevelILBasicBlock, and LowLevelILInstruction. There are a few others, like LowLevelILExpr and LowLevelILLabel, but those are more for writing a lifter rather than consuming IL.

Accessing Instructions

To begin playing with the IL, the first step is to get a reference to a function’s LLIL. This is accomplished through the low_level_il property of a Function object. If you’re in the GUI, you can get the LowLevelILFunction object for the currently displayed function using current_function.low_level_il.

The LowLevelILFunction class has a lot of methods, but they’re basically all for implementing a lifter, not performing analysis. In fact, this class is really only useful for retrieving or enumerating basic blocks and instructions. The __iter__ method is implemented and iterates over the basic blocks of the LLIL function, and the __getitem__ method is implemented and retrieves an LLIL instruction based on its index. The LowLevelILBasicBlock class also implements __iter__, which iterates over the individual LowLevelILInstruction objects belonging to that basic block. Therefore, it is possible to iterate over the instructions of a LowLevelILFunction two different ways, depending on your needs:

il = current_function.low_level_il

# iterate over instructions using basic blocks
for bb in il.basic_blocks:
  for instruction in bb:
    print instruction

# iterate over instructions directly
for index in range(len(il)):
  instruction = il[index]
  print instruction

Directly accessing an instruction is currently cumbersome. In Python, this is accomplished with function.low_level_il[function.get_low_level_il_at(function.arch, address)]. That’s a pretty verbose line of code. This is because the Function.get_low_level_il_at() method doesn’t actually return a LowLevelILInstruction object; it returns the integer index of the LLIL instruction. Hopefully it will be more concise in an upcoming refactor of the API.

Parsing Instructions

The real meat of the LLIL is exposed in LowLevelILInstruction objects. The common members shared by all instructions allow you to determine:

  • The containing function of the LLIL instruction
  • The address of the assembly instruction lifted to LLIL
  • The operation of the LLIL instruction
  • The size of the operation (i.e. is this instruction manipulating a byte/short/long/long long)

As we saw in the table above, the operands vary by instruction. These can be accessed sequentially, via the operands member, or directly accessed by operand name (e.g. dest, left, etc). When accessing operands of an instruction that has a destination operand, the dest operand will always be the first element of the list.

Example: A Simple Recursive Traversal Function

A very simple example of consuming information from the LLIL is a recursive traversal of a LowLevelILInstruction. In the example below, the operation of the expression of an LLIL instruction is output to the console, as well as its operands. If an operand is also an expression, then the function traverses that expression as well, outputting its operation and operands in turn.

def traverse_IL(il, indent):
  if isinstance(il, LowLevelILInstruction):
    print '\t'*indent + il.operation.name

    for o in il.operands:
      traverse_IL(o, indent+1)

  else:
    print '\t'*indent + str(il)

After copy-pasting this into the Binary Ninja console, select any instruction you wish to output the tree for. You can then use bv, current_function, and here to access the current BinaryView, the currently displayed function’s Function object, and the currently selected address, respectively. In the following example, I selected the ARM instruction ldr r3, [r11, #-0x8]:

traverse-il

Lifted IL vs Low Level IL

While reviewing the API, you might notice that there are function calls such as Function.get_lifted_il_at versus Function.get_low_level_il_at. This might make you unsure of which you should be processing for your analysis. The answer is fairly straight-forward: with almost no exceptions, you will always want to work with Low Level IL.

Lifted IL is what the lifter first generates when parsing the executable code; an optimized version is what is exposed as the Low Level IL to the user in the UI. To demonstrate this, try creating a new binary file, and fill it with a bunch of nop instructions, followed by a ret. After disassembling the function, and switching to IL view (by pressing i in Graph View), you will see that there is only a single IL instruction present: jump(pop). This is due to the nop instructions being optimized away.

It is possible to view the Lifted IL in the UI: check the box in Preferences for “Enable plugin development debugging mode.” Once checked, the “Options” tab at the bottom of the window will now present two options for viewing the IL. With the previous example, switching to Lifted IL view will now display a long list of nop instructions, in addition to the jump(pop).

In general, Lifted IL is not something you will need unless you’re developing an Architecture plugin.

Start Using the LLIL

In this blog post, I described the fundamentals of Binary Ninja’s Low Level IL, and how the Python API can be used to interact with it. Around the office, Ryan has used the LLIL and its data flow analysis to solve 2000 CTF challenge binaries by identifying a buffer to overflow and a canary value that had to remain intact in each. Sophia will present “Next-level Static Analysis for Vulnerability Research” using the Binary Ninja LLIL at INFILTRATE 2017, which everyone should definitely attend. I hope this guide makes it easier to write your own plugins with Binary Ninja!

In Part 2 of this blog post, I will demonstrate the power of the Low Level IL and its dataflow analysis with another simple example. We will develop a simple, platform-agnostic plugin to navigate to virtual functions by parsing the LLIL for an object’s virtual method table and calculating the offset of the called function pointer. This makes reversing the behavior of C++ binaries easier because instructions such as call [eax+0x10] can be resolved to a known function like object->isValid(). In the meantime, get yourself a copy of Binary Ninja and start using the LLIL.

Update (11 February 2017): A new version of Binary Ninja was released on 10 February 2017; this blog post has been updated to reflect changes to the API.

2016 Year in Review

John Oliver may have written off 2016, but we’re darn proud of all that we accomplished and contributed this year.

We released a slew of the security tools that help us -and you- work smarter, and promoted a few more that deserved recognition. We helped the New York City InfoSec community build a foundation for future growth. Perhaps most importantly, we weighed in when we believed the record needed to be set straight.

Here are 14 reasons we’re counting 2016 as a success and feeling good about 2017.

1. Brought automatic bug discovery to market

2016 will go down in history as the year that software began finding and patching vulnerabilities automatically. Our Cyber Reasoning System (CRS), built to compete in DARPA’s Cyber Grand Challenge, made history on its own when it audited zlib. As far as we know, our CRS was the first ever to audit a much larger amount of code in less time, in greater detail, and at a lower cost than a human could. Read the audit report and Mozilla’s announcement.

In January, we used our CRS to help settle a $1,000 bet about libotr, a popular library used in secure messaging software. Discover our insights about the challenges encrypted communications systems present for automated testing, how we solved them, and our testing methodology. And find out who won the bet.

Our CRS is available for commercial engagements, but we’ve open sourced one of its companion tools: GRR, a high-throughput fuzzer built specifically for the CRS. Read about the challenges we overcame while designing and building GRR.

Released And Reviewed Foundational Tools

2. Created a standardized benchmark suite for security tools

DARPA released the source code for over 100 challenge programs used in the Cyber Grand Challenge (CGC). The CGC challenge programs are realistic programs that contain documented vulnerabilities. Unfortunately, the programs don’t run on Windows, Linux, or macOS. We fixed this. We ported and packaged the challenge programs into a cross-platform benchmark suite. Now researchers in academia and industry can reliably evaluate and compare their program analysis tools and bug mitigations.

3. Ported Facebook’s osquery to Windows

Facebook’s osquery allows you to easily ask security and performance questions about your endpoint infrastructure. Until earlier this year, it was only available for macOS and Linux. We ported osquery to Windows, overcoming numerous technical challenges: completely different process and security models, incompatible APIs, and divergent compiler behavior. The port was worth the effort. Before, similar functionality would require cobbling together manual solutions or expensive and proprietary commercial products. Now, there’s a better option.

4. Released Algo, a secure, user-friendly, and free VPN

We built Algo, a self-hosted VPN server designed for ease of deployment and security. Algo servers are not shared with other users, only use modern protocols and ciphers, and only include only the minimal software you need. Since Algo deploys to nearly all popular cloud computing platforms, it provides an effectively unlimited set of egress locations. For anyone who is privacy conscious, travels for work frequently, or can’t afford a dedicated IT department, this one’s for you.

5. Showed how to automatically generate exploits with Binary Ninja

Ryan showed how to use Vector35’s Binary Ninja, a promising new interactive static analysis and reverse engineering platform, to generate exploits for 2,000 unique binaries in this year’s DEFCON CTF qualifying round. Its feature-rich and accessible APIs beat all competing products. Had he used IDA or radare2, Ryan would have had to divert his time and attention to implementing a stack model or using fragile heuristics instead of focusing on the true goal of the CTF: exploiting binaries. Read his full review.

6. Released Protofuzz, a protobuf fuzzer

Protofuzz helps find bugs in applications utilizing Google Protocol Buffers (protobuf). Applications use protobufs to specify the structure of messages, which are passed between processes or across networks. Protobufs automate the error-prone task of generating message serialization and deserialization code. Typical fuzzers that rely on random permutation cannot explore past the correct, auto-generated code. Protofuzz creates valid protobuf-encoded structures composed of malicious values — ensuring that permuted data passes through the hard auto-generated shell into the soft underbelly of the target program.

7. Made iOS’s Secure Enclave usable with Tidas

Apple’s Secure Enclave Crypto API promised to liberate us from the annoyance of passwords and using your Google account to log into Pokemon Go. But it was unusable in its original state. So, we filled the gap with Tidas, a simple SDK drop-in for iOS apps that provides cryptographically proven — and passwordless — authentication. The introduction of the T1 chip and TouchID on new MacBook Pros opens exciting new potential for Tidas on macOS. If you dream of a passwordless future, check out our DIY toolkit in Swift or Objective-C.

Shared With The Community

8. Explained Control Flow Integrity, and how to use it to mitigate exploits

Control Flow Integrity (CFI) prevents bugs from becoming exploits. Before indirect control flow transfers, CFI validates that the target of the flow belongs to a pre-determined set of valid targets. Unfortunately, instructions for using CFI are hard to find and very confusing. We fixed the problem. We published two blog posts that describe how CFI prevents exploitation, and how to use it with clang and Visual Studio. We also provided working examples showing how CFI protects Linux, MacOS, and Windows applications.

9. Pulled back the veil on PointsTo, our whole-program static analysis tool

Dynamic program analysis tools like AddressSanitizer and Valgrind can tell developers when running code accesses uninitialized memory, leaks memory, or uses memory after it’s been freed. Despite this, memory bugs are still shipped and exploited in the wild. That’s because bugs have a nasty habit of hiding behind rarely executed code paths, and dynamic analysis tools can’t check every possible program path. PointsTo can. It’s a whole-program static analysis tool that we developed and use to find and report on potential use-after-free bugs in large codebases. Read about PointsTo.

10. Continued to enrich NYC’s InfoSec community

Our bi-monthly meetup –Empire Hacking– really picked up steam in 2016. Membership passed 500 people; 100 of whom regularly attend our meetings. We heard 14 superb presentations on pragmatic security research and new discoveries in attack and defense. Many thanks to our hosts: Spotify, Two Sigma, DigitalOcean, and WeWork!

Everyone is welcome at Empire Hacking, but it may not be for everyone. So we put together nyc-infosec.com, a directory of all of the gatherings, companies, and university programs in NYC that we could find. Our goals with nyc-infosec.com are to promote collaboration, to elevate NYC’s community to its rightful place on the InfoSec ‘stage,’ and to help researchers and practitioners benefit from each other’s work.

One local event even earned our sponsorship: O’Reilly’s Security Conference.

11. Hired three interns for meaningful work

This winter we are once again giving college students paid internships to work on interesting security problems. We work with each student to create a project that is interesting both for them and beneficial for us, and provide them with resources and mentorship to make progress. This year, our interns are working on applying machine learning to fuzzing, porting challenge binaries to Windows, and improving our program analysis tools. We’ll have a post summarizing their work when it’s done; meanwhile, you can read about our past interns’ experience. We’ll also be hiring interns for the summer: contact us if you are interested.

12. Delivered 9 new talks at 13 separate conferences

When we can, we share the science that fuels our work – the successes and the failures. This year we spoke with an exceptional number of people at conferences all over the world.

Spoke The Truth

13. Called out Verizon for publishing bad data

Verizon’s Data Breach Investigations Report (DBIR) represents a collaborative effort involving 60+ organizations’ proprietary data. It’s the single best source of information for enterprise defenders, which is why it was a travesty that the report’s section on vulnerabilities used in data breaches contained misleading data, analysis, and recommendations. We chided Verizon and Kenna -the section’s contributor- and offered suggestions that would improve the quality of future DBIR. In response to criticism, Verizon and Kenna posted mea culpas on their blogs.

14. Clarified the technical details of the Apple-FBI standoff

In February, a federal judge ordered Apple to help the FBI recover encrypted data from the San Bernardino gunman’s iPhone 5C. Many argued the FBI’s request was technically infeasible given the support for strong encryption on iOS devices. Based on our reading of the request and knowledge of iOS, we explained why Apple could comply with the FBI in this instance. (If the iPhone had had a Secure Enclave, it would have been much harder.) For more detail, listen to Dan’s interview on the Risky Business podcast.

2017 won’t know what hit it

This year, we are looking forward to publicizing more of our research, continuing our commitment to our open source projects, and releasing more of our internal tools. We will:

  • Release Manticore, a ruthlessly effective hybrid symbolic-concrete (“concolic”) execution system that scales to large programs with numerous dependencies, complex interaction, and manual setup.
  • Add ARM support to McSema so we can lift binaries from all kinds of embedded systems such as hard drive firmware, phones, and IoT devices.
  • Publicly release a tool that combines a set of LLVM passes to detect side-channel vulnerabilities in sensitive codebases.
  • Sponsor Infiltrate 2017 and attend en masse. We really appreciate the forum they provide for a focused, quality review of the techniques attackers use to break systems. It’s a service to the community. We’re happy to support it.
  • Deliver a project inspired by Mr. Robot. Yeah, the TV show. More on that soon.

Let’s talk about CFI: Microsoft Edition

We’re back with our promised second installment discussing control flow integrity. This time, we will talk about Microsoft’s implementation of control flow integrity. As a reminder, control flow integrity, or CFI, is an exploit mitigation technique that prevents bugs from turning into exploits. For a more detailed explanation, please read the first post in this series.

Security researchers should study products that people use, and Microsoft has an overwhelming share of the desktop computing market. New anti-exploitation measures in Windows and Visual Studio are a big deal. These can and do directly impact a very large number of people.

For the impatient who want to know about control flow guard right now: add /guard:cf to both your compiler and linker flags, and take a look at our examples showing what CFG does and does not do.

Microsoft’s CFI

Microsoft’s implementation of CFI is called Control Flow Guard (CFG), and it requires both operating system and compiler support. The minimum supported operating system is Windows 8.1 Update 3 and the minimum compiler version is Visual Studio 2015 (VS2015 Update 3 is recommended). All the examples in this blog post use Visual Studio 2015 and Windows 10 on x86-64.

Control Flow Guard is very well documented — there is an official documentation page, the documentation for the compiler option, and even a blog post from when the feature was in development. CFG is a very straightforward implementation of CFI:

  • First, the compiler identifies all indirect branches in a program
  • Next, it determines which branches must be protected. For instance, indirect branches that have a statically identifiable target don’t need CFI checks.
  • Finally, the compiler inserts lightweight checks at potentially vulnerable branches to ensure the branch target is a valid destination.

As in the previous blog post, we will not explore the technical implementation of CFG. There is already plenty of excellent literature on the subject. Instead this blog post will focus on how to use CFG in your programs, and show what CFG does and does not protect. However, we will mention some important differences between CFG and Clang’s CFI implementation.

Comparing CFG with Clang’s CFI

This comparison is meant to show the differences between how each implementation translates theoretical ideas behind control flow integrity into shipping application protection mechanisms. Neither implementation is better or worse than the other; they target different software ecosystems. Each works within real-world constraints (e.g. source availability, performance, ease of use, API/ABI stability, backwards compatibility, etc.) to achieve meaningful software protection.

What’s protected?

Programs protected with Microsoft’s CFG or Clang’s CFI execute lightweight checks before indirect control flow transfers. The check validates that the target of the flow belongs to a pre-determined set of valid targets.

Windows programs have many indirect calls that cannot be hijacked. For instance, API calls are performed via an indirect call through the IAT, which is set to read-only after program load. The Visual Studio compiler safely omits CFG checks for these calls.

Clang’s CFI also includes checks that are not exactly CFI related, such as runtime validation of pointer casts. See the previous blog post for more details and examples.

What is a valid target?

Control Flow Guard has a single per-process mapping of all valid control flow targets. Anything in the mapping is considered a valid target (Figure 1b). CFG provides a way to adjust the valid target map at runtime, via the the aptly named SetProcessValidCallTargets API. This is especially helpful when dealing with JITted code or manually loading dynamic libraries.

CFG also provides three compiler directives that control CFG behavior in a specified method. These directives are defined in ntdef.h in the Windows SDK, but not well documented. We would like to thank Matt Miller from Microsoft for explaining what they do:

  • __declspec(guard(ignore)) will disable CFG checks for all indirect calls inside a method, and ignore any function pointers referenced in the method.
  • __declspec(guard(nocf)) will disable CFG checks for all indirect calls inside a method, but track any function pointers referenced in the method and add those functions to the valid destination map.
  • __declspec(guard(suppress)) will prevent an exported function from being a valid CFG destination. This is used to prevent security sensitive functions from being called indirectly (for instance, SetProcessValidCallTargets is protected in this way).

Clang’s CFI is more fine grained in its protection. The target of each indirect control flow transfer must match an expected type signature (Figure 1a). Depending on the options enabled, calls to class member functions are also verified to be within the proper class hierarchy. Effectively, there is a valid target mapping per type signature and per class hierarchy. The target sets are fixed at compile time and cannot be changed.

Figure 1: Differences in the valid call targets for the cfg_icall example. The true valid destination is in green, and everything else is in red.
clang_valid_dests vs_valid_dests
(a) Valid destinations at the indirect call for Clang’s CFI. Only functions matching the expected function signature are in the list. (b) Valid destinations at the indirect call for CFG using Visual Studio 2015. Every legal function entry point is in the list.

How is protection enforced?

Control Flow Guard splits enforcement duties between the compiler and the operating system. The compiler inserts the checks and provides an initial valid target set, and the operating system maintains the target set and verifies destinations.

Clang’s CFI does all enforcement at the compiler level; the operating system is not aware of CFI.

What about dynamic libraries, JITed code, and other edge cases?

Control Flow Guard supports cross-library calls, but enforcement only occurs if the library is also compiled with Control Flow Guard. Dynamically generated code pages can be added to or excluded from the valid target map. External functions retrieved via GetProcAddress are always valid call targets*.

Clang’s CFI supports cross-library calls via the -fsanitize-cfi-cross-dso flag. Both the library and the application must be compiled with this flag. As far as we can tell, dynamically generated code does not receive CFI protection. External functions retrieved via dlsym are automatically added as a valid target when -fsanitize-cfi-cross-dso is used, otherwise these calls trigger a CFI violation.

* The exception to this rule are functions protected with __declspec(guard(suppress)). These functions must be linked via the import table or they will not be callable.

Using CFI with Visual Studio 2015

Using control flow guard with Visual Studio is extremely simple. There is a fantastic documentation page on the MSDN website that describes how to enable CFG, both via the GUI and via the command line. The quick and summary: add /guard:cf to you compiler and linker flags. That’s it.

There are a few caveats, which are only applicable if you are going to dynamically adjust valid indirect call targets via SetProcessValidCallTargets. First, you will need a new-ish version of the Windows SDK. The version that came by default with our Visual Studio 2015 install didn’t have the proper definitions, we had to install the latest (as of this writing) version 10.0.14393.0. Second, you must set the SDK to target Windows 10 (#define _WIN32_WINNT 0x0A00). Third, you must link with mincore.lib, as it includes the necessary import definitions.

Control Flow Guard Examples

We have created samples with specially crafted bugs to show how to use CFG, and some errors CFG protects against. The bugs that these examples have are not statically identified by the compiler, but are detected at runtime by CFG. Where possible, we simulate potential malicious behavior that CFG would prevent, and which malicious behavior CFG would not prevent.

These CFG examples are modified from the Clang CFI examples to show the different meaning of a valid call destination between the two implementations. Each example builds two binaries, one with CFG (e.g. cfg_icall.exe) and one without CFG (e.g. no_cfg_icall.exe). These binaries are built from the same source, and used to illustrate CFG features and protections.

We have provided the following examples:

cfg_icall

This example is an analogue of the cfi_icall example from the Clang CFI blog post, but modified slightly to work with Visual Studio 2015 and Control Flow Guard. The example binary accepts a single command line argument, with the valid values being 0-3. Each value demonstrates different aspects of indirect call protection.

  • Option 0 is a normal, valid indirect call that should always work. This should run properly under any CFI scheme.
  • Option 1 is an invalid indirect call (the destination is read from outside array bounds), but the destination is a function with the same function signature as a valid call. This works under both Clang’s CFI and CFG, but it could fail under some future scheme.
  • Option 2 is an invalid indirect call, and the destination is a valid function entry but with a signature different than the caller expects. This call fails under Clang’s CFI but works under CFG.
  • Option 3 is an invalid indirect call to a destination that is an invalid function entry point. This should fail under any CFI scheme, and this call fails under Clang’s CFI and CFG.
  • All other options should point to uninitialized memory, and correctly fail for both tested CFI implementations.

cfg_vcall

The cfg_vcall example (derived from the cfi_icall example from the previous post) shows that virtual calls are protected by CFG, when the destination is not a valid entry point. The example shows two simulated bugs: the first bug is an invalid cast to simulate something like a type confusion vulnerability. This will fail under Clang’s CFI, but succeed under CFG. The second bug simulates a use-after-free or similar memory corruption, where the object pointer is replaced by an attacker-created object, with a function pointer that points to the middle of a function. The bad call is blocked by both Clang’s CFI and CFG.

Figure 2: A Control Flow Guard violation as seen in WinDbg.
cfg_violation

cfg_valid_targets

This example is cfg_icall but modified to show how to use SetProcessValidCallTargets. The CFG bitmap is manually updated to remove bad_int_arg and float_arg from the valid call target list. Only option 0 will work; every other option will return a CFG error.

cfg_guard_ignore

Control flow guard can be disabled for certain methods; this example shows how to use the __declspec(guard(ignore)) compiler directive to completely disable CFG inside the specified method.

cfg_guard_nocf

Control flow guard can be partially disabled for certain methods; this example shows how to use the __declspec(guard(nocf)) compiler directive to disable CFG for indirect calls in a specified method, but still enable CFG for any referenced function pointers. The example compares the effects of __declspec(guard(nocf)) to __declspec(guard(ignore)).

cfg_guard_suppress and cfg_suppressed_export

Sometimes a library has security sensitive methods that should never be called indirectly. The __declspec(guard(suppress)) directive will prevent exported functions from being called via function pointer. These two examples work together to show how suppressed exports work. Cfg_suppressed_export is a DLL with a suppressed export and a normal export. Cfg_guard_suppress tries to call both exports via a pointer retrieved via GetProcAddress.

All flows must end

Now that you know what Control Flow Guard is and how it can protect your applications, go turn it on for your software! Enabling CFG is very simple, just add /guard:cf to your compiler and linker flags. To see real examples of how CFG can protect your software, take look at our CFG examples showcase. We hope that Microsoft continues to improve CFG with future Visual Studio releases.

Meet Algo, the VPN that works

I think you’ll agree when I say: there’s no VPN option on the market designed with equal emphasis on security and ease of use.

That changes now.

Today we’re introducing Algo, a self-hosted personal VPN server designed for ease of deployment and security. Algo automatically deploys an on-demand VPN service in the cloud that is not shared with other users, relies on only modern protocols and ciphers, and includes only the minimal software you need.

And it’s free.

For anyone who is privacy conscious, travels for work frequently, or can’t afford a dedicated IT department, this one’s for you.

Don’t bother with commercial VPNs

They’re crap.

Really, the paid-for services are just commercial honeypots. If an attacker can compromise a VPN provider, they can monitor a whole lot of sensitive data.

Paid-for VPNs tend to be insecure: they share keys, their weak cryptography gives a false sense of security, and they require you to trust their operators.

Even if you’re not doing anything wrong, you could be sharing the same endpoint with someone who is. In that case, your network traffic will be analyzed when law enforcement makes that seizure.

Streisand is no better

Good concept. Poor implementation.

It installs ~40 services, including numerous remote access services, a Tor relay node, and out-of-date software. It leaves you with dozens of keys to manage and it allows weak crypto.

That’s a hefty footprint and it’s too complicated for any reasonable person to secure. If you set up an individual server just for yourself, you’d never know if or when an attacker compromised it.

OpenVPN: Requires client software

OpenVPN’s lack of out-of-the-box client support on any major desktop or mobile operating system introduces unnecessary complexity. The user experience suffers.

Speaking of users, they’re required to update and maintain this software too. That is a recipe for disaster.

Worst of all, OpenVPN depends on the security of TLS, both the protocol and its implementations. Between that, and past security incidents, we simply trust it less.

Other VPNs’ S/WAN song

The original attempt at free VPN software -FreeS/WAN- died in the early 2000’s when its dev team fractured. Three people forked it into LibreSwan, strongSwan and Openswan.

To use any of them today, you need something approaching tribal knowledge. The available documentation stymied and appalled us:

  • Little differentiation – If you search for information about strongSwan’s configuration, you could easily end up at a LibreSwan page. The terms will look familiar, but the instructions will be wrong.
  • Impenetrable language – Instead of using standard terms like ‘client, server, remote and local,’ they use ‘sun, moon, bob, carol,’ and a bunch of other arbitrary words.
  • Brittle methodology – The vast majority of documentation and guides insist on using ‘tried and true’ methods such as L2TP and IKEv1, even though IKEv2 is simpler and stronger. Since Apple added IKEv2 to iOS 8, there’s no reason not to use it.

Only the strongest S/WAN survived

After wading through the convoluted quagmire that is the S/WAN triplets, we settled on strongSwan.

Its documentation -such as it is- is the best of the bunch. It was rewritten recently from scratch to support IKEv2 (a positive step when supporting a major new protocol version). It’s the only IPSEC software that even offers the option for a trusted key store.

And the community is helpful. Special thanks to Thermi.

But it’s still super-complicated. Too many contributors made it very arcane. Again, you need that tribal knowledge to make IPSEC do what you want.

These are examples of why cryptography software has a well-earned reputation for poor usability. A tightly knit development community only communicating with itself tends to lead to a profusion of options that should be deprecated. There’s no sign that the user interface or experience has been reviewed on behalf of less-experienced users. For anyone bold enough to consider these points, here lies the path to widespread adoption.

So, we built Algo

Algo is a set of Ansible scripts that simplifies the setup of a personal IPSEC VPN. It contains the most secure defaults available, works with common cloud providers, and does not require client software on most devices.

The ‘VP of all Networks’ is strong, secure and tidy. It uses the least amount of software necessary to get the job done.

We made Algo with corporate travelers in mind. To save bandwidth and increase security, it blocks ads and compresses what’s left.

We shared an early version of Algo at Black Hat this year and people loved it.

Algo’s Features Anti-features
  • Supports only IKEv2
  • Supports only a single cipher suite w/ AES-GCM, SHA2 HMAC, and P-256 DH
  • Generates mobileconfig profiles to auto-configure Apple devices
  • Provides helper scripts to add and remove users
  • Blocks ads with a local DNS resolver and HTTP proxy
  • Based on current versions of Ubuntu and strongSwan
  • Installs to DigitalOcean, Amazon, Google, Azure or your own server
  • Does not support legacy cipher suites nor protocols like L2TP, IKEv1, or RSA
  • Does not install Tor, OpenVPN, or other risky servers
  • Does not depend on the security of TLS
  • Does not require client software on most platforms
  • Does not claim to provide anonymity or censorship avoidance
  • Does not claim to protect you from the FSB, MSS, DGSE, or FSM

Designed to be disposable

We wanted Algo to be easy to set up. That way, you start it when you need it, and tear it down before anyone can figure out the service you’re routing your traffic through.

Setup is automated. Just answer a few questions, and Algo will build your VPN for you.

We’ve automated the setup process for Apple devices, too. Algo just gives you a file that you AirDrop to your device. You press ‘install’ and you’ve got your VPN. Or ‘VPNs.’

You don’t have to choose just one VPN gateway. You could make yourself 20 on different services; Digital Ocean in Bangalore, EC2 in Virginia or any other combination. You have your choice.

One last reason that Algo is such a good solution: it’s been abstracted as a set of Ansible roles that we released to the community. Ansible provides clearer documentation, ensures that we can repeat what it is that we’re doing, and allows us to monitor configuration drift.

Thanks to the roles we created in Ansible, it’s very easy for us to add and refine different features independently. Members of our team will keep up on feature requests.

We’ll make sure it’s right. You can just use it.

Try Algo today.

Want help installing Algo?

We’re planning a virtual crypto party for Friday, December 16th at 3pm EST where we’ll walk you through installing Algo on their own. Register to join us.

Shin GRR: Make Fuzzing Fast Again

grr

We’ve mentioned GRR before – it’s our high-speed, full-system emulator used to fuzz program binaries. We developed GRR for DARPA’s Cyber Grand Challenge (CGC), and now we’re releasing it as an open-source project! Go check it out.

Fear GRR

Bugs aren’t afraid of slow fuzzers, and that’s why GRR was designed with unique and innovative features that make it tread scarily fast.

  • GRR emulates x86 binaries within a 64-bit address space using dynamic binary translation (DBT). As a 64-bit program, GRR can use more hardware registers and memory than the original program. This enabled easy implementation of perfect isolation without complex register-rescheduling or memory remapping logic. The translated program never sees GRR coming.
  • GRR is fast. Existing DBTs re-translate the same program on every execution. They specialize in translating long-running programs, where the translation cost is amortized over time, and “hot” code is reorganized to improve performance. Fuzzing campaigns execute the same program over and over again, so all code is hot, and re-translating the same code is wasteful. GRR’s avoids re-translating code by caching it to disk, and it optimizes its the cached code over the lifetime of the fuzzing campaign.
  • GRR eats JIT compilers and self-modifying code for breakfast. GRR translates one basic block at a time, and indexes the translated blocks in its cache using “version numbers.” A block’s version number is a Merkle hash of the contents of executable memory. Modifying the contents of an executable page in memory invalidates its hash, thereby triggering re-translation of its code when next executed.
  • GRR is efficient. GRR uses program snapshotting to skip over irrelevant setup code that executes before the first input byte is ever read. This saves a lot cycles in a fuzzing campaign with millions or billions of program executions. GRR also avoids kernel roundtrips by emulating system calls and performing all I/O within memory.
  • GRR is extensible. GRR supports pluggable input mutators, including Radamsa, and code coverage metrics, which allows you to tune GRR’s behavior to the program being fuzzed. In the CGC, we didn’t know ahead of time what binaries we would get. There is no one-size fits all way of measuring code coverage. GRR’s flexibility let us change how code coverage was measured over time. This made our fuzzer more resilient to different types of programs.

Two fists in the air

Take a look at GRR demolishing this CGC challenge that has six binaries communicating over IPC. GRR detects the crash in the 3rd binary.

NRFIN_00006

This demo shows off two nifty features of GRR:

  1. GRR can print out the trace of system calls performed by the translated binary.
  2. GRR can print out the register state on entry to every basic block executed. Instruction-granularity register traces are available when the maximum basic block size is set to a one instruction.

Dig deeper

I like to think of GRR as an excavator for bugs. It’s a lean, mean, bug-finding machine. It’s also now open-source, and permissively licensed. You should check it out and we welcome contributions.

Come Find Us at O’Reilly Security

We’re putting our money where our mouth is again. In continued support for New York’s growing infosec community we’re excited to sponsor the upcoming O’Reilly Security Conference.

We expect to be an outlier there: we’re the only sponsor that offers consulting and custom engineering rather than just off-the-shelf products. We see this conference as an opportunity to learn more about the problems -and their root causes- that attendees face, and how we can help.

We’ve had tremendous success helping companies like Amazon and Facebook. But you don’t need to have Amazon- or Facebook-sized security problems to benefit from our tools and research. If you have difficult security challenges, we hope you’ll come speak with us.

Pick through our tools

We’re going to have a LIVE instance of our Cyber Reasoning System (CRS) at our booth. Recently, we used it to audit a much larger amount of code in less time, in greater detail, and at a lower cost than a human could. For granular details, come grab a copy of a CRS-driven security assessment we conducted on zlib for Mozilla.

Autonomous cyber defense systems happens to be the topic of the keynote by Michael Walker, the DARPA PM who ran the Cyber Grand Challenge. If you’re intrigued by what he says, swing by our booth to see the CRS in action.

The CRS is just one of a unique set of capabilities and proprietary tools that we’ve developed over the course of deep research engagements, some for DARPA. We’ll have other tools to share, such as:

  • Challenge Binaries, which make it possible to objectively compare different bug-finding tools, program-analysis tools, patching strategies and exploit mitigations.
  • Screen, which combines a set of LLVM passes that track branching behavior to help find side-channel vulnerabilities, and an associated web frontend that helps with identifying commits that introduce them.
  • ProtoFuzz, a generic fuzzer for Google’s Protocol Buffers format. Instead of defining a new fuzzer generator for custom binary formats, ProtoFuzz automatically creates a fuzzer based on the same format definition that programs use.
  • osquery for Windows, a port of Facebook’s open-source endpoint security tool. This allows you to treat your infrastructure as a database, turning operating system information into a format that can be queried using SQL-like statements. This functionality is invaluable for performing incident response, diagnosing systems operations problems, ensuring baseline security settings, and more.

That’s just the start. We’re prepared to discuss every tool that we’ve ever mentioned on this blog.

Bring us your problems.

If you’re coming to this conference with complex needs that don’t fit neatly into any one product category, come talk to us. Mark, Sophia, Yan, and Dan will be on hand to answer your questions. We’re especially keen to chat with you if you’re:

  • Building low-level software, say in C or C++.
  • Using crypto in new and interesting ways.
  • Affected by resourced threat actors, reverse engineers, or fraudsters.
  • Building your own hardware or firmware.
  • Stuck on an intractable security problem that has eluded resolution. The more difficult the better.

Come find us at booth #104 in the sponsor pavilion.

Break a leg O’Reilly.

O’Reilly has put a lot of resources into their security conference. It fills a gap. We hope that it turns out well, and that they plan more events just like it in New York. See you there!

Let’s talk about CFI: clang edition

Our previous blog posts often mentioned control flow integrity, or CFI, but we have never explained what CFI is, how to use it, or why you should care. It’s time to remedy the situation! In this blog post, we’ll explain, at a high level, what CFI is, what it does, what it doesn’t do, and how to use CFI in your projects. The examples in this blog post are clang-specific, and have been tested on clang 3.9, the latest release as of October 2016.

This post is going to be long, so if you already know what CFI is and simply want to use it in your clang-compiled project, here’s the summary:

  • Ensure you are using a link-time optimization capable linker (like GNU gold or MacOS ld).
  • Add -flto to your build and linker flags
  • Add -fvisibility=hidden and -fsanitize=cfi to your build flags
  • Sleep happier knowing your binary is more protected against binary level exploitation.

For an example of using CFI in your project, please take a look at the Makefile that comes with our CFI samples.

What is CFI?

Control flow integrity (CFI) is an exploit mitigation, like stack cookies, DEP, and ASLR. Like other exploit mitigations, the goal of CFI is to prevent bugs from turning into exploits. Bugs in a program, like buffer overflows, type confusion, or integer overflows, may allow an attacker to change the code a program executes, or to execute parts of the program out of order. To convert these bugs to exploits, an attacker must force the target program to follow a code path the programmer never intended. CFI works by reducing the attacker’s ability to do that. The easiest way to understand CFI is that it aims to enforce at run-time what the programmer intended at compile time.

Another way to understand CFI is via graphs. A program’s control flow may be represented as a graph, called the control flow graph (CFG). The CFG is a directed graph where each node is a basic block of the program, and each directed edge is a possible control flow transfer. CFI ensures the CFG determined by the compiler at compile time is followed by the program at run time, even in the presence of vulnerabilities that would otherwise allow an attacker to alter control flow.

There are more technical details, such as forward-edge CFI, backwards-edge CFI, but these are best absorbed from the numerous academic papers published on control flow integrity.

History of CFI

The original paper on CFI from Microsoft Research was released in 2005, and since then there have been numerous improvements to the performance and functionality of various CFI schemes. Continued improvements mean that now CFI is mainstream: recent versions of both the clang compiler and Microsoft Visual Studio include some form of CFI.

Clang’s CFI

In this blog post, we will look at the various options provided by clang’s CFI implementation, what each does and does not protect, and how to use it in your projects. We will not cover technical implementation details or performance numbers; a thorough technical explanation is already available from the implementation team in their paper.

Control flow integrity support has been in mainline clang since version 3.7, invoked as a part of the supported sanitizers suite. To operate, CFI requires the full control flow graph of a program. Since programs are typically built from multiple compilation units, the full control flow is not available until link time. To enable CFI, clang requires a linker capable of link-time optimization. Our code examples assume a Linux environment, so we will be using the GNU gold linker. Both GNU gold and recent versions of clang are available as packages for common Linux distributions. GNU gold is already included in modern binutils packages; Clang 3.9 packages for various Linux distributions are available from the LLVM package repository.

Some of the CFI options in clang actually have nothing to do with control flow. Instead these options detect invalid casts or other similar violations before they turn into worse bugs. These options are spiritually similar to CFI, however, because they ensure “abstraction integrity” — that is, what the programmer intended to happen is what happens at runtime.

Using CFI in Clang

The clang CFI documentation leaves a lot to be desired. We are going to describe what each option does, what limitations it has, and example scenarios where using it would prevent exploitation. These directions assume clang 3.9 and an LTO capable linker are installed and working. Once installed, both the linker and clang 3.9 should “just work”; specific installation instructions are beyond the scope of this blog post.

Several new compilation and linking flags are needed for your project: -flto to enable link-time optimization, -fsanitize=cfi to enable all CFI checks, and -fvisibility=hidden to set default LTO visibility. For debug builds, you will also want to add -fno-sanitize-trap=all to see descriptive error messages when CFI violation is detected. For release builds, omit this flag.

To review, your debug command line should now look like:

clang-3.9 -fvisibility=hidden -flto -fno-sanitize-trap=all -fsanitize=cfi -o [output] [input]

And your release command line should look like:

clang-3.9 -fvisibility=hidden -flto -fsanitize=cfi -o [output] [input]

You most likely want to enable every CFI check, but if you want to only enable select checks (each is described in the next section), specify them via -fsanitize=[option] in your flags.

CFI Examples

We have created samples with specially crafted bugs to test each CFI option. All of the samples are designed to compile cleanly with the absolute maximum warning levels* (-Weverything). The bugs that these examples have are not statically identified by the compiler, but are detected at runtime via CFI. Where possible, we simulate potential malicious behavior that occurs without CFI protections.

Each example builds two binaries, one with CFI protection (e.g. cfi_icall) and one without CFI protections (e.g. no_cfi_icall). These binaries are built from the same source, and used to illustrate the difference CFI protection makes.

We have provided the following examples:

  • cfi_icall demonstrates control flow integrity of indirect calls. The example binary accepts a single command line argument (valid values are 0-3, but try invalid values with both binaries!). The command line argument shows different aspects of indirect call CFI protection, or lack thereof.
  • cfi_vcall shows an example of CFI applied to virtual function calls. This example demonstrates how CFI would protect against a type confusion or similar attack.
  • cfi_nvcall shows clang’s protections for calling non-virtual member functions via something that is not an object that has those functions defined.
  • cfi_unrelated_cast shows how clang can prevent casts between objects of unrelated types.
  • cfi_derived_cast expands on cfi_unrelated_cast and shows how clang can prevent casts from an object of a base class to an object of a derived class, if the object is not actually of the derived class.
  • cfi_cast_strict showcases the very specific instance where the default level of base-to-derived cast protection, like in cfi_derived_cast, would not catch an illegal cast.

* Ok, we lied, we had to disable two warnings, one about C++98 compatibility, and one about virtual functions being defined inline. The point is still valid since those warnings do not relate to potential bugs.

CFI Option: -fsanitize=cfi

This option enables all CFI checks. Use this option! The various CFI protections will only be inserted where needed; you aren’t saving anything by not using this option and picking specific protections. So if you want to enable CFI, use -fsanitize=cfi.

The currently implemented CFI checks, as of clang 3.9, are described in more detail in the following sections.

CFI Option: -fsanitize=cfi-icall

The cfi-icall option is the most straightforward form of CFI. At each indirect call site, such as calls through a function pointer, an extra check verifies two conditions:

The address being called is a valid destination, like the start of a function
The destination’s static function signature matches the signature determined at compile time.

When would these conditions be violated? When exploiting memory corruption attacks! Attackers want to hijack the program’s control flow to perform their bidding. These days, anti-exploitation protections are good enough force attackers to reuse pieces of the existing program. The program re-use technique is called ROP, and the pieces are referred to as gadgets. Gadgets are almost never whole functions, but snippets of machine code close to a control flow transfer instruction. The important aspect is that these gadgets are not at the start of a function; an attacker attempting to start ROP execution will fail CFI checks.

Attackers may be clever enough to point the new function pointer to a valid function. For instance, think of what would happen if a call to write was changed to call to system. The second condition attempts to mitigate these errors, by ensuring that runtime type signatures of destinations have to fall within a list of pre-selected destinations. Both of these condition violations are illustrated in option 2 and 3 of the the cfi_icall example.

Example Output

 $ ./no_cfi_icall 2
 Calling a function:
 CFI should protect transfer to here
 In float_arg: (0.000000)
 $ ./cfi_icall 2
 Calling a function:
 cfi_icall.c:83:12: runtime error: control flow integrity check for type 'int (int)' failed during indirect function call (cfi_icall+0x424610): note: (unknown) defined here
 $ ./no_cfi_icall 3
 Calling a function:
 CFI ensures control flow only transfers to potentially valid destinations
 In not_entry_point: (2)
 $ ./cfi_icall 3
 Calling a function:
 cfi_icall.c:83:12: runtime error: control flow integrity check for type 'int (int)' failed during indirect function call
 (cfi_icall+0x424730): note: (unknown) defined here

Limitations

  • Indirect call protection doesn’t work across shared library boundaries; indirect calls into shared libraries are not protected.
  • All translation units have to be compiled with -fsanitize=cfi-icall.
  • Only works on x86 and x86_64 architectures
  • Indirect call protection does not detect calls to the same function signature. Think of changing a call from delete_user(const char *username) to make_admin(const char *username). We show this limitation in cfi_icall option 1:
     $ ./cfi_icall 1
     Calling a function:
     CFI will not protect transfer to here
     In bad_int_arg: (1)

CFI Option: -fsanitize=cfi-vcall

To explain cfi-vcall, we need a quick review of virtual functions. Recall that virtual functions are functions that can be specialized in derived classes. Virtual functions are dynamically bound — that, is the actual function called is determined at runtime, depending on the object’s type. Due to dynamic binding, all virtual calls will be indirect calls. But these indirect calls may legitimately call functions with different signatures, since the class name is a part of the function signature. The cfi-vcall protection addresses this gap, by verifying that a virtual function call destination is always a function in the class hierarchy of the source object.

So when would a bug like this ever occur? The classic example is type confusion bugs in complex C++-based software like PDF readers, script interpreters, and web browsers. In type confusion, an object is re-interpreted as an object of a different type. The attacker can then use this mismatch to redirect virtual function calls to attacker controlled locations. A simulated example of such a scenario is in the cfi_vcall example.

Example Output

 $ ./no_cfi_vcall
 Derived::printMe
 CFI Prevents this control flow
 Evil::makeAdmin
 $ ./cfi_vcall
 Derived::printMe
 cfi_vcall.cpp:45:5: runtime error: control flow integrity check for type 'Derived' failed during virtual call (vtable address 0x00000042eb20)
 0x00000042eb20: note: vtable is of type 'Evil'
 00 00 00 00 c0 6f 42 00 00 00 00 00 d0 6f 42 00 00 00 00 00 00 70 42 00 00 00 00 00 00 00 00 00

Limitations

  • Only applies to C++ code that uses virtual functions.
  • All translation units have to be compiled with -fsanitize=cfi-vcall.
  • There can be a noticeable increase in the output binary size.
  • Need to specify the -fvisibility flag when building (for most purposes use -fvisibility=hidden)

CFI Option: -fsanitize=cfi-nvcall

The cfi-nvcall option is spiritually similar to the cfi-vcall option, except it works on non-virtual calls. The key difference is that non-virtual calls are direct calls known statically at compile time, so this protection is not strictly a control flow integrity issue. What the cfi-nvcall option does is identify non-virtual calls and ensure the calling object’s type at runtime can be derived from the type of the object known at compile time.

In simple terms, imagine a class hierarchy of Balls and a class hierarchy of Bricks. With cfi-nvcall, a compile-time call to Ball::Throw may execute Baseball::Throw, but will never execute Brick::Throw, even if an attacker substitutes a Brick object for a Ball object.

Situations fixed by cfi-nvcall may arise from memory corruption, type confusion, and deserialization. While these instances do not allow an attacker to redirect control flow on their own, these bugs may result in data-only attacks, or enable enough misdeeds to permit future bugs to work. This type of attack using data-only bugs is shown in the cfi-nvcall example: a low privilege user object is used in-place of a high privilege administrator object, leading to in-application privilege escalation.

Example Output

 $ ./no_cfi_nvcall
 Admin check:
 Account name is: admin
 Would do admin work in context of: admin
 User check:
 Account name is: user
 Admin Work not permitted for a user account!
 Account name is: user
 CFI Should prevent the actions below:
 Would do admin work in context of: user
 $ ./cfi_nvcall
 Admin check:
 Account name is: admin
 Would do admin work in context of: admin
 User check:
 Account name is: user
 Admin Work not permitted for a user account!
 Account name is: user
 CFI Should prevent the actions below:
 cfi_nvcall.cpp:54:5: runtime error: control flow integrity check for type 'AdminAccount' failed during non-virtual call (vtable address 0x00000042f300)
 0x00000042f300: note: vtable is of type 'UserAccount'
 00 00 00 00 80 77 42 00 00 00 00 00 a0 77 42 00 00 00 00 00 90 d4 f0 00 00 00 00 00 41 f3 42 00

Limitations

  • The cfi-nvcall checks only apply to polymorphic objects.
  • All translation units have to be compiled with -fsanitize=cfi-nvcall.
  • Need to specify the -fvisibility flag when building (for most purposes use -fvisibility=hidden)

CFI Option: -fsanitize=cfi-unrelated-cast

This is the first of three cast related options that are grouped with control flow integrity protections, but have nothing to do with control flow. These cast options verify “abstraction integrity”. Using these cast checks guards against insidious C++ bugs that may eventually lead to control flow hijacking.

The cfi-unrelated-cast option performs two runtime checks. First, it verifies that casts between object types must be in the same class hierarchy. Think of this as permitting casts from a variable of type Ball* to Baseball* but not from a variable of type Ball* to Brick*. The second runtime check verifies that casts from void* to an object type refer to objects of that type. Think of this as ensuring that a variable of type void* that points to a Ball object can only be converted back to Ball, and not to a Brick.

This property is most effectively verified at runtime, because the compiler is forced to treat all casts from void* to another type as legal. The cfi-unrelated-cast option ensures that such casts make sense in the runtime context of the program.

When would this violation ever happen? A common use of void* pointers is to pass references to objects between different parts of a program. The classic example is the arg argument to pthread_create. The target function would have no way to determine if the void* argument is of the correct type. Similar situations happen in complex application, especially in those that use IPC, queues, or other cross-component messaging. The cfi_unrelated_cast example shows a sample scenario that is protected by the cfi-unrelated-cast option.

Example Output

 $ ./no_cfi_unrelated_cast
 I am in fooStuff
 And I would execute: system("/bin/sh")
 $ ./cfi_unrelated_cast
 cfi_unrelated_cast.cpp:55:19: runtime error: control flow integrity check for type 'Foo' failed during cast to unrelated type (vtable address 0x00000042ec40)
 0x00000042ec40: note: vtable is of type 'Bar'
 00 00 00 00 70 71 42 00 00 00 00 00 a0 71 42 00 00 00 00 00 00 00 00 00 00 00 00 00 88 ec 42 00

Limitations

  • All translation units must to be compiled with cfi-unrelated-cast
    Need to specify the -fvisibility flag when building (for most purposes use -fvisibility=hidden)
  • Some functions (e.g. allocators) legitimately allocate memory of one type and then cast it to a different, unrelated object. These functions can be blacklisted from protection.

CFI Option: -fsanitize=cfi-derived-cast

This is the second of three cast related “abstraction integrity” options. The cfi-derived-cast option ensures that an object of a base class cannot be cast to a an object of a derived class unless the object is actually a derived object. As an example, cfi-derived-cast will prevent an variable of type Ball* being cast to Baseball*. This is a stronger guarantee than cfi-unrelated-cast, which verifies that the destination type is in the same class hierarchy as the source.

The potential causes of this issue are the same as most other issues on this list, namely memory corruption, de-serialization issues, and type confusion. In the cfi_derived_cast example, we show how a hypothetical base-to-derived casting bug can be used to disclose memory contents.

Example Output

 $ ./no_cfi_derived_cast
 I am: derived class, my member variable is: 12345678
 I am: base class, my member variable is: 7fffb6ca1ec8
 $ ./cfi_derived_cast
 I am: derived class, my member variable is: 12345678
 cfi_derived_cast.cpp:32:21: runtime error: control flow integrity check for type 'Derived' failed during base-to-derived cast (vtable address 0x00000042ef80)
 0x00000042ef80: note: vtable is of type 'Base'
 00 00 00 00 00 73 42 00 00 00 00 00 30 73 42 00 00 00 00 00 00 00 00 00 00 00 00 00 b0 ef 42 00

Limitations

  • All translation units must to be compiled with cfi-derived-cast
  • Need to specify the -fvisibility flag when building (for most purposes use -fvisibility=hidden)

CFI Option: -fsanitize=cfi-cast-strict

This is the third and most confusing of all the cast-related “abstraction integrity” options is a stricter version of cfi-derived-cast. The cfi-derived-cast option is not enabled when a derived class meets a very specific set of requirements:

  • It has only a single base class.
  • It does not introduce any virtual functions.
  • It does not override any virtual functions, other than an implicit virtual destructor.

If all of the above conditions are met, the base class and the derived class have an identical in-memory layout, and casting from the base class to the derived class should not introduce any security vulnerabilities. Performing such a cast is undefined and should never be done, but apparently enough projects utilize this undefined behavior to warrant a separate CFI option. The cfi_cast_strict example shows this behavior in action.

Example Output

 $ ./no_cfi_cast_strict
 Base: func
 $ ./cfi_cast_strict
 cfi_cast_strict.cpp:22:18: runtime error: control flow integrity check for type 'Derived' failed during base-to-derived cast (vtable address 0x00000042e790)
 0x00000042e790: note: vtable is of type 'Base'
 00 00 00 00 10 6d 42 00 00 00 00 00 20 6d 42 00 00 00 00 00 50 6d 42 00 00 00 00 00 90 c3 f0 00

Limitations

  • All translation units must to be compiled with cfi-cast-strict
    Need to specify the -fvisibility flag when building (for most purposes use -fvisibility=hidden)
  • May break projects that rely on this undefined behavior.

Conclusion

Control flow integrity is an important exploit mitigation, and should be used whenever possible. Modern compilers such as clang already have support for control flow integrity, and you can use it today. In this blog post we described how to use CFI with clang, example scenarios where CFI prevents exploitation and otherwise detects subtle bugs, and discussed some limitations of CFI protections.

Now that you’ve read about what clang’s CFI does, try out out the examples and see how CFI can benefit your software development process.

But clang isn’t the only compiler to implement CFI! Microsoft Research originated CFI, and CFI protections are available in Visual Studio 2015. In our next installment, we are going to discuss Visual Studio’s control flow integrity implementation: Control Flow Guard.

Automated Code Audit’s First Customer

Last month our Cyber Reasoning System (CRS) -developed for DARPA’s Cyber Grand Challenge– audited a much larger amount of code in less time, in greater detail, and at a lower cost than a human could.

Our CRS audited zlib for the Mozilla Secure Open Source (SOS) Fund. To our knowledge, this is the first instance of a paid, automated security audit by a CRS.

This represents a shift in the way that software security audits can be performed. It’s a tremendous step toward securing the Internet’s core infrastructure.

Choice where there once was none

Every year, public, private, and not-for-profit organizations spend tens of thousands of dollars on code audits.

Over a typical two-week engagement, security professional charge a tidy fee to perform an audit. Their effectiveness will be limited by the sheer volume of the code, the documentation and organization of the code, and the inherent limitations of humans — getting tired, dreaming of vacations, etc.

You can only analyze complex C code effectively for so many hours a day.

Furthermore, a human assessor might have great experience in some subset of possible flaws or the C language, but complete or nearly complete knowledge is hard to come by. We’re talking about expertise acquired over 15 years or more. That level of knowledge isn’t affordable for non-profits, nor is it common in 1-2 week assessments.

It makes more sense for a piece of software to conduct the audit instead. Software doesn’t get tired. It can audit old, obfuscated code as easily as modern, well-commented code. And software can automatically re-audit code after every update to make sure fixes are correct and don’t introduce new errors.

Mozilla’s SOS

In August, as a part of their Secure Open Source (SOS) Fund, Mozilla engaged us to perform a security assessment of zlib, an open source compression library. Zlib is used in virtually every software package that requires compression or decompression. More than one piece of software you are using to read this very text bundles zlib.

It has a relatively small code base, but in that small size hides a lot of complexity. First, the code that runs on the machine may not exactly match the source, due to compiler optimizations. Some bugs may only occur occasionally due to use of undefined behavior. Others may only be triggered under extremely exceptional conditions. In a well-inspected code base such as zlib, the only bugs left might be too subtle for a human to find during a typical engagement.

To identify any especially subtle bugs from a human-powered audit, Mozilla would have had to spend many thousands of dollars more. But they’re a non-profit, and they have an array of other projects to audit and improve.

Great coverage at a great price

Automation made the engagement affordable for Mozilla, and viable for us. They paid 20% of what we normally have to charge for this kind of work.

Our automated assessment paired the Trail of Bits CRS with TrustInSoft’s verification software to identify memory corruption vulnerabilities, create inputs that stress varying program paths, and to identify code that may lead to bugs in the future.

For non-profits working to secure core infrastructure of the Internet, this is a wonderful opportunity to get a detailed assessment with great coverage for a fraction of the traditional cost.

Contact us for more information.

Windows network security now easier with osquery

Today, Facebook announced the successful completion of our work: osquery for Windows.

“Today, we’re excited to announce the availability of an osquery developer kit for Windows so security teams can build customized solutions for their Windows networks… This port of osquery to Windows gives you the ability to unify endpoint defense and participate in an active open source community ready to share experiences and stories.”
Introducing osquery for Windows

osquery for Windows works with Doorman

The Windows version of osquery can talk to existing osquery fleet management tools, such as doorman. osquery for Windows has full support for TLS remote endpoints and certificate validation, just like the Unix version. In this screenshot, we are using an existing doorman instance to find all running processes on a Windows machine.

How we ported osquery to Windows

This port presented several technical challenges, which we always enjoy. Some of the problems were general POSIX to Windows porting issues, while others were unique to osquery.

Let’s start with the obvious POSIX to Windows differences:

  • Paths are different — no more ‘/’ as the path separator.
  • There are no signals.
  • Unix domain sockets are now named pipes.
  • There’s no glob() — we had to approximate the functionality.
  • Windows doesn’t fork() — the process model is fundamentally different. osquery forks worker processes. We worked around this by abstracting the worker process functionality.
  • There’s no more simple integer uid or gid values — instead you have SIDs, ACLs and DACLs.
  • And you can forget about the octal file permissions model — or use the approximation we created.

Then, the less-obvious problems: osquery is a daemon. In Windows, daemons are services, which expect a special interface and are launched by the service control manager. We added service functionality to osquery, and provided a script to register and remove the service. The parent-child process relationship is different — there is no getppid() equivalent, but osquery worker processes needed to know if their parent stopped working, or if a shutdown event was triggered in the parent process.

Deeper still, we found some unexpected challenges:

  • Some code that builds on clang/gcc just won’t build on Visual Studio.
  • Certain function attributes like __constructor__() have no supported Visual Studio equivalent. The functionality had to be re-created.
  • Certain standard library functions have implementation defined behavior — for instance, fopen will open a directory for reading on Unix-based systems, but will fail on Windows.

Along the way, we also had to ensure that every library that osquery depends on worked on Windows, too. This required fixing some bugs and making substitutions, like using linenoise-ng instead of GNU readline. There were still additional complexities: the build system had to accommodate a new OS, use Windows libraries, paths, compiler options, appropriate C runtime, etc.

This was just the effort to get the osquery core running. The osquery tables – the code that retrieves information from the local machine – present their own unique challenges. For instance, the processes table needed to be re-implemented on Windows. This table retrieves information about processes currently running on the system. It is a requirement for the osquery daemon to function. To implement this table, we created a generic abstraction to the Windows Management Instrumentation (WMI), and used existing WMI functionality to retrieve the list of running processes. We hope that this approach will support the creation of many more tables to tap into the vast wealth of system instrumentation data that WMI offers.

osqueryi works on Windows too!

osqueryi, the interactive osquery shell, also works on Windows. In this screenshot we are using osquery to query the list of running processes and the cryptographic hash of a file.

The port was worth the effort

Facebook sparked a lot of excitement when it released osquery in 2014.

The open source endpoint security tool allows an organization to treat its infrastructure as a database, turning operating system information into a format that can be queried using SQL-like statements. This functionality is invaluable for performing incident response, diagnosing systems operations problems, ensuring baseline security settings, and more.

It fundamentally changed security for environments running Linux distributions such as Ubuntu or CentOS, or for deployments of Mac OS X machines.

But if you were running a Windows environment, you were out of luck.

To gather similar information, you’d have to cobble together a manual solution, or pay for a commercial product, which would be expensive, force vendor reliance, and lock your organization into using a proprietary -and potentially buggy– agent. Since most of these services are cloud-based, you’d also risk exposing potentially sensitive data.

Today, that’s no longer the case.

Disruption for the endpoint security market?

Because osquery runs on all three major desktop/server platforms, the open-source community can supplant proprietary, closed, commercial security and monitoring systems with free, community-supported alternatives. (Just one more example of how Facebook’s security team accounts for broader business challenges.)

We’re excited about the potential:

  • Since osquery is cross platform, network administrators will be able to monitor complex operating system states across their entire infrastructure. For those already running an osquery deployment, they’ll be able to seamlessly integrate their Windows machines, allowing for far greater efficiency in their work.
  • We envision startups launching without the need to develop agents that collect this rich set of data first, as Kolide.co has already done. We’re excited to see what’s built from here.
  • More vulnerable organizations -groups that can’t afford the ‘Apple premium,’ or don’t use Linux- will be able to secure their systems to a degree that wasn’t possible before.

Get started with osquery

osquery for Windows is only distributed via source code. You must build your own osquery. To do that, please see the official Building osquery for Windows guide.

Currently osquery will only build on Windows 10, the sole prerequisite. All other dependencies and build tools will be automatically installed as a part of the provisioning and building process.

There is an open issue to create an osquery chocolatey package, to allow for simple package management-style installation of osquery.

If you want our help modifying osquery’s code base for your organization, contact us.

Learn more about porting applications to Windows

We will be writing about the techniques we applied to port osquery to Windows soon. Follow us on Twitter and subscribe to our blog with your favorite RSS reader for more updates.

Plug into New York’s Infosec Community

Between the city’s size and the wide spectrum of the security industry, it’s easy to feel lost. Where are ‘your people?’ How can you find talks that interest you? You want to spend your time meeting and networking, not researching your options.

So, we put together a directory of all of the infosec gatherings, companies, and university programs in NYC that we know of at nyc-infosec.com.

Why’d we create this site?

We’re better than this. Today, when investors think ‘east coast infosec,’ they think ‘Boston.’ We believe that NYC’s infosec community deserves more recognition on the national and international stages. That will come as we engage with one another to generate more interesting work.

We need breaks from routine. It’s easy to stay uptown or downtown, or only go to forensics or software security meetings. If you don’t know what’s out there, you don’t know what you’re missing out on.

We all benefit from new ideas. That’s why we started Empire Hacking. We want to help more people learn about topics that excite and inspire action.

We want to coax academics off campus. A lot of exciting research takes place in this city. We want researchers to find the groups that will be the most interested in their work. Conversely, industry professionals have much to learn from emerging academic innovations and we hope to bring them together.

Check out a new group this month

Find infosec events, companies, and universities in the city on nyc-infosec.com. If you’re not sure where to start, we recommend:

Empire Hacking (new website!)
Information security professionals gather at this semi-monthly meetup to discuss pragmatic security research and new discoveries in attack and defense over drinks and light food.

New York Enterprise Information Security Group
Don’t be fooled by the word ‘enterprise.’ This is a great place for innovative start-ups to get their ideas in front of prospective early clients. David Raviv has created a great space to connect directly with technical people working at smart, young companies.

SummerCon
Ah, SummerCon. High-quality, entertaining talks. Inexpensive tickets. Bountiful booze. Somehow, they manage to pull together an excellent line-up of speakers each year. This attracts a great crowd, ranging from “hackers to feds to convicted felons and concerned parents.”

O’Reilly Security
Until now, New York didn’t really have one technical, pragmatic, technology-focused security conference. This newcomer has the potential to fill that gap. It looks like O’Reilly has put a lot of resources behind it. If it turns out well for them (fingers crossed), we hope that they’ll plan more events just like it.

What’d we miss?

If you know of an event that should be on the list, please let us know on the Empire Hacking Slack.