What do you wish osquery could do?

Welcome to the third post in our series about osquery. So far, we’ve described how five enterprise security teams use osquery and reviewed the issues they’ve encountered. For our third post, we focus on the future of osquery. We asked users, “What do you wish osquery could do?” The answers we received ranged from small requests to huge advancements that could disrupt the incident-response tool market. Let’s dive into those ‘super features’ first.

osquery super features

Some users’ suggestions could fundamentally expand osquery’s role from an incident detection tool, potentially allowing it to steal significant market share from commercial tools in doing prevention and response (we listed a few of these in our first blog post). This would be a big deal. A free and open source tool that gives security teams access to incident response abilities normally reserved for customers of expensive paid services would be a windfall for the community. It could democratize fleet security and enhance the entire community’s defence against attackers. Here are the features that could take osquery to the next level:

Writable access to endpoints

What it is: Currently, osquery is limited to read-only access on endpoints. Such access allows the program to detect and report changes in the operating systems it monitors. Write-access via an osquery extension would allow it to edit registries in the operating system and change the way endpoints perform. It could use this access to enforce security policies throughout the fleet.

Why it would be amazing: Write-access would elevate osquery from a detection tool to the domain of prevention. Rather than simply observing system issues with osquery, write-access would afford you the ability to harden the system right from the SQL interface. Application whitelisting and enforcement, managing licenses, partitioning firewall settings, and more could all be available.

How we could build it: If not built correctly, write-access in osquery could cause more harm than good. Write-access goes beyond the scope of osquery core. Some current users are only permitted to deploy osquery throughout their fleet because of its limited read-only permissions. Granting write-access through osquery core would bring heightened security risks as well as potential for system disruption. The right way to implement this would be to make it available to extensions that request the functionality during initialization and minimize the impact this feature has on the core.

IRL Proof: In fact, we have a pull request waiting on approval that would support write-access through extensions! The code enables write-permissions for extensions but also blocks write-permissions for tables built into core.

We built this feature in support of a client who wanted to block malicious IP addresses, domains and ports for both preventative and reactive use-cases. Once this code is committed, our clients will be able to download our osquery firewall extension to use osquery to partition firewall settings throughout their fleets.

Event-triggered responses

What it is: If osquery reads a log entry that indicates an attack, it could automatically respond with an action such as quarantining the affected endpoint(s). This super feature would add automated prevention and incident response to osquery’s capabilities.

Why it would be amazing: This would elevate osquery’s capabilities to those of commercial vulnerability detection/response tools, but it would be transparent and customizable. Defense teams could evaluate, customize, and match osquery’s incident-response capabilities to their companies’ needs, as a stand-alone solution or as a complement to another more generic response suite.

How we could build it: Automated event response for osquery could be built flexibly to allow security teams to define their own indicators of incidents and their preferred reactions. Users could select from known updated databases: URL reputation via VirusTotal, file reputation via ReversingLabs, IP reputation of the remote addresses of active connections via OpenDNS, etc. The user could pick the type of matching criteria (e.g., exact, partial, particular patterns, etc.), and prescribe a response such as ramping up logging frequency, adding an associated malicious ID to a firewall block list, or calling an external program to take an action. As an additional option, event triggering that sends logs to an external analysis tool could provide more sophisticated response without damaging endpoint performance.

IRL Proof: Not only did multiple interviewees long for this feature; some teams have started to build rudimentary versions of it. As discussed in “How are teams currently using osquery?”, we spoke with one team who built incident alerting with osquery by piping log data into ElasticSearch and auto-generated Jira tickets through ElastAlert upon anomaly detection. This example doesn’t demonstrate full response capability, but it illustrates how useful just-in-time business process reaction to incidents is possible with osquery. If osquery can monitor event-driven logs (FIM, process auditing, etc), trigger an action based on detection of a certain pattern, and administer a protective response, it can provide an effective endpoint protection platform.

Technical debt overhaul

What it is: Many open source projects carry ‘technical debt.’ That is, some of the code engineering is built to be effective for short-term goals but isn’t suitable for long-term program architecture. A distributed developer community each enhancing the technology for slightly different requirement exacerbates this problem. Solving this problem requires costly coordination and effort from multiple community members to rebuild and standardize the system.

Why it would be amazing: Decreasing osquery’s technical debt would upgrade the program to a standard that’s adoptable to a significantly wider range of security teams. Users in our osquery pain points research cited performance effects and reliability among organizational leadership’s top concerns for adopting osquery. Ultimately, the teams we interviewed won the argument, but there are likely many teams who didn’t get the green light on using osquery.

How we could build it: Tackling technical debt is hard enough within an organization. It’s liable to be even harder in a distributed community. Unless developers have a specific motivation for tackling very difficult high-value inefficiencies, the natural reward for closing an issue biases developers toward smaller efforts. To combat this, leaders in the community could dump and sort all technical debt issues along a matrix of value and time, leave all high-value/low-time issues for individual open source developers, and pool community resources to resolve harder problems as full-fledged development projects.

IRL Proof: We know that pooling community resources to tackle technical debt works. We’ve been doing it for over a year. Trail of Bits has been commissioned by multiple companies to build features and fixes too big for the open source community. We’ve leveraged this model to port osquery to Windows, enhance FIM and process auditing, and much more that we’re excited to share with the public over the coming months. Often, multiple clients are interested in building the same things. We’re able to pool resources to make the project less expensive for everyone involved while the entire community benefits.

Other features users want

osquery shows considerable potential to grow beyond endpoint monitoring. However, the enterprise security teams and developers whom we interviewed say that the open source tool has room for improvement. Here are some of the other requests we heard from users:

  • Guardrails & rules for queries: Right now, a malformed query or practice can hamper the user’s workflow. Interviewees wanted guidance on targeting the correct data, querying at correct intervals, gathering from recommended tables, and customized recommendations for different environments.
  • Enhance Deployment Options: Users sought better tools for deploying throughout fleets and keeping these implementations updated. Beyond recommended QueryPacks, administrators wanted to be able to define and select platform-specific configurations of osquery across multi-platform endpoints. Automatically detecting and deploying configurations for unique systems and software was another desired feature.
  • Integrated Testing, Debugging, and Diagnostics: In addition to the current debugging tools, users wanted more resources for testing and diagnosing issues. New tools should help improve reliability and predictability, avoid performance issues, and make osquery easier to use.
  • Enhanced Event-Driven Data Collection: osquery has support for event-based data collection through FIM, Process Auditing, and other tables. However, these data sources suffer from logging implementation issues and are not supported on all platforms. Better event-handling configurations, published best practices, and guardrails for gathering data would be a great help.
  • Enhanced Performance Features: Users want osquery to do more with fewer resources. This would either lead to overall performance enhancements, or allow osquery to operate on endpoints with low resource profiles or mission-critical performance requirements.
  • Better Configuration Management: Enhancements such as custom tables and osqueryd scheduled queries for differing endpoint environments would make osquery easier to deploy and maintain on a growing fleet.
  • Support for Offline Endpoint Logging: Users reported a desire for forensic data availability to support remote endpoints. This would require offline endpoints to store data locally –- including storage of failed queries –- and push to the server upon reconnection
  • Support for Common Platforms: Facebook built osquery for its fleet of macOS- and Linux-based endpoints. PC sysadmins were out of luck until our Windows port last year. Support for other operating systems has been growing steadily thanks to the development community’s efforts. Nevertheless, there are still limitations. Think of this as one umbrella feature request: support for all features on all operating systems.

The list keeps growing

Unfortunately for current and prospective osquery users, Facebook can’t satisfy all of these requests. They’ve shared a tremendous gift by open sourcing osquery. Now it’s up to the community to move the platform forward.

Good news: none of these feature requests are unfeasible. The custom engineering is just uneconomical for individual organizations to invest in.

In the final post in this series, we’ll propose a strategy for osquery users to share the cost of development. Companies that would benefit could pool resources and collectively target specific features.

This would accelerate the rate at which companies could deprecate other full-suite tools that are more expensive, less flexible and less transparent.

If any of these items resonate with your team’s needs, or if you use osquery currently and have another request to add to the list, please let us know.

How to prepare for a security audit

You’ve just approved an audit of your codebase. Do you:

  • Send a copy of the repository and wait for the auditors’ reports, or
  • Take the extra effort to set the auditors up for success?

By the end of the audit, the difference between these answers will lead to profoundly disparate results. In the former case, you’ll waste money, lose time, and miss security issues. In the latter case, you’ll reduce your risk, protect your time, and get more valuable security guidance.

It’s an easy choice, right?

Glad you agree.

Now, here’s how to make that audit more effective, valuable, and satisfying for everybody involved.

Set a goal for the audit

This is the most important step of an audit, and paradoxically the one most often overlooked. You should have an idea of what kind of question you want answered, such as:

  • What’s the overall level of security for this product?
  • Are all client data transactions handled securely?
  • Can a user leak information about another user?

Knowing your biggest area of concern will help the auditing team tailor their approach to meet your needs.

Resolve the easy issues

Handing the code off to the auditing team is a lot like releasing the product: the cleaner the code, the better everything will go. To that end:

  • Enable and address compiler warnings. Go after the easy stuff first: turn on every single compiler warning you can find, understand each warning, then fix your code until they’re all gone. Upgrade your compiler to the latest version, then fix all the new warnings and errors. Even innocuous seeming warnings can indicate problems lying in wait.
  • Increase unit and feature test coverage. Ideally this has been part of the development process, but everyone slips up, tests don’t get updated, or new features don’t quite match the old integrations tests. Now is the time to update the tests and run them all.
  • Remove dead code, stale branches, unused libraries, and other extraneous weight. You may know which branch is active and which is dead but the auditors won’t and will waste time investigating it for potential issues. The same goes for that new feature that hasn’t seen progress in months, or that third-party library that doesn’t get used anymore.

Some issues will persist — a patch that isn’t quite ready, or a refactor that’s not integrated yet. Document any incomplete changes as thoroughly as possible, so that your auditors don’t waste a week digging into code that will be gone in two months’ time.

Document, Document, Document

Think of an audit team as a newly hired, fully remote developer; skilled at what they do, but unfamiliar with your product and code base. The more documentation, the faster they’ll get up to speed and the sooner they’ll be able to start their analysis.

  • Describe what your product does, who uses it, and how. The most important documentation is high level: what does your product do? What do users want from it? How does it achieve that goal? Use clear language to describe how systems interact and the rationale for design decisions made during development.
  • Add comments in-line with the code. Functions should have comments containing high-level descriptions of their intended behavior. Complicated sections of code should have comments describing what is happening and why this particular approach was chosen.
  • Label and describe your tests. More complicated tests should describe the exact behavior they’re testing. The expected results of tests, both positive and negative, should be documented.

Include past reviews and bugs. Previous audit reports can provide guidance to the new audit team. Similarly, documentation regarding past security-relevant bugs can give an audit team clues about where to look most carefully.

Deliver the code batteries included

Just like a new fully remote developer, the audit team will need a copy of the code and clear guidance on how to build and deploy your application.

  • Prepare the build environment. Document the steps to create a build environment from scratch on a computer that is fully disconnected from your internal network. Where relevant, be specific about software versions. Walk through this process on your own to ensure that it is complete. If you have external dependencies that are not publicly available, include them with your code. Fully provisioned virtual machine images are a great way to deliver a working build environment.
  • Document the build process. Include both the debug and release build processes, and also include steps on how to build and run the tests. If the test environment is distinct from the build environment, include steps on how to create the test environment. A well-documented build process enables an auditor to run static analysis tools far more efficiently and effectively.
  • Document the deploy process. This includes how to build the deployment environment. It is very important to list all the specific versions of external tools and libraries for this process, as the deployment environment is a considerable factor in evaluating the security of your product. A well-documented deployment process enables an auditor to run dynamic analysis tools in a real world environment.

The payoff

At this point you’ve handed off your code, documentation, and build environment to the auditors. All that prep work will pay off. Rather than puzzling over how to build your code or what it does, the audit team can immediately start work integrating advanced analysis tools, writing custom fuzzers, or bringing custom internal tools to bear. Knowing your specific goals will help them focus where you want them to.

An audit can produce a lot of insight into the security of your product. Having a clear goal for the audit, a clean codebase, and complete documentation will not only help the audit, it’ll make you more confident about the quality of the results.

Interested in getting an audit? Contact us to find out what we can do for you.

Checklist

Resolve the easy issues

  • Enable and address every last compiler warning.
  • Increase unit and feature test coverage.
  • Remove dead code, stale branches, unused libraries, and other extraneous weight.

Document

  • Describe what your product does, who uses it, why, and how it delivers.
  • Add comments about intended behavior in-line with the code.
  • Label and describe your tests and results, both positive and negative.
  • Include past reviews and bugs.

Deliver the code batteries included

  • Document the steps to create a build environment from scratch on a computer that is fully disconnected from your internal network. Include external dependencies.
  • Document the build process, including debugging and the test environment.
  • Document the deploy process and environment, including all the specific versions of external tools and libraries for this process.

Vulnerability Modeling with Binary Ninja

This is Part 3 in a series of posts about the Binary Ninja Intermediate Language (BNIL) family. You can read Part 1 here and Part 2 here.

In my previous post, I demonstrated how to leverage the Low Level IL (LLIL) to write an architecture-agnostic plugin that could devirtualize C++ virtual functions. A lot of new and exciting features have been added to Binary Ninja since then; in particular, Medium Level IL (MLIL) and Single Static Assignment (SSA) form[1]. In this post, I’m going to discuss both of these and demonstrate one fun use of them: automated vulnerability discovery.

Plenty of static analyzers can perform vulnerability discovery on source code, but what if you only have the binary? How can we model a vulnerability and then check a binary to see if it is vulnerable? The short answer: use Binary Ninja’s MLIL and SSA form. Together, they make it easy to build and solve a system of equations with a theorem prover that takes binaries and turns them, alchemy-like, into vulnerabilities!

Let’s walk through the process with everyone’s favorite hyped vulnerability of yesteryear, Heartbleed.

Hacking like it’s 2014: Let’s find Heartbleed!

For those who might not remember or be familiar with the Heartbleed vulnerability, let’s run through a quick refresher. Heartbleed was a remote information-disclosure vulnerability in OpenSSL 1.0.1 – 1.0.1f that allowed an attacker to send a crafted TLS heartbeat message to any service using TLS. The message would trick the service into responding with up to 64KB of uninitialized data, which could contain sensitive information such as private cryptographic keys or personal data. This was possible because OpenSSL used a field in the attacker’s message as a size parameter for malloc and memcpy calls without first validating that the given size was less than or equal to the size of the data to read. Here’s a snippet of the vulnerable code in OpenSSL 1.0.1f, from tls1_process_heartbeat:

    /* Read type and payload length first */
    hbtype = *p++;
    n2s(p, payload);
    pl = p;

    /* Skip some stuff... */

    if (hbtype == TLS1_HB_REQUEST)
        {
        unsigned char *buffer, *bp;
        int r;

        /* Allocate memory for the response, size is 1 bytes
         * message type, plus 2 bytes payload length, plus
         * payload, plus padding
         */
        buffer = OPENSSL_malloc(1 + 2 + payload + padding);
        bp = buffer;
        
        /* Enter response type, length and copy payload */
        *bp++ = TLS1_HB_RESPONSE;
        s2n(payload, bp);
        memcpy(bp, pl, payload);
        bp += payload;
        /* Random padding */
        RAND_pseudo_bytes(bp, padding);

        r = ssl3_write_bytes(s, TLS1_RT_HEARTBEAT, buffer, 3 + payload + padding);

Looking at the code, we can see that the size parameter (payload) comes directly from the user-controlled TLS heartbeat message, is converted from network-byte order to host-byte order (n2s), and then passed to OPENSSL_malloc and memcpy with no validation. In this scenario, when a value for payload is greater than the data at pl, memcpy will overflow from the buffer starting at pl and begin reading the data that follows immediately after it, revealing data that it shouldn’t. The fix in 1.0.1g was pretty simple:

    hbtype = *p++;
    n2s(p, payload);
    if (1 + 2 + payload + 16 > s->s3->rrec.length)
        return 0; /* silently discard per RFC 6520 sec. 4 */
    pl = p;

This new check ensures that the memcpy won’t overflow into different data.

Back in 2014, Andrew blogged about writing a clang analyzer plugin that could find vulnerabilities like Heartbleed. A clang analyzer plugin runs on source code, though; how could we find the same vulnerability in a binary if we didn’t have the source for it? One way: build a model of a vulnerability by representing MLIL variables as a set of constraints and solving them with a theorem prover!

Model binary code as equations with z3

A theorem prover lets us construct a system of equations and:

  1. Verify whether those equations contradict each other.
  2. Find values that make the equations work.

For example, if we have the following equations:

x + y = 8
2x + 3 = 7

A theorem prover could tell us that a) a solution does exist for these equations, meaning that they don’t contradict each other, and b) a solution to these equations is x = 2 and y = 6.

For the purposes of this exercise, I’ll be using the Z3 Theorem Prover from Microsoft Research. Using the z3 Python library, the above example would look like the following:

>>> from z3 import *
>>> x = Int('x')
>>> y = Int('y')
>>> s = Solver()
>>> s.add(x + y == 8)
>>> s.add(2*x + 3 == 7)
>>> s.check()
sat
>>> s.model()
[x = 2, y = 6]

Z3 tells us that the equations can be satisfied and provides values to solve them. We can apply this technique to modeling a vulnerability. It turns out that assembly instructions can be modeled as algebraic statements. Take the following snippet of assembly:

  lea eax, [ebx+8]
  cmp eax, 0x20
  jle allocate
  int3
allocate:
  push eax
  call malloc
  ret

When we lift this assembly to Binary Ninja’s LLIL, we get the following graph:

llil_example

Figure 1. LLIL makes it easy to identify the signed comparison conditional.

In this code, eax takes the value of ebx and then adds 8 to it. If this value is above 0x20, an interrupt is raised. However, if the value is less than or equal to 0x20, the value is passed to malloc. We can use LLIL’s output to model this as a set of equations that should be unsatisfiable if an integer overflow is not possible (e.g. there should never be a value of ebx such that ebx is larger than 0x20 but eax is less than or equal to 0x20), which would look something like this:

eax = ebx + 8
ebx > 0x20
eax <= 0x20

What happens if we plug these equations into Z3? Not exactly what we’d hope for.

>>> eax = Int('eax')
>>> ebx = Int('ebx')
>>> s = Solver()
>>> s.add(eax == ebx + 8)
>>> s.add(ebx > 0x20)
>>> s.add(eax <= 0x20)
>>> s.check()
unsat

There should be an integer overflow, but our equations were unsat. This is because the Int type (or “sort” in z3 parlance) represents a number in the set of all integers, which has a range of -∞ to +∞, and thus an overflow is not possible. Instead, we must use the BitVec sort, to represent each variable as a vector of 32 bits:

>>> eax = BitVec('eax', 32)
>>> ebx = BitVec('ebx', 32)
>>> s = Solver()
>>> s.add(eax == ebx + 8)
>>> s.add(ebx > 0x20)
>>> s.add(eax <= 0x20)
>>> s.check()
sat

There’s the result we expected! With this result, Z3 tells us that it is possible for eax to overflow and call malloc with a value that is unexpected. With a few more lines, we can even see a possible value to satisfy these equations:

>>> s.model()
[ebx = 2147483640, eax = 2147483648]
>>> hex(2147483640)
'0x7ffffff8'

This works really well for registers, which can be trivially represented as discrete 32-bit variables. To represent memory accesses, we also need Z3’s Array sort, which can model regions of memory. However, stack variables reside in memory and are more difficult to model with a constraint solver. Instead, what if we could treat stack variables the same as registers in our model? We can easily do that with Binary Ninja’s Medium Level IL.

Medium Level IL

Just as LLIL abstracts native disassembly, Medium Level IL (MLIL) adds another layer of abstraction on top of LLIL. Whereas LLIL abstracted away flags and NOP instructions, MLIL abstracts away the stack, presenting both stack accesses and register accesses as variables. Additionally, during the process of mapping LLIL to MLIL, memory stores that aren’t referenced later are identified and eliminated. These processes can be observed in the example below. Notice how there are no stack accesses (i.e. push or pop instructions) and var_8 does not appear in the MLIL at all.

disassembly

Figure 2a. An example function in x86.

LLIL

Figure 2b. LLIL of the example function.

MLIL

Figure 2c. MLIL of the example function.

Another feature you might notice in the MLIL is that variables are typed. Binary Ninja initially infers these types heuristically, but the user can override these with manual assignment later. Types are propagated through the function and also help inform the analysis when determining function signatures.

MLIL structure

Structurally, MLIL and LLIL are very similar; both are expression trees and share many of the same expression types for operations (for more on the IL’s tree structure, see my first blog post). However, there are several stark differences. Obviously, MLIL does not have analogous operations for LLIL_PUSH and LLIL_POP, since the stack has been abstracted. The LLIL_REG, LLIL_SET_REG, and other register-based operations are instead MLIL_VAR, MLIL_SET_VAR, and similar. On top of this, thanks to typing, MLIL also has a notion of structures; MLIL_VAR_FIELD and MLIL_STORE_STRUCT expressions describe these operations.

MLIL_Structures

Figure 3. Types in MLIL can generate some very clean code.

Some operations are common to both LLIL and MLIL, though their operands differ. The LLIL_CALL operation has a single operand: dest, the target of the call. In contrast, the MLIL_CALL operation also specifies the output operand that identifies what variables receive a return value and the params operand, which holds a list of MLIL expressions that describe the function call’s parameters. A user-specific calling convention, or one determined by automated analysis based on usage of variables interprocedurally, determines these parameters and return values. This allows Binary Ninja to identify things like when the ebx register is used as a global data pointer in an x86 PIC binary, or when a custom calling convention is used.

Putting all of this together, MLIL comes pretty close to decompiled code. This also makes MLIL ideal for translating to Z3, due to its abstraction of both registers and stack variables, using Binary Ninja’s API.

MLIL and the API

Working with MLIL in the Binary Ninja API is similar to working with LLIL, though there are some notable differences. Like LLIL, a function’s MLIL can be accessed directly via the medium_level_il property of the Function class, but there is no corresponding MLIL method to get_low_level_il_at. In order to directly access a specific instruction’s MLIL, a user must first query for the LLIL. The LowLevelILInstruction class now has a medium_level_il property that retrieves its MediumLevelILInstruction form. As a single line of Python, this would look like current_function.get_low_level_il_at(address).medium_level_il. It is important to remember that this can sometimes be None, as an LLIL instruction can be optimized away completely in MLIL.

The MediumLevelILInstruction class introduces new convenience properties that aren’t available in the LowLevelILInstruction class. The vars_read and vars_written properties make it simple to query an instruction for a list of variables the instruction uses without parsing the operands. If we revisit an old instruction from my first blog post, lea eax, [edx+ecx*4], the equivalent MLIL instruction would look similar to the LLIL. In fact, it appears to be identical at first glance.

>>> current_function.medium_level_il[0]
<il: eax = ecx + (edx << 2)>

But, if we look closer, we can see the difference:

>>> current_function.medium_level_il[0].dest
<var int32_t* eax>

Unlike LLIL, where dest would have been an ILRegister object representing the register eax, the dest operand here is a typed Variable object, representing a variable named eax as an int32_t pointer.

There are several other new properties and methods introduced for MLIL as well. If we wanted to extract the variables read by this expression, this would be as simple as:

>>> current_function.medium_level_il[0].vars_read
[<var int32_t* ecx>, <var int32_t edx>]

The branch_dependence property returns the conditional branches of basic blocks that dominate the instruction’s basic block when only the true or false branch dominates, but not both. This is useful for determining which decisions an instruction explicitly depends on.

Two properties use the dataflow analysis to calculate the value of an MLIL expression: value can efficiently calculate constant values, and possible_values uses the more computationally-expensive path-sensitive dataflow analysis to calculate ranges and disjoint sets of values that an instruction can result in.

possible_values

Figure 5. Path-sensitive dataflow analysis identifies all concrete data values that can reach a certain instruction.

With these features at our disposal, we can model registers, stack variables, and memory, but there is one more hangup that we need to solve: variables are often re-assigned values that are dependent on the previous value of the assignment. For example, if we are iterating over instructions and come across something like the following:

mov eax, ebx
lea eax, [ecx+eax*4]

When creating our equations, how do we model this kind of reassignment? We can’t just model it as:

eax = ebx
eax = ecx + (eax * 4)

This can cause all sorts of unsatisfiability, because constraints are purely expressing mathematical truths about variables in a system of equations and have no temporal element at all. Since constraint solving has no concept of time, we need to find some way to bridge this gap, transforming the program to effectively remove the idea of time. Moreover, we need to be able to efficiently determine from where the previous value eax originates. The final piece of the puzzle is another feature available via the Binary Ninja API: SSA Form.

Single Static Assignment (SSA) Form

In concert with Medium Level IL’s release, Binary Ninja also introduced Static Single Assignment (SSA) form for all representations in the BNIL family. SSA form is a representation of a program in which every variable is defined once and only once. If the variable is assigned a new value, a new “version” of that variable is defined instead. A simple example of this would be the following:

a = 1
b = 2
a = a + b
a1 = 1
b1 = 2
a2 = a1 + b1

The other concept introduced with SSA form is the phi-function (or Φ). When a variable has a value that is dependent on the path the control flow took through the program, such as an if-statement or loop, a Φ-function represents all of the possible values that that variable could take. A new version of that variable is defined as the result of this function. Below is a more complicated (and specific) example, using a Φ-function:

def f(a):
    if a > 20:
        a = a * 2
    else:
        a = a + 5
    return a
def f(a0):
    if a0 > 20:
        a1 = a0 * 2
    else:
        a2 = a0 + 5
    a3 = Φ(a1, a2)
return a3

SSA makes it easy to explicitly track all definitions and uses of a variable throughout the lifetime of the program, which is exactly what we need to model variable assignments in Z3.

SSA form in Binary Ninja

The SSA form of the IL can be viewed within Binary Ninja, but it’s not available by default. In order to view it, you must first check the “Enable plugin development debugging mode” box in the preferences. The SSA form, seen below, isn’t really meant to be consumed visually, as it’s more difficult to read than a normal IL graph. Instead, it is primarily intended to be used with the API.

mlil_vs_ssa

Figure 6. An MLIL function (left) and its corresponding SSA form (right).

The SSA form of any of the intermediate languages is accessible in the API through the ssa_form property. This property is present in both function (e.g. LowLevelILFunction and MediumLevelILFunction) and instruction (e.g. LowLevelILInstruction and MediumLevelILInstruction) objects. In this form, operations such as MLIL_SET_VAR and MLIL_VAR are replaced with new operations, MLIL_SET_VAR_SSA and MLIL_VAR_SSA. These operations use SSAVariable operands instead of Variable operands. An SSAVariable object is a wrapper of its corresponding Variable, but with the added information of which version of the Variable it represents in SSA form. Going back to our previous re-assignment example, the MLIL SSA form would output the following:

eax#1 = ebx#0
eax#2 = ecx#0 + (eax#1 << 2)

This solves the problem of reusing variable identifiers, but there is still the issue of locating usage and definitions of variables. For this, we can use MediumLevelILFunction.get_ssa_var_uses and MediumLevelILFunction.get_ssa_var_definition, respectively (these methods are also members of the LowLevelILFunction class).

Now our bag of tools is complete, let’s dive into actually modeling a real world vulnerability in Binary Ninja!

Example script: Finding Heartbleed

Our approach will be very similar to Andrew’s, as well as the one Coverity used in their article on the subject. Byte-swapping operations are a pretty good indicator that the data is coming from the network and is user-controlled, so we will use Z3 to model memcpy operations and determine if the size parameter is a byte-swapped value.

image8

Figure 7. A backward static slice of the size parameter of the vulnerable memcpy in tls1_process_heartbeat of OpenSSL 1.0.1f, in MLIL

Step 1: Finding our “sinks”

It would be time-consuming and expensive to perform typical source-to-sink taint tracking, as demonstrated in the aforementioned articles. Let’s do the reverse; identify all code references to the memcpy function and examine them.

    memcpy_refs = [
        (ref.function, ref.address)
        for ref in bv.get_code_refs(bv.symbols['_memcpy'].address)
    ]

    dangerous_calls = []

    for function, addr in memcpy_refs:
        call_instr = function.get_low_level_il_at(addr).medium_level_il
        if check_memcpy(call_instr.ssa_form):
            dangerous_calls.append((addr, call_instr.address))

Step 2: Eliminate sinks that we know aren’t vulnerable

In check_memcpy, we can quickly eliminate any size parameters that Binary Ninja’s dataflow can calculate on its own[2], using the MediumLevelILInstruction.possible_values property. We’ll model whatever is left.

def check_memcpy(memcpy_call):
    size_param = memcpy_call.params[2]

    if size_param.operation != MediumLevelILOperation.MLIL_VAR_SSA:
        return False

    possible_sizes = size_param.possible_values

    # Dataflow won't combine multiple possible values from
    # shifted bytes, so any value we care about will be
    # undetermined at this point. This might change in the future?
    if possible_sizes.type != RegisterValueType.UndeterminedValue:
        return False

    model = ByteSwapModeler(size_param, bv.address_size)

    return model.is_byte_swap()

Step 3: Track the variables the size depends on

Using the size parameter as a starting point, we use what is called a static backwards slice to trace backwards through the code and track all of the variables that the size parameter is dependent on.

        var_def = self.function.get_ssa_var_definition(self.var.src)

        # Visit statements that our variable directly depends on
        self.to_visit.append(var_def)

        while self.to_visit:
            idx = self.to_visit.pop()
            if idx is not None:
                self.visit(self.function[idx])

The visit method takes a MediumLevelILInstruction object and dispatches a different method depending on the value of the instruction’s operation field. Recalling that BNIL is a tree-based language, visitor methods will recursively call visit on an instruction’s operands until it reaches the terminating nodes of the tree. At that point it will generate a variable or constant for the Z3 model that will propagate back through the recursive callers, very similar to the vtable-navigator plugin of Part 2.

The visitor for MLIL_ADD is fairly simple, recursively generating its operands before returning the sum of the two:

    def visit_MLIL_ADD(self, expr):
        left = self.visit(expr.left)
        right = self.visit(expr.right)

        if None not in (left, right):
            return left + right

Step 4: Identify variables that might be part of a byte swap

MLIL_VAR_SSA, the operation that describes an SSAVariable, is a terminating node of an MLIL instruction tree. When we encounter a new SSA variable, we identify the instruction responsible for the definition of this variable, and add it to the set of instructions to visit as we slice backwards. Then, we generate a Z3 variable to represent this SSAVariable in our model. Finally, we query Binary Ninja’s range value analysis to see if this variable is constrained to being a single byte (i.e. within the range 00xff, starting at an offset that is a multiple of 8). If it is, we go ahead and constrain this variable to that value range in our model.

    def visit_MLIL_VAR_SSA(self, expr):
        if expr.src not in self.visited:
            var_def = expr.function.get_ssa_var_definition(expr.src)
            if var_def is not None:
                self.to_visit.append(var_def)

        src = create_BitVec(expr.src, expr.size)

        value_range = identify_byte(expr, self.function)
        if value_range is not None:
            self.solver.add(
                Or(
                    src == 0,
                    And(src = value_range.step)
                )
            )

            self.byte_vars.add(expr.src)

        return src

The parent operation of an MLIL instruction that we visit will generally be MLIL_SET_VAR_SSA or MLIL_SET_VAR_PHI. In visit_MLIL_SET_VAR_SSA, we can recursively generate a model for the src operand as usual, but the src operand of an MLIL_SET_VAR_PHI operation is a list of SSAVariable objects, representing each of the parameters of the Φ-function. We add each of these variables’ definition sites to our set of instructions to visit, then write an expression for our model that states dest == src0 || dest == src1 || … || dest == srcn:

        phi_values = []

        for var in expr.src:
            if var not in self.visited:
                var_def = self.function.get_ssa_var_definition(var)
                self.to_visit.append(var_def)

            src = create_BitVec(var, var.var.type.width)

            # ...

            phi_values.append(src)

        if phi_values:
            phi_expr = reduce(
                lambda i, j: Or(i, j), [dest == s for s in phi_values]
            )

            self.solver.add(phi_expr)

In both visit_MLIL_SET_VAR_SSA and visit_MLIL_SET_VAR_PHI, we keep track of variables that are constrained to a single byte, and which byte they are constrained to:

        # If this value can never be larger than a byte,
        # then it must be one of the bytes in our swap.
        # Add it to a list to check later.
        if src is not None and not isinstance(src, (int, long)):
            value_range = identify_byte(expr.src, self.function)
            if value_range is not None:
                self.solver.add(
                    Or(
                        src == 0,
                        And(src = value_range.step)
                    )
                )

                self.byte_vars.add(*expr.src.vars_read)

                if self.byte_values.get(
                    (value_range.step, value_range.end)
                ) is None:
                    self.byte_values[
                        (value_range.step, value_range.end)
                    ] = simplify(Extract(
                                int(math.floor(math.log(value_range.end, 2))),
                                int(math.floor(math.log(value_range.step, 2))),
                                src
                            )
                    )

And finally, once we’ve visited a variable’s definition instruction, we mark it as visited so that it won’t be added to to_visit again.

Step 5: Identify constraints on the size parameter

Once we’ve sliced the size parameter and located potential bytes used in our byte swap, we need to make sure that there aren’t any constraints that would restrict the value of the size on the path of execution leading to the memcpy. The branch_dependence property of the memcpy’s MediumLevelILInstruction object identifies mandatory control flow decisions required to arrive at the instruction, as well as which branch (true/false) must be taken. We examine the variables checked by each branch decision, as well as the dependencies of those variables. If there is a decision made based on any of the bytes we determined to be in our swap, we’ll assume this size parameter is constrained and bail on its analysis.

        for i, branch in self.var.branch_dependence.iteritems():
            for vr in self.function[i].vars_read:
                if vr in self.byte_vars:
                    raise ModelIsConstrained()
                vr_def = self.function.get_ssa_var_definition(vr)
                if vr_def is None:
                    continue
                for vr_vr in self.function[vr_def].vars_read:
                    if vr_vr in self.byte_vars:
                        raise ModelIsConstrained

Step 6: Solve the model

If the size isn’t constrained and we’ve found that the size parameter relies on variables that are just bytes, we need to add one final equation to our Z3 Solver. To identify a byte swap, we need to make sure that even though our size parameter is unconstrained, the size is still explicitly constructed only from the bytes that we previously identified. Additionally, we also want to make sure that the reverse of the size parameter is equal to the identified bytes reversed. If we just added an equation to the model for those properties, it wouldn’t work, though. Theorem checkers only care if any value satisfies the equations, not all values, so this presents a problem.

We can overcome this problem by negating the final equation. By telling the theorem solver that we want to ensure that no value satisfies the negation and looking for an unsat result, we can find the size parameters that satisfy the original (not negated) equation for all values. So if our model is unsatisfiable after we add this equation, then we have found a size parameter that is a byte swap. This might be a bug!

           self.solver.add(
                Not(
                    And(
                        var == ZeroExt(
                            var.size() - len(ordering)*8,
                            Concat(*ordering)
                        ),
                        reverse_var == ZeroExt(
                            reverse_var.size() - reversed_ordering.size(),
                            reversed_ordering
                        )
                    )
                )
            )

            if self.solver.check() == unsat:
                return True

Step 7: Find bugs

I tested my script on two versions of OpenSSL: first the vulnerable 1.0.1f, and then 1.0.1g, which fixed the vulnerability. I compiled both versions on macOS with the command ./Configure darwin-i386-cc to get a 32-bit x86 version. When the script is run on 1.0.1f, we get the following:

find_heartbleed_old

Figure 8. find_heartbleed.py successfully identifies both vulnerable Heartbleed functions in 1.0.1f.

If we then run the script on the patched version, 1.0.1g:

find_heartbleed_new

Figure 9. The vulnerable functions are no longer identified in the patched 1.0.1g!

As we can see, the patched version that removes the Heartbleed vulnerability is no longer flagged by our model!

Conclusion

I’ve now covered how the Heartbleed flaw led to a major information leak bug in OpenSSL, and how Binary Ninja’s Medium Level IL and SSA form translates seamlessly to a constraint solver like Z3. Putting it all together, I demonstrated how a vulnerability such as Heartbleed can be accurately modeled in a binary. You can find the script in its entirety here.

Of course, a static model such as this can only go so far. For more complicated program modeling, such as interprocedural analysis and loops, explore constraint solving with a symbolic execution engine such as our open source tool Manticore.

Now that you know how to leverage the IL for vulnerability analysis, hop into the Binary Ninja Slack and share with the community your own tools, and if you’re interested in learning even more about the BNIL, SSA form, and other good stuff.

Finally, don’t miss the Binary Ninja workshop at Infiltrate 2018. I’ll be hanging around with the Vector 35 team and helping answer questions!

[1] After Part 2, Jordan told me that Rusty remarked, “Josh could really use SSA form.” Since SSA form is now available, I’ve added a refactored and more concise version of the article’s script here!
[2] This is currently only true because Binary Ninja’s dataflow does not calculate the union of disparate value ranges, such as using bitwise-or to concatenate two bytes as happens in a byte swap. I believe this is a design tradeoff for speed. If Vector 35 ever implements a full algebraic solver, this could change, and a new heuristic would be necessary.

Use our suite of Ethereum security tools

Two years ago, when we began taking on blockchain security engagements, there were no tools engineered for the work. No static analyzers, fuzzers, or reverse engineering tools for Ethereum.

So, we invested significant time and expertise to create what we needed, adapt what we already had, and refine the work continuously over dozens of audits. We’ve filled every gap in the process of creating secure blockchain software.

Today, we’re happy to share most of these tools in the spirit of helping to secure Ethereum’s foundation.

Think of what follows as the roadmap. If you are new to blockchain security, just start at the top. You have all you need. And, if you’re diligent, less reason to worry about succumbing to an attack.

Development Tools

To build a secure Ethereum codebase: get familiar with known mistakes to avoid, run a static analysis on every new checkin of code, fuzz new features, and verify your final product with symbolic execution.

1. Not So Smart Contracts

This repository contains examples of common Ethereum smart contract vulnerabilities, including real code. Review this list to ensure you’re well acquainted with possible issues.

The repository contains a subdirectory for each class of vulnerability, such as integer overflow, reentrancy, and unprotected functions. Each subdirectory contains its own readme and real-world examples of vulnerable contracts. Where appropriate, contracts that exploit the vulnerabilities are also provided.

We use these examples as test cases for our Ethereum bug-finding tools, listed below. The issues in this repository can be used to measure the effectiveness of other tools you develop or use. If you are a smart contract developer, carefully examine the vulnerable code in this repository to fully understand each issue before writing your own contracts.

2. Slither

Slither combines a set of proprietary static analyses on Solidity that detect common mistakes such as bugs in reentrancy, constructors, method access, and more. Run Slither as you develop, on every new checkin of code. We continuously incorporate new, unique bugs types that we discover in our audits.

Slither is privately available to all firms that work with us, and may become available for licensing or accessible via an API if there’s enough interest. Sign up to get notified if Slither becomes available.

Running Slither is simple: $ slither.py contract.sol

Slither will then output the vulnerabilities it finds in the contract.

3. Echidna

Echidna applies next-generation smart fuzzing to EVM bytecode. Write Echidna tests for your code after you complete new features. It provides simple, high coverage unit tests that discover security bugs. Until your app has 80+% coverage with Echidna, don’t consider it complete.

Using Echidna is simple:

  1. Add some Echidna tests to your existing code (like in this example),
  2. Run ./echidna-test contract.sol, and
  3. See if your invariants hold.

If you want to write a fancier analysis (say, abstract state machine testing), we have support for that too.

4. Manticore

Manticore uses symbolic execution to simulate complex multi-contract and multi-transaction attacks against EVM bytecode. Once your app is functional, write Manticore tests to discover hidden, unexpected, or dangerous states that it can enter. Manticore enumerates the execution states of your contract and verifies critical functionality.

If your contract doesn’t require initialization parameters, then you can use the command line to easily explore all the possible executions of your smart contract as an attacker or the contract owner:

manticore contract.sol --contract ContractName --txaccount [attacker|owner]

Manticore will generate a list of all the reachable states (including assertion failures and reverts) and the inputs that cause them. It will also automatically flag certain types of issues, like integer overflows and use of uninitialized memory.

Using the Manticore API to review more advanced contracts is simple:

  1. Initialize your contract with the proper values
  2. Define symbolic transactions to explore potential states
  3. Review the list of resulting transactions for undesirable states

Reversing Tools

Once you’ve developed your smart contract, or you want to look at someone else’s code, you’ll want to use our reversing tools. Load the binary contract into Ethersplay or IDA-EVM. For an instruction set reference, use our EVM Opcodes Database. If you’d like to do more complex analysis, use Rattle.

1. EVM Opcode Database

Whether you’re stepping through code in the Remix debugger or reverse engineering a binary contract, you may want to look up details of EVM instructions. This reference contains a complete and concise list of EVM opcodes and their implementation details. We think this is a big time saver when compared to scrolling through the Yellow Paper, reading Go/Rust source, or checking comments in StackOverflow articles.

2. Ethersplay

Ethersplay is a graphical EVM disassembler capable of method recovery, dynamic jump computation, source code matching, and binary diffing. Use Ethersplay to investigate and debug compiled contracts or contracts already deployed to the blockchain.

Ethersplay takes EVM bytecode as input in either ascii hex encoded or raw binary format. Examples of each are test.evm and test.bytecode, respectively. Open the test.evm file in Binary Ninja, and it will automatically analyze it, identify functions, and generate a control flow graph.

Ethersplay includes two Binary Ninja plugins to help. “EVM Source Code” will correlate contract source to the EVM bytecode. “EVM Manticore Highlight” integrates Manticore with Ethersplay, graphically highlighting code coverage information from Manticore output.

3. IDA-EVM

IDA-EVM is a graphical EVM disassembler for IDA Pro capable of function recovery, dynamic jump computation, applying library signatures, and binary diffing using BinDiff.

IDA-EVM allows you to analyze and reverse engineer smart contracts without source. To use it, follow the installation instructions in the readme, then open a .evm or .bytecode file in IDA.

4. Rattle

Rattle is an EVM static analyzer that analyzes the EVM bytecode directly for vulnerabilities. It does this by disassembling and recovering the EVM control flow graph and lifting the operations to a Single Static Assignment (SSA) form called EVM::SSA. EVM::SSA optimizes out all pushes, pops, dups, and swaps, often reducing the instruction count by 75%. Rattle will eventually support storage, memory, and argument recovery as well as static security checks similar to those implemented in Slither.

Rattle is privately available to all firms that work with us, and may become available for licensing or accessible via an API if there’s enough interest. Sign up to be notified if Rattle becomes available.

To use Rattle, supply it runtime bytecode from solc or extracted directly from the blockchain:

$ ./rattle -i path/to/input.evm

Work with us!

Please, use the tools, file issues in their respective repos, and participate in their feature and bug bounties. Let us know how they could be better on the Empire Hacking Slack in #ethereum.

Now that we’ve introduced each tool, we plan to write follow-up posts that dig into their details.

An accessible overview of Meltdown and Spectre, Part 2

This is the second half of our blog post on the Meltdown an Spectre vulnerabilities, describing Spectre Variant 1 (V1) and Spectre Variant 2 (V2). If you have not done so already, please review the first blog post for an accessible review of computer architecture fundamentals. This blog post will start by covering the technical details of Spectre V1 and Spectre V2, and conclude with a discussion of how these bugs lurked undetected for so long, and what we think the future holds.

Like Meltdown, the Spectre vulnerabilities rely on speculative execution and timing side channels to read memory without proper permission. The difference between Meltdown and Spectre is the method of operation and the potential impact — more computers are vulnerable to Spectre. Meltdown works by taking advantage of an asynchronous permissions check, and affects only Intel processors. Spectre works by tricking the branch predictor (Figure 1), and affects almost every processor released in the last 25 years.

Figure 1: How branch misprediction leads to speculative execution. When the branch predictor makes an incorrect guess about the destination of a conditional branch, some instructions are speculatively executed. The execution of these instructions is undone, but their effects on the cache remain. Spectre causes the branch predictor to guess wrong and speculatively execute a carefully chosen set of instructions.

Figure 1: How branch misprediction leads to speculative execution. When the branch predictor makes an incorrect guess about the destination of a conditional branch, some instructions are speculatively executed. The execution of these instructions is undone, but their effects on the cache remain. Spectre causes the branch predictor to guess wrong and speculatively execute a carefully chosen set of instructions.

Spectre (Variant 1)

The first variant of Spectre allows a program to read memory that it should not have the ability to read. Spectre V1 attacks are possible because of the confluence of two optimizations: branch prediction and speculative execution. A Spectre V1 attack tricks the branch predictor (for conditional branches) into skipping security checks and speculatively executing instructions in the wrong context. The effects of these speculatively executed instructions are visible via a cache timing side-channel.

Technical Details

Let’s walk through a hypothetical example. Even though the steps seem complex, they are quite possible to execute in a web browser via JavaScript — see the Spectre paper’s working proof of concept. The following example assumes a Spectre attack delivery via JavaScript in a web browser. That isn’t a requirement. Spectre can work on all kinds of programs, not just web browsers. Also, some specifics are omitted for brevity and clarity.

  1. First we’ll create two large memory allocations. We’ll call one leaker and the other reader. In JavaScript, these can be created with ArrayBuffers, a structure used for storing binary data (like pictures). The leaker allocation will be used for the cache timing side-channel. The reader allocation only exists to train the branch predictor. The arrays we allocated have a fixed length. It is only legal to read items from the array starting at zero (the first element) and ending before the array length (the last element). So for a 256 byte array, the valid items are numbered 0 to 255. This range of the first to last element is called the array bounds.
  2. Next we start training the branch predictor. A Spectre attack relies on grooming the branch predictor to guess that certain security checks always pass. In this specific case, we are going to make the branch predictor guess that the values read out of reader are always in bounds. JavaScript checks bounds on array reads; an out of bounds read will fail and cause the program to stop working. Array bounds are (currently!) checked via branch instructions. For this part of the Spectre attack, we will repeatedly read in bounds values from reader. Recall that the processor assumes current behavior is predictive of future behavior. If branches are always taken (or not taken), then the branch predictor will be trained to expect that same behavior.
  3. Next we ensure that no part of leaker is cached. This is also possible to do in JavaScript. We avoid the details for brevity.
  4. Now that the preparation is out of the way, let’s get to the core Spectre V1 flaw. We will read an out of bounds element from reader. Let’s call this value secret. Because we read secret from outside the array bounds, it can, by definition, be any chosen memory location. The read of secret will temporarily succeed even though there is a bounds check that logically prevents it from ever happening. It succeeds only speculatively, because the branch predictor has been primed to assume that the bounds check will succeed. At this point the countdown also starts for the processor to discover the mispredicted branch and un-execute the mispredicted instructions.
  5. Next, we use the value of secret as an index to read an element from reader (i.e., reader[secret]). This action will be speculatively executed and cause that element of reader to be cached. At this point, the branch predictor can correct itself and un-execute all speculatively executed instructions.
  6. By measuring the time to read every element of secret, it is possible to determine which element was cached. The index of the cached element will be the value of secret, a value the program was not permitted to read. For example, if reader[42] was cached, the value of secret would be 42.
  7. The attack can now be repeated to read more bytes. The bandwidth of this channel is estimated at 10 KB/s.

See Figure 2 for a graphical representation of the reader and secret data memory locations at Step 4.

Figure 2: A hypothetical layout of reader and some secret data in computer memory. Because computer memory is laid out linearly, it is possible to access any part of it in terms of a positive or negative index into reader. Access to reader should be limited by its bounds, 0 to 255. If bounds checks are bypassed, even by a few speculatively executed instructions, it is possible to access memory without proper permission.

Figure 2: A hypothetical layout of reader and some secret data in computer memory. Because computer memory is laid out linearly, it is possible to access any part of it in terms of a positive or negative index into reader. Access to reader should be limited by its bounds, 0 to 255. If bounds checks are bypassed, even by a few speculatively executed instructions, it is possible to access memory without proper permission.

What is the impact?

Spectre (Variant 1) is a really big deal to desktop, laptop, and mobile users. It lets websites break security and isolation guarantees built into web browsers like Chrome, IE, and Firefox. As an example, Spectre could allow a website open in one tab to read passwords you are typing into a different tab, or allow an ad on a checkout page to read the credit card number you are typing.

Spectre is equally devastating for cloud providers and internet companies. A lot of the code powering your favorite websites relies on isolation guarantees provided by programming languages. Spectre renders those guarantees into good intentions. Application hosting providers have to re-evaluate their security architecture, rebuild a lot of core code (with performance loss), or both.

There is no generic fix for Variant 1. Affected software has to be re-compiled to avoid using vulnerable code patterns. However, exploitation of the vulnerability can be made more difficult. This is the path taken by web-browser vendors so far. They have removed high-resolution timers (necessary to determine if something was cached) and are actively working to avoid using vulnerable code patterns.

What should I do?

Update your browser, operating system, and BIOS to the latest version. All browser vendors have released mitigations for Spectre vulnerabilities.

All major cloud providers have deployed mitigations. You as a customer should have nothing to worry about moving forward.

Spectre (Variant 2)

The second variant of Spectre allows a program to read memory that it should not have access to, regardless of whether the memory is part of the same program, another program, or the core operating system.

Like Spectre V1, Spectre V2 also relies on abusing branch prediction, speculative execution, and cache timing side-channels. Whereas V1 tricks the conditional branch predictor, V2 tricks the indirect branch predictor. The indirect branch predictor is indifferent to privilege changes, including from user to kernel, from program to program, and even from a virtual machine (think cloud computing instance) to hypervisor (the cloud control plane). For those reasons, Spectre V2 attacks can happen across most if not all privilege levels provided by modern processors.

Technical Details

Out of the three published attacks, Spectre V2 attacks are the most complex. To explain a Spectre V2 attack, we are going to describe a simplified version outlined in the Google Project Zero blog. Many technical details will be omitted with the hope of providing a more accessible understanding of the attack.

The Spectre V2 attack described here reads privileged hypervisor memory from a guest virtual machine. Google Project Zero demonstrated that this attack works with real software running on real processors.

First, a quick refresher about indirect branches and the indirect branch predictor. Indirect branches tell a processor to start executing instructions at some new location. This location can be stored in memory. If that location is not cached, the processor would have to pause a very long time (relative to instruction execution speed) while waiting to find out where to get instructions. The indirect branch predictor guesses where the processor would go next, so the processor doesn’t have to wait. If the guess turns out to be wrong, the processor un-executes some instructions and goes back to the correct place. If the guess turns out to be correct, the program runs much faster. Processors are very good at guessing correctly, so this usually results in a big speedup.

Time for the first Spectre V2 concept: the indirect branch predictor works by keeping a history of recently executed branches in the Branch History Buffer (BHB), sometimes called the Branch History Table, because previous branch decisions are typically indicative of future branch decisions (Figure 3). As with Meltdown and Spectre V1, by carefully measuring how long some branch operations take, it is possible to “read” the BHB and determine the location of previously executed instructions — even if those instructions were executed at a different privilege level or in a different program.

Figure 3: The branch history buffer is used to predict the target of indirect branches, so that the processor can execute them faster. When an indirect branch executes, an entry is written into the table. The table has limited space, and is indexed based on branch address. Because of this limitation, it is possible to “poison” the branch predictor and make it guess an attacker-chosen address for any indirect branch. In this hypothetical prediction example, any branch at an address ending in 045C would have the same predicted destination.

Figure 3: The branch history buffer is used to predict the target of indirect branches, so that the processor can execute them faster. When an indirect branch executes, an entry is written into the table. The table has limited space, and is indexed based on branch address. Because of this limitation, it is possible to “poison” the branch predictor and make it guess an attacker-chosen address for any indirect branch. In this hypothetical prediction example, any branch at an address ending in 045C would have the same predicted destination.

Now for the second Spectre V2 concept: The branch predictor is shared among all programs running on a processor. Branches executed in one program change how the indirect branch predictor guesses in another program. By executing a carefully chosen series of branches, you can “write” the BHB, and force the branch predictor to guess a location you chose for a future indirect branch — even if that indirect branch will be executed at a different privilege level. This is analogous to Spectre V1, but using indirect branches instead of conditional branches.

The steps described for Google Project Zero’s Spectre V2 attack are less detailed because there isn’t a good way to condense the required information. Each step described below is extremely involved and difficult to accomplish properly. The described scenario is one where a cloud computing instance reads memory contents of the cloud control plane (Linux-KVM, specifically). That memory contains private information about what every other cloud computing instance on that physical machine is doing, including private user data, software source code, encryption keys, etc.

Attack Preparation

  1. First, allocate a large block of memory that will be used for the cache-timing side channel. Per our previous convention, we’ll call this block reader.
  2. To leak out data via speculative execution, we will need some way to force the hypervisor to read a byte of privileged memory first, and to use that byte to access reader. This code pattern may be hard to find, but luckily we can cheat. The KVM hypervisor is a part of the Linux kernel, and the Linux kernel includes a feature that runs custom programs to quickly deal with network traffic. This feature is called the eBPF interpreter. We can use the eBPF interpreter to create the code pattern we need in the hypervisor. To do that, we’ll need to:
    1. Find the eBPF interpreter in the hypervisor, and
    2. Supply some code to the eBPF interpreter, which we’ll call ebpf_code.
  3. Because Spectre V2 relies on tricking the indirect branch predictor, we need to know the location of an indirect branch in the hypervisor. We’ll call this indirect_branch.
  4. The hypervisor has a different view of computer memory than our program in the cloud instance. Because we’ll be leaking memory from the hypervisor, we’ll need to find out where in hypervisor-land reader, ebpf_code, indirect_branch and the eBPF interpreter are located.
  5. To find these items, we will need to know the location of some code in the hypervisor. We can’t simply read hypervisor code. We lack the proper privileges. Instead, we leak out code locations via the BHB. How? We’ll ask the hypervisor to do some normal operations on behalf of the cloud instance. In the course of these operations, the hypervisor executes indirect branches, the locations of which will be stored in the BHB. We then “read” the BHB to identify these code locations.
  6. Once we know some code locations in the hypervisor, with some math and multiple repeated attempts we can find where in hypervisor-land reader, ebpf_code, indirect_branch and the eBPF interpreter reside (Figure 4).
Figure 4: A simplified diagram of how a cloud computing system would look after completing Step 6 of this attack description. The attacker has gathered all information they need to conduct the attack, but the key part is still to come.

Figure 4: A simplified diagram of how a cloud computing system would look after completing Step 6 of this attack description. The attacker has gathered all information they need to conduct the attack, but the key part is still to come.

Attack Execution

  1. We ensure that no part of reader is cached.
  2. Time for the magic! We execute a series of indirect branches that write new entries into the BHB. These branches and their targets are set up to trick the indirect branch predictor into guessing that indirect_branch points to the eBPF interpreter (Figure 5).
Figure 5: This is the core of the Spectre V2 attack: specially crafted indirect branches can trick the Branch History Buffer into predicting an attacker-chosen speculative destination for an indirect branch. In this case, an indirect branch in the hypervisor is set to speculatively point to the eBPF interpreter code instead of its original location.

Figure 5: This is the core of the Spectre V2 attack: specially crafted indirect branches can trick the Branch History Buffer into predicting an attacker-chosen speculative destination for an indirect branch. In this case, an indirect branch in the hypervisor is set to speculatively point to the eBPF interpreter code instead of its original location.

Leaking Secrets

  1. We set up processor state so that if it were to start executing the eBPF interpreter, then the interpreter would run ebpf_code, which would in turn read a byte of hypervisor memory and use that byte to access a part of reader.
  2. We ask the hypervisor to perform an innocuous action that is guaranteed to trigger indirect_branch. A complex chain of events now happens:
    1. The processor will guess indirect_branch points to the eBPF interpreter.
    2. The processor will start speculatively executing the eBPF interpreter. Also, the countdown starts until the processor knows it guessed the wrong branch target.
    3. The eBPF interpreter will execute instructions provided by ebpf_code, which will:
      1. Read a byte of hypervisor memory.
      2. Use that byte to access a piece of reader.
      3. Cause that piece of reader to be cached.
    4. The processor figures out it executed the wrong branch, and un-executes every instruction speculatively executed thus far. However, a piece of reader is now in the cache (Figure 6).
  3. As with Meltdown and Spectre V1, we time access to every piece of reader and identify which piece is read much faster than the others. The index of that piece is the value of the byte we read from hypervisor memory.
  4. We can repeat this process to read more bytes, although with less setup since we can skip the Attack Preparation steps.
Figure 6: A visualization of how secrets can be leaked in a Spectre V2 attack. The eBPF interpreter speculatively executes attacker-specified eBPF code which will read a secret value and use it to access reader. The effects of speculative execution can be observed via a cache-timing side channel.

Figure 6: A visualization of how secrets can be leaked in a Spectre V2 attack. The eBPF interpreter speculatively executes attacker-specified eBPF code which will read a secret value and use it to access reader. The effects of speculative execution can be observed via a cache-timing side channel.

Again, the steps described above are greatly simplified to show the general concepts and some of the exciting trickery involved in a working Spectre V2 attack. Each step is very complex on its own (i.e., one does not simply “read” the BHB).

What is the impact?

This is very bad news for everyone — no matter if you are a cloud provider, an internet company, or just a citizen of the web. The impact combines the worst parts of Spectre (Variant 1) with the worst parts of Meltdown. Unlike Variant 1, there is no proof of concept that targets web browsers, but the possibility can’t be ruled out.

The news for cloud providers is worse. There is a proof-of-concept exploit that breaks the strongest isolation mechanisms used to separate tenants on cloud computing systems. Also, not only Intel, but processors from various vendors across multiple CPU architectures are vulnerable to Variant 2.

There are multiple mitigations available, each of which has some performance cost. Processor vendors have updates available that tune how their processors work to lessen the impact. When that is not possible, software can be re-compiled to avoid using indirect branch instructions. We have not been able to find reliable numbers for the performance penalty of these fixes, but they are certainly not zero and must be paid in addition to the penalties for fixing Meltdown.

What should I do?

Update your operating system and firmware (i.e., BIOS or UEFI) to the latest version. The latest operating systems and BIOS updates will mitigate the most serious instances of Variant 2 either via processor microcode updates, workarounds to control indirect branch predictor behavior, or recompilation to avoid indirect branches.

All major cloud providers have deployed mitigations. You as a customer should have nothing to worry about.

How Did This Happen?

So how did a fundamental computer design flaw go unnoticed for the past 25 years? The answer has two parts: our fundamental assumptions about computing have changed, and the many hints about these security implications weren’t put together in a working proof-of-concept until now.

When speculative execution made its consumer processor debut in the 90s, the way we used computers was different. Most machines were single user, and the most popular operating systems of the day — Windows 9x and Mac OS Classic — lacked memory protection. You didn’t need Spectre or Meltdown to read (or even write!) another application’s memory — you could just do it. In that environment, the performance gains were real, and the security implications weren’t.

Nowadays, multi-tenant cloud computing, whether using virtual machines or containers, powers a huge part of the web. It is a regular occurrence for web browsers to download and run untrusted code (i.e. JavaScript) that is meant to be sandboxed from other untrusted code (i.e. other tabs you have open). In this environment, leaking information from one isolated memory compartment to another (e.g. between browser tabs) is a huge problem.

There have been numerous hints that speculative execution could lead to leaks of privileged information. The Google Project Zero blog post cites multiple sources (including one by our very own Sophia D’Antoine) that insinuate the problem. Multiple, independent researchers identified and reported the Spectre and Meltdown vulnerabilities, and others were very close. There is, however, a big difference between thinking there may be a problem and writing a proof-of-concept showing the problem is real. The work done by the researchers reporting this issue was fantastic. I hope this blog post shows just how difficult it was.

Conclusion

We hope you have a better understanding of how Meltdown and Spectre work at a technical level, their impact, and the available mitigations. This blog post was written to be accessible to someone without a computer architecture background, and we sincerely hope we succeeded. To our more technical readers, the Meltdown and Spectre papers, and the Project Zero blog post are better sources for the gory details.

Looking forward, micro-architectural attacks on computing platforms are going to be an exciting area of computer security. Because so many deployed platforms are vulnerable to Meltdown and Spectre, micro-architectural attacks will continue to be relevant and dangerous for many years to come.

“AMD Flaws” Technical Summary

Two weeks ago, we were engaged by CTS Labs as independent consultants at our standard consulting rates to review and confirm the technical accuracy of their preliminary findings. We participated neither in their research nor in their subsequent disclosure process. Our recommendation to CTS was to disclose the vulnerabilities through a CERT.

Our review of the vulnerabilities was based on documentation and proof-of-concept code provided by CTS. We confirmed that the proof-of-concept code worked as described on the hardware we tested, but we will defer to AMD for a final determination of their full impact, patches, and remediation recommendations.

Most of the discussion after the public announcement of the vulnerabilities has been focused on the way they were disclosed rather than their technical impact. In this post, we have tried to extract the relevant technical details from the CTS whitepaper so they can be of use to the security community without the distraction of the surrounding disclosure issues.

Technical Summary

The security architecture of modern computer systems is based on a defense in depth. Security features like Windows Credential Guard, TPMs, and virtualization can be used to prevent access to sensitive data from even an administrator or root.

The AMD Platform Security Processor (PSP) is a security coprocessor that resides inside AMD CPUs and is implemented as a separate ARM CPU. It is similar to Intel ME or the Apple Secure Enclave. It runs applications that provide security features like the TPM or Secure Encrypted Virtualization. The PSP has privileged access to the lowest level of the computer system.

The PSP firmware can be updated through a BIOS update, but it must be cryptographically signed by AMD. Physical access is usually not required to update the BIOS and this can be done with administrator access to the computer. The MASTERKEY vulnerability bypasses the PSP signature checks to update the PSP with the attacker’s firmware. Cfir Cohen on the Google Cloud Security Team discovered a similar issue in an adjacent area of the AMD PSP in September 2017.

The PSP also exposes an API to the host computer. The FALLOUT and RYZENFALL vulnerabilities exploit the PSP APIs to gain code execution in the PSP or the SMM.

The “chipset” is a component on the motherboard used to broker communication between the processor, memory, and peripherals. The chipset has full access to the system memory and devices. The CHIMERA vulnerability abuses exposed interfaces of the AMD Promontory chipset to gain code execution in the chipset processor.

Exploitation requirements

  • All exploits require the ability to run an executable as admin (no physical access is required)
  • MASTERKEY additionally requires issuing a BIOS update + reboot

Potential technical impact

  • Code execution in the PSP and SMM (no visibility to typical security products)
  • Persistence across OS reinstallation and BIOS updates
  • Block or infect further BIOS updates, or brick the device
  • Bypass Windows Credential Guard
  • Bypass Secure Encrypted Virtualization (SEV)
  • Bypass Secure Boot
  • Bypass or attack security features implemented on top of the PSP (e.g., fTPM)

There is no immediate risk of exploitation of these vulnerabilities for most users. Even if the full details were published today, attackers would need to invest significant development efforts to build attack tools that utilize these vulnerabilities. This level of effort is beyond the reach of most attackers (see https://www.usenix.org/system/files/1401_08-12_mickens.pdf, Figure 1)

These types of vulnerabilities should not surprise any security researchers; similar flaws have been found in other embedded systems that have attempted to implement security features. They are the result of simple programming flaws, unclear security boundaries, and insufficient security testing. In contrast, the recent Meltdown and Spectre flaws required previously unknown techniques and novel research advances to discover and exploit.

Echidna, a smart fuzzer for Ethereum

Today we released Echidna, our next-generation EVM smart fuzzer at EthCC. It’s the first-ever fuzzer to target smart contracts, and has powerful features like abstract state-machine modeling and automatic minimal test case generation. We’ve been working on it for quite some time, and are thrilled to finally share it with the world.

Different interfaces for different applications

Echidna ships with an echidna-test executable, which can start finding bugs in your Solidity code in minutes, and a powerful and comprehensive library for writing your own analyses. echidna-test requires nothing other than simple Solidity assertions to find deep bugs and comes with a clear UI to make understanding its output easy. See a video of it in action:

As a library, Echidna provides a myriad of tools to write custom analyses for more complex contracts. Want to model the progression of time to test your token launch contract? Echidna does that. Want to write a symbolic model of your contract’s underlying state machine? Echidna does that. Want to do something even fancier? No promises, but ask around on Empire Hacking and there’s a good chance Echidna does that too.

Batteries included

Echidna isn’t just a powerful fuzzer.

We took care to write a beautiful UI, ensure generated test cases are minimal, test with continuous integration, and provide examples so you can use it more easily. We’re still actively developing it, so there may be some rough edges. On the off chance you run into one, you should file an issue and we’ll do our best to take care of it. We use Echidna on real audits, so we’re committed to it being as correct and powerful as possible.

It’s easy to get started

Echidna uses stack, so setup should be a breeze, and the only dependency you’ll need is whatever version of solc your contract uses (and, of course, stack itself). If you do run into issues, we’ve tried to document the process and potential workarounds.

Once it’s installed, you can simply run echidna-test solidity/cli.sol and you’re off to the races! If you have any issues, open one on Github or ask in #ethereum on the Empire Hacking Slack and we’ll do our best to help.

2017 in review

What a roller coaster of a year! Well, outside of our office. Inside, 2017 was excellent.

We published novel research that advanced – among others – the practices of automated bug discovery, symbolic execution, and binary translation. In the process, we improved many foundational tools that an increasing number of security researchers will come to rely on. We scaled up our work on securing smart contracts and established ourselves as a premiere blockchain security firm. Finally, as in years past, we shared what lessons we could and supported others to do the same.

Whether you’re a client, a long-time follower, or a budding security researcher, thank you for your interest and contribution.

Below, find 12 highlights from 2017; each one is a reason to stick around in 2018.

Novel research

Automated bug discovery entered the real world

This field really picked up momentum in 2017. If you weren’t paying close attention, the flurry of developments was easy to miss. That’s why we gave a tour of the field’s recent advances at IT Defense, BSidesLisbon, and CyCon.

But ‘unused tools don’t find bugs,’ and many roadblocks still stand in the way of widespread adoption. We’re changing that in the defense industry. We won contracts with Lockheed Martin and the Department of Defense’s DIUx to apply and scale our Cyber Reasoning System.

If you’re wondering about the future role of humans in the secure development lifecycle, we maintain that these tools will always require expert operators.

If you’re on a team developing an automated bug discovery tool, you’ll be happy to know that we ported the CGC’s Challenge Binaries to Windows, macOS, and Linux. Now you have an objective benchmark for evaluating your tool’s performance.

Manticore improved the state of accessible symbolic execution tooling

We open-sourced Manticore, to some applause in the community. Manticore is a highly flexible symbolic execution tool, which we rely on for binary analysis and rapid prototyping of new research techniques.

Parts of Manticore underpinned our symbolic execution capabilities in the Cyber Grand Challenge. Since then, it has been an integral component of our research for DARPA’s LADS (Leveraging the Analog Domain for Security) program.

In only one year after Manticore’s public release, we’ve adapted the tool to amplify the abilities of smart contract auditors and contribute to the security of the Ethereum platform. In December, we explained how we use Manticore for our work on Ethereum Virtual Machine (EVM) bytecode. When applied to Ethereum, symbolic execution can automatically discover functions in a compiled contract, generate transactions to trigger contract states, and check for failure states.

McSema 2.0 brought us closer to treating binaries like source code

In early 2017, we decided to give McSema a fresh coat of paint. We cleaned up the code, and made it more portable and easier to install. It ran faster. The code it produced was better. But we knew we could push it further.

Since we released McSema four years ago, programs have adopted modern x86 features at an increasing rate, and our lifting goals have expanded to include AArch64, the architecture used by modern smartphones.

So, we made a series of major enhancements. For example, we completely separated the instruction semantics from the control flow recovery and created Remill. McSema is now a client that uses the Remill library for binary lifting. To borrow an analogy, McSema is to Remill as Clang is to LLVM. If you want access to lifting capabilities in your own app, you should use Remill.

Next, we demonstrated a series of use cases for McSema, including: binary patching and modification, symbolic execution with KLEE, and reuse of existing LLVM-based tools (e.g. libFuzzer).

Foundational Tools

Ethereum’s foundation firmed up

In response to the surge of interest in Ethereum smart contracts and blockchain technology, we launched new services and created tools that offer verifiable security gains to the community. We adapted Manticore into an industry-leading security tool, and developed a suite of additional tools that help others write more secure smart contracts.

In a short period of time, we’ve become one of the industry’s most trusted providers of audits, tools, and best practices for securing smart contracts and their adjacent technologies. We’ve secured token launches, decentralized apps, and entire blockchain platforms. See our public reports for RSK and DappHub’s Sai.

We focused the year’s final Empire Hacking meetup on how to write secure smart contracts, and hack them. Two of the six speakers came from our team. In November, we were the first to finish Zeppelin’s Ethereum CTF, Ethernaut.

We became the first information security company to join the Enterprise Ethereum Alliance (EEA), the world’s largest open source blockchain initiative. As one of the industry’s top smart contract auditors, we’re excited to contribute our unparalleled expertise and unique toolset to the EEA’s working groups.

osquery expanded its reach and abilities

Following our port of Facebook’s open source endpoint instrumentation and monitoring agent to Windows in 2016, we’ve continued to contribute to osquery’s development and adoption.

We made foundational enhancements that increased the framework’s raw capabilities. Adding auditd-based file integrity monitoring required a redesign from the ground up. As a result, end users get better performance, no fake or broken events, and new file integrity monitoring.

Among numerous other improvements, we showed how osquery can find notable industry issues like the CCleaner malware, and contributed the features needed to detect them. For additions that aren’t native operating system functions, we’ve created a maintained repository of osquery extensions.

In an effort to promote osquery’s long-term success, we shared the experiences, pains and wishes of users at five major tech firms. We hope the findings will help the community to chart a course forward, and help the undecided to determine if and how to deploy osquery in their companies.

iVerify satisfied a fundamental need for iPhone users

We released iVerify, an App-Store-compatible library of the most comprehensive iOS jailbreak checks in the industry. The checks are maintained by our team of experts; some of the world’s foremost authorities in iOS security internals.

App developers deserve to know when their apps are installed on jailbroken phones. However, ineffective jailbreak detection can be worse than no jailbreak detection at all.

iVerify detects jailbreaks on iOS 10 and 11 right now. We’re committed to updating the library as new versions of iOS are released, and as more effective checks capable of finding known and unknown jailbreaks are developed.

Algo brought self-hosted VPN services to the masses

In late 2016, we released our self-hosted personal VPN server. Algo is designed for ease of deployment and security, it relies on only modern protocols and ciphers, it includes only the minimal software you need, and it’s free.

Then, in 2017, interest in protecting one’s online activity exploded. We can’t bring ourselves to thank the FCC for relaxing ISP commercialization rules, but we are glad that more people are putting more thought into their digital privacy.

And yes, we are very grateful to:

We’ll continue to work aggressively toward simplifying and automating Algo’s installation so those who lack the technical expertise to build and maintain their own VPNs aren’t left exposed.

Learn & Share

Helped the industry deploy new exploit mitigations

Following our discussions of Control Flow Integrity (CFI) and Control Flow Guard (CFG), we shared our attempt to compare clang’s implementation of CFI against Visual Studio’s Control Flow Guard by applying both to osquery. Instead of a direct comparison, we generated a case study of how seemingly small tradeoffs in security mitigations have serious implications for usability. Our discussion shows developers how to use these mitigations and includes sample programs that showcase the bugs they mitigate.

Months later, when Microsoft was caught on the wrong end of a ‘tradeoff’ with serious implications for its users, we applied AppJailLauncher-rs to Windows Defender on the software giant’s behalf. The result, Flying Sandbox Monster, is the industry’s first sandboxed anti-virus scanner for Windows. We described the process and results of creating the tool, as well as its Rust-based framework to contain untrustworthy apps in AppContainers.

Combining Control Flow Integrity with sandboxing makes for an incredible challenge for attackers. Unfortunately, they’re also a challenge for developers to use! In creating the above materials, we lowered the learning curve for the community.

Shone a spotlight on Binary Ninja

We think that Vector35’s versatile reversing platform doesn’t get the respect it deserves. We worked to help others understand Binary Ninja’s capabilities by:

  • Describing the fundamentals of Binary Ninja’s Low Level IL, and how the Python API can be used to interact with it.
  • Demonstrating how to easily develop platform-agnostic tools harnessing the power of Binary Ninja’s LLIL and its dataflow analysis.
  • Explaining how we analyzed this year’s DEF CON CTF challenges with our own Binary Ninja processor module, now available for anyone interested in trying out the challenges.
  • Sharing at Infiltrate and Summercon how Binary Ninja makes program analysis more accessible and useful.

Sponsored the causes that matter to us

The next generation. We care about giving younger people opportunities to learn and develop skills in the industry, so we continued our sponsorship of capture the flag competitions like UIUC CTF, HSCTF, and CSAW. We contributed both financial support and unique challenges.

The InfoSec community. We want to share our research with a larger audience and help others gain access to it, so we sponsored conferences like GreHack, Infiltrate, and ISSISP. We provided both financial support and workshops on new techniques and Manticore.

The Truth. We care about getting accurate information out there, so we’re always happy to sponsor the industry’s best podcast host: Patrick Gray at Risky Business. We appreciate his cutting commentary on industry news. Listen to our interviews in episodes #449 and #474 on exploit mitigations and security engineering, respectively.

Advanced the public’s understanding of security

As in years past, when we come across something that would improve the state of security, and it isn’t covered under an NDA, we share it. To that end, we:

Grew as a team

This has been another wonderful year for our team. We expanded our numbers. We went to Infiltrate in Miami and Whistler for company retreats. Josselin earned his PhD. We tacked on more NOP certifications, and hosted some wonderful interns.

Well done, everyone!

More in store for 2018

This year, we will continue to publish more of our research, advance our commitment to our open source projects, and share more of the tools we’ve developed in-house. Look for more soon about:

  • DIUx – The Department of Defense’s experimental innovation unit DIUx recently awarded us a seven-figure contract to take our Cyber Reasoning System (CRS) to the next level as part of project Voltron.
  • Blockchain – As this area becomes a larger part of our business, expect to see more of our discoveries about the security of smart contracts, the security implications of the Solidity language and the Ethereum Virtual Machine.
  • Open source support – We are taking new projects under our wing (Google Santa, Google Omaha, and more), in addition to the major contributions we have in the works for osquery.
  • iVerify – We plan to release a standalone version that allows anyone to check whether their phone has been jailbroken. The service is intended for high-risk users like journalists and activists operating in high threat environments.
  • Algo – We’ll be making it easier to use for those who don’t want to use a terminal.
  • Accessible tooling – We’ll make advanced tools and technologies available to greater numbers of software engineers with new releases of DeepState, Manticore, and fcd-remill.
  • And finally, Operation Waking Shark – Keep an eye out for these team fleeces at an upcoming Empire Hacking.

Parity Technologies engages Trail of Bits

We’re helping Parity Technologies secure their Ethereum client. We’ll begin by auditing their codebase, and look forward to publishing results and the knowledge we gained in the future.

Parity Technologies combines cryptography, cellular systems, peer-to-peer technology and decentralized consensus to solve the problems that have gone unaddressed by conventional server-client architecture.

Their Ethereum client is designed for foundational use in enterprise environments, so businesses and organizations can capitalize on the new opportunities blockchain technology presents.

Parity selected us for several reasons

  • Our expert staff brings decades of security knowledge to the field of smart contracts, deep experience with Rust and Solidity, and rapid command of the latest developments in Ethereum security.
  • We can dig deeper into the construction of smart contracts, the security implications of the Solidity language, and the Ethereum Virtual Machine (EVM) than any other team because of our proprietary tools such as Manticore, Ethersplay, Slither, and Echidna.
  • Finally, Parity was attracted to our enthusiasm for jointly publishing discoveries in our audit, and possibly even educational material for the benefit of the broader blockchain community.

Bottom line, we’re one of the leading blockchain security firms–a natural choice for their needs.

What you can expect next

Over the course of the next few weeks, we will audit the beta branch of Parity and the corresponding jsonrpc library. We’ll review Parity’s key generation and storage, RPCs that use private keys and are responsible for permissions, and an assortment of smart contracts. Once the report is made public we plan to write about our lessons learned and results.

We’re excited to work with Parity to help secure the Ethereum ecosystem!

An accessible overview of Meltdown and Spectre, Part 1

In the past few weeks the details of two critical design flaws in modern processors were finally revealed to the public. Much has been written about the impact of Meltdown and Spectre, but there is scant detail about what these attacks are and how they work. We are going to try our best to fix that.

This article is explains how the Meltdown and Spectre attacks work, but in a way that is accessible to people who have never taken a computer architecture course. Some technical details will be greatly simplified or omitted, but you’ll understand the core concepts and why Spectre and Meltdown are so important and technically interesting.

This article is divided into two parts for easier reading. The first part (what you are reading now) starts with a crash course in computer architecture to provide some background and explains what Meltdown actually does. The second part explains both variants of Spectre and discusses why we’re fixing these bugs now, even though they’ve been around for the past 25 years.

Background

First, a lightning-fast overview of some important computer architecture concepts and some basic assumptions about hardware, software, and how they work together. These are necessary to understand the flaws and why they work.

Software is Instructions and Data

All the software that you use (e.g. Chrome, Photoshop, Notepad, Outlook, etc.) is a sequence of small individual instructions executed by your computer’s processor. These instructions operate on data stored in memory (RAM) and also in a small table of special storage locations, called registers. Almost all software assumes that a program’s instructions execute one after another. This assumption is both sound and practical — it is equivalent to assuming that time travel is impossible — and it enables us to write functioning software.

image2

These Intel x86-64 processor instructions in notepad.exe on Windows show what software looks like at the instruction level. The arrows flow from branch instructions to their possible destinations.

Processors are designed to be fast. Very, very fast. A modern Intel processor can execute ~300 billion instructions per second. Speed drives new processor sales. Consumers demand speed. Computer engineers have found some amazingly clever ways to make computers fast. Three of these techniques — caching, speculative execution, and branch prediction — are key to understanding Meltdown and Spectre. As you may have guessed, these optimizations are in conflict with the sequential assumption of how the hardware in your computer executes instructions.

Caching

Processors execute instructions very quickly (one instruction every ~2 ns). These instructions need to be stored somewhere, as does the data they operate on. That place is called main memory (i.e. RAM). Reading or writing to RAM is 50-100x slower (~100 ns/operation) than the speed at which processors execute instructions.

Because reading from and writing to memory is slow (relative to the instruction execution speed), a key goal of modern processors is to avoid this slowness. One way to achieve this is to assume a common behavior across most programs: they access the same data over and over again. Modern processors speed up reads and writes to frequently accessed memory locations by storing copies of the contents of those memory locations in a “cache.” This cache is located on-chip, near the processor cores that execute instructions. This closeness makes accessing cached memory locations considerably faster than going off-chip to the main storage in RAM. Cache access times vary by the type of cache and its location, but they are on the order of ~1ns to ~3ns, versus ~100ns for going to RAM.

image6

An image of an Intel Nehalem processor die (first generation Core i7, taken from this press release). There are multiple levels of cache, numbered by how far away they are from the execution circuitry. The L1 and L2 cache are in the core itself (likely the bottom right/bottom left of the core images). The L3 cache is on the die but shared among multiple cores. Cache takes up a lot of expensive processor real estate because the performance gains are worth it.

Cache capacity is tiny compared to main memory capacity. When a cache fills, any new items put in the cache must evict an existing item. Because of the stark difference in access times between memory and cache, it is possible for a program to tell whether or not a memory location it requested was cached by timing how long the access took. We’ll discuss this in depth later, but this cache-based timing side-channel is what Meltdown and Spectre use to observe internal processor state.

Speculative Execution

Executing one instruction at a time is slow. Modern processors don’t wait. They execute a bundle of instructions at once, then re-order the results to pretend that everything executed in sequence. This technique is called out-of-order execution. It makes a lot of sense from a performance standpoint: executing 4 instructions one at a time would take 8ns (4 instructions x 2 ns/instruction). Executing 4 instructions at once (realistic on a modern processor) takes just 2ns — a 75% speedup!

While out-of-order execution and speculative execution have different technical definitions, for the purposes of this blog post we’ll be referring to both as speculative execution. We feel justified in this because out-of-order execution is by nature speculative. Some instructions in a bundle may not need to be executed. For example, an invalid operation like a division by zero may halt execution, thus forcing the processor to roll back operations performed by subsequent instructions in the same bundle.

image3

A visualization of performance gained by speculatively executing instructions. Assuming 4 execution units and instructions that do not depend on each other, in-order execution will take 8ns while out-of-order execution will take 2ns. Please note that this diagram is a vast oversimplification and purposely doesn’t show many important things that happen in real processors (like pipelining).

Sometimes, the processor makes incorrect guesses about what instructions will execute. In those cases, speculatively executed instructions must be “un-executed.” As you may have guessed, researchers have discovered that some side-effects of un-executed instructions remain.

There are many caveats that lead to speculative execution guessing wrong, but we’ll focus on the two that are relevant to Meltdown and Spectre: exceptions and branch instructions. Exceptions happen when the processor detects that a rule is being violated. For example, a divide instruction could divide by zero, or a memory read could access memory without permission. We discuss this more in the section on Meltdown. The second caveat, branch instructions, tell the processor what to execute next. Branch instructions are critical to understanding Spectre and are further described in the next section.

Branch Prediction

Branch instructions control execution flow; they specify where the processor should get the next instruction. For this discussion we are only interested in two kinds of branches: conditional branches and indirect branches. A conditional branch is like an fork in the road because the processor must select one of two choices depending on the value of a condition (e.g. A > B; C = 0; etc. ). An indirect branch is more like a portal because the processor can go anywhere. In an indirect branch, the processor reads a value that tells it where to fetch the next instruction.

image1

A conditional branch and its two potential destinations. If the two operands of the cmp instruction are equal, this branch will be taken (i.e. the processor will execute instructions at the green arrow). Otherwise the branch will not be taken (i.e. the processor will execute instructions at the red arrow). Code taken from notepad.exe

image4

An indirect branch. This branch will redirect execution to whatever address is at memory location 0x10000c5f0. In this case, it will call the initterm function. Code taken from notepad.exe.

Branches happen very frequently, and get in the way of speculative execution. After all, the processor can’t know which code to execute until after the branch condition calculation completes. The way processors get around this dilemma is called branch prediction. The processor guesses the branch destination. When it guesses incorrectly, the already-executed actions are un-executed and new instructions are fetched from the correct location. This is uncommon. Modern branch predictors are easily 96+% accurate on normal workloads.

When the branch predictor is wrong, the processor speculatively executes instructions with the wrong context. Once the mistake is noticed, these phantom instructions are un-executed. As we’ll explain, the Spectre bug shows that it is possible to control both the branch predictor and to determine some effects of those un-executed instructions.

Meltdown

Now let’s apply the above computer architecture knowledge to explain Meltdown. The Meltdown bug is a design flaw in (almost) every Intel processor released since 1995. Meltdown allows a specially crafted program to read core operating system memory that it should not have permission to access.

Processors typically have two privilege modes: user and kernel. The user part is for normal programs you interact with every day. The kernel part is for the core of your operating system. The kernel is shared among all programs on your machine, making sure they can function together and with your computer hardware, and contains sensitive data (keystrokes, network traffic, encryption keys, etc) that you may not want exposed to all of the programs running on your machine. Because of that, user programs are not permitted to read kernel memory. The table that determines what part of memory is user and what part is kernel is also stored in memory.

Imagine a situation where some kernel memory content is in the cache, but its permissions are not. Checking permissions will be much slower than simply reading the value of the content (because it requires a memory read). In these cases, Intel processors check permissions asynchronously: they start the permission check, read the cached value anyway, and abort execution if permission was denied. Because processors are much faster than memory, dozens of instructions may speculatively execute before the permission result arrives. Normally, this is not a problem. Any instructions that happen after a permissions check fails will be thrown away, as if they were never executed.

What researchers figured out was that it was possible to speculatively execute a set of instructions that would leave an observable sign (via a cache timing side-channel), even after un-execution. Furthermore, it was possible to leave a different sign depending on the content of kernel memory — meaning a user application could indirectly observe kernel memory content, without ever having permission to read that memory.

Technical Details

image5

A graphical representation of the core issue in Meltdown, using terminology from the steps described below. It is possible to speculatively read core system memory without permission. The effects of these temporary speculative reads are supposed to be invisible after instruction abort and un-execution. It turns out that cache effects of speculative execution cannot be undone, which creates a cache-based timing side channel that can be used to read core system memory without permission.

At a high level, the attack works as follows:

  1. A user application requests a large block of memory, which we’ll call bigblock, and ensures that none of it is cached. The block is logically divided into 256 pieces (bigblock[0], bigblock[1], bigblock[2], ... bigblock[255]).
  2. Some preparation takes place to ensure that a memory permissions check for a kernel address will take a long time.
  3. The fun begins! The program will read one byte from a kernel memory address — we’ll call this value secret_kernel_byte. As a refresher, a byte can be any number in the range of 0 to 255. This action starts a race between the permissions check and the processor.
  4. Before the permissions check completes, the hardware continues its speculative execution of the program, which uses secret_kernel_byte to read a piece of bigblock (i.e. x = bigblock[secret_kernel_byte]). This use of a piece of bigblock will cache that piece, even if the instruction is later undone.
  5. At this point the permissions check returns permission denied. All speculatively executed instructions are un-executed and the processor pretends it never read memory at bigblock[secret_kernel_byte]. There is just one problem: a piece of bigblock is now in the cache, and it wasn’t before.
  6. The program will time how long it takes to read every piece of bigblock. The piece cached due to speculative execution will be read much faster than the rest.
  7. The index of the piece in bigblock is the value of secret_kernel_byte. For example, if bigblock[42] was read much faster than any other entry, the value of secret_kernel_byte must be 42.
  8. The program has now read one byte from kernel memory via a cache timing side-channel and speculative execution.
  9. The program can now continue to read more bytes. The Meltdown paper authors claim they can read kernel memory at a rate of 503 Kb/s using this technique.

What is the impact?

Malicious software can use Meltdown to more easily gain a permanent foothold on your desktop and to spy on your passwords and network traffic. This is definitely bad. You should go apply the fixes. However, malicious software could already do those things, albeit with more effort.

If you are a cloud provider (like Amazon, Google, or Microsoft) or a company with major internet infrastructure (like Facebook), then this bug is an absolute disaster. It’s hard to underscore just how awful this bug is. Here’s the problem: the cloud works by dividing a massive datacenter into many virtual machines rentable by the minute. A single physical machine can have hundreds of different virtual tenants, each running their own custom code. Meltdown breaks down the walls between tenants: each of those tenants could potentially see everything the other is doing, like their passwords, encryption keys, source code, etc. Note: how the physical hardware was virtualized matters. Meltdown does not apply in some cases. The details are beyond the scope of this post.

The fix for Meltdown incurs a performance penalty. Some sources say it is a 5-30% performance penalty, some say it is negligible, and others say single digits to noticeable. What we know for sure is that older Intel processors are impacted much more than newer ones. For a desktop machine, this is slightly inconvenient. For a large cloud provider or internet company, a 5% performance penalty across their entire infrastructure is an enormous price. For example, Amazon is estimated to have 2 million servers. A 5 to 30% slowdown could mean buying and installing 100,000 (5%) to 600,000 (30%) additional servers to match prior capability.

What should I do?

Please install the latest updates to your operating system (i.e. MacOS, Windows, Linux). All major software vendors have released fixes that should be applied by your automatic updater.

All major cloud providers have deployed fixes internally, and you as a customer have nothing to worry about.

To be continued…

We hope you have a better understanding of computer architecture concepts and the technical details behind Meltdown. In the second half of this blog post we will explain the technical details of Spectre V1 and Spectre V2 and discuss why these bugs managed to stay hidden for the past 25 years. The technical background will get more complicated, but the bugs are also more interesting.

Finally, we’d like to remind our readers that this blog post was written to be accessible to someone without a computer architecture background, and we sincerely hope we succeeded in explaining some difficult concepts. The Meltdown and Spectre papers, and the Project Zero blog post are better sources for the gory details.