Summer @ Trail of Bits

This summer I’ve had the incredible opportunity to work with Trail of Bits as a high school intern. In return, I am obligated to write a blog post about this internship. So without further ado, here it is.

Starting with Fuzzing

The summer kicked off with fuzzing, a technique I had heard of but had never tried. The general concept is to throw input at a program until it crashes, then analyze the crash to find a vulnerability. Because of time constraints, it didn’t make sense to write my own fuzzer, so I began looking for pre-existing fuzzers online. The first tool that I found was Cert’s Failure Observation Engine (FOE), which seemed very promising. FOE has many options that allow for precise fine-tuning of the fuzzer, so it can be tweaked specifically for the target. However, my experience with FOE was fickle. With certain targets, the tool would run once and stop, instead of running continuously (as a fuzzer should). Just wanting to get started, I decided to move on to other tools instead. I settled on american fuzzy lop (afl) for Linux and Microsoft MiniFuzz for Windows. Each had their pros and cons. Afl works best with source code, which limits the scope to open-source software (there is experimental support for closed-source binaries, however it is considerably slower). Compiling from source with afl allows the fuzzer to ascertain code coverage and provide helpful feedback in its interface. MiniFuzz is the opposite: it runs on closed-source Windows software and provides very little feedback while it runs. However, the crash data is very helpful as it gives the values of all registers at the time of the program crash — something the other fuzzers did not provide. MiniFuzz was very click and run compared to afl’s more involved compiling setup.

Examining a Crash

Once the fuzzers were set up and running on targets (Video Lan’s VLC, Wireshark, and Image Magick just to name a few) it was time to start analyzing the crashes. Afl reported several crashes in VLC. While verifying that these crashes were reproducible, I noticed that several were segfaults while trying to free the address 0x7d. This struck me as odd because the address was so small, so on a hunch I opened up the crashing input in a hex editor and searched for ‘7d’. Sure enough, deep in the file was a match: 0x0000007d. I changed this to something easily recognizable, 0x41414141, and ran the file through again. This time the segfault was on, you guessed it, 0x41414141! Encouraged by the knowledge that I could control an address in the program from a file, I set out to find the bug. This involved a long process of becoming undesirably acquainted with both gdb and the VLC source code. The bug allows for the freeing of two arbitrary, user-controlled pointers.

The Bug in Detail

VLC reads in different parts of a file as boxes, which it categorizes in a tagged union. The bug is the result of a type confusion when the size of the stsd box in the file is changed, increasing its size so that it considers the following box, an stts box, to be its child. VLC reads boxes from the file by indexing into a function table based on the type of the box and the type of its parent. But with the wrong parent, it finds no match and instead uses a default read, which reads the file in as a vide type box. Later, when freeing the box, it finds the function only by checking its own type, so it triggers the correct function. VLC tries to free an stts box that was read in as a generic vide box, and frees two address straight from the stts box.



Controlling two freed addresses is plausibly exploitable, so it was time to report the bug. I went through oCERT who were very helpful in communicating the bug with the VLC developers to fix the issue and getting a CVE assigned (CVE-2015-5949). After some back and forth it was settled, and time to move on to something new.

Switching Gears to the Web

With half a summer done and another half to learn something new, I began to explore web security. I had slightly more of a background in this from some CTFs and from NYU Hack Night, but I wanted to get a more in-depth and practical understanding. Unlike fuzzing, where it was easy to hit the ground running, web security required a bit more knowledge beforehand. I spent a week trying to learn as much as possible from The Web Application Hacker’s Handbook and the corresponding MDSec labs. Armed with a solid foundation, I put this training to good use.

Bounty Hunting

HackerOne has a directory of companies that have bug bounty programs, and this seemed the best place to start. I sorted by date joined and picked the newest companies – they probably had not been looked at much yet. Using BurpSuite, an indispensable tool, I poked through these websites looking for anything amiss. Looking through sites like,, and, I searched for vulnerable functions and security issues, and submitted a few reports. I’ve had some success, but they are still going through disclosure.

Security Assessment

To conclude the internship, I performed a security assessment of a tech startup in NYC, applying the skills I’ve acquired. I found bugs in application logic, access controls, and session management, the most severe of which was a logic flaw that posed significant financial risk to the company. I then had the opportunity to present these bugs in a meeting with the company. The report was well-received and the company is now implementing fixes.

Signing Off

This experience at Trail of Bits has been fantastic. I’ve gotten a solid foundation in application and web security, and it’s been a great place to work. I’m taking a week off to look at colleges, but I’ll be back working part time during my senior year.

Flare-On Reversing Challenges 2015

This summer FireEye’s FLARE team hosted its second annual Flare-On Challenge targeting reverse engineers, malware analysts, and security professionals. In total, there were eleven challenges, each using different anti-reversing techniques and each in different formats. For example, challenges ranged from simple password crack-mes to kernel drivers to stego in images.

This blogpost will highlight four of the eleven challenges (specifically 6, 7, 9, and 11) that we found most interesting as well as some of the more useful tools and materials that would help for future challenges like these.

Challenge Six

  • Summary: Challenge Six was an obfuscated Android App crack-me which took and verified your input
  • Techniques Used: Remote Android debugging, IDAPython

The novelty of this level was that it wasn’t a Windows binary (the majority of the challenges targeted the Windows platform; clearly looking for some Windows reversers ;] ) and it required knowledge of ARM reversing.

At the heart of this level was the ARM shared object library that contained the algorithm for checking the key. Launching the app on either your spare Android malware designated phone or emulator, we see this screen:

Screen Shot 2015-09-08 at 7.51.24 PM

Taking a stab at gambling, we try entering “password”. No luck.

Screen Shot 2015-09-08 at 7.52.32 PM

Opening it in IDA (if you did this first without running it… you’re in good company) we see that the important part of the library is the compare.


Tracing this compare backwards we find the function which generates the expected input value. All we need to do is statically reverse this. The main part of this decryption function is the factorization of the encrypted password stored in the binary.

The logic from this function can be ported into Python along with the encrypted string. Using IDAPython to extract the necessary data from the binary makes this process a lot easier. For those who have never used IDAPython, the script is included below.

Main IDAPython Script

Main IDAPython Script

The above logic was exfiltrated from the obfuscated binary through static reversing. IDAPython helped with carving out the right data segments from the app.

IDAPython script to dump prime index map

IDAPython script to dump prime index map

IDAPython script to dump "rounds"

IDAPython script to dump “rounds”

Running the final Python script to decrypt the string prints the intended password.


Screen Shot 2015-09-08 at 7.54.05 PM


Aside from reversing statically, remote debugging can also be done with to either attach to the app running on a phone or attach to an emulated android server.

A breakpoint can be then set at the compare and the decrypted flag read out of the debugger. To do this, extract android apk, setup android debugging environment and break at the calls to the shared, obfuscated object.

There are a few good resources online which show how to setup a remote gdb environment on android. Specifically, a few useful resources can be found at the bottom of this post.

Challenge Seven: YUSoMeta

  • Summary: Challenge 7 was an obfuscated .NET application that verified a user-supplied password.
  • Techniques Used: .NET deobfuscation, Windbg special breakpoints

Challenge 7, YUSoMeta, was a .NET Portable Executable format application. Like every good reverser, we load the .NET application into IDA Pro.

Glancing at the Functions window reveals quite a few peculiarly named methods. Many of the classes and class field names do not consist of exclusively ASCII characters (as exhibited by “normal” .NET applications). This suggests the presence of obfuscation.


Opening the application in a hex editor (our particular choice is HxD), we find an interesting string: “Powered by SmartAssembly”.


SmartAssembly is an obfuscator (much like Trail of Bits’ MAST) for .NET applications. Luckily, de4dot is a tool to deobfuscate SmartAssembly protected applications. Deobfuscated, tools such as .NET Reflector can decompile the Common Intermediate Language (CIL) assembly back into C#. Using this, we find a password verification function.


The challenge captures the password obtained by user input and compares it to the expected password as generated by a series of somewhat complex operations. The easiest way to obtain the expected password is to use Windbg.

First, we setup Windbg by loading the SOS debugging extensions to introspect managed programs (aka .NET applications).

In Windbg

In Windbg

Second, we need to set up the symbols path to obtain debugging symbols.

In Windbg

In Windbg

Afterwards, we set a breakpoint on the string equality comparison function, System.String.op_Equality in mscorlib.dll. Note: we run the !name2ee twice because !name2ee always fails on the first issuance.

In Windbg

In Windbg

Upon breaking, we examine the stack objects using !dumpstackobjects. The password used to extract the key should be on the .NET stack.

In Windbg

In Windbg


Challenge Nine: you_are_very_good_at_this

  • Summary: Challenge 9 was an obfuscated command line password checking application
  • Techniques Used: Intel PIN, Windbg, Python, IDA Pro

Challenge 9, you_are_very_good_at_this, was a x86 Portable Executable command line application that took an argument and verified it against the expected password – a basic crack-me.


Like all good reversers, we immediately open the application in IDA Pro, which, revealed an enormous wall of obfuscated code – clearly dynamic analysis is the way to go.


To us, there are two clear ways of solving this challenge. The first uses pintool, the second, Windbg.

First Solution: Pin

We know that the crack-me is checking the command line input somehow, character by character, through mass amounts of operations. Luckily for us, we don’t really need to know more than that.

Using a simple instruction count Pintool (inscount0.cpp from Pin’s tutorial works perfectly) , we can count the instructions executed to check our input and determine whether it failed on the 1st character or failed on the nth character. This allows us to, byte by byte, brute force the password.


It is apparent that there are more instructions executed in the second case, where the nth character is incorrect and exit() isn’t called until later in the execution of the program. We can use this knowledge to determine that the first n-1 characters are correct inputs.


Using Python, we script the pintool to give us the instruction count of the binary’s execution using every possible printable character for the first character of the password.

Python Pseudocode

Python Pseudocode

Doing this, all inputs give us the same instruction count result except for the input containing the correct first character. This is because the correct character is the only one which passes the application’s validator. When this happens, the binary executes additional instructions that aren’t otherwise run.

Now we know one character, we add a for loop to our script to check for an outlier, and do the same thing for every character of the password… successfully leaking the password!

Last year, Flare-on Challenge 6 was also solvable in this exact way, thanks @gaasedelen for his detailed writeup on this.

Second Solution: Windbg/Python/IDA Pro

To solve this challenge the old fashioned way, we launch WinDbg and set a breakpoint on kernel32!ReadFile. We trace the kernel32!ReadFile caller and manually deobfuscate the password checking loop by cross-analyzing in IDA Pro.

The password checking loop uses a CMPXCHG instruction to compare the characters of the user supplied password and the expected password.


We determined the registers of interest are the AL and BL registers. Tracing the dataflow for the registers of interest reveals that the AL register encounters some transformations, as a function of CL and AH, but ultimately derives from the user supplied buffer. This implies that the BL register contains the transformed character of the expected password.

Fortunately, we are able to precisely breakpoint at an instruction in the password verification loop and extract the necessary register values (namely the BL, CL, and AH registers) to decode the actual password.


In Windbg

To decode the expected password, we take the printed BL, CL, and AH register values for each “character round” and implemented a Python function to reverse the XOR/ROL transformation done on AL.

Python Pseudocode

Python Pseudocode

We unearth the key by joining the output of ror_xor for each “character round”.


Challenge Eleven: CryptoGraph

  • Summary: Challenge Eleven was an encrypted jpeg image using the rc5 algorithm and an obfuscated key. It turned out to be a solid ‘reversing’ challenge.
  • Techniques Used: RC5 Crypto Algorithm, Windbg, Resource segment carving

The final challenge. Challenge Eleven, CryptoGraph.exe, was a command line binary which, when no arguments were passed, created a junk jpeg file on the system. Looking closer, we see that the binary does accept one command line argument. However, when any number is passed, the binary loops ‘forever’.


Opening the binary in IDA Pro, we assume that the flag will somehow appear in a properly created image. This means we start reversing by tracing up the calls from “WriteFile.”


A few functions up we realize that the resource #124 is being loaded, decrypted and saved as this image file.


The decrypted algorithm is easily identifiable as RC5 through Google. The first result is the Kaspersky report on the Equation Group and their use of RC5/6.


Now all we need is the RC5 decryption key. Unlucky for us, the key is of length 16 bytes and cannot be easily brute forced. However, reversing further, we realize that the key is the result of two distinct RC5 decryption stages.

The first decryption is indexed into using a random index byte between 0x0 and 0xf. This creates an 8 byte key.

This key is then used in another RC5 decryption which actually takes the encrypted source (Resource_122) at an offset, a number which is the function of the same random index byte. This second stage, decrypts only 16 bytes, the 16 byte RC5 key needed for the encrypted jpeg, Resource_124.

Diagram showing the different decryption stages

Diagram showing the different decryption stages

Breaking in Windbg, we realize that the decryption of Resource_121 is what is causing the program to seemingly loop forever. In fact, the loops, which run from 0x0 to 0x20, are getting exponentially longer to execute each iteration.

Given the RC5 key length and the algorithm used for indexing into the decrypted Resource_121 which gives us this RC5 key, we determine that only one section of the resource is necessary.

Indexing Algorithm

Indexing Algorithm

Decrypting only the relevant bits of Resource_121 reduces the execution time significantly.
The indexing algorithm, which is not entirely deterministic, can, at max, index into the first 784 bytes of the decrypted resource.

Because each loop decrypts 48 bytes (hardcoded as an argument passed to the decrypt function), we need to let the main decryption loop run past 0x10 iterations before breaking out of the function.

Math Used to Calculate Loops Needed

Math Used to Calculate Loops Needed

Using Windbg, we break at the loops to stop after the 0x10th iteration. This means only parts of the key Resource_121 will be decrypted, but thankfully, that’s the only part the 8 byte key needs.

One last thing that needs to be brute forced isa single byte value between 0x0 and 0xf which affects the indexing algorithm. This byte affects the generation of the previously discussed 8 byte key as well as the index into Resource_122 from which a 16 byte key is decrypted.

Python-Windbg Pseudo Code

Python-Windbg Pseudo Code

Scripting this in Windbg (full script found here), we let the binary run 0xf times; each time stopping the loops after 0x10 iterations.

On the 0x9 iteration (the magic indexing byte), the correctly decrypted image is saved to a file, and the flag can be read out :].



Thanks to the FLARE team at FireEye for putting these challenges and successfully forcing me to crack open my Windows VM and learn some new reversing tools. Hope next year’s are just as fun, obscure, and maybe a little harder.

References, Guides, & Tools

Things that we found useful for the challenges.

Hardware Side Channels in the Cloud

At REcon 2015, I demonstrated a new hardware side channel which targets co-located virtual machines in the cloud. This attack exploits the CPU’s pipeline as opposed to cache tiers which are often used in side channel attacks. When designing or looking for hardware based side channels – specifically in the cloud – I analyzed a few universal properties which define the ‘right’ kind of vulnerable system as well as unique ones tailored to the hardware medium.

Slides and full research paper found here.

The relevance of side channel attacks will only increase. Especially attacks which target the vulnerabilities inherent to systems which share hardware resources – such as in cloud platforms.

Virtualization of Physical Resources

Figure 1: virtualization of physical resources


Any meaningful information that you can leak from the environment running the target application or, in this case, the victim virtual machine counts as a side channel. However, some information is better than others. In this case a process (the attacker) must be able to repeatedly record an environment ‘artifact’ from inside one virtual machine.

In the cloud, these environment artifacts are the shared physical resources used by the virtual machines. The hypervisor dynamically partitions each resource and this is then seen by a single virtual machine as its private resource. The side channel model (Figure 2) illustrates this.

Knowing this, the attacker can affect that resource partition in a recordable way, such as by flushing a line in the cache tier, waiting until the victim process uses it for an operation, then requesting that address again – recording what values are now there.

Figure 2: Side Channel Model

Figure 2: side channel model


Great! So we can record things from our victim’s environment – but now what? Depending on what the victim’s process is doing we can actually employ several different types of attacks.

1. crypto key theft

Crypto keys are great, private crypto keys are even better. Using this hardware side channel, it’s possible to leak the bytes of the private key used by a co-located process. In one scenario, two virtual machines are allocated the same space on the L3 cache at different times. The attacker flushes a certain cache address, waits for the victim to use that address, then queries it again – recording the new values that are there [1].

2. process monitoring ( what applications is the victim running? )

This is possible when you record enough of the target’s behavior, i.e. CPU or pipeline usage or values stored in memory. A mapping between the recording to a specific running process can be constructed with a varied degree of certainty. Warning, this does rely on at least a rudimentary knowledge of machine learning.

3. environment  keying ( great for proving co-location! )

Using the environment recordings taken off of a specific hardware resource, you can also uniquely identify one server from another in the cloud. This is useful to prove that two virtual machines you control are co-resident on the same physical server. Alternatively, if you know the behavior signature of a server your target is on, you can repeatedly create virtual machines, recording the behavior on each system until you find a match [2].

4. broadcast signal ( receive messages without the internet :0 )

If a colluding process is purposefully generating behavior on a pre-arranged hardware resource, such as purposefully filling a cache line with 0’s and 1’s, the attacker (your process) can record this behavior in the same way it would record a victim’s behavior. You then can translate the recorded values into pre-agreed messages. Recording from different hardware mediums results in a channel with different bandwidths [3].

The Cache is Easy, the Pipeline is Harder

Now all of the above examples used the cache to record the environment shared by both victim and attacker processes. Cache is the most widely used in both literature and practice to construct side channels as well as being the easiest to record artifacts from. Basically everyone loves cache.

The cache isn’t the only shared resource: co-located virtual machines also share the CPU execution pipeline. In order to use the CPU pipeline, we must be able to record a value from it. However, there is no easy way for any process to query the state of the pipeline over time – it is like a virtual black-box. The only thing a process can know is the instruction set order it gives to be executed on the pipeline and the result the pipeline returns.

out-of-order execution

( the pipeline’s artifact )

We can exploit this pipeline optimization as a means to record the state of the pipeline. The known input instruction order will result in two different return values – one is the expected result(s), the other is the result if the pipeline executions them out-of-order.

Figure 3: Foreign Processes Can Share the Same Pipeline

Figure 3: foreign processes can share the same pipeline

strong memory ordering

Our target, cloud processors, can be assumed to be x86/64 architecture – implying a usually strongly-ordered memory model [4]. This is important because the pipeline will optimize the execution of instructions but attempt to maintain the right order of stores to memory and loads from memory

…HOWEVER, the stores and loads from different threads may be reordered by out-of-order-execution. Now this reordering is observable if we’re clever.

recording instruction reorder ( or how to be clever )

In order for the attacker to record the “reordering” artifact from the pipeline, we must record two things for each of our two threads:

  • input instruction order
  • return value

Additionally, the instructions in each thread must contain a STORE to memory and a LOAD from memory. The LOAD from memory must reference the location stored to by the opposite thread. This setup ensures the possibility for the four cases illustrated below. The last is the artifact we record – doing so several thousand times gives us averages over time.

Figure 4: the attacker can record when its instructions are reordered

Figure 4: the attacker can record when its instructions are reordered

sending a message

To make our attacks more interesting, we want to be able force the amount of recorded out-of-order-executions. This ability is useful for other attacks, such as constructing covert communication channels.

In order to do this, we need to alter how the pipeline’s optimization works – either by increasing the probability that it will or will not reorder our two threads. The easiest is to enforce a strong memory order and guarantee that the attacker will receive less out-of-order-executions.

memory barriers

In the x86 instruction set, there are specific barrier instructions that will stop the processor from reordering the four possible combinations of STORE’s and LOAD’s. What we’re interested in is forcing a strong order when the processor encounters an instruction set with a STORE followed by a LOAD.

The instruction mfence does exactly this.

By have the colluding process inject these memory barriers in the pipeline, the attacker’s instructions will not be reordered, forcing a noticeable decrease in the recorded averages. Doing this in distinct time frames allows us to send a binary message.

Figure 5: mfence ensures the strong memory order on pipeline

Figure 5: mfence ensures the strong memory order on pipeline


The takeaway is that even with virtualization separating your virtual machine from the hundreds of other alien virtual machines, the pipeline can’t distinguish your process’s instructions from all the other ones and we can use that to our advantage. :0

If you would like to learn more about this side channel technique, please read the full paper here.


How We Fared in the Cyber Grand Challenge

The Cyber Grand Challenge qualifying event was held on June 3rd, at exactly noon Eastern time. At that instant, our Cyber Reasoning System (CRS) was given 131 purposely built insecure programs. During the following 24 hour period, our CRS was able to identify vulnerabilities in 65 of those programs and rewrite 94 of them to eliminate bugs built in their code. This proves, without a doubt, that it is not only possible but achievable to automate the actions of a talented software auditor.

Despite the success of our CRS at finding and patching vulnerabilities, we did not qualify for the final event, to be held next year. There was a fatal flaw that lowered our overall score to 9th, below the 7th place threshold for qualification. In this blog post we’ll discuss how our CRS works, how it performed against competitor systems, what doomed its score, and what we are going to do next.

Cyber Grand Challenge Background

The goal of the Cyber Grand Challenge (CGC) is to combine the speed and scale of automation with the reasoning capabilities of human experts. Multiple teams create Cyber Reasoning Systems (CRSs) that autonomously reason about arbitrary networked programs, prove the existence of flaws in those programs, and automatically formulate effective defenses against those flaws. How well these systems work is evaluated through head-to-head tournament-style competition.

The competition has two main events: the qualifying event and the final event. The qualifying event was held on June 3, 2015. The final event is set to take place during August 2016. Only the top 7 competitors from the qualifying event proceed to the final event.

During the qualifying event, each competitor was given the same 131 challenges, or purposely built vulnerable programs, each of which contained at least one intentional vulnerability. For 24 hours, the competing CRSes faced off against each other and were scored according to four criteria. The full details are in the CGC Rules, but here’s a quick summary:

  • The CRS had to work without human intervention. Any teams found to use human assistance were disqualified.
  • The CRS had to patch bugs in challenges. Points were gained for every bug successfully patched. Challenges with no patched bugs received zero points.
  • The CRS could prove bugs exist in challenges. The points from patched challenges were doubled if the CRS could generate an input that crashed the challenge.
  • The patched challenges had to function and perform almost as well as the originals. Points were lost based on performance and functionality loss in the patched challenges.

A spreadsheet with all the qualifying event scores and other data used to make the graphs in this post is available from DARPA (Trail of Bits is the ninth place team). With the scoring in mind, let’s review the Trail of Bits CRS architecture and the design decisions we made.


We’re a small company with a distributed workforce, so we couldn’t physically host a lot of servers. Naturally, we went with cloud computing to do processing; specifically, Amazon EC2. Those who saw our tweets know we used a lot of EC2 time. Most of that usage was purely out of caution.

We didn’t know how many challenges would be in the qualifying event — just that it would be “more than 100.” We prepared for a thousand, with each accompanied by multi-gigabyte network traffic captures. We were also terrified of an EC2 region-wide failure, so we provisioned three different CRS instances, one in each US-based EC2 region, affectionately named Biggie (us-east-1), Tupac (us-west-2), and Dre (us-west-1).

It turns out that there were only 131 challenges and no gigantic network captures in the qualifying event. During the qualifying event, all EC2 regions worked normally. We could have comfortably done the qualifying event with 17 c4.8xlarge EC2 instances, but instead we used 297. Out of our abundance of caution, we over-provisioned by a factor of ~17x.

Bug Finding

The Trail of Bits CRS was ranked second by the number of verified bugs found (Figure 1). This result is impressive considering that we started with nothing while several other teams already had existing bug finding systems prior to CGC.

Figure 1: Teams in the qualifying event ranked by number of bugs found. Orange bars signify finalists.

Our CRS used a multi-pronged strategy to find bugs (Figure 2). First, there was fuzzing. Our fuzzer is implemented with a custom dynamic binary translator (DBT) capable of running several 32-bit challenges in a single 64-bit address space. This is ideal for challenges that feature multiple binaries communicating with one another. The fuzzer’s instrumentation and mutation are separated, allowing for pluggable mutation strategies. The DBT framework can also snapshot binaries at any point during execution. This greatly improves fuzzing speed, since it’s possible to avoid replaying previous inputs when exploring new input space.

Figure 2: Our bug finding architecture. It is a feedback-based architecture that explores the state space of a program using fuzzing and symbolic execution.

Figure 2: Our bug finding architecture. It is a feedback-based architecture that explores the state space of a program using fuzzing and symbolic execution.

In addition to fuzzing, we had not one but two symbolic execution engines. The first operated on the original unmodified binaries, and the second operated on the translated LLVM from mcsema. Each symbolic execution engine had its own strengths, and both contributed to bug finding.

The fuzzer and symbolic execution engines operate in a feedback loop mediated by a system we call MinSet. The MinSet uses branch coverage to maintain a minimum set of maximal coverage inputs. The inputs come from any source capable of generating them: PCAPs, fuzzing, symbolic execution, etc. Every tool gets original inputs from MinSet, and feeds any newly generated inputs into MinSet. This feedback loop lets us explore the possible input state with both fuzzers and symbolic execution in parallel. In practice this is very effective. We log the provenance of our crashes, and most of them look something like:

Network Capture ⇒ Fuzzer ⇒ SymEx1 ⇒ Fuzzer ⇒ Crash

Some bugs can only be triggered when the input replays a previous nonce, which would be different on every execution of the challenge. Our bug finding system can produce inputs that contain variables based on program outputs, enabling our CRS to handle such cases.

Additionally, our symbolic executors are able to identify which inputs affect program state at the point of a crash. This is a key requirement for the success of any team competing in the final as it enables the CRS to create a more controlled crash.


Our CRS’s patching effectiveness, as measured by the security score, ranks as fourth (Figure 3).

Figure 3: Teams in the qualifying event ranked by patch effectiveness (security score). Orange bars signify finalists.

Figure 3: Teams in the qualifying event ranked by patch effectiveness (security score). Orange bars signify finalists.

Our CRS patches bugs by translating challenges into LLVM bitcode with mcsema. Patches are applied to the LLVM bitcode, optimized, and then converted back into executable code. The actual patching works by gracefully terminating the challenge when invalid memory accesses are detected. Patching the LLVM bitcode representation of challenges provides us with enormous power and flexibility:

  • We can easily validate any memory access and keep track of all memory allocations.
  • Complex algorithms, such as dataflow tracking, dominator trees, dead store elimination, loop detection, etc., are very simple to implement using the LLVM compiler infrastructure.
  • Our patching method can be used on real-world software, not just CGC challenges.

We created two main patching strategies: generic patching and bug-based patching. Generic patching is an exclusion-based strategy: it first assumes that every memory access must be verified, and then excludes accesses that are provably safe. The benefit of generic patching is that it patches all possible invalid memory accesses in a challenge. Bug-based patching is an inclusion-based strategy: it first assumes only one memory access (where the CRS found a bug) must be verified, and then includes nearby accesses that may be unsafe. Each patching strategy has multiple heuristics to determine which accesses should be included or excluded from verification.

The inclusion and exclusion heuristics generate patched challenges with different security/performance tradeoffs. The patched challenges generated by these heuristics were tested for performance and security to determine which heuristic performed best while still fixing the bug. For the qualifying event, we evaluated both generic and bug-based patching, but ultimately chose a generic-only patching strategy. Bug-based patching was slightly more performant, but generic patching was more comprehensive and it patched bugs that our CRS couldn’t find.

Functionality and Performance

Functionality and performance scores combine to create an availability score. The availability score is used as a scaling factor for points gained by patching and bug finding. This scaling factor only matters for successfully patched challenges, since those are the only challenges that can score points. The following graphs only consider functionality and performance of successfully patched challenges.


Out of the 94 challenges that our CRS successfully patched, 56 retained full functionality, 30 retained partial functionality, and 8 were nonfunctional. Of the top 10 teams in the qualifying event, our CRS ranks 5th in terms of fully functional patched challenges (Figure 4). We suspect our patched challenges lost functionality due to problems in mcsema, our x86 to LLVM translator. We hope to verify and address these issues once DARPA open-sources the qualifying event challenges.

Figure 4: The count of perfectly functional, partially functional, and nonfunctional challenges submitted by each of the top 10 teams in the qualifying event. Orange bars signify finalists.

Figure 4: The count of perfectly functional, partially functional, and nonfunctional challenges submitted by each of the top 10 teams in the qualifying event. Orange bars signify finalists.


The performance of patched challenges is how our CRS snatched defeat from the jaws of victory. Of the top ten teams in the qualifying event, our CRS placed last in terms of patched challenge performance (Figure 5).

Figure 5: Average and median performance scores of the top ten qualifying event participants. Orange bars signify finalists.

Figure 5: Average and median performance scores of the top ten qualifying event participants. Orange bars signify finalists.

Our CRS produces slow binaries for two reasons: technical and operational. The technical reason is that performance of our patched challenges is an artifact of our patching process, which translates challenges into LLVM bitcode and then re-emits them as executable binaries. The operational reason is that our patching was developed late and optimized for the wrong performance measurements.

So, why did we optimize for the wrong performance measurements? The official CGC performance measurement tools were kept secret, because the organizers wanted to ensure that no one could cheat by gaming the performance measurements. Therefore, we had to measure performance ourselves, and our metrics showed that CPU overhead of our patched challenges was usually negligible. The main flaw that we observed was that our patched challenges used too much memory. Because of this, we spent time and effort optimizing our patching to use less memory in favor of using more CPU time.

It turns out we optimized for the wrong thing, because our self-measurement did not agree with the official measurement tools (Table 1). When self-measuring, our worst-performing patching method had a median CPU overhead of 33% and a median memory overhead of 69%. The official qualifying event measured us at 76% CPU overhead and 28% memory overhead. Clearly, our self-measurements were considerably different from official measurements.

Measurement Median CPU Overhead Median Memory Overhead
Worst Self-Measured Patching Method 33% 69%
Official Qualifying Event 76% 28%

Table 1: Self measured CPU and memory overhead and the official qualifying event CPU and memory overhead.

Our CRS measured its overall score with our own performance metrics. The self-measured score of our CRS was 106, which would have put us in second place. The real overall score was 21.36, putting us in ninth.

An important aspect of software development is choosing where to focus your efforts, and we chose poorly. CGC participants had access to the official measuring system during two scored events held during the year, one in December 2014 and one in April 2015. We should have evaluated our patching system thoroughly during both scored events. Unfortunately, our patching wasn’t fully operational until after the second scored event, so we had no way to verify the accuracy of our self-measurement. The performance penalty of our patching isn’t a fundamental issue. Had we known how bad it was, we would have fixed it. However, according to our own measurements the patching was acceptable so we focused efforts elsewhere.

What’s Next?

According to the CGC FAQ (Question 46), teams are allowed to combine after the qualifying event. We hope to join forces with another team that qualified for the CGC final event, and use the best of both our technologies to win. The technology behind our CRS will provide a significant advantage to any team that partners with us. If you would like to discuss a potential partnership for the CGC final, please contact us at

If we cannot find a partner for the CGC final, we will focus our efforts on adapting our CRS to automatically find and patch vulnerabilities in real software. Our system is up to the task: it has already proven that it can find bugs, and all of its core components were derived from software that works on real Linux binaries. Several components even have Windows and 64-bit support, and adding support for other platforms is a possibility. If you are interested in commercial applications of our technology, please get in touch with us at

Finally, we plan to contribute back fixes and updates to the open source projects utilized in our CRS. We used numerous open source projects during development, and have made several custom fixes and modifications. We look forward to contributing these back to the community so that everyone benefits from our improvements.

How to Harden Your Google Apps

Never let a good incident go to waste.

Today, we’re using the OPM incident as an excuse to share with you our top recommendations for shoring up the security of your Google Apps for Work account.

More than 5 million companies rely on Google Apps to run their critical business functions, like email, document storage, calendaring, and chat. As a result, a huge amount of data pools inside Google Apps just waiting for an attacker to gain access to it. In any modern company, this is target #1.

This guide is for small businesses who want avoid the worst security problems while expending minimal effort. If you’re in a company with more than 500 employees, and have dedicated IT staff, this guide is not for you.


A lot that can go wrong with computers, even when you eliminate the complexity of client applications and move to a cloud-hosted platform like Google Apps. Many people tend to think too abstractly about security to reason about concrete steps to improve themselves. In this context, here are the attacks we’re concerned about:

  • Password management. Users occasionally reuse passwords, surrender them to successful phishing, or lose all of them due to poor choice of password manager.
  • Cross-Site Scripting (XSS). Google has an enormous number of web applications under active development. They routinely acquire and add new companies to their domain. Some new vulnerabilities might be tucked into this torrent of fresh code. Any one XSS can result in a lost cookie that logs an attacker into your Google account.
  • Inadvertent Disclosure. Permissions management is hard. The user interface for Google Docs does not make it easier. Internal documents, calendars, and more can end up publicly available and indexed by search.
  • Backdoored Accounts. In the event of a successful compromise of one user’s account, the attacker will seek to preserve access so they can come back later. Backdoored Google Apps accounts can continue to leak emails even after you format an infected computer.
  • Exploits and Malware. Even with an all-Chromebook fleet (which we wholeheartedly recommend), there is a chance that computers will get infected and malware will ride on the back of legitimate sessions to gain access to your accounts.

Top 8 Google Apps Security Enhancements

If you make these few changes, you’ll be miles ahead of most other people and at considerably less risk to any of the above scenarios.

1. Create a secure Super Administrator account

In, create a new admin account for your domain. You’ll only use this account to administer your domain; no email, no chat. Stay logged out of it. Set the secondary, recovery email to a secure mail host (like your personal Gmail). Turn on 2FA or use a Security Key for both accounts.

Separate the role for administrative access to your domain

Separate the role for administrative access to your domain

2. Plug the leaks in your email policy

Gmail provides a wealth of options that allow users to forward, share, report, or disclose their emails to third parties. Any of these options could enable an inadvertent disclosure or provide a handy backdoor to an attacker who has lost their primary method of access. Disable read receipts, mail delegation, emailing profiles, automatic forwarding, and outbound gateways.

Limit what can go wrong with email

Limit what can go wrong with email

Disable automatic forward

Disable automatic forward

Keep your mail to yourself

Keep work email configurations clean

3. Enable 2-Step Verification (2SV) and review your enrollment reports

2SV (or, as it’s more commonly referred, 2-factor Authentication or 2FA) will save your ass. With 2FA switched on, stolen passwords won’t be enough to compromise accounts. Hundreds of services support it. You should encourage your users to turn it on everywhere. Heck, just buy a bunch of Security Keys and hand them out like health workers would condoms.

Why is this even an option? Turn it on already!

Why is this even an option? Turn it on already!

Note: The advanced settings expose an option to force 2FA on every user on your domain. To use this feature properly, you must create an exception group to allow new users to set up their accounts. tl;dr Ignore the enforcement feature and just go bop your users over the head when you see they haven’t turned 2FA on yet.

4. Delete or suspend unmaintained user accounts

Stale accounts have accumulated sensitive data yet have no one to watch over them. Over the lifetime of an account, it may have connected to dozens of apps, left its password saved in mobile and client apps, and shared public documents now left forgotten and unmaintained. Reduce the risk of these accounts by deleting or suspending them.

Delete or suspend unmaintained accounts

Delete or suspend unmaintained accounts

5. Reduce your data’s exposure to third parties

The default settings for Mail, Drive, Talk, and Sites can lead to over-sharing of data. Retain the flexibility for employees to choose the appropriate setting, but tighten the defaults to start with the data private and warn users when it is not. Currently, there is no universal control; you have to make changes to each Google app individually.

Disable contact sharing

Disable contact sharing (a great way to determine who your CEO talks to)

Stricter defaults for Drive

Stricter defaults for Drive

Stricter defaults for Drive

Stricter defaults for Drive

Help users recognize who they are talking to

Help users recognize who they are talking to

Don't overstore data if you don't need to

Don’t overstore data if you don’t need to

Help users understand who can see their Site

Help users understand who can see their Site

6. Prevent email forgery using your domain name

Left unprotected, it is easy for an attacker to spoof an email that looks like it came from your CEO and send it to your staff, partners, or clients. Ensure this does not happen. Turn on SPF and DKIM to authenticate email for your domain. Both require modifications to TXT records in your DNS settings.

Turn on DKIM for your domain and get this green check

Turn on DKIM for your domain and get this green check

7. Disable services from Google that you don’t need

Cross-site Scripting (XSS) and other client-side web application flaws are an underappreciated method for performing targeted hacks. DOM XSS can be used as a method of persistence. Labelling a bug as “post-authentication” means little when you stay logged into your Google account all day. Disable access to Google services you don’t use. That will help limit the amount of code your cookies are exposed to.

There are dozens of services you'll never use. Disable them.

There are dozens of services you’ll never use. Disable them.

8. Set booby traps for the hacker that makes it in anyway

Your defenses will give way at some point. When this happens, you’ll want to know it, fast. Enable predefined alerts to receive an email when major changes are made to your Google Apps. Turn on alerts for suspicious login activity, admin privileges added or revoked, users added or deleted, and any settings changes. Send the alerts to a normal user, since you wouldn’t be logged into the Super Administrator regularly.

Turn on alerts and be liberal with who gets them

Turn on alerts and be liberal with who gets them

Security Wishlist for Google Apps

Google Apps offers one of the most secure platforms for running outsourced IT services for your company. However, even the configuration above leaves some blind spots.

Better support for inbound attachment filtering

Attackers will email your users malicious attachments or links. This problem is largely one for the endpoint (and Google offers Chromebooks as one solution), but an email provider can do more to mitigate this tactic.

The Google Apps settings for Gmail offers an “attachment compliance” feature that, while not specifically made for security, could be enhanced to protect users from malicious attachments. Gmail could prepend a message to the email subject that includes a warning about certain attachments, quarantining attachments with certain features (e.g. macros), sending attachments to a third-party service for analysis via an ICAP-like protocol, or converting attachments (say, doc to docx).

If we took this idea even further, Gmail could strip the attachments entirely and place them in Google Drive. This would make it easier to remove access to the attachment in the event it was identified as malicious and it would make it easier to perform repeated analyses of past attachments to discover previously unknown malicious content.

Attachment compliance options could be useful to play with

Tune attachment compliance options to protect users from malicious attachments

Better management of 2FA enforcement

Google was the first major service provider to roll out 2FA to all their users. Their support for this technology has been nothing short of tremendous. But it’s still too hard to enforce across your domain in Google Apps.

Turning on organization-wide enforcement requires setting up an exception group and performing extra work each time you add a new user to your domain. Could Google require 2FA on first sign-in, or give new users a configurable X-day grace period during which they could use just a password? How about bulk discounts on Security Keys?

Built-in management and reporting for DMARC

Domain Message Authentication Reporting and Conformance (DMARC), like SPF and DKIM, was designed to enhance the security and deliverability of the email you send. DMARC can help you discover how and when other people may be sending email in your name. If you want to turn on DMARC for your Google Apps, you’re pretty much on your own.

Google should make it easier to turn on DMARC and provide the tools to help manage it. This is a no-brainer, and it should be, considering email is their flagship feature.

End-to-end crypto on all their services

If the data for your organization were stored encrypted on Google servers, you wouldn’t have to worry as much about password disclosures, snooping Google employees, or security incidents at Google. Anyone who gained access to your data, but lacked the proper key, would be unable to read it.

Google’s End-to-End project will help users deploy email crypto. If you want this feature today, the S/MIME standard is supported out-of-the-box on, iOS, Outlook, Thunderbird, and more. Amazon WorkMail, a competitor to Google Apps, allows client-managed keys. By encrypting 100% of your internal email, their contents are unreadable to third parties that happen to gain access to your accounts.

However, this still leaves sensitive data that lives unprotected on other services, like Hangouts and Drive. Yes, there are alternatives, but none are ideal in this scenario. You could deploy your own, in-house secure videoconferencing or consider adopting tarsnap but the inconvenience is still too great. This problem is still waiting for a solution in Google Apps.

If You Have a Problem

By now, your Google Apps domain should be less vulnerable. So, what happens if you discover one of your users has been hacked? Google has you covered here. Review the “Administrator security checklist” if you think you have a problem. Their step-by-step guide is nearly everything you need to get started responding to a security incident.


I hope that you have found this guide useful. What do you use to help secure your Google Apps? Are there features on your wishlist for Google Apps that I missed? Did I miss something?


GCHQ released a guide for securing Google Apps in November, 2015.

Introducing the RubySec Field Guide

Vulnerabilities have been discovered in Ruby applications with the potential to affect vast swathes of the Internet and attract attackers to lucrative targets online.

These vulnerabilities take advantage of features and common idioms such as serialization and deserialization of data in the YAML format. Nearly all large, tested and trusted open-source Ruby projects contain some of these vulnerabilities.

Few developers are aware of the risks.

In our RubySec Field Guide, you’ll cover recent Ruby vulnerabilities classes and their root causes. You’ll see demonstrations and develop real-world exploits. You’ll study the patterns behind the vulnerabilities and develop software engineering strategies to avoid these vulnerabilities in your projects.

You Will Learn

  • The mechanics and root causes of past Rails vulnerabilities
  • Methods for mitigating the impact of deserialization flaws
  • Rootkit techniques for Rack-based applications via YAML deserialization
  • Mitigations techniques for YAML deserialization flaws
  • Defensive Ruby programming techniques
  • Advanced testing techniques and fuzzing with Mutant

We’ve structured this field guide so you can learn as quickly as you want, but if you have questions along the way, contact us. If there’s enough demand, we may even schedule an online lecture.

Now, to work.

-The Trail of Bits Team

Closing the Windows Gap

The security research community is full of grey beards that earned their stripes writing exploits against mail servers, domain controllers, and TCP/IP stacks. These researchers started writing exploits on platforms like Solaris, IRIX, and BSDi before moving on to Windows exploitation. Now they run companies, write policy, rant on twitter, and testify in front of congress. I’m not one of those people; my education in security started after Windows Vista and then expanded through Capture the Flag competitions when real-world research got harder. Security researchers entering the industry post-20101 learn almost exclusively via Capture the Flags competitions.

Occasionally, I’ll try to talk a grey beard into playing capture the flag. It’s like trying to explain Pokemon to adults. Normally such endeavors are an exercise in futility; however, on a rare occasion they’ll also get excited and agree to try it out! They then get frustrated and stuck on the same problems I do – it’s fantastic for my ego2.

“Ugh, it’s 90s shellcoding problems applied today.”
— muttered during DEFCON 22 CTF Quals

Following a particularly frustrating CTF we were discussing challenges and how there are very few Windows challenges despite Windows being such an important part of our industry. Only the Russian CTFs release Windows challenges; none of the large American CTFs do.

Much like Cold War-era politics, the Russian (CTFs) have edged out a Windows superiority, a Windows gap.

Projected magnitude of the Windows gap

Projected magnitude of the Windows gap

The Windows gap exists outside of CTF as well. Over the past few years the best Windows security research has come out of Russia3 and China. So, why are the Russians and Chinese so good at Windows? Well, because they actually use Windows…and for some reason western security researchers don’t.

Let’s close this Windows gap. Windows knowledge is important for our industry.

Helping the CTF community

If Capture the Flag competitions are how today’s greenhorns cut their teeth, we should have more Windows-based challenges and competitions. To facilitate this, Trail of Bits is releasing AppJailLauncher, a framework for making exploitable Windows challenges!

This man knows Windows and thinks you should too.

This man knows Windows and thinks you should too.

As a contest organizer, securing your infrastructure is the biggest priority and securing Windows services has always been a bit tricky until Windows 8 and the introduction of AppContainers. AppJailLauncher uses AppContainers to keep everything nice and secure from griefers. The repository includes everything you need to isolate a Windows TCP service from the rest of the operating system.

Additionally, we’re releasing the source code to greenhornd, a 2014 CSAW challenge I wrote to introduce people to Windows exploitation and the best debugger yet developed: WinDbg. The repository includes the binary as released, deployment directions, and a proof-of-vulnerability script.

We’re hoping to help drag the CTF community kicking and screaming into Windows expertise.

Windows Reactions

Releasing a Windows challenge last year at CSAW was very entertaining. There was plenty of complaining4:

<dwn> how is this windows challenge only 200 points omg
<dwn> making the vuln obvious doesn’t make windows exploitation any easier ;_;

<mserrano> RyanWithZombies: dude but its fuckin windows
<mserrano> even I don’t use windows anymore
<@RyanWithZombies> i warned you guys for months
<mserrano> also man windows too hard

<geohot> omg windows
<geohot> is so hard
<geohot> will do tomorrow
<geohot> i don’t have windows vm

<ebeip90> zomg a windows challenge
<ebeip90> ❤
[ hours later ]
<ebeip90> remember that part a long time ago when I said “Oh yay, a Windows challenge”?

<ricky> Windows is hard
<miton> ^

Some praise:

<cai_> i liked your windows one btw 🙂

<MMavipc> RyanWithZombies pls more windows pwning/rce

<CTFBroforce> I was so confused I have never done a windows exploit
<CTFBroforce> this challenge is going to make me look into windows exploits
<CTFBroforce> I dont know how to write windows shell code

<spq> thx for the help and the force to exploit windows with shellcode for the first time 🙂

It even caused some arguments among competitors:

<clockish> dudes, shut up, windows is hard
<MMavipc> windows is easy
<MMavipc> linux is hard

We hope AppJailLauncher will be used to elicit more passionate responses over the next few years!

  1. Many of the most popular CTFs started in 2010 and 2011: Ghost in the Shellcode (2010), RuCTFe (2010), PlaidCTF (2011), Codegate (2011), PHDays (2011). Very few predate 2010.
  2. Much like watching geohot fail at format string exploitation during a LiveCTF broadcast:
  3. Try searching for obscure Windows kernel symbols, you’ll end up on a Russian forum.
  4. The names have not been changed to shame the enablers.

Empire Hacking, a New Meetup in NYC

Today we are launching Empire Hacking, a bi-monthly meetup that focuses on pragmatic security research and new discoveries in attack and defense.


It’s basically a security poetry jam

Empire Hacking is technical. We aim to bridge the gap between weekend projects and funded research. There won’t be any product pitches here. Come prepared with your best ideas.

Empire Hacking is exclusive. Talks are by invitation-only and are under Chatham House Rule. We will discuss ongoing research and internal projects you won’t hear about anywhere else.

Empire Hacking is engaging. Talk about subjects you find interesting, face to face, with a community of experts from across the industry.

Each meetup will consist of short talks from three expert speakers and run from 6-9pm at Trail of Bits HQ. Tentative schedule: Even months, on Patch Tuesday (the 2nd Tuesday). Beverages and light food will be provided. Space is limited. Please apply on our Meetup page.

Our inaugural meetup will feature talks from Chris Rohlf, Dr. Byron Cook, and Nick DePetrillo on Tuesday, June 9th.

Offense at Scale

Chris will discuss the effects of scale on vulnerability research, fuzzing and real attack campaigns.

Chris Rohlf runs the penetration testing team at Yahoo in NYC. Before Yahoo he was the founder of Leaf Security Research, a highly-specialized security consultancy with expertise in vulnerability discovery, reversing and exploit development.

Automatically proving program termination (and more!)

Byron will discuss research advances that have led to practical tools for automatically proving program termination and related properties.

Dr. Byron Cook is professor of computer science at University College London.

Cellular Baseband Exploitation

Baseband exploitation has been a topic of interest for many, however, few have described the effort required to make such attacks practical. In this talk, we explore the challenges towards reliable, large-scale cellular baseband exploitation.

Nick DePetrillo is a principal security engineer at Trail of Bits with expertise in cellular hardware and infrastructure security.

Keep up with Empire Hacking by following us on Twitter. See you at a meetup!

Frequently Asked Questions

Why is Empire Hacking a membership-based group?

To cultivate a tight-knit community. This should be a place where members feel free to discuss private or exclusive research and data, knowing that it will remain within the group. Furthermore, we believe that a membership process increases motivation to make a high-quality contribution.

To protect against abuse. Everyone is expected to treat his or her fellow members with respect and decency. Violators lose membership and all access to the group, including membership lists, meeting locations, and our discussion board.

To follow the crowd. Not really. But seriously, we are hardly the first private meetup or group in security. Consider that NCC Open Forum “is by invite only and is limited to engineers and technical managers”, NY Information Security Meetup charges $5 to attend, and Ops-T “does not accept applications for membership.”

Why does Empire Hacking use Chatham House Rules?

We welcome everyone to apply to Empire Hacking, even journalists. But we don’t want participants to worry that their personal thoughts will be relayed to outsiders, or used against them or their employers. We enforce Chatham House Rules to preserve the balance between candor and discretion.

How can I attend a meetup?

Please apply on our page. If you have any trouble, feel free to reach out to any of the Trail of Bits staff, including on our Slack community for Empire Hacking.

The Foundation of 2015: 2014 in Review

We need to do more to protect ourselves. 2014 overflowed with front-page proof: Apple, Target, JPMorgan Chase, etc, etc.

The current, vulnerable status quo begs for radical change, an influx of talented people, and substantially better tools. As we look ahead to driving that change in 2015, we’re proud to highlight a selection of our 2014 accomplishments that will underpin that work.

1. Open-source framework to transform binaries to LLVM bitcode

Our framework for analyzing and transforming machine-code programs to LLVM bitcode became a new tool in the program analysis and reverse engineering communities. McSema connects the world of LLVM program analysis and manipulation tools to binary executables. Currently it supports the translation of semantics for x86 programs and supports subsets of integer arithmetic, floating point, and vector operations.

2. Shaped smarter public policy

The spate of national-scale computer security incidents spurred anxious conversation and action. To pre-empt poorly conceived laws from poorly informed lawmakers, we worked extensively with influential think tanks to help educate our policy makers on the finer points of computer security. The Center for a New American Security’s report “Surviving on a Diet of Poisoned Fruit” was just one result of this effort.

3. More opportunities for women

As part of our ongoing collaboration with NYU-Poly, Trail of Bits put its support behind the CSAW Program for High School Women and Career Discovery in Cyber Security Symposium. These events are intended to help guide talented and interested women into careers in computer security. We want to create an environment where women have the resources to contribute and excel in this industry.

4. Empirical data on secure development practices

In contrast with traditional security contests, Build-it, Break-it, Fix-it rewards secure software development under the same pressures that lead to bugs: tight deadlines, performance requirements, competition, and the allure of money. We were invited to share insights from the event at Microsoft’s Bluehat v14.

5. Three separate Cyber Fast Track projects

Under DARPA’s Program Manager Peiter ‘Mudge’ Zatko, we completed three distinct projects in the revolutionary Cyber Fast Track program: CodeReason, MAST, and PointsTo. Five of our employees went to the Pentagon to demonstrate our creations to select members of the Department of Defense. We’re happy to have participated and been recognized for our work. We’re now planning on giving back; CodeReason will be making an open-source release in 2015!

6. Taught machines to find Heartbleed

Heartbleed, the infamous OpenSSL vulnerability, went undetected for so long because it’s hard for static analyzers to detect. So, Andrew Ruef took on the challenge and wrote a checker for clang-analyzer that can find Heartbleed and other bugs like it automatically. We released the code for others to learn from.

7. A resource for students of computer security

One of the most fun and effective ways to learn computer security is by competing in Capture the Flag events. But many fledgling students don’t know where to get started. So we wrote the Capture the Flag Field Guide to help them get involved and encourage them to take the first steps down this career path.

8. The iCloud Hack spurs our two-factor authentication guide

Adding two-factor authentication is always a good idea. Just ask anyone whose account has been compromised. If you store any sensitive information with Google, Apple ID or Dropbox, you’ll want to know about our guide to adding an extra layer of protection to your accounts.

9. Accepted into DARPA’s Cyber Grand Challenge

The prize: $2 million. The challenge: Build a robot that can repair insecure software without human input. If successful, this program will have a profound impact on the way companies secure their data in the future. We were selected as one of seven funded teams to compete.

10. THREADS 2014: How to automate security

Our CEO Dan Guido chaired THREADS, a research and development conference that takes place at NYU-Poly’s Cyber Security Awareness Week (CSAW). This year’s theme focused on scaling security — ensuring that security is an integral and automated part of software development and deployment models. We believe that the success of automated security is essential to our ever more internetworked society and devices. See talks and slides from the event.

Looking ahead.

This year, we’re excited to develop and share more code, including: improvements to McSema (i.e. support for LLVM 3.5, lots more SSE and FPU instruction support, and a new control flow recovery module based on JakStab), a private videochat service, and an open-source release of CodeReason. We’re also excited about Ghost in the Shellcode (GitS) — a capture the flag competition at ShmooCon in Washington DC in January that three of our employees are involved in running. And don’t forget about DARPA’s Cyber Grand Challenge qualifying event in June.

For now, we hope you’ll connect with us on Twitter or subscribe to our newsletter.

Close Encounters with Symbolic Execution (Part 2)

This is part two of a two-part blog post that shows how to use KLEE with mcsema to symbolically execute Linux binaries (see the first post!). This part will cover how to build KLEE, mcsema, and provide a detailed example of using them to symbolically execute an existing binary. The binary we’ll be symbolically executing is an oracle for a maze with hidden walls, as promised in Part 1.

As a visual example, we’ll show how to get from an empty maze to a solved maze:

Maze (Before) Maze (After)

Building KLEE with LLVM 3.2 on Ubuntu 14.04

One of the hardest parts about using KLEE is building it. The official build instructions cover KLEE on LLVM 2.9 and LLVM 3.4 on amd64. To analyze mcsema generated bitcode, we will need to build KLEE for LLVM 3.2 on i386. This is an unsupported configuration for KLEE, but it still works very well.

We will be using the i386 version of Ubuntu 14.04. The 32-bit version of Ubuntu is required to build a 32-bit KLEE. Do not try adding -m32 to CFLAGS on a 64-bit version. It will take away hours of your time that you will never get back. Get the 32-bit Ubuntu. The exact instructions are described in great detail below. Be warned: building everything will take some time.

# These are instructions for how to build KLEE and mcsema. 
# These are a part of a blog post explaining how to use KLEE
# to symbolically execute closed source binaries.
# install the prerequisites
sudo apt-get install vim build-essential g++ curl python-minimal \
  git bison flex bc libcap-dev cmake libboost-dev \
  libboost-program-options-dev libboost-system-dev ncurses-dev nasm
# we assume everything KLEE related will live in ~/klee.
cd ~
mkdir klee
cd klee
# Get the LLVM and Clang source, extract both
tar xzf llvm-3.2.src.tar.gz
tar xzf clang-3.2.src.tar.gz
# Move clang into the LLVM source tree:
mv clang-3.2.src llvm-3.2.src/tools/clang
# normally you would use cmake here, but today you HAVE to use autotools.
cd llvm-3.2.src
# For this example, we are only going to enable only the x86 target.
# Building will take a while. Go make some coffee, take a nap, etc.
./configure --enable-optimized --enable-assertions --enable-targets=x86
# add the resulting binaries to your $PATH (needed for later building steps)
export PATH=`pwd`/Release+Asserts/bin:$PATH
# Make sure you are using the correct clang when you execute clang — you may 
# have accidentally installed another clang that has priority in $PATH. Lets 
# verify the version, for sanity. Your output should match whats below.
#$ clang --version
#clang version 3.2 (tags/RELEASE_32/final)
#Target: i386-pc-linux-gnu
#Thread model: posix
# Once clang is built, its time to built STP and uClibc for KLEE.
cd ~/klee
git clone
# Use CMake to build STP. Compared to LLVM and clang,
# the build time of STP will feel like an instant.
cd stp
mkdir build && cd build
cmake -G 'Unix Makefiles' -DCMAKE_BUILD_TYPE=Release ..
# After STP builds, lets set ulimit for STP and KLEE:
ulimit -s unlimited
# Build uclibc for KLEE
cd ../..
git clone --depth 1 --branch klee_0_9_29
cd klee-uclibc
./configure -l --enable-release
cd ..
# It’s time for KLEE itself. KLEE is updated fairly often and we are 
# building on an unsupported configuration. These instructions may not 
# work for future versions of KLEE. These examples were tested with 
# commit 10b800db2c0639399ca2bdc041959519c54f89e5.
git clone
# Proper configuration of KLEE with LLVM 3.2 requires this long voodoo command
cd klee
./configure --with-stp=`pwd`/../stp/build \
  --with-uclibc=`pwd`/../klee-uclibc \
  --with-llvm=`pwd`/../llvm-3.2.src \
  --with-llvmcc=`pwd`/../llvm-3.2.src/Release+Asserts/bin/clang \
  --with-llvmcxx=`pwd`/../llvm-3.2.src/Release+Asserts/bin/clang++ \
# KLEE comes with a set of tests to ensure the build works. 
# Before running the tests, libstp must be in the library path.
# Change $LD_LIBRARY_PATH to ensure linking against libstp works. 
# A lot of text will scroll by with a test summary at the end.
# Note that your results may be slightly different since the KLEE 
# project may have added or modified tests. The vast majority of 
# tests should pass. A few tests fail, but we’re building KLEE on 
# an unsupported configuration so some failure is expected.
export LD_LIBRARY_PATH=`pwd`/../stp/build/lib
make check
#These are the expected results:
#Expected Passes : 141
#Expected Failures : 1
#Unsupported Tests : 1
#Unexpected Failures: 11
# KLEE also has a set of unit tests so run those too, just to be sure. 
# All of the unit tests should pass!
make unittests
# Now we are ready for the second part: 
# using mcsema with KLEE to symbolically execute existing binaries.
# First, we need to clone and build the latest version of mcsema, which
# includes support for linked ELF binaries and comes the necessary
# samples to get started.
cd ~/klee
git clone
cd mcsema
git checkout v0.1.0
mkdir build && cd build
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release ..
# Finally, make sure our environment is correct for future steps
export PATH=$PATH:~/klee/llvm-3.2.src/Release+Asserts/bin/
export PATH=$PATH:~/klee/klee/Release+Asserts/bin/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/klee/stp/build/lib/

Translating the Maze Binary

The latest version of mcsema includes the maze program from Felipe’s blog in the examples as demo_maze. In the instructions below, we’ll compile the maze oracle to a 32-bit ELF binary and then convert the binary to LLVM bitcode via mcsema.

# Note: tests/ completes these steps automatically
cd ~/klee/mcsema/mc-sema/tests
# Load our environment variables
# Compile the demo to a 32-bit ELF executable
${CC} -ggdb -m32 -o demo_maze demo_maze.c
# Recover the CFG using mcsema's bin_descend
${BIN_DESCEND_PATH}/bin_descend -d -func-map=maze_map.txt -i=demo_maze -entry-symbol=main
# Convert the CFG into LLVM bitcode via mcsema's cfg_to_bc
${CFG_TO_BC_PATH}/cfg_to_bc -i demo_maze.cfg -driver=mcsema_main,main,raw,return,C -o demo_maze.bc
# Optimize the bitcode
${LLVM_PATH}/opt -O3 -o demo_maze_opt.bc demo_maze.bc

We will use the optimized bitcode (demo_maze_opt.bc) generated by this step as input to KLEE. Now that everything is set up, let’s get to the fun part — finding all maze solutions with KLEE.

# create a working directory next to the other KLEE examples.
cd ~/klee/klee/examples
mkdir maze
cd maze
# copy the bitcode generated by mcsema into the working directory
cp ~/klee/mcsema/mc-sema/tests/demo_maze_opt.bc ./
# copy the register context (needed to build a drive to run the bitcode)
cp ~/klee/mcsema/mc-sema/common/RegisterState.h ./

Now that we have the maze oracle binary in LLVM bitcode, we need to tell KLEE which inputs are symbolic and when a maze is solved. To do this we will create a small driver that will intercept the read() and exit() system calls, mark input to read() as symbolic, and assert on exit(1), a successful maze solution.

To make the driver, create a file named maze_driver.c with contents from the this gist and use clang to compile the maze driver into bitcode. Every function in the driver is commented to help explain how it works. 

clang -I../../include/ -emit-llvm -c -o maze_driver.bc maze_driver.c

We now have two bitcode files: the translation of the maze program and a driver to start the program and mark inputs as symbolic. The two need to be combined into one bitcode file for use with KLEE. The two files can be combined using llvm-link. There will be a compatibility warning, which is safe to ignore in this case.

llvm-link demo_maze_opt.bc maze_driver.bc > maze_klee.bc

Running KLEE

Once we have the combined bitcode, let’s do some symbolic execution. Lots of output will scroll by, but we can see KLEE solving the maze and trying every state of the program. If you recall from the driver, we can recognize successful states because they will trigger an assert in KLEE. There are four solutions to the original maze, so let’s see how many we have. There should be 4 results — a good sign (note: your test numbers may be different):

klee --emit-all-errors -libc=uclibc maze_klee.bc
# Lots of things will scroll by
ls klee-last/*assert*
# For me, the output is:
# klee-last/test000178.assert.err  klee-last/test000315.assert.err
# klee-last/test000270.assert.err  klee-last/test000376.assert.err

Now let’s use a quick bash script to look at the outputs and see if they match the original results. The solutions identified by KLEE from the mcsema bitcode are:

  • sddwddddsddw
  • ssssddddwwaawwddddsddw
  • sddwddddssssddwwww
  • ssssddddwwaawwddddssssddwwww

… and they match the results from Felipe’s original blog post!


Symbolic execution is a powerful tool that can execute programs on all inputs at once. Using mcsema and KLEE, we can symbolically execute existing closed source binary programs. In this example, we found all solutions to a maze with hidden walls — starting from an opaque binary. KLEE and mcsema could do this while knowing nothing about mazes and without being tuned for string inputs.

This example is simple, but it shows what is possible: using mcsema we can apply the power of KLEE to closed source binaries. We could generate high code coverage tests for closed source binaries, or find security vulnerabilities in arbitrary binary applications.

Note: We’re looking for talented systems engineers to work on mcsema and related projects (contract and full-time). If you’re interested in being paid to work on or with mcsema, send us an email!