ReMASTering Applications by Obfuscating during Compilation
In this post, we discuss the creation of a novel software obfuscation toolkit, MAST, implemented in the LLVM compiler and suitable for denying program understanding to even the most well-resourced adversary. Our implementation is inspired by effective obfuscation techniques used by nation-state malware and techniques discussed in academic literature. MAST enables software developers to protect applications with technology developed for offense.
MAST is a product of Cyber Fast Track, and we would like to thank Mudge and DARPA for funding our work. This project would not have been possible without their support. MAST is now a commercial product offering of Trail of Bits and companies interested in licensing it for their own use should contact info@trailofbits.com.
Background
There are a lot of risks in releasing software these days. Once upon a time, reverse engineering software presented a challenge best solved by experienced and skilled reverse engineers at great expense. It was worthwhile for reasonably well-funded groups to reverse engineer and recreate proprietary technology or for clever but bored people to generate party tricks. Despite the latter type of people causing all kinds of mild internet havoc, reverse engineering wasn’t widely considered a serious threat until relatively recently.
Over time, however, the stakes have risen; criminal entities, corporations, even nation-states have become extremely interested in software vulnerabilities. These entities seek to either defend their own network, applications, users, or to attack someone else’s. Historically, software obfuscation was a concern of the “good guys”, who were interested in protecting their intellectual property. It wasn’t long before malicious entities began obfuscating their own tools to protect captured tools from analysis.
A recent example of successful obfuscation is that used by the authors of the Gauss malware; several days after discovering the malware, Kaspersky Lab, a respected malware analysis lab and antivirus company, posted a public plea for assistance in decrypting a portion of the code. That even a company of professionals had trouble enough to ask for outside help is telling: obfuscation can be very effective. Professional researchers have been unable to deobfuscate Gauss to this day.
Motivation
With all of this in mind, we were inspired by Gauss to create a software protection system that leapfrogs available analysis technology. Could we repurpose techniques from software exploitation and malware obfuscation into a state-of-the-art software protection system? Our team is quite familiar with publicly available tools for assisting in reverse engineering tasks and considered how to significantly reduce their efficacy, if not deny it altogether.
Software developers seek to protect varying classes of information within a program. Our system must account for each with equal levels of protection to satisfy these potential use cases:
- Algorithms: adversary knowledge of proprietary technology
- Data: knowledge of proprietary data (the company’s or the user’s)
- Vulnerabilities: knowledge of vulnerabilities within the program
In order for the software protection system to be useful to developers, it must be:
- Easy to use: the obfuscation should be transparent to our development process, not alter or interfere with it. No annotations should be necessary, though we may want them in certain cases.
- Cross-platform: the obfuscation should apply uniformly to all applications and frameworks that we use, including mobile or embedded devices that may run on different processor architectures.
- Protect against state-of-the-art analysis: our obfuscation should leapfrog available static analysis tools and techniques and require novel research advances to see through.
Finally, we assume an attacker will have access to the static program image; many software applications are going to be directly accessible to a dedicated attacker. For example, an attacker interested in a mobile application, anti-virus signatures, or software patches will have the static program image to study.
Our Approach
We decided to focus primarily on preventing static analysis; in this day and age there are a lot of tools that can be run statically over application binaries to gain information with less work and time required by attackers, and many attackers are proficient in generating their own situation-specific tools. Static tools can often very quickly be run over large amounts of code, without necessitating the attacker having an environment in which to execute the target binary.
We decided on a group of techniques that compose together, comprising opaque predicate insertion, code diffusion, and – because our original scope was iOS applications – mangling of Objective-C symbols. These make the protected application impossible to understand without environmental data, impossible to analyze with current static analysis tools due to alias analysis limitations, and deny the effectiveness of breakpoints, method name retrieval scripts, and other common reversing techniques. In combination, these techniques attack a reverse engineer’s workflow and tools from all sides.
Further, we did all of our obfuscation work inside of a compiler (LLVM) because we wanted our technology to be thoroughly baked into the entire program. LLVM can use knowledge of the program to generate realistic opaque predicates or hide diffused code inside of false paths not taken, forcing a reverse engineer to consult the program’s environment (which might not be available) to resolve which instruction sequences are the correct ones. Obfuscating at the compiler level is more reliable than operating on an existing binary: there is no confusion about code vs. data or missing critical application behavior. Additionally, compiler-level obfuscation is transparent to current and future development tools based on LLVM. For instance, MAST could obfuscate Swift on the day of release — directly from the Xcode IDE.
Symbol Mangling
The first and simplest technique was to hinder quick Objective-C method name retrieval scripts; this is certainly the least interesting of the transforms, but would remove a large amount of human-readable information from an iOS application. Without method or other symbol names present for the proprietary code, it’s more difficult to make sense of the program at a glance.
Opaque Predicate Insertion
The second technique we applied, opaque predicate insertion, is not a new technique. It’s been done before in numerous ways, and capable analysts have developed ways around many of the common implementations. We created a stronger version of predicate insertion by inserting predicates with opaque conditions and alternate branches that look realistic to a script or person skimming the code. Realistic predicates significantly slow down a human analyst, and will also slow down tools that operate on program control flow graphs (CFGs) by ballooning the graph to be much larger than the original. Increased CFG size impacts the size of the program and the execution speed but our testing indicates the impact is smaller or consistent with similar tools.
Code Diffusion
The third technique, code diffusion, is by far the most interesting. We took the ideas of Return-Oriented Programming (ROP) and applied them in a defensive manner.
In a straightforward situation, an attacker exploits a vulnerability in an application and supplies their own code for the target to execute (shellcode). However, since the introduction of non-executable data mitigations like DEP and NX, attackers have had to find ways to execute malicious code without the introduction of anything new. ROP is a technique that makes use of code that is already present in the application. Usually, an attacker would compile a set of short “gadgets” in the existing program text that each perform a simple task, and then link those together, jumping from one to the other, to build up the functionality they require for their exploit — effectively creating a new program by jumping around in the existing program.
We transform application code such that it jumps around in a ROP-like way, scrambling the program’s control flow graph into disparate units. However, unlike ROP, where attackers are limited by the gadgets they can find and their ability to predict their location at runtime, we precisely control the placement of gadgets during compilation. For example, we can store gadgets in the bogus programs inserted during the opaque predicate obfuscation. After applying this technique, reverse engineers will immediately notice that the handy graph is gone from tools like IDA. Further, this transformation will make it impossible to use state-of-the-art static analysis tools, like BAP, and impedes dynamic analysis techniques that rely on concrete execution with a debugger. Code diffusion destroys the semantic value of breakpoints, because a single code snippet may be re-used by many different functions and not used by other instances of the same function.
The figures above demonstrate a very simple function before and after the code diffusion transform, using screenshots from IDA. In the first figure, there is a complete control flow graph; in the second, however, the first basic block no longer jumps directly to either of the following blocks; instead, it must refer at runtime to a data section elsewhere in the application before it knows where to jump in either case. Running this code diffusion transform over an entire application reduces the entire program from a set of connected-graph functions to a much larger set of single-basic-block “functions.”
Code diffusion has a noticeable performance impact on whole-program obfuscation. In our testing, we compared the speed of bzip2 before and after our return-oriented transformation and slowdown was approximately 55% (on x86).
Environmental Keying
MAST does one more thing to make reverse engineering even more difficult — it ties the execution of the code to a specific device, such as a user’s mobile phone. While using device-specific characteristics to bind a binary to a device is not new (it is extensively used in DRM and some malware, such as Gauss), MAST is able to integrate device-checking into each obfuscation layer as it is woven through the application. The intertwining of environmental keying and obfuscation renders the program far more resistant to reverse-engineering than some of the more common approaches to device-binding.
Rather than acquiring any copy of the application, an attacker must also acquire and analyze the execution environment of the target computer as well. The whole environment is typically far more challenging to get ahold of, and has a much larger quantity of code to analyze. Even if the environment is captured and time is taken to reverse engineer application details, the results will not be useful against the same application as running on other hosts because every host runs its own keyed version of the binary.
Conclusions
In summary, MAST is a suite of compile-time transformations that provide easy-to-use, cross-platform, state-of-the-art software obfuscation. It can be used for a number of purposes, such as preventing attackers from reverse engineering security-related software patches; protecting your proprietary technology; protecting data within an application; and protecting your application from vulnerability hunters. While originally scoped for iOS applications, the technologies are applicable to any software that can be compiled with LLVM.