Using Static Analysis and Clang To Find Heartbleed

Background

Friday night I sat down with a glass of Macallan 15 and decided to write a static checker that would find the Heartbleed bug. I decided that I would write it as an out-of-tree clang analyzer plugin and evaluate it on a few very small functions that had the spirit of the Heartbleed bug in them, and then finally on the vulnerable OpenSSL code-base itself.

The Clang project ships an analysis infrastructure with their compiler, it’s invoked via scan-build. It hooks whatever existing make system you have to interpose the clang analyzer into the build process and the analyzer is invoked with the same arguments as the compiler. This way, the analyzer can ‘visit’ every compilation unit in the program that compiles under clang. There are some limitations to clang analyzer that I’ll touch on in the discussion section.

This exercise added to my list of things that I can only do while drinking: I have the best success with first-order logic while drinking beer, and I have the best success with clang analyzer while drinking scotch.

Strategy

One approach to identify Heartbleed statically was proposed by Coverity recently, which is to taint the return values of calls to ntohl and ntohs as input data. One problem with doing static analysis on a big state machine like OpenSSL is that your analysis either has to know the state machine to be able to track what values are attacker influenced across the whole program, or, they have to have some kind of annotation in the program that tells the analysis where there is a use of input data.

I like this observation because it is pretty actionable. You mark ntohl calls as producing tainted data, which is a heuristic, but a pretty good one because programmers probably won’t htonl their own data.

What our clang analyzer plugin should do is identify locations in the program where variables are written using ntohl, taint them, and then alert when those tainted values are used as the size parameter to memcpy. Except, that isn’t quite right, it could be the use is safe. We’ll also check the constraints of the tainted values at the location of the call: if the tainted value hasn’t been constrained in some way by the program logic, and it’s used as an argument to memcpy, alert on a bug. This could also miss some bugs, but I’m writing this over a 24h period with some Scotch, so increasing precision can come later.

Clang analyzer details

The clang analyzer implements a type of symbolic execution to analyze C/C++ programs. Plugging in to this framework as an analyzer requires bending your mind around the clang analyzer view of program state. This is where I consumed the most scotch.

The analyzer, under the hood, performs a symbolic/abstract exploration of program state. This exploration is flow and path sensitive, so it is different from traditional compiler data flow analysis. The analysis maintains a “state” object for each path through the program, and in this state object are constraints and facts about the program’s execution on that path. This state object can be queried by your analyzer, and, your analyzer can change the state to include information produced by your analysis.

This was one of my biggest hurdles when writing the analyzer – once I have a “symbolic variable” in a particular state, how do I query the range of that symbolic variable? Say there is a program fragment that looks like this:

int data = ntohl(pkt_data);
if(data >= 0 && data < sizeof(global_arr)) {
 // CASE A
...
} else {
 // CASE B
 ...
}

When looking at this program from the analyzers point of view, the state “splits” at the if into two different states A and B. In state A, there is a constraint that data is between certain bounds, and in case B there is a constraint that data is NOT within certain bounds. How do you access this information from your checker?

If your checker calls the “dump” method on its given “state” object, data like the following will be printed out:

Ranges of symbol values:
 conj_$2{int} : { [-2147483648, -2], [0, 2147483647] }
 conj_$9{uint32_t} : { [0, 6] }

In this example, conj_$9{uint32_t} is our ‘data’ value above and the state is in the A state. We have a range on ‘data’ that places it between 0 and 6. How can we, as the checker, observe that there’s a difference between this range and an unconstrained range of say [-2147483648, 2147483648]?

The answer is, we create a formula that tests the symbolic value of ‘data’ against some conditions that we enforce, and then we ask the state what program states exist when this formula is true and when it is false. If a new formula contradicts an existing formula, the state is infeasible and no state is generated. So we create a formula that says, roughly, “data > 500″ to ask if data could ever be greater than 500. When we ask the state for new states where this is true and where it is false, it will only give us a state where it is false.

This is the kind of idiom used inside of clang analyzer to answer questions about constraints on state. The arrays bounds checkers use this trick to identify states where the sizes of an array are not used as constraints on indexes into the array.

Implementation

Your analyzer is implemented as a C++ class. You define different “check” functions that you want to be notified of when the analyzer is exploring program state. For example, if your analyzer wants to consider the arguments to a function call before the function is called, you create a member method with a signature that looks like this:

void checkPreCall(const CallEvent &Call, CheckerContext &C) const;

Your analyzer can then match on the function about to be (symbolically) invoked. So our implementation works in three stages:

  1. Identify calls to ntohl/ntoh
  2. Taint the return value of those calls
  3. Identify unconstrained uses of tainted data

We accomplish the first and second with a checkPostCall visitor that roughly does this:

void NetworkTaintChecker::checkPostCall(const CallEvent &Call,
CheckerContext &C) const {
  const IdentifierInfo *ID = Call.getCalleeIdentifier();

  if(ID == NULL) {
    return;
  }

  if(ID->getName() == "ntohl" || ID->getName() == "ntohs") {
    ProgramStateRef State = C.getState();
    SymbolRef 	    Sym = Call.getReturnValue().getAsSymbol();

    if(Sym) {
      ProgramStateRef newState = State->addTaint(Sym);
      C.addTransition(newState);
    }
  }

Pretty straightforward, we just get the return value, if present, taint it, and add the state with the tainted return value as an output of our visit via ‘addTransition’.

For the third goal, we have a checkPreCall visitor that considers a function call parameters like so:

void NetworkTaintChecker::checkPreCall(const CallEvent &Call,
CheckerContext &C) const {
  ProgramStateRef State = C.getState();
  const IdentifierInfo *ID = Call.getCalleeIdentifier();

  if(ID == NULL) {
    return;
  }
  if(ID->getName() == "memcpy") {
    SVal            SizeArg = Call.getArgSVal(2);
    ProgramStateRef state =C.getState();

    if(state->isTainted(SizeArg)) {
      SValBuilder       &svalBuilder = C.getSValBuilder();
      Optional<NonLoc>  SizeArgNL = SizeArg.getAs<NonLoc>();

      if(this->isArgUnConstrained(SizeArgNL, svalBuilder, state) == true) {
        ExplodedNode  *loc = C.generateSink();
        if(loc) {
          BugReport *bug = new BugReport(*this->BT, "Tainted,
unconstrained value used in memcpy size", loc);
          C.emitReport(bug);
        }
      }
    }
  }

Also relatively straightforward, our logic to check if a value is unconstrained is hidden in ‘isArgUnConstrained’, so if a tainted, symbolic value has insufficient constraints on it in our current path, we report a bug.

Some implementation pitfalls

It turns out that OpenSSL doesn’t use ntohs/ntohl, they have n2s / n2l macros that re-implement the byte-swapping logic. If this was in LLVM IR, it would be tractable to write a “byte-swapping recognizer” that uses an amount of logic to prove when a piece of code approximates the semantics of a byte-swap.

There is also some behavior that I have not figured out in clang’s creation of the AST for openssl where calls to ntohs are replaced with __builtin_pre(__x), which has no IdentifierInfo and thus no name. To work around this, I replaced the n2s macro with a function call to xyzzy, resulting in linking failures, and adapted my function check from above to check for a function named xyzzy. This worked well enough to identify the Heartbleed bug.

Solution output with demo programs and OpenSSL

First let’s look at some little toy programs. Here is one toy example with output:

$ cat demo2.c

...

int data_array[] = { 0, 18, 21, 95, 43, 32, 51};

int main(int argc, char *argv[]) {
  int   fd;
  char  buf[512] = {0};

  fd = open("dtin", O_RDONLY);

  if(fd != -1) {
    int size;
    int res;

    res = read(fd, &size, sizeof(int));

    if(res == sizeof(int)) {
      size = ntohl(size);

      if(size < sizeof(data_array)) {
        memcpy(buf, data_array, size);
      }

      memcpy(buf, data_array, size);
    }

    close(fd);
  }

  return 0;
}

$ ../docheck.sh
scan-build: Using '/usr/bin/clang' for static analysis
/usr/bin/ccc-analyzer -o demo2 demo2.c
demo2.c:30:7: warning: Tainted, unconstrained value used in memcpy size
      memcpy(buf, data_array, size);
      ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1 warning generated.
scan-build: 1 bugs found.
scan-build: Run 'scan-view /tmp/scan-build-2014-04-26-223755-8651-1' to
examine bug reports.

And finally, to see it catching Heartbleed in both locations it was present in OpenSSL, see the following:

Image

Image

Discussion

The approach needs some improvement, we reason about if a tainted value is “appropriately” constrained or not in a very coarse-grained way. Sometimes that’s the best you can do though – if your analysis doesn’t know how large a particular buffer is, perhaps it’s enough to show to an analyst “hey, this value could be larger than 5000 and it is used as a parameter to memcpy, is that okay?”

I really don’t like the limitation in clang analyzer of operating on ASTs. I spent a lot of time fighting with the clang AST representation of ntohs and I still don’t understand what the source of the problem was. I kind of just want to consider a programs semantics in a virtual machine with very simple semantics, so LLVM IR seems ideal to me. This might just be my PL roots showing though.

I really do like the clang analyzers interface to path constraints. I think that interface is pretty powerful and once you get your head around how to apply your problem to asking states if new states satisfying your constraints are feasible, it’s pretty straightforward to write new analyses.

Edit: Code Post

I’ve posted the code for the checker to Github, here.

Semantic Analysis of Native Programs, introducing CodeReason

Introduction

Have you ever wanted to make a query into a native mode program asking about program locations that write a specific value to a register? Have you ever wanted to automatically deobfuscate obfuscated strings?

Reverse engineering a native program involves understanding its semantics at a low level until a high level picture of functionality emerges. One challenge facing a principled understanding of a native mode program is that this understanding must extend to every instruction used by the program. Your analysis must know which instructions have what effects on memory calls and registers.

We’d like to introduce CodeReason, a machine code analysis framework we produced for DARPA Cyber Fast Track. CodeReason provides a framework for analyzing the semantics of native x86 and ARM code. We like CodeReason because it provides us a platform to make queries about the effects that native code has on overall program state. CodeReason does this by having a deep semantic understanding of native instructions.

Building this semantic understanding is time-consuming and expensive. There are existing systems, but they have high barriers to entry or don’t do precisely what we want, or they don’t apply simplifications and optimizations to their semantics. We want to do that because these simplifications can reduce otherwise hairy optimizations to simple expressions that are easy to understand. To motivate this, we’ll give an example of a time we used CodeReason.

Simplifying Flame

Around when the Flame malware was revealed, some of its binaries were posted onto malware.lu. Their overall scheme is to store the obfuscated string in a structure in global data. The structure looks something like this:


struct ObfuscatedString {
char padding[7];
char hasDeobfuscated;
short stringLen;
char string[];
};

Each structure has variable-length data at the end, with 7 bytes of data that were apparently unused.

There are two fun things here. First I used Code Reason to write a string deobfuscator in C. The original program logic performs string deobfuscation in three steps.

The first function checks the hasDeobfuscated field and if it is zero, will return a pointer to the first element of the string. If the field is not zero, it will call the second function, and then set hasDeobfuscated to zero.

The second function will iterate over every character in the ‘string’ array. At each character, it will call a third function and then subtract the value returned by the third function from the character in the string array, writing the result back into the array. So it looks something like:


void inplace_buffer_decrypt(unsigned char *buf, int len) {
int counted = 0;
while( counted < len ) {
unsigned char *cur = buf + counted;
unsigned char newChar = get_decrypt_modifier_f(counted);
*cur -= newChar;
++counted;
}
return;
}

What about the third function, ‘get_decrypt_modifier’? This function is one basic block long and looks like this:


lea ecx, [eax+11h]
add eax, 0Bh
imul ecx, eax
mov edx, ecx
shr edx, 8
mov eax, edx
xor eax, ecx
shr eax, 10h
xor eax, edx
xor eax, ecx
retn

An advantage of having a native code semantics understanding system is that I could capture this block and feed it to CodeReason and have it tell me what the equation of ‘eax’ looks like. This would tell me what this block ‘returns’ to its caller, and would let me capture the semantics of what get_decrypt_modifier does in my deobfuscator.

It would also be possible to decompile this snippet to C, however what I’m really concerned with is the effect of the code on ‘eax’ and not something as high-level as what the code “looks like” in a C decompilers view of the world. C decompilers also use a semantics translator, but then proxy the results of that translation through an attempt at translating to C. CodeReason lets us skip the last step and consider just the semantics, which sometimes can be more powerful.

Using CodeReason

Getting this from CodeReason looks like this:


$ ./bin/VEEShell -a X86 -f ../tests/testSkyWipe.bin
blockLen: 28
r
...
EAX = Xor32[ Xor32[ Shr32[ Xor32[ Shr32[ Mul32[ Add32[ REGREAD(EAX), I:U32(0xb) ], Add32[ REGREAD(EAX), I:U32(0x11) ] ], I:U8(0x8) ], Mul32[ Add32[ REGREAD(EAX), I:U32(0xb) ], Add32[ REGREAD(EAX), I:U32(0x11) ] ] ], I:U8(0x10) ], Shr32[ Mul32[ Add32[ REGREAD(EAX), I:U32(0xb) ], Add32[ REGREAD(EAX), I:U32(0x11) ] ], I:U8(0x8) ] ], Mul32[ Add32[ REGREAD(EAX), I:U32(0xb) ], Add32[ REGREAD(EAX), I:U32(0x11) ] ] ]
...
EIP = REGREAD(ESP)

This is cool, because if I implement functions for Xor32, Mul32, Add32, and Shr32, I have this function in C, like so:


unsigned char get_decrypt_modifier_f(unsigned int a) {
return Xor32(
Xor32(
Shr32(
Xor32(
Shr32(
Mul32(
Add32( a, 0xb),
Add32( a, 0x11) ),
0x8 ),
Mul32(
Add32( a, 0xb ),
Add32( a, 0x11 ) ) ),
0x10 ),
Shr32(
Mul32(
Add32( a, 0xb ),
Add32( a, 0x11 ) ),
0x8 ) ),
Mul32(
Add32( a, 0xb ),
Add32( a, 0x11 ) ) );
}

And this also is cool because it works.


C:\code\tmp>skywiper_string_decrypt.exe
CreateToolhelp32Snapshot

We’re extending CodeReason into an IDA plugin that allows us to make these queries directly from IDA, which should be really cool!

The second fun thing here is that this string deobfuscator has a race condition. If two threads try and deobfuscate the same thread at the same time, they will corrupt the string forever. This could be bad if you were trying to do something important with an obfuscated string, as it would result in passing bad data to a system service or something, which could have very bad effects.

I’ve used CodeReason to attack string obfuscations that were implemented like this:


xor eax eax
push eax
sub eax, 0x21ece84
push eax

Where the sequence of native instructions would turn non-string immediate values into string values (through a clever use of the semantics of twos compliment arithmetic) and then push them in the correct order onto the stack, thereby building a string dynamically each time the deobfuscation code ran. CodeReason was able to look at this and, using a very simple pinhole optimizer, convert the code into a sequence of memory writes of string immediate values, like:


MEMWRITE[esp] = '.dll'
MEMWRITE[esp-4] = 'nlan'

Conclusions

Having machine code in a form where it can be optimized and understood can be kind of powerful! Especially when that is available from a programmatic library. Using CodeReason, we were able to extract the semantics of string obfuscation functions and automatically implement a string de-obfuscator. Further, we were able to simplify obfuscating code into a form that expressed the de-obfuscated string values on their own. We plan to cover additional uses and capabilities of CodeReason in future blog posts.

Ending the Love Affair with ExploitShield

Introduction

ExploitShield has been marketed as offering protection “against all known and unknown 0-day day vulnerability exploits, protecting users where traditional anti-virus and security products fail.” I found this assertion quite extraordinary and exciting! Vulnerabilities in software applications are real problems for computer users worldwide. So far, we have been pretty bad at providing actual technology to help individual users defend against vulnerabilities in software.

In my opinion, Microsoft has made the best advances with their Enhanced Mitigation Experience Toolkit. EMET changes the behavior of the operating system to increase the effort attackers have to expend to produce working exploits. There are blog posts that document exactly what EMET does.

In general, I believe that systems that are upfront and public about their methodologies are more trustworthy than “secret sauce” systems. EMET is very upfront about their methodologies, while ExploitShield conceals them in an attempt to derive additional security from obscurity.

I analyzed the ExploitShield system and technology and the results of my analysis follow. To summarize, the system is very predictable, attackers can easily study it and adapt their attacks to overcome it and the implementation itself creates new attack surface. After this analysis, I do not believe that this system would help an individual or organization defend themselves against an attacker with any capability to write their own exploits, 0-day or otherwise.

Caveat

The analysis I performed was on their “Browser” edition. It’s possible that something far more advanced is in their “Corporate” edition, I honestly can’t say because I haven’t seen it. However, given the ‘tone’ of the implementation that I analyzed, and the implementation flaws that are in it, I doubt this possibility and believe that the “Corporate” edition represents just “more of the same.” I am welcome to being proven wrong.

Initial Analysis

Usually we can use some excellent and free tools to get a sense of software’s footprint. I like to use GMER for this. GMER surveys the entire system and uses a cross-view technique to identify patches made to running programs.

If you recall, from ExploitShields marketing information, we see popup boxes that look like this:

This screenshot has some tells in it, for example, why is the path specified? If this was really blocking the ‘exploit’, shouldn’t it never get as far as specifying a path on the file system?

In the following sections, I’ll go over each phase of my analysis as it relates to a component of or a concept within ExploitShield.

ExploitShield uses a Device Driver

One component of the ExploitShield system is a device driver. The device driver uses an operating-system supported mechanism (PsSetCreateProcessNotifyRoutine) to receive notification from the operating system when a process is started by the operating system.

Each time a process starts, the device driver examines this process and optionally loads its own user-mode code module into the starting process. The criteria for loading a user-mode code module is determined by whether or not the starting process is a process that ExploitShield is protecting.

User-Mode Component

The user-mode component seems to exist only to hook/detour specific functions.

The act of function hooking, also called function detouring, involves making modifications to the beginning of a function such that when that function is invoked, another function is invoked instead. The paper on Detours by MS Research explains the concept pretty thoroughly.

Function hooking is commonly used as a way to implement a checker or reference monitor for an application. A security system can detour a function, such as CreateProcessA, and make a heuristics-based decision on the arguments to CreateProcessA. If the heuristic indicates that the behavior is suspect, the security system can take some action, such as failing the call to CreateProcessA or terminating the process.

Hooked Functions

ExploitShield seems to function largely by detouring the following methods:

WinExec
CreateProcessW/A
CreateFileW/A
* ShellExecute
* UrlDownloadToFileW/A
* UrlDownloadToCacheFileW/A

Here we can get a sense of what the authors of ExploitShield meant when they said “After researching thousands of vulnerability exploits ZeroVulnerabilityLabs has developed an innovative patent-pending technology that is able to detect if a shielded application is being exploited maliciously”. These are functions commonly used by shellcode to drop and execute some other program!

Function Hook Behavior

Each function implements a straightforward heuristic. Before any procedure (on x86) is invoked, the address to return to after the procedure is finished is pushed onto the stack. Each hook retrieves the return address off of the stack, and asks questions about the attributes of the return address.

  • Are the page permissions of the address RX (read-execute)?
  • Is the address located within the bounds of a loaded module?

If either of these two tests fail, ExploitShield reports that it has discovered an exploit!

A Confusion of Terms

  • Vulnerability: A vulnerability is a property of a piece of software that allows for some kind of trust violation. Vulnerabilities have a really broad definition. Memory corruption vulnerabilities have had such an impact on computer security that many times, ‘vulnerability’ is used simply as a shorthand for ‘memory corruption vulnerability’ however other kinds of vulnerabilities do exist, for example information disclosure vulnerabilities or authentication bypass vulnerabilities. An information disclosure vulnerability could sometimes be worse for individual privacy than a memory corruption vulnerability.
  • Exploit: An exploit is a software or procedure that uses a vulnerability to effect some action, usually to execute a payload.
  • Payload: Attacker created software that executes after a vulnerability has been used to compromise a system.

It is my belief that when ExploitShield uses the term ‘exploit’, they really mean ‘payload’.

A Good Day for ExploitShield

So what is a play by play of ExploitShield functioning as expected? Let’s take a look, abstracting the details of exactly which exploit is used:

  1. A user is fooled into navigating to a malicious web page under the attackers control. They can’t really be blamed too much for this, they just need to make this mistake once and the visit could be the result of an attacker compromising a legitimate website and using it to serve malware.
  2. This web page contains an exploit for a vulnerability in the user’s browser. The web browser loads the document that contains the exploit and begins to parse and process the exploit document.
  3. The data in the exploit document has been modified such that the program parsing the document does something bad. Let’s say that what the exploit convinces the web browser to do is to overwrite a function pointer stored somewhere in memory with a value that is the address of data that is also supplied by the exploit. Next, the vulnerable program calls this function pointer.
  4. Now, the web browser executes code supplied by the exploit. At this point, the web browser has been exploited. The user is running code supplied by the attacker / exploit. At this point, anything could happen. Note how we’ve made it all the way through the ‘exploitation’ stage of this process and ExploitShield hasn’t entered the picture yet.
  5. The executed code calls one of the hooked functions, say WinExec. For this example, let’s say that the code executing is called from a page that is on the heap, so its permissions are RWX (read-write-execute).

ExploitShield is great if the attacker doesn’t know it’s there, and, isn’t globally represented enough to be a problem in the large for an attacker. If the attacker knows it’s there, and cares, they can bypass it trivially.

A Bad Day for ExploitShield

If an attacker knows about ExploitShield, how much effort does it take to create an exploit that does not set off the alarms monitored by ExploitShield? I argue it does not take much effort at all. Two immediate possibilities come to mind:

  • Use a (very) primitive form of ROP (Return-Oriented Programming). Identify a ret instruction in a loaded module and push that onto the stack as a return address. Push your return address onto the stack before this address. The checks made by ExploitShield will pass.
  • Use a function that is equivalent to one of the hooked functions, but is not the hooked function. If CreateProcess is hooked, use NtCreateProcess instead.

Both of these would defeat the protections I discovered in ExploitShield. Additionally, these techniques would function on systems where ExploitShield is absent, meaning that if an attacker cared to bypass ExploitShield when it was present they would only need to do the work of implementing these bypasses once.

Obscurity Isn’t Always Bad

The principle of ‘security through obscurity’ is often cited by security nerds as a negative property for a security system to hold. However, obscurity does actually make systems more secure as long as the defensive system remains obscure or unpredictable. The difficulty for obscurity-based defensive techniques lies in finding an obscure change that can be made with little cost and that the attacker can’t adapt to before they are disrupted by it, or a change that can be altered for very little cost when its obscurity is compromised.

For example, consider PatchGuard from Microsoft. PatchGuard ‘protects’ the system by crashing when modifications are detected. The operation of PatchGuard is concealed and not published by Microsoft. As long as PatchGuards operation is obscured and secret, it can protect systems by crashing them when it detects modification made by a rootkit.

However, PatchGuard has been frequently reverse engineered and studied by security researchers. Each time a researcher has sat down with the intent to bypass PatchGuard, they have met with success. The interesting thing is what happens next: at some point in the future, Microsoft silently releases an update that changes the behavior of PatchGuard such that it still accomplishes its goal of crashing the system if modifications are detected, but is not vulnerable to attacks created by security researchers.

In this instance, obscurity works. It’s very cheap for Microsoft to make a new PatchGuard, indeed the kernel team might have ten of them “on the bench” waiting for the currently fielded version to be dissected and bypassed. This changes the kernel from a static target into a moving target. The obscurity works because it is at Microsoft’s initiative to change the mechanism, changes are both cheap and effective, and the attacker can’t easily prepare to avoid these changes when they’re made.

The changes that ExploitShield introduces are extremely brittle and cannot be modified as readily. Perhaps if ExploitShield was an engine to quickly deliver a broad variety of runtime changes and randomly vary them per application, this dynamic would be different.

Some Implementation Problems

Implementing a HIPS correctly is a lot of work! There are fiddly engineering decisions to make everywhere and as the author you are interposing yourself into a very sticky security situation. ExploitShield makes some unnecessary implementation decisions.

The IOCTL Interface

The driver exposes an interface that is accessible to all users. Traditional best-practices for legacy Windows drivers ask that interfaces to the driver only be accessible to the users that should access it. The ExploitShield interface is accessible to the entire system however, including unprivileged users.

The driver processes messages that are sent to it. I didn’t fully discover what type of messages these are, or their format, however IOCTL handling code is full of possibilities for subtle mistakes. Any mistake present inside of the IOCTL handling code could lead to a kernel-level vulnerability, which would compromise the security of your entire system.

This interface creates additional attack surface.

The Hook Logic

Each hook invokes a routine to check if the return address is located in a loaded module. This routine makes use of a global list of modules that is populated only once by a call to EnumerateLoadedModules with a programmer-supplied callback. There are two bugs in ExploitShields methodology to retrieve the list of loaded modules.

The first bug is that there is apparently no mutual exclusion around the critical section of populating the global list. Multiple threads can call CreateProcessA at once, so it is theoretically possible for the user-mode logic to place itself into an inconsistent state.

The second bug is that the modules are only enumerated once. Once EnumerateLoadedModules has been invoked, a global flag is set to true and then EnumerateLoadedModules is never invoked again. If the system observes a call to CreateProcess, and then a new module is subsequently loaded, and that module has a call to CreateProcess, the security system will erroneously flag that module as an attempted exploit.

Neither of these flaws expose the user to any additional danger, they just indicate poor programming practice.

Why Hook At All?

An especially baffling decision made in the implementation of ExploitShield is the use of hooks at all! For each event that ExploitShield concerns itself with (process creation and file write), there are robust callback infrastructures present in the NT kernel. Indeed, authors of traditional anti-virus software so frequently reduced system stability with overly zealous use of hooks that Microsoft very strongly encouraged them to use this in-kernel monitoring API.

ExploitShield uses unnecessarily dangerous programming practices to achieve effects possible by using legitimate system services, possibly betraying a lack of understanding of the platform they aim to protect.

The Impossibility of ExploitShield’s success

What can ExploitShield do to change this dynamic? The problem is, not much. Defensive systems like this are wholly dependent on obscurity. Once studied by attackers, the systems lose their value. In the case of software like this, one problem is that the feedback loop does not inform the authors or users of the security software that the attacker has adapted to the security system. Another problem is that the obscurity of a system is difficult to maintain. The software has to be used by customers, so it has to be available in some sense, and if it is available for customers, it will most likely also be available for study by an attacker.

What Hope Do We Have?

It’s important to note that EMET differs from ExploitShield in an important regard: EMET aims to disrupt the act of exploiting a program, while ExploitShield aims to disrupt the act of executing a payload on a system. These might seem like fine points, however a distinction can be made around “how many choices does the attacker have that are effective”. When it comes to executing payloads, the attackers choices are nearly infinite since they are already executing arbitrary code.

In this regard, EMET is generally not based on obscurity. The authors of EMET are very willing to discuss in great detail the different mitigation strategies they implement, while the author of ExploitShield has yet to do so.

Generally, I believe if a defensive technique makes a deterministic change to program or run-time behavior, an attack will fail until it is adapted to this technique. The effectiveness of the attack relies on the obscurity of the technique, and on whether the change impacts the vulnerability, exploit, or payload. If the attack cannot be adapted to the modified environment, then the obscurity of the mitigation is irrelevant.

However, what if the technique was not obscure, but was instead unpredictable? What if there was a defensive technique that would randomly adjust system implementation behavior while preserving the semantic behavior of the system as experienced by the program? What is needed is identification of properties of a system that, if changed, would affect the functioning of attacks but would not change the functioning of programs.

When these properties are varied randomly, the attacker has fewer options. Perhaps they are aware of a vulnerability that can transcend any permutation of implementation details. If they are not, however, they are entirely at the mercy of chance for whether or not their attack will succeed.

Conclusion

ExploitShield is a time capsule containing the best host-based security technology that 2004 had to offer. In my opinion, it doesn’t represent a meaningful change in the computer security landscape. The techniques used hinge wholly on obscurity and secrecy, require very little work to overcome and only affect the later stage of computer attacks, the payload, and not the exploit.

When compared to other defensive technologies, ExploitShield comes up short. It uses poorly implemented techniques that work against phases of the attack that require very little attacker adaptation to overcome. Once ExploitShield gains enough market traction, malware authors and exploit writers will automate techniques that work around it.

ExploitShield even increases your attack surface, by installing a kernel-mode driver that will processes messages sent by any user on the system. Any flaws in that kernel-mode driver could result in the introduction of a privilege escalation bug into your system.

The detection logic it uses to find shellcode is not wholly flawed, it contains an implementation error that could result in some false positives, however it is generally the case that a call to a runtime library function, with a return address that is not in the bounds of a loaded module, is suspicious. The problem with this detection signature is that it is trivially modified to achieve the same effect. Additionally, this detection signature is not novel, HIPS products have implemented this check for a long time.

This is a shame, because, in my opinion, there is still some serious room for innovation in this type of software…

Follow

Get every new post delivered to your Inbox.

Join 3,758 other followers