McSema is Officially Open Source!

We are proud to announce that McSema is now open source! McSema is a framework for analyzing and transforming machine-code programs to LLVM bitcode. It supports translation of x86 machine code, including integer, floating point, and SSE instructions. We previously covered some features of McSema in an earlier blog post and in our talk at ReCON 2014.

Our talk at ReCON where we first described McSema

Build instructions and demos are available in the repository and we encourage you to try them on your own. We have created a mailing list, mcsema-dev@googlegroups.com, dedicated to McSema development and usage. Questions about licensing or integrating McSema into your commercial project may be directed to opensource@trailofbits.com.

McSema is permissively licensed under a three-clause BSD license. Some code and utilities we incorporate (e.g. Intel PIN for semantics testing) have their own licenses and need to be downloaded separately.

Finally, we would like to thank DARPA for their sponsorship of McSema development and their continued support. This project would not have been possible without them.

A Preview of McSema

On June 28th Artem Dinaburg and Andrew Ruef will be speaking at REcon 2014 about a project named McSema. McSema is a framework for translating x86 binaries into LLVM bitcode. This translation is the opposite of what happens inside a compiler. A compiler translates LLVM bitcode to x86 machine code. McSema translates x86 machine code into LLVM bitcode.

Why would we do such a crazy thing?

Because we wanted to analyze existing binary applications, and reasoning about LLVM bitcode is much easier than reasoning about x86 instructions. Not only is it easier to reason about LLVM bitcode, but it is easier to manipulate and re-target bitcode to a different architecture. There are many program analysis tools (e.g. KLEE, PAGAI, LLBMC) written to work on LLVM bitcode that can now be used on existing applications. Additionally it becomes much simpler to transform applications in complex ways while maintaining original application functionality.

McSema brings the world of LLVM program analysis and manipulation tools to binary executables. There are other x86 to LLVM bitcode translators, but McSema has several advantages:

  • McSema separates control flow recovery from translation, permitting the use of custom control flow recovery front-ends.
  • McSema supports FPU instructions.
  • McSema is open source and licensed under a permissive license.
  • McSema is documented, works, and will be available soon after our REcon talk.

This blog post will be a preview of McSema and will examine the challenges of translating a simple function that uses floating point arithmetic from x86 instructions to LLVM bitcode. The function we will translate is called timespi. It it takes one argument, k and returns the value of k * PI. Source code for timespi is below.

long double timespi(long double k) {
    long double pi = 3.14159265358979323846;
    return k*pi;
}

When compiled with Microsoft Visual Studio 2010, the assembly looks like the IDA Pro screenshot below.

IDA_TimesPi_Original

This is what the original timespi function looks like in IDA.

After translating to LLVM bitcode with McSema and then re-emitting the bitcode as an x86 binary, the assembly looks much different.

IDA_TimesPi_Lifted

How timespi looks after translation to LLVM and re-emission back as an x86 binary. The new code is considerably larger. Below, we explain why.

You may be saying to yourself: “Wow, that much code bloat for such a small function? What are these guys doing?”

We specifically wanted to use this example because it shows floating point support — functionality that is unique to McSema, and because it showcases difficulties inherent in x86 to LLVM bitcode translation.

Translation Background

McSema models x86 instructions as operations on a register context. That is, there is a register context structure that contains all registers and flags and an instruction semantics are expressed as modifications of structure members. This concept is easiest to understand with a simplified pseudocode example. An operation such as ADD EAX, EBX would be translated to context[EAX] += context[EBX].

Translation Difficulties

Now let’s examine why a small function like timespi presents serious translation challenges.

The value of PI is read from the data section.

Control flow recovery must detect that the first FLD instruction references data and correctly identify the data size. McSema separates control flow recovery from translation, and hence can leverage IDA’s excellent CFG recovery via an IDAPython script.

The translation needs to support x86 FPU registers, FPU flags, and control bits.

The FPU registers aren’t like integer registers. Integer registers (EAX, ECX, EBX, etc.) are named and independent. Instructions referencing EAX will always refer to the same place in a register context.

FPU registers are a stack of 8 data registers (ST(0) through ST(7)), indexed by the TOP flag. Instructions referencing ST(i) actually refer to st_registers[(TOP + i) % 8] in a register context.

figure_8_2

This is Figure 8-2 from the Intel IA-32 Software Development Manual. It very nicely depicts the FPU data registers and how they are implicitly referenced via the TOP flag.

Integer registers are defined solely by register contents. FPU registers are partially defined by register contents and partially by the FPU tag word. The FPU tag word is a bitmap that defines whether the contents of a floating point register are:

  • Valid (that is, a normal floating point value)
  • The value zero
  • A special value such as NaN or Infinity
  • Empty (the register is unused)

To determine the value of an FPU register, one must consult both the FPU tag word and the register contents.

The translation needs to support at least the FLDFSTP, and FMUL instructions.

The actual instruction operation such as loads, stores, and multiplication is fairly straightforward to support. The difficult part is implementing FPU execution semantics.

For instance, the FPU stores state about FPU instructions, like:

  • Last Instruction Pointer: the location of the last executed FPU instruction
  • Last Data Pointer: the address of the latest memory operand to an FPU instruction
  • Opcode: The opcode of the last executed FPU instruction

Some of these concepts are easier to translate to LLVM bitcode than others. Storing the address of the last memory operand translates very well: if the translated instruction references memory, store the memory address in the last data pointer field of the register context. Other concepts simply don’t translate. As an example, what does the “last instruction pointer” mean when a single FPU instruction is translated into multiple LLVM operations?

Self-referencing state isn’t the end of translation difficulties. FPU flags like the precision control and rounding control flags affect instruction operation. The precision control flag affects arithmetic operation, not the precision of stored registers. So one can load a double extended precision values in ST(0) and ST(1) via FLD, but FMUL may store a single precision result in ST(0).

Translation Steps

Now that we’ve explored the difficulties of translation, let’s look at the steps needed to translate just the core of timespi, the FMUL instruction. The IA-32 Software Development Manual manual defines this instance of FMUL as “Multiply ST(0) by m64fp and store result in ST(0).” Below are just some of the steps required to translate FMUL to LLVM bitcode.

  • Check the FPU tag word for ST(0), make sure its not empty.
  • Read the TOP flag.
  • Read the value from st_registers[TOP]. Unless the FPU tag word said the value is zero, in which case just read a zero.
  • Load the value pointed to by m64fp.
  • Do the multiplication.
  • Check the precision control flag. Adjust the result precision of the result as needed.
  • Write the adjusted result into st_registers[TOP].
  • Update the FPU tag word for ST(0) to match the result. Maybe we multiplied by zero?
  • Update FPU status flags in the register context. For FMUL, this is just the C1 flag.
  • Update the last FPU opcode field
  • Did our instruction reference data? Sure did! Update the last FPU data field to m64fp.
  • Skip updating the last FPU instruction field since it doesn’t really map to LLVM bitcode… for now

Thats a lot of work for a single instruction, and the list isn’t even complete. In addition to the work of translating raw instructions, there are additional steps that must be taken on function entry and exit points, for external calls and for functions that have their address taken. Those additional details will be covered during the REcon talk.

Conclusion

Translating floating point operations is a tricky, difficult business. Seemingly simple floating point instructions hide numerous operations and translate to a large amount of LLVM bitcode. The translated code is large because McSema exposes the hidden complexity of floating point operations. Considering that there have been no attempts to optimize instruction translation, we think the current output is pretty good.

For a more detailed look at McSema, attend Artem and Andrew’s talk at REcon and keep following the Trail of Bits blog for more announcements.

EDIT: McSema is now open-source. See our announcement for more information.

iOS 4 Security Evaluation

This year’s BlackHat USA was the 12th year in a row that I’ve attended and the 6th year in a row that I’ve participated in as a presenter, trainer, and/or co-organizer/host of the Pwnie Awards. And I made this year my busiest yet by delivering four days of training, a presentation, the Pwnie Awards, and participating on a panel. Not only does that mean that I slip into a coma after BlackHat, it also means that I win at conference bingo.

Reading my excuses for the delay in posting my slides and whitepaper, however, is not why you are reading this blog post. It is to find the link to download said slides and whitepaper:

Attacker Math 101

At SOURCE Boston this year, I gave my first conference keynote presentation. I really appreciate the opportunity that Stacy Thayer and the rest of the SOURCE crew gave me. The presentation was filmed by AT&T and you can watch it on the AT&T Tech Channel. Another thanks goes out to Ryan Naraine for inviting me to give an encore presentation of it for Kaspersky’s SAS conference in Malaga, Spain.

If slides are more your style, you can check out the more recent version from Kaspersky’s SAS 2011: Attacker Math 101.

Upcoming Events in 2011

I’m going to start out 2011 pretty busy on the information security events circuit.  Here are some of the events that I’ll be participating in over the first few months in 2011:

So there you have it: a workshop, a presentation, a round-table, a panel, a training, and a keynote on both coasts of North America and both sides of the Atlantic.  I win at conference bingo!  I’m pretty excited about giving my first ever conference keynote presentation at SOURCE.  I’ll be giving a food-for-thought type of presentation, not the technical sort that I’m used to.  However, just to keep things interesting, I might randomly drop some 0day in the middle of the presentation anyway.

Hacking at Mach 2!

At BayThreat last month, I gave an updated (and more much sober) version of my “Hacking at Mach Speed” presentation from SummerC0n.  Now, since the 0day Mach RPC privilege de-escalation vulnerability has been fixed, I can include full details on it.  The presentation is meant to give a walkthrough on how to identify and enumerate Mach RPC interfaces in bootstrap servers on Mac OS X.  Why would you want to do this?  Hint: there are other uses for these types of vulnerabilities besides gaining increased privileges on single-user Mac desktops.  Enjoy!

  • “Hacking at Mach 2!” (PDF)

BlackHat USA 2010

BlackHat is going to be a busy one for me this year because I am still trying to quit my nasty over-committing habit. But hopefully, I should have something that interests just about everybody.

If you love/hate Macs and like hacking, you should check out the Mac Hacking Class training that I am giving with Vincenzo Iozzo. We’ll be covering a lot of material including discovering and exploiting vulnerabilities, Mac OS X and Mach internals, and writing exploit payloads.

If Windows is more your style, you should check out my presentation, Return-Oriented Exploitation. I’ll be talking about using a variety of return-oriented techniques to bypass DEP/NX and ASLR on modern Windows operating systems, using my exploit for the “Operation Aurora” Internet Explorer vulnerability as an example and live demo. My presentation will be on Thursday at 1:45pm in the Exploitation Track (Augustus 1-2).

Finally, if you don’t really care about Macs or Windows, but do love security vulnerabilities and/or the infosec drama circus (b/c who really cares about the actual work we do?), you should check out the Pwnie Awards. For the 4th year in a row, Alex Sotirov and I have organized the Pwnie Awards to celebrate the achievements and failures of the information security industry. Along with our fellow esteemed judges (Dave Aitel, Mark Dowd, Halvar Flake, Dave Goldsmith, and HD Moore), we will be hosting the Pwnie Awards at 6:00pm on Wednesday, July 28th or July 29th (there seems to be some confusion on exactly which day it’ll be on and where currently). Follow the Pwnie Awards on Twitter for late-breaking updates.

Mac OS X Return-Oriented Exploitation

In The Mac Hacker’s Handbook and a few Mac-related presentations last year, I described my return-oriented exploitation technique for Mac OS X Leopard (10.5) for x86. This technique involved returning into the setjmp() function within dyld (the Mac OS X dynamic linker, which is loaded at a static location) to write out the values of controlled registers to a chosen location in writable and executable memory. By subsequently returning into that location, a few bytes of chosen x86 instructions could be executed. Performing this sequence twice will allow the attacker to execute enough chosen instructions to copy their traditional machine code payload into executable memory and execute it. In Snow Leopard (10.6), Apple has removed setjmp() from dyld, so I had to go back to the drawing board.

For my talk at REcon this year, Mac OS X Return-Oriented Exploitation, I applied my recent research in return-oriented programming and exploitation to Mac OS X to develop a few techniques against Snow Leopard x86 (32-bit) processes. I also talk about why attackers don’t really have to care about 64-bit x86_64 processes on Snow Leopard just yet. If you missed REcon this year (and why would you ever allow that to happen?!), you can download my slides here: Mac OS X Return-Oriented Exploitation.

Hacking at Mach Speed!

The first ever NYC SummerCon last weekend was a blast and everyone seemed to have a great time. As promised, there was 0day at the conference and hopefully no one remembered it because they were too drunk. Here are the slides for my presentation, (they are really no substitute for the live SummerCon experience). This presentation was a mix of some technical background on local Mach RPC on Mac OS X, a bug I found the day before the conference, and some miscellaneous rants from my presentation at BSidesSF.

It was awesome bringing the conference up to NYC and I had a great time opening up for Dr. Raid’s “Busticating DEP” presentation/freestyle busticati rap.

Practical Return-Oriented Programming

At a number of conferences this spring, I am presenting “Practical Return-Oriented Programming.” The talk is about taking the academic and applying it in the real world to developing exploits for Windows that bypass Permanent DEP using my BISC (Borrowed Instructions Synthetic Computer) library.  In the talk, I demonstrate exploitation of the Internet Explorer “Operation Aurora” vulnerability on Windows 7.  These techniques are not at all new, only my implementation is, and it owes much to previous research by Sebastian Krahmer’s “Borrowed Code Chunks” technique , Hovav Shacham’s Return-Oriented Programming, and Pablo Sole’s DEPLIB.

Follow

Get every new post delivered to your Inbox.

Join 3,629 other followers