By Alan Chang
Today, we are releasing Magnifier, an experimental reverse engineering user interface I developed during my internship. Magnifier asks, “What if, as an alternative to taking handwritten notes, reverse engineering researchers could interactively reshape a decompiled program to reflect what they would normally record?” With Magnifier, the decompiled C code isn’t the end—it’s the beginning.
Decompilers are essential tools for researchers. They transform program binaries from assembly code into source code, typically represented as C-like code. A researcher’s job starts where decompilers leave off. They must make sense of a decompiled program’s logic, and the best way to drill down on specific program paths or values of interest is often pen and paper. This is obviously tedious and cumbersome, so we chose to prototype an alternative method.
Decompilation at Trail of Bits
Trail of Bits is working on multiple open-source projects related to program decompilation: Remill, Anvill, Rellic, and now Magnifier. The Trail of Bits strategy for decompilation is to progressively lift compiled programs through a tower of intermediate representations (IRs); Remill, Anvill, and Rellic work together to achieve this. This multi-stage approach helps break down the problem into smaller components:
- Remill represents machine instructions in terms of LLVM IR.
- Anvill transforms machine code functions into LLVM functions.
- Rellic transforms the LLVM IR into C code via the Clang AST.
Theoretically, a program may be transformed at any pipeline stage, and Magnifier proves this theory. Using Magnifier, researchers can interactively transform Anvill’s LLVM IR and view the C code produced by Rellic instantaneously.
It started as a REPL
Magnifier started its life as a command-line read-eval-print-loop (REPL) that lets users perform a variety of LLVM IR transformations using concise commands. Here is an example of one of these REPL sessions. The key transformations exposed were:
- Function optimization using LLVM
- Function inlining
- Value substitution with/without constant folding
- Function pointer devirtualization
Magnifier’s first goal was to describe the targets being transformed; depending on the type of transformation, these targets could be instructions, functions, or other objects. To describe these targets consistently and hide some implementation details, Magnifier assigns a unique, opaque ID to all functions, function parameters, basic blocks, and IR instructions.
Magnifier’s next important goal was to track instruction provenance across transformations and understand how instructions are affected by operations. To accomplish this, it introduces an additional source ID. (For unmodified functions, source IDs are the same as current IDs.) Then during each transformation, a new function is created that propagates the source IDs but generates new, unique current IDs. This solution ensures that no function is mutated in place, facilitating before-and-after comparisons of transformations while tracking their provenance.
Lastly, for transformations such as value substitution, Magnifier enables the performance of additional transformations in the form of constant folding. These extra transformations are often desirable. To accommodate different use cases, Magnifier provides granular control over each transformation in the form of a universal substitution interface. This interface allows users to monitor all the transformations and selectively allow, reject, or modify substitution steps as they see fit.
Here’s an example of transformations in action using Magnifier REPL.
First, a few functions are defined as follows:
Here’s the same “mul2” function in annotated LLVM IR:
The opaque IDs and the provenance IDs are shown. “XX|YY” means “XX” is the current ID, and “YY” is the source ID. The IDs in this example are:
Parameter “a”: 45
Basic block (entry): 51
Instruction “ret i32”: 50
Now, substitution takes place that sets the parameter “a” to 10:
The “perform substitution” message at the top shows that a value substitution has happened. Looking at the newly transformed function, each instruction has a new current ID, but the source IDs still track the original function and instructions. Also, a call to “@llvm.assume” is inserted to document the value substitution.
Next, the “b” parameter is substituted for 20, and the two calls to “addOne” are inlined:
The end result is surprisingly simple. We now have a function that calls “@llvm.assume” on “a” and “b” then returns just 231. The constant folding here shows Magnifier’s ability to evaluate simple functions.
MagnifierUI: A More Intuitive Interface
While the combination of a shared library plus REPL is a simple and flexible solution, it’s not the most ideal setup for researchers who just want to use Magnifier as a tool to reverse-engineer binaries. This is where the MagnifierUI comes in.
The MagnifierUI consists of a Vue.js front end and a C++ back end, and it uses multi-session WebSockets to facilitate communication between the two. The MagnifierUI not only exposes most of the features Magnifier has to offer, but it also integrates Rellic, the LLVM IR-to-C code decompiler, to show side-by-side C code decompilation results.
We can try performing the same set of actions as before using the MagnifierUI:
Use the Upload button to open a file.
The Terminal view exposes the same set of Magnifier commands, which we can use to substitute the value for the first argument.
The C code view and the IR view are automatically updated with the new value. We can do the same for the second parameter.
Clicking an IR instruction selects it and highlights the related C code. We can then inline the selected instruction using the Inline button. The same can be done for the other call instruction.
After inlining both function calls, we can now optimize the function using the Optimize button. This uses all the available LLVM optimizations.
Simplified the function down to returning a constant value
Compared to using the REPL, the MagnifierUI is more visual and intuitive. In particular, the side-by-side view and instruction highlighting make reading the code a lot easier.
Capturing the flag with LLVM optimizations
As briefly demonstrated above, we can leverage the LLVM library in various ways, including its fancy IR optimizations to simplify code. However, a new example is needed to fully demonstrate the power of Magnifier’s approach.
Here we have a “mystery” function that calls “fibIter(100)” to obtain a secret value:
It would be convenient to find this secret value without running the program dynamically (which could be difficult if anti-debugging methods are in place) or manually reverse-engineering the “fibIter” function (which can be time-consuming). Using the MagnifierUI, we can solve this problem in just two clicks!
Select the “fibIter” function call instruction and click the “Inline” button
With the function inlined, we can now “Optimize”!
Here’s our answer: 3314859971, the “100th Fibonacci number” that Rellic has tried to fit into an unsigned integer.
This example shows Magnifier’s great potential for simplifying the reverse-engineering process and making researchers’ lives easier. By leveraging all the engineering wisdom behind LLVM optimizations, Magnifier can reduce even a relatively complex function like “fibIter,” which contains loops and conditionals, down to a constant.
Looking toward the Future of Magnifier
I hope this blog post sheds some light on how Trail of Bits approaches the program decompilation challenge at a high level and provides a glimpse of what an interactive compiler can achieve with the Magnifier project.
Magnifier certainly needs additional work, from adding support for transformation types (with the hope of eventually expressing full patch sets) to integrating the MagnifierUI with tools like Anvill to directly ingest binary files. Still, I’m very proud of what I’ve accomplished with the project thus far, and I look forward to what the future holds for Magnifier.
I would like to thank my mentor Peter Goodman for all his help and support throughout my project as an intern. I learned a great deal from him, and in particular, my C++ skills improved a lot with the help of his informative and detailed code reviews. He has truly made this experience unique and memorable!