Revisiting 2000 cuts using Binary Ninja’s new decompiler

April 17, 2020

Page content

It’s been four years since my blog post “2000 cuts with Binary Ninja.” Back then, Binary Ninja was in a private beta and the blog post response surprised its developers at Vector35. Over the past few years I’ve largely preferred to use IDA and HexRays for reversing, and then use Binary Ninja for any scripting. My main reason for sticking with HexRays? Well, it’s an interactive decompiler, and Binary Ninja didn’t have a decompiler…until today!

Today, Vector35 released their decompiler to their development branch, so I thought it’d be fun to revisit the 2000 cuts challenges using the new decompiler APIs.

Enter High Level IL

The decompiler is built on yet another IL, the High Level IL. High Level IL (HLIL) further abstracts the existing ILs: Low Level IL, Medium Level IL, and five others largely used for scripts. HLIL has aggressive dead code elimination, constant folding, switch recovery, and all the other things you’d expect from a decompiler, with one exception: Binary Ninja’s decompiler doesn’t target C. Instead, they’re focusing on readability, and I think that’s a good thing, because C is full of many idiosyncrasies and implicit operations/conversions, which makes it very difficult to understand.

Using one of our 2000 cuts challenges, let’s compare Binary Ninja’s decompiler to its IL abstractions.

First, the original disassembly:

Binary Ninja disassembly (ASM) view of the challenge binary — ASM (interactive version)

The ASM disassembly has 43 instructions, of which three are arithmetic, 27 memory operations (loads, stores), four transfers, one comparison, and eight control flow instructions. Of these instructions, we care mainly about the control flow instructions and a small slice of memory operations.

Next, Low Level IL (LLIL):

Binary Ninja Low Level IL (LLIL) view of the challenge binary — LLIL (interactive version)

LLIL mode doesn’t remove any instructions; it simply provides us a consistent interface to write scripts against, regardless of architecture being disassembled.

With Medium Level IL (MLIL), things start to get better:

Binary Ninja Medium Level IL (MLIL) view of the challenge binary — MLIL (interactive version)

In MLIL, stack variables are identified, dead stores are eliminated, constants are propagated, and calls have parameters. And we’re down to 17 instructions (39% of the original). Since MLIL understands the stack frame, all the arithmetic operations are removed and all the “push” and “pop” instructions are eliminated.

Now, in the decompiler, we’re down to only six instructions (or eight, if you count the variable declarations):

Binary Ninja High Level IL (HLIL) decompiler view of the challenge binary — HLIL (interactive version)

Also available in linear view:

Binary Ninja High Level IL (HLIL) linear view of the challenge binary — HLIL Linear View

Using the new decompiler API to solve 2000 cuts

For the original challenge, we had to extract hard-coded stack canaries out of hundreds of nearly identical executables. I was able to do this with very little difficulty using the LLIL API, but I had to trace some values around and it was a bit unergonomic.

Let’s try it again, but with the HLIL API¹:

#!/usr/bin/env python3

import sys
import binaryninja

# 4 years ago we had to specify the loader format and sleep while analysis
# finished
target = sys.argv[1]
bv = binaryninja.BinaryViewType.get_view_of_file(target, update_analysis=True)

# start is our entry point and has two calls, the second call is main
# grabbing start isn't too different than before
start = bv.get_function_at(bv.entry_point)
start_blocks = list(start.high_level_il) # start only has one block
start_calls = [x for x in start_blocks[0] if x.operation == binaryninja.HighLevelILOperation.HLIL_CALL]
call_main = start_calls[1] # second call is main

# Main has a single call to our handler function
main = bv.get_function_at(call_main.dest.constant)
main_blocks = list(main.high_level_il) # main only has one block
main_calls = [x for x in main_blocks[0] if x.operation == binaryninja.HighLevelILOperation.HLIL_CALL]
call_handler = main_calls[0] # first call is handler

# Here's where the real improvements lie
# Handler has our cookie, which is compared with memcmp in the
# last call of the first block, but our call is folded into the if condition
handler = bv.get_function_at(call_handler.dest.constant)
handler_blocks = list(handler.high_level_il)

# grab all the HLIL_IF instructions out of the first block, which there should only be one.
if_insn = [x for x in handler_blocks[0] if x.operation == binaryninja.HighLevelILOperation.HLIL_IF]

# The call to memcmp is the left side of the condition, the right side is '0':
#   if(memcmp(buf, "cookie", 4) == 0)
call_memcmp = if_insn[0].condition.left

# Now pull the cookie's data pointer out of the call to memcmp
#   arg0 is our input buffer
#   arg1 is the cookie data pointer
#   arg2 is the size of the compare
cookie_ptr = call_memcmp.params[1].constant

# Read the first 4 bytes to get the cookie value, we could also use the
# count of the memcmp here
cookie = bv.read(cookie_ptr, 4)
print(f"Cookie: {cookie}")

2000_cuts_hlil.py

Running the script produces the desired result:

Asciinema recording of the HLIL script processing the 1000 cuts challenge binaries — We quickly process 1000 cuts

Conclusion

The HLIL solution is a bit more concise than our original, and easier to write. Overall, I’m enjoying the Binary Ninja’s new decompiler and I’m pretty excited to throw it at architectures not supported by other decompilers, like MIPS, 68k, and 6502. Hopefully, this is the decade of the decompiler wars!

If you’re interested in learning more about Binary Ninja, check out our Automated Reverse Engineering with Binary Ninja training.

Footnotes

For help writing Binary Ninja plugins that interact with the ILs, check out my bnil-graph plugin, which was recently updated to support HLIL.