Close Encounters with Symbolic Execution (Part 2)

This is part two of a two-part blog post that shows how to use KLEE with mcsema to symbolically execute Linux binaries (see the first post!). This part will cover how to build KLEE, mcsema, and provide a detailed example of using them to symbolically execute an existing binary. The binary we’ll be symbolically executing is an oracle for a maze with hidden walls, as promised in Part 1.

As a visual example, we’ll show how to get from an empty maze to a solved maze:

Maze (Before) Maze (After)

Building KLEE with LLVM 3.2 on Ubuntu 14.04

One of the hardest parts about using KLEE is building it. The official build instructions cover KLEE on LLVM 2.9 and LLVM 3.4 on amd64. To analyze mcsema generated bitcode, we will need to build KLEE for LLVM 3.2 on i386. This is an unsupported configuration for KLEE, but it still works very well.

We will be using the i386 version of Ubuntu 14.04. The 32-bit version of Ubuntu is required to build a 32-bit KLEE. Do not try adding -m32 to CFLAGS on a 64-bit version. It will take away hours of your time that you will never get back. Get the 32-bit Ubuntu. The exact instructions are described in great detail below. Be warned: building everything will take some time.

# These are instructions for how to build KLEE and mcsema. 
# These are a part of a blog post explaining how to use KLEE
# to symbolically execute closed source binaries.
 
# install the prerequisites
sudo apt-get install vim build-essential g++ curl python-minimal \
  git bison flex bc libcap-dev cmake libboost-dev \
  libboost-program-options-dev libboost-system-dev ncurses-dev nasm
 
# we assume everything KLEE related will live in ~/klee.
cd ~
mkdir klee
cd klee
 
# Get the LLVM and Clang source, extract both
wget http://llvm.org/releases/3.2/llvm-3.2.src.tar.gz
wget http://llvm.org/releases/3.2/clang-3.2.src.tar.gz
tar xzf llvm-3.2.src.tar.gz
tar xzf clang-3.2.src.tar.gz
 
# Move clang into the LLVM source tree:
mv clang-3.2.src llvm-3.2.src/tools/clang
 
# normally you would use cmake here, but today you HAVE to use autotools.
cd llvm-3.2.src
 
# For this example, we are only going to enable only the x86 target.
# Building will take a while. Go make some coffee, take a nap, etc.
./configure --enable-optimized --enable-assertions --enable-targets=x86
make
 
# add the resulting binaries to your $PATH (needed for later building steps)
export PATH=`pwd`/Release+Asserts/bin:$PATH
 
# Make sure you are using the correct clang when you execute clang — you may 
# have accidentally installed another clang that has priority in $PATH. Lets 
# verify the version, for sanity. Your output should match whats below.
# 
#$ clang --version
#clang version 3.2 (tags/RELEASE_32/final)
#Target: i386-pc-linux-gnu
#Thread model: posix
 
# Once clang is built, its time to built STP and uClibc for KLEE.
cd ~/klee
git clone https://github.com/stp/stp.git
 
# Use CMake to build STP. Compared to LLVM and clang,
# the build time of STP will feel like an instant.
cd stp
mkdir build && cd build
cmake -G 'Unix Makefiles' -DCMAKE_BUILD_TYPE=Release ..
make
 
# After STP builds, lets set ulimit for STP and KLEE:
ulimit -s unlimited
 
# Build uclibc for KLEE
cd ../..
git clone --depth 1 --branch klee_0_9_29 https://github.com/klee/klee-uclibc.git
cd klee-uclibc
./configure -l --enable-release
make
cd ..
 
# It’s time for KLEE itself. KLEE is updated fairly often and we are 
# building on an unsupported configuration. These instructions may not 
# work for future versions of KLEE. These examples were tested with 
# commit 10b800db2c0639399ca2bdc041959519c54f89e5.
git clone https://github.com/klee/klee.git
 
# Proper configuration of KLEE with LLVM 3.2 requires this long voodoo command
cd klee
./configure --with-stp=`pwd`/../stp/build \
  --with-uclibc=`pwd`/../klee-uclibc \
  --with-llvm=`pwd`/../llvm-3.2.src \
  --with-llvmcc=`pwd`/../llvm-3.2.src/Release+Asserts/bin/clang \
  --with-llvmcxx=`pwd`/../llvm-3.2.src/Release+Asserts/bin/clang++ \
  --enable-posix-runtime
make
 
# KLEE comes with a set of tests to ensure the build works. 
# Before running the tests, libstp must be in the library path.
# Change $LD_LIBRARY_PATH to ensure linking against libstp works. 
# A lot of text will scroll by with a test summary at the end.
# Note that your results may be slightly different since the KLEE 
# project may have added or modified tests. The vast majority of 
# tests should pass. A few tests fail, but we’re building KLEE on 
# an unsupported configuration so some failure is expected.
export LD_LIBRARY_PATH=`pwd`/../stp/build/lib
make check
 
#These are the expected results:
#Expected Passes : 141
#Expected Failures : 1
#Unsupported Tests : 1
#Unexpected Failures: 11
 
# KLEE also has a set of unit tests so run those too, just to be sure. 
# All of the unit tests should pass!
make unittests
 
# Now we are ready for the second part: 
# using mcsema with KLEE to symbolically execute existing binaries.
 
# First, we need to clone and build the latest version of mcsema, which
# includes support for linked ELF binaries and comes the necessary
# samples to get started.
cd ~/klee
git clone https://github.com/trailofbits/mcsema.git
cd mcsema
git checkout v0.1.0
mkdir build && cd build
cmake -G "Unix Makefiles" -DCMAKE_BUILD_TYPE=Release ..
make
 
# Finally, make sure our environment is correct for future steps
export PATH=$PATH:~/klee/llvm-3.2.src/Release+Asserts/bin/
export PATH=$PATH:~/klee/klee/Release+Asserts/bin/
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/klee/stp/build/lib/

Translating the Maze Binary

The latest version of mcsema includes the maze program from Felipe’s blog in the examples as demo_maze. In the instructions below, we’ll compile the maze oracle to a 32-bit ELF binary and then convert the binary to LLVM bitcode via mcsema.

# Note: tests/demo_maze.sh completes these steps automatically
cd ~/klee/mcsema/mc-sema/tests
# Load our environment variables
source env.sh
# Compile the demo to a 32-bit ELF executable
${CC} -ggdb -m32 -o demo_maze demo_maze.c
# Recover the CFG using mcsema's bin_descend
${BIN_DESCEND_PATH}/bin_descend -d -func-map=maze_map.txt -i=demo_maze -entry-symbol=main
# Convert the CFG into LLVM bitcode via mcsema's cfg_to_bc
${CFG_TO_BC_PATH}/cfg_to_bc -i demo_maze.cfg -driver=mcsema_main,main,raw,return,C -o demo_maze.bc
# Optimize the bitcode
${LLVM_PATH}/opt -O3 -o demo_maze_opt.bc demo_maze.bc

We will use the optimized bitcode (demo_maze_opt.bc) generated by this step as input to KLEE. Now that everything is set up, let’s get to the fun part — finding all maze solutions with KLEE.

# create a working directory next to the other KLEE examples.
cd ~/klee/klee/examples
mkdir maze
cd maze
# copy the bitcode generated by mcsema into the working directory
cp ~/klee/mcsema/mc-sema/tests/demo_maze_opt.bc ./
# copy the register context (needed to build a drive to run the bitcode)
cp ~/klee/mcsema/mc-sema/common/RegisterState.h ./

Now that we have the maze oracle binary in LLVM bitcode, we need to tell KLEE which inputs are symbolic and when a maze is solved. To do this we will create a small driver that will intercept the read() and exit() system calls, mark input to read() as symbolic, and assert on exit(1), a successful maze solution.

To make the driver, create a file named maze_driver.c with contents from the this gist and use clang to compile the maze driver into bitcode. Every function in the driver is commented to help explain how it works. 

clang -I../../include/ -emit-llvm -c -o maze_driver.bc maze_driver.c

We now have two bitcode files: the translation of the maze program and a driver to start the program and mark inputs as symbolic. The two need to be combined into one bitcode file for use with KLEE. The two files can be combined using llvm-link. There will be a compatibility warning, which is safe to ignore in this case.

llvm-link demo_maze_opt.bc maze_driver.bc > maze_klee.bc

Running KLEE

Once we have the combined bitcode, let’s do some symbolic execution. Lots of output will scroll by, but we can see KLEE solving the maze and trying every state of the program. If you recall from the driver, we can recognize successful states because they will trigger an assert in KLEE. There are four solutions to the original maze, so let’s see how many we have. There should be 4 results — a good sign (note: your test numbers may be different):

klee --emit-all-errors -libc=uclibc maze_klee.bc
# Lots of things will scroll by
ls klee-last/*assert*
# For me, the output is:
# klee-last/test000178.assert.err  klee-last/test000315.assert.err
# klee-last/test000270.assert.err  klee-last/test000376.assert.err

Now let’s use a quick bash script to look at the outputs and see if they match the original results. The solutions identified by KLEE from the mcsema bitcode are:

  • sddwddddsddw
  • ssssddddwwaawwddddsddw
  • sddwddddssssddwwww
  • ssssddddwwaawwddddssssddwwww

… and they match the results from Felipe’s original blog post!

Conclusion

Symbolic execution is a powerful tool that can execute programs on all inputs at once. Using mcsema and KLEE, we can symbolically execute existing closed source binary programs. In this example, we found all solutions to a maze with hidden walls — starting from an opaque binary. KLEE and mcsema could do this while knowing nothing about mazes and without being tuned for string inputs.

This example is simple, but it shows what is possible: using mcsema we can apply the power of KLEE to closed source binaries. We could generate high code coverage tests for closed source binaries, or find security vulnerabilities in arbitrary binary applications.

Note: We’re looking for talented systems engineers to work on mcsema and related projects (contract and full-time). If you’re interested in being paid to work on or with mcsema, send us an email!

8 thoughts on “Close Encounters with Symbolic Execution (Part 2)

  1. I think this is wrong:

    ${BIN_DESCEND_PATH}/bin_descend_wrapper.py -d -func-map=maze_map.txt -i=demo_maze -entry-symbol=main

    and should actually be replaced by:

    ${BIN_DESCEND_PATH}/bin_descend -d -func-map=maze_map.txt -i=demo_maze -entry-symbol=main

  2. You already have the source but instead of compiling directly to LLVM bitcode with clang and running that through KLEE, you’re needlessly complicating the process by inserting your own machinery. The mind boggles.

    > We could generate high code coverage tests for closed source binaries
    You clearly don’t understand the scope of the problem.

  3. I have an error when I compile maze_driver.c
    clang -I/klee/mcsema/mc-sema/common/ -emit-llvm -c -o maze_driver.bc maze_driver.c
    maze_driver.c:65:5: error: conflicting types for ‘exit’
    int exit(int status) {
    ^
    /usr/include/stdlib.h:543:13: note: previous declaration is here
    extern void exit (int __status) __THROW __attribute__ ((__noreturn__));
    ^
    1 error generated.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s