ProtoFuzz: A Protobuf Fuzzer

At Trail of Bits, we occasionally perform source code audits. A recent one targeted an application that used Protocol Buffers extensively.

Google’s Protocol Buffers (protobuf) is a common method of serializing data, typically found in distributed applications. Protobufs simplify the generally error-prone task of parsing binary data by letting a developer define the type of data, and letting a protobuf compiler (protoc) generate all the serialization and deserialization code automatically.

Fuzzing a service expecting protobuf-encoded structures directly is not likely to achieve satisfactory code coverage. First, protobuf deserialization code is fairly mature and has seen scrutiny. Second, we are not typically interested in flaws in the protobuf implementation itself. Our main goal is to target the code behind protobuf decoding. Our aim becomes to create valid protobuf-encoded structures that are composed of malicious values.

This library is in sufficiently widespread use that we found it worthwhile to create a generic Protobuf message generator to help with assessments. The message generator is a Python3 library with a simple interface: provided a protobuf definition, it creates Python generators for various permutations of all defined messages. We call it ProtoFuzz.

For data itself, we use the fuzzdb database as the source of values that are generated, but it’s relatively straightforward to define your own collection of values.

Installation

When installing in Ubuntu:

pip install py3-protobuffers
sudo add-apt-repository -y ppa:5-james-t/protobuf-ppa
sudo apt-get -qq update
sudo apt-get -y install protobuf-compiler
git clone --recursive git@github.com:trailofbits/protofuzz.git
cd protofuzz/
python3 setup.py install

Usage

Message generation is handled by ProtobufGenerator instances. Each instance backs a Protobuf-produced class. This class has two functions: create fuzzing strategies and create field dependencies.

A fuzzing strategy defines how fields are permuted. So far just two are defined: linear and permutation. A linear strategy creates a stream of protobuf objects that are the equivalent of Python’s zip() across all values that can be generated. A permutation produces a stream that is a cartesian product of all the values that can be generated. A linear() permutation can be used to get a sense of the kinds of values that will be generated without creating a multitude of values.

Field dependencies force the values of some fields to be created from the values of others via any callable object. This is used for fields that probably shouldn’t be fuzzed, like lengths, CRC checksums, magic values, etc.

The entry point into the library is the `protofuzz.protofuzz` module. It defines three functions:

protofuzz.from_description_string()

Create a dict of ProtobufGenerator objects from a string Protobuf definition.

from protofuzz import protofuzz
message_fuzzers = protofuzz.from_description_string("""
    message Address {
     required int32 house = 1;
     required string street = 2;
    }
""")
for obj in message_fuzzers['Address'].permute():
    print("Generated object: {}".format(obj))
Generated object: house: -1
street: "!"

Generated object: house: 0
street: "!"

Generated object: house: 256
street: "!"

protofuzz.from_file()

Create a dict of ProtobufGenerator objects from a path to a .proto file.

from protofuzz import protofuzz
message_fuzzers = protofuzz.from_file('test.proto')
for obj in message_fuzzers['Person'].permute():
    print("Generated object: {}".format(obj))
Generated object: name: "!"
id: -1
email: "!"
phone {
  number: "!"
  type: MOBILE
}

Generated object: name: "!\'"
id: -1
email: "!"
phone {
  number: "!"
  type: MOBILE
}
...

protofuzz.from_protobuf_class()

Create a ProtobufGenerator from an already-loaded Protobuf class.

Creating Linked Fields

Some fields shouldn’t be fuzzed. For example, fields like magic values, checksums, and lengths should not be mutated. To this end, protofuzz supports resolving selected field values from other fields. To create a linked field, use ProtobufGenerator’s add_dependency method. Dependencies can also be created between nested objects. For example,

fuzzer = protofuzz.from_description_string('''
message Contents {
  required string header = 1;
  required string body = 2;
}
message Payload {
  required int32 length = 1;
  required Contents contents = 2;
}
''')

fuzzer['Payload'].add_dependency('length', 'contents.body', len)
for idx, obj in zip(range(3), fuzzer['Payload'].permute()):
  print("Generated object: {}".format(obj))
Generated object: length: 1
contents {
  header: "!"
  body: "!"
}

Generated object: length: 2
contents {
  header: "!"
  body: "!\'"
}

Generated object: length: 29
contents {
  header: "!"
  body: "!@#$%%^#$%#$@#$%$$@#$%^^**(()"
}
...

Miscellaneous

Although not related to fuzzing directly, Protofuzz also includes a simple logging class that’s implemented as a ring buffer to aid in fuzzing campaigns. See protobuf.log.

Conclusion

We created Protofuzz to assist with security assessments. It gave us the ability to quickly test message-handling code with minimal ramp up.

The library itself is implemented with minimal dependencies, making it appropriate for integration with continuous integration (CI) and testing tools.

If you have any questions, please feel free to reach out at yan@trailofbits.com or file an issue.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s