QueryCon 2019: A Turning Point for osquery

Has it really been 3 months since Trail of Bits hosted QueryCon? We’ve had such a busy and productive summer that we nearly forgot to go back and reflect on the success of this event!

On June 20-21, Trail of Bits partnered with Kolide and Carbon Back to host the 2nd annual QueryCon, at the Convene Old Slip Convention Center in downtown New York. We beat last year’s attendance with 150 attendees from around the globe. The 14 speakers presented talks on osquery ranging from technical presentations on Linux security event monitoring to discussions of end-user research. We saw familiar faces from last year’s event in San Francisco, and we met many new teams interested in osquery.

_OSS4146

Tania McCormack of Carbon Black presented her user research on introducing osquery to new audiences.

Last year’s inaugural QueryCon brought us all together in person for the first time. QueryCon 2019 strengthened our sense of community and proved a catalyst for positive change: Our productive collaboration generated community-based and technical changes that have put this project back on track.

A new foundation

On June 18th, the day before QueryCon, the Linux Foundation officially announced that they would be taking over ownership of osquery from Facebook. Under the Linux Foundation, the new osquery Foundation will be directed by a Technical Steering Committee (TSC) consisting of engineers and developers from Facebook, Trail of Bits, Google, and Kolide—companies that are using osquery and have committed to supporting the project. The TSC members are:

  • Teddy Reed (Facebook)
  • Alessandro Gario (Trail of Bits)
  • Zachary Wasserman (independent consultant)
  • Victor Vrantchan (from Google, but working independently)
  • Joseph Sokol-Margolis (Kolide)

This change was exciting news to a growing list of companies who rely on osquery for endpoint protection. As we reported in April, osquery outgrew its original home as a Facebook project, and its community’s expectations and needs now exceed what Facebook can be expected to manage on its own. A new community-based governance model was needed, and conference attendees were eager to discuss the change. We hosted a panel discussion with Facebook’s lead osquery maintainer, Teddy Reed, and representatives from the new osquery TSC.

 

 

How the foundation will work

The Linux Foundation functions as a steward for osquery, providing various funding and management platforms. (Learn more about their stewardship model here.) The new osquery TSC will guide and maintain the project with the help of contributions from the greater community, and Trail of Bits will commit to biweekly office hours for public comment and transparent project governance.

Meanwhile, Facebook will turn over credentials and control of funding, infrastructure, hosting, and engineer review to a new committee of maintainers (of which Facebook will remain a member). The organizations on the TSC are contributing significant engineering time to establish build and release processes, and a forthcoming funding platform on CommunityBridge will allow sponsorship.

Technical decisions

The TSC has a significant backlog of contributions to work through, but we’re already seeing a massive acceleration of activity on the project.

First, osquery core will be updated to feature parity with osql, the community- oriented osquery fork by Trail of Bits. The initial goal is a monthly release, with alternating “developer” and “stable” releases. Another big priority is to merge all major independent efforts and private forks into a single canonical osquery that everyone can benefit from.
Once Trail of Bits resolves the technical debt that has accrued on the project—build toolchains, dependency management, CI systems—it will maintain these components and focus on client-driven engineering requests for osquery. Other stakeholders are also contributing a backlog of Pull Requests, which will be prioritized and merged as soon as possible.

A proliferation of committed PRs

One way to track the health and activity of a project on GitHub is by Pull Requests. Over nine months, from September 2018 to the day before QueryCon, there were roughly 35 PRs merged to the osquery project, with only a few from the community outside Facebook. In just the 12 weeks since QueryCon, nearly 90 PRs were successfully merged (representing about 113 commits). More importantly, the majority of those contributions were from outside Facebook.

Trail of Bits alone is responsible for approximately 44 of the PRs merged this summer.

Some highlights from our recent contributions:

  • #5604 and #5610: The new osquery Foundation version of osquery was kicked off by merging the Facebook and Trail of Bits versions of osquery. This meant we restored CMake build support and set up the public CI, which were key improvements brought over from the osql fork.
  • #5706: We refactored the build scripts so that all of osquery’s third-party library dependencies will build from source on Linux. This absolves the need for the project to host and distribute the correct versions of pre-built binaries for these libraries (a job that previously relied on Facebook); improves compatibility across different varieties of Linux; is a prerequisite for our goals of reproducible builds and offline builds; and, finally, avoids incompatibilities arising from system-installed versions of these dependencies.
  • #5696, #5697, and #5640: We fixed and vastly improved the table for querying the certificates stores on Windows. It is now possible to use osquery to list all per-user certificates, whether they are in a registry hive or on the filesystem, and whether or not those users are even logged in at the time of the query. Searching your fleet for anomalous certificates is an important security monitoring and incident response capability, as it can be an indicator of compromise.
  • #5665: We fixed several bugs that we found with the use of dynamic analysis (fuzzing and Clang’s Address Sanitizer). Soon, we plan to incorporate both static and dynamic analysis passes into the CI system, for proactive detection of code vulnerabilities in osquery. This is a best practice that Trail of Bits recommends for all of our clients, and we’re happy to contribute it to the security of osquery.

A new stable release

DSC_6478

During a community workshop at the end of the conference, osquery users and TSC members discussed the best path to the next stable release.

Prior to QueryCon 2019, the most recent major cross-platform release was August 2018. Seven days after the conference, Trail of Bits’ Alessandro Gario provided a pre-release of the new version of osquery. For the past nine months Facebook had refactored osquery around Buck, a build system created and used by Facebook that had long been problematic for the greater community. Our pre-release restored CMake support, CI and packaging, and a few fixes not related to the build system.

Now the first full stable release of osquery is out! It’s a significant effort to improve the build system for the future of osquery, ensuring that:

  • Building osquery from source will no longer rely on Facebook to maintain and host the downloads for pre-built dependencies
  • The osquery project once again has a public-facing Continuous Integration server, automatically testing all proposed contributions to detect any developer mistakes or regressions
  • All contributors can use their preferred build tools: developers inside Facebook can use their build tool, Buck, and developers in the greater community can use the standard build tool, CMake
  • An all-new custom build toolchain for Linux will enable broader Linux support, and eventually reproducible builds

New features for osquery users:

  • The process_events table detects more kinds of events, on Linux
  • More powerful query syntax: osquery now supports a regex_match function to allow searches over a particular column of a given table
  • Initial support for eBPF-based event tracing, on Linux
  • New macOS tables for detecting T1/T2 chips, querying the list of running apps, and listing the installed Atom packages on macOS or Linux
  • The certificates, logged_in_users, and logical_drives tables on Windows are all greatly improved
  • Initial implementation of a new high-performance eventing framework that will enable more types of event-based monitoringImproved ability to profile and benchmark tables’ performance
  • New detections added to the macOS query pack

But wait, there’s more! Dozens of bugs have been squashed, additional security hardening mitigations have been turned on, certain performance cases have been improved and resource leaks plugged, the documentation has been updated…we could go on and on. For a full list of changes in this release, refer to the comprehensive change notes.

QueryCon and beyond

DSC_5954

The hosts of the QueryCon 2019 posed for a team group shot!

We had so much fun hosting QueryCon this year and we want to thank everyone who attended. This event was a catalyst for positive change in our community thanks to the thoughts, discussions, and passion of this year’s attendees. We can’t wait to see how osquery improves now that its development has been unlocked.

What’s next for osquery? We want you to tell us! If you’re using osquery in your organization, let’s talk about what features and fixes should be next. Thanks to a revolutionary meeting of the minds, we now have the power to make it happen.

Crypto 2019 Takeaways

This year’s IACR Crypto conference was an excellent blend of far-out theory and down-to-earth pragmatism. A major theme throughout the conference was the huge importance of getting basic cryptographic primitives right. Systems ranging from TLS servers and bitcoin wallets to state-of-the-art secure multiparty computation protocols were broken when one small sub-component was either chosen poorly or misused. People need to stop using RSA, drop AES-CBC, and make sure they’re generating randomness in a cryptographically secure way.

It wasn’t all attacks and bad news, though. The ascendance of cryptographic tools for privacy-preserving computation continues apace. Zero-knowledge proofs, secure multiparty computation, and secure messaging systems all had major breakthroughs this year. These areas of cryptography have become efficient enough to transcend purely theoretical interest: Companies large and small, from niche blockchain startups to tech giants like Facebook, are deploying cutting-edge cryptographic protocols. Even the city of Boston has used secure multiparty computation in a study on pay equity.

All of the papers presented at Crypto were remarkable and groundbreaking. From this impressive pool, we’ve highlighted the papers we believe will have a substantial impact outside of academia in the near future.

Attacks

Breaking OCB2

The Best Paper award went to the Cryptanalysis of OCB2: Attacks on Authenticity and Confidentiality. This amalgamation of three papers released last fall demonstrates attacks against OCB2, an ISO-standard authenticated encryption scheme, for forging signatures on arbitrary messages and full plaintext recovery. This result is especially shocking because all three versions of OCB have been standardized, and were thought to have airtight security proofs. Fortunately, OCB1 and OCB3 are not vulnerable to this attack because it relies on details specific to how OCB2 applies the XEX* mode of operation.

ECDSA Nonce Bias and Reuse

A well-known weakness in the ECDSA signature scheme is that nonces must be generated uniformly at random; otherwise, an attacker can recover the signer’s private key. In Biased Nonce Sense, Breitner and Heninger demonstrate the real-world practicality of these attacks by scraping the Bitcoin and Ethereum blockchains in search of either duplicate or slightly biased nonces. These attacks allowed them to recover the private keys for over 300 Bitcoin accounts and several dozen Ethereum accounts. In most instances, these accounts had a nonzero balance, which indicates that active users of these blockchains are using libraries with bad nonce generation. This result confirms the need to move towards deterministic nonce generation in ECDSA, as specified in RFC6979—or, even better, to stop using ECDSA entirely and use Ed25519 instead.

Automating TLS Padding Oracle Attack Discovery and Implementation

Developing tools for vulnerability detection proved to be a popular theme this year. One group of researchers developed a tool that automatically scans websites for CBC padding oracle vulnerabilities in their TLS protocol. They found that roughly 1.83% of the Alexa Top Million websites were vulnerable. Matthew Green’s team at Johns Hopkins used SAT and SMT solvers to automate the development of new padding oracle attacks (they actually consider a broader class of attacks called format oracles), which eliminates the laborious task of discovering and writing such attacks by hand. Their tool was able to rediscover common examples of such attacks, including Bleichenbacher’s attack on PKCS#1 v1.5.

Secure Computation

Efficient Zero-Knowledge Proofs

This year was huge for pushing zero-knowledge proofs further into the realm of practicality. Among the most impressive results was the development of the Libra scheme (no relation to the Facebook cryptocurrency). Libra is notable for several reasons. First, it has a pre-processing phase that is only linear in the witness size, not linear in the statement size like SNARKs. Second, it has a prover that runs in linear time with respect to the computation being run in zero-knowledge.

Comparison of zero-knowledge protocols

Only Bulletproofs achieve the same asymptotic efficiency, although they run much slower in practice because they require the prover to perform many expensive cryptographic operations. On the other hand, Bulletproofs have no trusted setup phase and make somewhat more standard cryptographic assumptions. The table above, taken from the Libra paper, shows a comprehensive overview of the most performant zero-knowledge proofs.

Breaking Secure Multiparty Computation

Over in the secure multiparty computation world, Jonathan Katz delivered a keynote talk about a devastating class of vulnerabilities that affects nearly all MPC implementations. The fundamental issue is that these protocols are extremely complex and often leave low-level details up to the implementer. Since MPC protocols are extremely resource intensive, practical implementations often apply optimizations in a haphazard way.

Here’s some background: MPC protocols require the computation of many hash functions, and even relatively simple functions like RSA encryption require the computation of tens of millions of hashes in this context. While we often think of SHA3 as rather fast, in extreme settings like this it’s actually quite slow and is one of the main bottlenecks of the protocol. This led to researchers using fixed-key AES instead, since it’s roughly 50 times faster to compute than SHA3. Originally, this optimization was placed on firm cryptographic ground in the JustGarble system. However, the security proof does not extend to many modern systems. In fact, Katz et al showed that using fixed-key AES in most widespread protocols completely undermines the privacy guarantees of MPC. However, they also showed that a simple modification to fixed-key AES was secure and equally performant.

This attack highlights the recklessness of rushing to deploy cutting-edge cryptography. These protocols are often extremely slow and complex, and few people understand the subtle details of the security proof. More work must be done to quantify the concrete security of these protocols as they are actually instantiated, not just asymptotically using idealized functionalities.

Content Moderation and Signatures

Metadata-Private Message Franking

A fundamental problem in end-to-end messaging systems is how content moderation should be handled. Facebook has been particularly concerned with this issue due to the widespread use of WhatsApp and Messenger, so they developed something called a message franking system. This system allows users to report abusive content without allowing Facebook to see the content of all users’ messages. Message franking provides the following security guarantees:

  • Message privacy: Platform should only learn messages that are reported
  • Accountability: Moderator should always be able to verify that the alleged sender actually sent the reported message
  • Deniability: Only the moderator should be able to verify the reported message

Unfortunately, prior work on message franking is not metadata private—the moderator is able to see the sender and recipient of every message, even those that aren’t reported. This has been remedied in a new scheme that extends the functionality of message franking to private end-to-end encryption protocols using zero-knowledge proofs. Unlike the zero-knowledge proofs discussed above, the ones used in this franking scheme are extremely efficient and produce signatures that are only around 500 bytes.

Ring Signatures With Deniability

Another popular signature-type primitive that has been especially useful in blockchain systems is the ring signature. Ring signatures allow one member of a ring (i.e., a designated group of people) to anonymously sign messages on behalf of the entire group. For example, a ring could be a workers’ union, and a ring signature would allow someone from the union to anonymously make a complaint while verifying that they’re actually part of the union. It would be beneficial in some ring signature use cases if individuals could claim responsibility for signing the message. Conversely, it may also be useful for members of a ring to prove that they did not sign a given message. Park and Sealfon formalized these security properties and developed new ring signature constructions that satisfy them.

Post-Quantum Cryptography

As the NIST post-quantum cryptography standardization effort enters its second phase, it’s essential to understand the concrete security of proposed cryptosystems. Quantum cryptanalysis in the RAM model: Claw-finding attacks on SIKE, one of the two papers chosen for the Best Young Researcher award, develops a new model for thinking about post-quantum security and applies that model to the SIKE cryptosystem. SIKE is a key encapsulation mechanism that uses supersingular isogenies to achieve post-quantum security.

One big takeaway of the paper is that SIKE is more secure than previously thought. But the paper’s formulation of a new way to quantify post-quantum security may have a more enduring impact. This method involves thinking about quantum computers as a classical RAM machine controlling arrays of bits and qubits, then using this model to reason about time/memory tradeoffs in different attack strategies. As a result, the researchers determined that attacks previously thought to be especially potent against SIKE would require so much classical RAM that it would be more efficient to simply run the best known classical attack.

Looking Forward

Navigating the complex landscape of cryptographic protocols and parameter choices continues to be a key point of difficulty for developers. We need to agree as a community that developers should not be responsible for security-critical configuration choices, and move towards building libraries that are misuse resistant, such as Libsodium and Tink. The use of these libraries alone would have prevented many of the attacks on deployed systems we saw this year.

However, we realize that not all systems can realistically support a complete cryptography overhaul, and some are often stuck using a specific library or primitive for legacy reasons. While we saw lots of activity this year around automating vulnerability detection and exploit writing, we’d like to see the community emphasize bug-fixing tools as well. For example, we recently released a tool called Fennec that can rewrite functions in compiled binaries, and we used this tool to detect and repair the use of a static IV in AES-CBC without access to source.

On the theory side, we’d like to see a more stringent examination of the concrete security of cutting-edge protocols like zero-knowledge proofs and MPC. As we saw at Crypto this year, implementation details matter, and many of these complex systems are implemented in ways that either render them insecure or dramatically reduce their security level. This is made worse by the fact that many of these new protocols aren’t based on standard security assumptions like factoring or the discrete log problem. For example, many blockchain companies are rushing to roll out verifiable delay functions, which rely on a very new and poorly understood property of imaginary quadratic number fields. We need more thorough analyses of assumptions like this and how they impact security.

Finally, NIST will select the third round of candidates for their post-quantum cryptography standardization program in 2020. To succeed, NIST will need to rigorously assess the security of the round two candidates. We need more work like the paper on SIKE, which helps us compare classical and quantum security in a more precise way.

DeepState Now Supports Ensemble Fuzzing

by Alan Cao, Francis Lewis High School, Queens, NY

We are proud to announce the integration of ensemble fuzzing into DeepState, our unit-testing framework powered by fuzzing and symbolic execution. Ensemble fuzzing allows testers to execute multiple fuzzers with varying heuristics in a single campaign, while maintaining an architecture for synchronizing generated input seeds across fuzzer queues.

This new ensemble fuzzer includes a new deepstate-ensembler tool and several notable fuzzer engines powered by a new and open-sourced DeepState front-end API/SDK. This SDK enables developers to integrate front-end executors for DeepState and provides seed synchronization support for our ensembler tool.

The Fallacy of Smart Fuzzers

Fuzzers are one of the most effective tools in a security researcher’s toolbox. In recent years, they have been widely studied to build better heuristics, or strategies that fuzzers use to explore programs. However, one thing remains clear: fuzzer heuristics rarely live up to their hype. When scaling up so-called smart fuzzers for real-world programs, their performance often falters, and we end up defaulting back to our “dumb” standard tooling, like AFL and LibFuzzer.

Since we have explored the topic of evaluating fuzzing research in the past, let’s take a tangent and instead explore the possibility of combining various fuzzer heuristics to maximize fuzzer performance, without giving up time in our testing workflow. This led to my summer internship project of integrating ensemble fuzzing into DeepState.

What is Ensemble Fuzzing?

The insight from ensemble fuzzing is that, while certain heuristics work well in certain contexts, combining them should produce greater results than just a single fuzzer with a single strategy. This idea was first introduced in Chen et al.’s EnFuzz paper, however, there are no available open-source or commercial implementations.

Our implementation of an ensemble fuzzer follows the architecture implemented in EnFuzz, as seen below:

alan_1

Given a set of pre-determined diverse base fuzzers with respective local seed queues, we can integrate a global-asynchronous and local-synchronous (GALS) seed synchronization mechanism that pulls interesting seeds from local queues to a shared global queue during an execution cycle. Therefore, as a base fuzzer’s heuristics fail to improve coverage or discover interesting input seeds, it can now pull other fuzzers’ seeds from this global queue in the next execution cycle. Furthermore, once the campaign is terminated, we can receive any fuzzing feedback from the ensembler regarding base fuzzer performance, crash triaging/deduplication, or any other post-processing statistics.

Using ensemble fuzzing and our already powerful unit-testing framework for fuzzing/symbolic execution, DeepState, we are able to approach the following problems during testing:

  • Fuzzer performance diversity – do different fuzzer heuristics contribute varying useful seeds, maximizing the potential for improving coverage and crashes discovered?
  • Fuzzer workflow – how can we do exhaustive fuzz testing and/or symbolic execution while simplifying our workflow?
  • Invariant consistency – do different fuzzers return different results, indicating that there might be a source of nondeterminism in our test?

Spinning up Front-ends

Since DeepState already supports Eclipser as a backend, we chose to first build a front-end API, where a developer can write a front-end wrapper for a fuzzer backend. This orchestrates the running fuzzer process, and performs compile-time instrumentation, pre- and post-processing, and seed synchronization. It also simplifies the fuzzing environment setup by unifying how we can construct tools while implementing functionality.

The snippet below shows an example of a front-end wrapper for AFL. It inherits from a base DeepStateFrontend class and includes methods that define fuzzer-related functionality.

alan_2

Example front-end wrapper for AFL. One inherited method, pre_exec, allows the user to perform sanity checks before execution. Both environment properties (i.e. core dump pattern) and argument parsing are checked.

In order to build a front-end wrapper, we should have the following methods in our fuzzer object:

alan_6Each fuzzer has its own ensemble method, which provides a specialized rsync call to push and pull seeds from a global and local queue directory:

alan_3

Ensemble method for seed synchronization. Each __sync_seeds() call invokes a specialized rsync command to transfer seeds between a local and global queue directory.

Once built, we can use a front-end wrapper as so:

# compile a DeepState harness with AFL instrumentation
$ deepstate-afl --compile_test MyDeepStateHarness.cpp --compiler_args=”-Isomelib/include -lsomelib -lotherlib”

# execute the AFL fuzzer through DeepState
$ deepstate-afl -i seeds -o out ./out.afl

For a more detailed look into the fuzzer front-end API and how you can implement your own frontends, see this tutorial. DeepState has existing front-end executors for AFL, libFuzzer, Angora, Eclipser, and Honggfuzz.

https://asciinema.org/a/262023

Building the Ensembler

Using the unified API, we can now build an ensemble fuzzer that provisions front-end objects and executes fuzzers concurrently while maintaining seed synchronization.

To start, take a DeepState test harness input, and “ensemble compile” multiple instrumented binaries to a workspace directory with our fuzzers through the exposed compile() call from each frontend object.

alan_4

Provisioning a test workspace with binaries with the “ensemble compilation” strategy.

Once complete, each parallel fuzzer process is instantiated through run(). Since each front-end wrapper invokes rsync-style synchronization through ensemble(), the ensembler simply calls it from each front-end after a specified sync cycle (in seconds) to synchronize seeds.

This implementation is surprisingly simple, and was built with around 300 lines of code in Python. Here’s a quick demo running the ensembler on one of our test examples, Crash.cpp.

Fuzz ‘Em All!

Inspired by Google’s fuzzer-test-suite and the aforementioned work in fuzzer-performance testing, we decided that a DeepState-powered test suite, deepstate-test-suite, could help fuzzer performance A/B tests and actual bug-hunting. With our easy-to-use fuzzers and ensembler, let’s evaluate how they stand against real-world test-case benchmarks!

Bignum vulnerabilities are an especially interesting class of bugs because edge cases are much more difficult, and even probabilistic, to discover. This makes them ideal targets for DeepState property tests.

We benchmarked our base and ensembler fuzzers for their performance reproducing an existing test case and a real-world bignum vulnerability—a carry mis-propagation bug in TweetNaCl. Following the evaluation methods in this fuzzing evaluation paper, the average time was measured for each test case using 10 crash instances from well-formed initial inputs:

alan_5

These results provide some interesting insights about fuzzer diversity. Running smarter fuzzers like Angora and Eclipser on smaller contained test cases, like the Runlen example, work well. However, their performance falters when scaled up to the context of actual bug discovery in real-world software, like the TweetNaCl bug. The ensemble fuzzer’s performance shows it can scale up well for both of these test cases.

What’s in the future for Ensemble Fuzzing?

Ensemble fuzzing is a powerful technique that scales to real-world software libraries and programs. Integrating an ensemble fuzzer into DeepState gives power to unit-testing with a simplified workflow, and it opens up the possibility for many other research and engineering efforts.

Based on our current benchmark results, we can’t definitely say that ensemble fuzzing is the best fuzzing strategy, but it’s worth noting that there are always elements of randomness and probabilistic behavior when evaluating fuzzers. Effective ensemble fuzzing may be dependent on base fuzzer selection—determining which fuzzers to invoke and when based on the type of target or bug class being analyzed.

Maybe our current ensemble of fuzzers works effectively on reproducing bignum vulnerabilities, but would they work just as well on other classes of bugs? Would it be even more effective if we invoke fuzzers in a specific order? These are the questions we can answer more accurately with more benchmark tests on diverse targets.

Thanks!

Being a security engineering intern at Trail of Bits has been a wonderful experience, as always. Working with awesome employees and interns has really propelled my understandings in security research and how we can turn insightful academic research into working software implementations, just like my previous work with analyzing cryptographic primitives with Manticore. I’m especially excited to continue to do this at NYU, where I’ll be starting in the spring!

Rewriting Functions in Compiled Binaries

by Aditi Gupta, Carnegie Mellon University

As a summer intern at Trail of Bits, I’ve been working on building Fennec, a tool to automatically replace function calls in compiled binaries that’s built on top of McSema, a binary lifter developed by Trail of Bits.

The Problem

Let’s say you have a compiled binary, but you don’t have access to the original source code. Now, imagine you find something wrong with your program, or something you’d like to change. You could try to fix it directly in the binary—for example, by patching the file in a hex editor—but that becomes tedious very quickly. Instead, being able to write a C function and swap it in would massively speed up the process.

I spent my summer developing a tool that allows you to do so easily. Knowing the name of the function you want to replace, you can write another C function that you want to use instead, compile it, and feed it into Fennec, which will automatically create a new and improved binary.

A Cryptographic Example

To demonstrate what Fennec can do, let’s look at a fairly common cryptographic vulnerability that has shown up in the real world: the use of a static initialization vector in the CBC mode of AES encryption. In the very first step of CBC encryption, the plaintext block is XOR’ed with an initialization vector (IV). An IV is a block of 128 bits (the same as the block size) that is used a single time in any encryption to prevent repetition in ciphertexts. Once encrypted, this ciphertext plays the role of the IV for the next block of plaintext and is XOR’ed with this plaintext block.

aditi_1

This process can become insecure when an initialization vector is constant throughout plaintexts. Under a fixed IV, if every message begins with the same block of plaintext, they will all correspond to the same ciphertext. In other words, a static IV can allow an attacker to analyze multiple ciphertexts as a group rather than as individual messages. Below is an example of an IV being generated statically.

unsigned char *generate_iv() {
  return (unsigned char *)"0123456789012345";
}

Sometimes, developers will use cryptography libraries like OpenSSL to do the actual encryption, but write their own functions to generate IVs. This can be dangerous, since non-random IVs can make AES insecure. I built Fennec to help fix issues like this one—it checks whether IV generation was random or static and replaces the function with a new, secure IV if necessary.

The Process

The end goal was to lift executable binaries to LLVM bitcode with McSema and combine it with some LLVM manipulation to replace any function automatically. I started by understanding my cryptographic example and exploring different ways of patching binaries as a bit of background before getting started.

My first step was to work through a couple of the Matasano Cryptopals Challenges to learn about AES and how it can be used and broken. This stage of the project gave me working encryption and decryption programs in C, both of which called OpenSSL, as well as a few Python scripts to attack my implementations. My encryption program used a static IV generation function, which I was hoping to replace automatically later.

I kept using these C binaries throughout the summer. Then, I started looking at binary patching. I spent some time looking into both LD_PRELOAD and the Witchcraft Compiler Collection, which would work if my IV generation function was dynamically linked into the program. The goal of my project, however, was to replace function calls within binaries, not just dynamically loaded functions.

I didn’t want to complicate everything with lifted bitcode yet, so I started by using clean bitcode that generated directly from source code. I wanted to run an LLVM pass on this bitcode to change the functionality of part of my program—namely, the part that generated an IV.

I started by trying to change the function’s bitcode directly in my pass, but soon moved to writing a new function in C and making my original program call that function instead. Every call to the old function would be replaced with a call to my new function.

After some experiments, I created an LLVM pass that would replace all calls to my old function with calls to a new one. Before moving to lifted bitcode, I added code to make sure I would still be able to call the original function if I wanted to. In my cryptographic example, this meant being able to check whether the original function was generating a static IV and, if so, replace it with the code below, as opposed to assuming it was insecure and replacing it no matter what.

// a stub function that represents the function in original binary
unsigned char *generate_iv_original() {
  unsigned char *result = (unsigned char *)"";
  // the contents of this function do not matter
  return result;
}

unsigned char *random_iv() {
  unsigned char *iv = malloc(sizeof(int) * 16);
  RAND_bytes(iv, 16);  // an OpenSSL call
  return iv;
}

unsigned char *replacement() {
  unsigned char *original = generate_iv_original();
  for (int i = 0; i < 10; i++) {
    unsigned char *iv = generate_iv_original();
    if (iv == original) { . // if the IV is static
      return random_iv();
    }
  }
  return original;
}

With my tool working on clean bitcode, it was time to start looking at lifted bitcode. I familiarized myself with how McSema worked by lifting and recompiling binaries and looking through the intermediate representation. Because McSema changes the way functions are called, it took some extra effort to make my tool work on lifted bitcode in the same way that it had on clean bitcode. I had to lift both the original binary and the replacement with McSema. Additional effort was required because the replacement function in a non-lifted binary doesn’t follow McSema’s calling conventions, so it couldn’t be swapped in trivially.

aditi_2

Function names and types are more complex through McSema, but I eventually made a working procedure. Like the tool for clean bitcode, the original function could be kept for use in the replacement.

The last step was to generalize my process and wrap everything into a command line tool that others could use. So I tested it on a variety of targets (including stripped binaries and dynamically-loaded functions), added tests, and tested my installation process.

The Function Replacement Pass

The complete process consists of three primary steps: 1) lifting the binaries to bitcode with McSema, 2) using an LLVM pass to carry out the function replacement within the bitcode, and 3) recompiling a new binary. The LLVM pass is the core of this tool, as it actually replaces the functions. The pass works by iterating through each instruction in the program and checking whether it is a call to the function we want to replace. In the following code, each instruction is checked for calls to the function to replace.

for (auto &B : F) {
  for (auto &I : B) {
    // check if instruction is call to function to be replaced
    if (auto *op = dyn_cast(&I)) {
      auto function = op->getCalledFunction();
      if (function != NULL) {
        auto name = function->getName();
        if (name == OriginalFunction) {
...

Then, we find the replacement function by looking for a new function with the specified name and same type as the original.

Type *retType = function->getReturnType();
FunctionType *newFunctionType =
  FunctionType::get(retType, function->getFunctionType()->params(), false);

// create new function
newFunction = (Function *)(F.getParent()->getOrInsertFunction(ReplacementFunction, newFunctionType));

The next step is to pass the original function’s arguments to the new call.

CallSite CS(&I);

// get args to original function to be passed to replacement
std::vector arguments;

for (unsigned int i = 0; i uses()) {
User* user = U.getUser();
user->setOperand(U.getOperandNo(), newCall);
}

The Complete Tool

Although the LLVM pass does the work of replacing a given function, it is wrapped with the other steps in a bash script that implements the full process. First, we disassemble and lift both input binaries using McSema.

aditi_3

Lifts binaries with McSema

Next, we analyze and tweak the bitcode to find the names of the functions as McSema represents them. This section of code includes support for both dynamically-loaded functions and stripped binaries, which affect the names of functions. We need to know these names so that we can pass them as arguments to the LLVM pass when we actually do the replacement. If we were to look for the names from the original binary, the LLVM pass wouldn’t be able to find any matching functions, since we’re using lifted bitcode.

aditi_4

Finds the names of functions to be replaced

Finally, we run the pass. If we don’t need access to the original function, we only need to run the pass on the original binary. If, however, we want to call the original function from the replacement, we run the pass on both the original binary and the replacement binary. In this second case, we are replacing the original function with the replacement function, and the stub function with the original function. Lastly, we recompile everything to a new working binary.

aditi_5

Runs the pass and compiles a new binary from updated bitcode

Results

Fennec uses binary lifting and recompilation to make a difficult problem relatively manageable. It’s especially useful for fixing security bugs in legacy software, where you might not have access to source code.

Using this tool, it becomes possible to automatically fix a cryptographic IV vulnerability. As seen below, the original binary encrypts a message identically each time using a static IV. After running Fennec, however, the newly created binary uses a different IV, thereby producing a unique ciphertext each time it is run, even on the same plaintext (blue).

# Original binary
aditi@nessie:~/ToB-Summer19$ ./encrypt ""
MDEyMzQ1Njc4OTAxMjM0NQ==/reJh+5rktBatDpyuJNQEBo++0pyIRGZiNsmZkN09HTPIOBVqQ9ov6CrxPXO7dC4cUJGYzBEsejHuTQyjVQh+XsLCHyDkURmfCuJ+a97raPY+o8pKKt8yf/xTmYMtyq2zf7EQxqPxv2bXKdP+6K+h9KyuO3q4+3JbuJFTesNLy8Np1m9ShJ9UAHvAdO6LCZvQ
N91kz0ytIH+s7LgajIWyises+yz26UBQwOzZLeLcQp4=
176

aditi@nessie:~/ToB-Summer19$ ./encrypt ""
MDEyMzQ1Njc4OTAxMjM0NQ==/reJh+5rktBatDpyuJNQEBo++0pyIRGZiNsmZkN09HTPIOBVqQ9ov6CrxPXO7dC4cUJGYzBEsejHuTQyjVQh+XsLCHyDkURmfCuJ+a97raPY+o8pKKt8yf/xTmYMtyq2zf7EQxqPxv2bXKdP+6K+h9KyuO3q4+3JbuJFTesNLy8Np1m9ShJ9UAHvAdO6LCZvQ
N91kz0ytIH+s7LgajIWyises+yz26UBQwOzZLeLcQp4=
176

aditi@nessie:~/ToB-Summer19$ ./encrypt ""
MDEyMzQ1Njc4OTAxMjM0NQ==/reJh+5rktBatDpyuJNQEBo++0pyIRGZiNsmZkN09HTPIOBVqQ9ov6CrxPXO7dC4cUJGYzBEsejHuTQyjVQh+XsLCHyDkURmfCuJ+a97raPY+o8pKKt8yf/xTmYMtyq2zf7EQxqPxv2bXKdP+6K+h9KyuO3q4+3JbuJFTesNLy8Np1m9ShJ9UAHvAdO6LCZvQ
N91kz0ytIH+s7LgajIWyises+yz26UBQwOzZLeLcQp4=
176

aditi@nessie:~/ToB-Summer19$ bash run.sh 2 ../mcsema-2.0.0-ve/remill-2.0.0/remill-build-2/ /home/aditi/ToB-Summer19/ida-6.9/idal64 encrypt replaceIV generate_iv replacement generate_iv_original -lcrypto

# Fennec's modified binary
aditi@nessie:~/ToB-Summer19$ ./encrypt.new ""
L+PYRFiOKMcu18hSqdGQEw==/aK2hYm/GXHwA2tqZxPmoNccQwW+Zhj7E0PQUSRF+lOLJiEMwOc7yv+/Z2AA0pEJjP7Jq4lHMpq2eIVl73lvav0pJiVlOcmfnFwQ9cu0MW0EWqUdgl2FCsWKtO/TAfGhcQPopJyvP8KD/LHlru4QIfZiym7//tt0V9vvabFCLNiSTRG350XKO/zoydeuRFfSu
0HmNNQbAcLSQkcUETH424RyQ4SxmcreW3krOw30kfJY=
176

aditi@nessie:~/ToB-Summer19$ ./encrypt.new ""
hYnowxN2Z3QyPIzwNaFzJw==/pzCq+V1q5ipHoqJXZ9MaeDr+nMdV5E1RbeI+YrcQqXjFHcVmDSq4yZboEuIJJjkbNbdO5DG6n3CQnZ1C7CumGdaZsddaYJueORROk7X+PnQZUq5bKqvdN7ZJEhK7qaerjogOF4TAotDV3ryLC6l/EWY01DkhGrf0hlXAkjQnOz28lXF40GNMd6pIjcoIbZze
V72v5s5q67fVdKdCzVE3BH76qX8qYS9YnN5JkGLERYA=
176

aditi@nessie:~/ToB-Summer19$ ./encrypt.new ""
r3/wMu5nD3rEFn7N88fCjQ==/MisK9RcK8RLsqjV2nrAfprghBYrBmeJS3FbJ4YG6zHBk+uA0CcZ+R4CSDolAaAPlCmkupfxy6bFHNEqyMVv7moPaiJEAkHDDU/FKen8eAJjMvz9+RK+xmQja238jk7xmaS6JbJOdh8teQ2XiMzlHsBYBVpw89UBFrTqOSN8qtlgU3aR4xUVlwZAA1+Pg2GHy
2CIWQI6ioHGDhN3P3po7MaOldJAgHGZO5d2GluroI70=
176

You can download Fennec and find instructions for its use here.

If you have questions or comments about the tool, you can find Aditi on Twitter at @aditi_gupta0!

Binary symbolic execution with KLEE-Native

by Sai Vegasena, New York University, and Peter Goodman, Senior Security Engineer

KLEE is a symbolic execution tool that intelligently produces high-coverage test cases by emulating LLVM bitcode in a custom runtime environment. Yet, unlike simpler fuzzers, it’s not a go-to tool for automated bug discovery. Despite constant improvements by the academic community, KLEE remains difficult for bug hunters to adopt. We’re working to bridge this gap!

My internship project focused on KLEE-Native, a fork of KLEE that operates on binary program snapshots by lifting machine code to LLVM bitcode.

What doesn’t kill you makes you stronger

KLEE’s greatest strength is also its biggest weakness: It operates on LLVM bitcode. The most apparent strength of operating on bitcode is that KLEE can run on anything that the Clang compiler toolchain can compile: C, C++, Swift, Rust, etc. However, there is a more subtle benefit to KLEE’s approach that is often overlooked. Operating on bitcode means the “supporting code” or “runtime” can be implemented in C or C++, compiled to bitcode, linked against the system under test (SUT), and then subjected to the same symbolic path exploration as the SUT itself.

This provides flexibility and power. For example, a KLEE runtime can implement I/O-related system calls (read, write​, etc.) as plain old C functions. These functions are subject to symbolic exploration, just like the SUT, and contribute to code coverage. This allows KLEE to “see into” the OS kernel, and explore paths that might lead to tricky corner cases.

Now for the downside of operating in bitcode. Typically, KLEE is used on programs where source code is available, but getting bitcode from source code is not easy because of the difficulties created by build systems, configurations, and dependencies. Even when bitcode is available, a vulnerability researcher may have to manually inject KLEE API calls into the source code, link in the KLEE runtime, and possibly stub out or manually model external library dependencies. These tasks become daunting when dealing with large code bases and complicated build systems. McSema is a black-box option when source code is not available, but the limitations of binary control-flow graph recovery and occasional inaccuracies may not produce acceptable results.

KLEE-Native runs on binary program snapshots

First, we focused on getting bitcode for any program, and the approach we took was to operate on bitcode lifted from machine code, as opposed to compiled source. Using a dynamic approach based on snapshots like with GRR, we can start KLEE-Native deep into a program’s execution, which isn’t possible with mainline KLEE.

Snapshotting

By default, the KLEE-Native snapshotter captures the program’s memory and register state right before the first instruction is executed. This means KLEE-Native needs to emulate pre-main code (e.g., loading shared libraries), which isn’t ideal. To avoid emulating that type of deterministic setup, we implemented a feature that works by injecting an INT3 breakpoint instruction at a user-specified virtual address, via ptrace.

In this mode, the target process executes natively until the breakpoint instruction is hit. Once it’s hit, the snapshotter reclaims control of the target and subsequently dumps the target process memory and a register state structure compatible with Remill into a “workspace” directory. Memory mappings of the original process can then be recreated from this workspace.

$ klee-snapshot-7.0 --workspace_dir ws --breakpoint 0x555555555555 --arch amd64 -- ./a.out

For address-space layout randomization (ASLR) binaries, the --dynamic flag instructs the snapshotter to interpret the breakpoint address as an offset within the main program binary. To do this, we use a neat trick of parsing the /proc/[pid]/maps file for the target process to discover the base virtual address of the loaded program. Do some arithmetic and voila, we have our breakpoint location!

$ klee-snapshot-7.0 --workspace_dir ws --dynamic --breakpoint 0x1337 --arch amd64_avx -- ./a.out

Side note: One interesting side effect involves CPU feature testing. Libc checks for available CPU features using the CPUID instruction and uses it to determine whether to use specialized versions of some functions (e.g., a hand-coded, SSE4-optimized memset). If a snapshot is taken after CPUID is executed natively, then you must specify an AVX option for the architecture of the snapshot. Otherwise those kinds of instructions might not lift.

Dynamically lifting machine code to bitcode

Now that we have our snapshot, we can ask KLEE-Native to dynamically lift and execute the program. We can do this with the following command:

$ klee-exec-7.0 --workspace_dir ws

The following diagram shows the control flow of klee-exec.

Screenshot from 2019-08-09 15-11-26

At a high level, KLEE-Native just-in-time decodes and lifts traces, which are simply LLVM function objects that contain a logical segment of the lifted machine code. Traces contain LLVM instructions emulating machine code and calls to Remill intrinsics and other lifted traces.

Intrinsic calls may be handled by KLEE’s “special function handler” capability, which allows for runtime bitcode to have direct access to KLEE’s executor and state. For example, Remill’s memory read-and-write intrinsics use the special function handler to interact with the snapshotted address space.

When emulating a function like malloc, traces are created from libc’s implementation, and execution is able to continue smoothly. All is good in the world. We see the light and everything makes sense…

… Just kidding!

brk and mmap come along, and now we have to execute a system call. What do we do?

The runtime is the kernel and the machine

KLEE-Native lifts all machine code down to the raw system call instructions. Using Remill, system call instructions are handled by a call to the intrinsic function __remill_async_hyper_call in the runtime bitcode. Remill doesn’t specify the semantics of this function, but the intent is that it should implement any hardware- or OS-specific functionality needed to carry out a low-level action.

In our case, KLEE-Native implements the __remill_async_hyper_call function, so it passes execution from lifted bitcode back into the runtime bitcode, where each Linux system call wrapper is implemented.

System call wrappers are parameterized by an Application Binary Interface (ABI) type, which finds the system call number and arguments and stores the return value. Here is an example the SysOpen function, which wraps the POSIX open system call implemented by KLEE’s runtime.

trent_edit

Snippet of emulated open system call

This wrapper performs some of the error checking that the actual OS kernel would do. (i.e., making sure the file path name can be read from the snapshotted address space, checking the path length, etc.) Here is where we play to KLEE’s strengths: All of these error-checking paths are subject to symbolic exploration if any of the data being tested is symbolic.

We have now dynamic-lifted machine code to bitcode, and we’ve emulated system calls. That means we can run the lifted machine code and have it “talk” to KLEE’s own runtime the same way that bitcode compiled from source might do.

Our call to malloc continues executing and allocating a chunk of memory. Life is good again, but things are starting to get slow. And boy, do I mean sloooooooow.

Recovering source-level abstractions with paravirtualization

Operating on machine code means that we must handle everything. This is problematic when translating seemingly benign libc functions. For example, strcpy, strcmp, and memset require significant lifting effort, as their assembly is comprised of SIMD instructions. Emulating the complex instructions that form these functions ends up being more time consuming than if we had emulated simplistic versions. This doesn’t even address the sheer amount of state forking that can occur if these functions operate on symbolic data.

We paravirtualized libc to resolve this issue. This means we introduced an LD_PRELOAD-based library into snapshotted programs that lets us interpose and define our own variants of common libc functions. In the interceptor library, our paravirtualized functions are thin wrappers around the POSIX originals, and executing them results in the original POSIX functions being called before the snapshot.

Their purpose is to be a single point of entry that we find and patch during the snapshot phase. In the following example, the snapshotter will patch over a JMP with a NOP instruction so that malloc ends up invoking INT 0x81 when it is emulated by KLEE-Native.

trent_edits3

LD_PRELOAD malloc interceptor

When an interrupt is hit during emulation, we check the interrupt vector number and hook to a corresponding paravirtualized libc runtime function. Sometimes our paravirtualized versions of the libc functions cannot handle all cases, so we support a fallback mechanism, where we give control to the original libc function. To do this, we increment the emulated program counter by one and jump over the RETN instruction, which leads to the original function being executed.

Here are the libc functions that our LD_PRELOAD library currently intercepts. Each of the numbers (e.g., malloc’s 0x81) is the interrupt vector number that corresponds with the paravirtualized version of that function.

inter_alias_nbc

libc​ functions with paravirtualized equivalents

Wonderful! Our call to malloc has, instead, hit an interrupt, and we are able to hook to its paravirtualized version in KLEE. What exactly are we doing with it, and why do we care?

Modeling heap memory with libc interceptors

A call to mmap or brk during an emulated malloc will layout memory for allocations. While this is an accurate representation of the lifted machine code instructions, it is not a productive model for finding bugs. The problem: mmaps are very coarse grained.

Every access to memory can be seen, but it is unclear where a given allocation begins or ends. As a result, it is difficult to do things like bounds checks to detect overflows and underflows. Furthermore, there is no oversight on bug classes like double frees and use-after-frees, when allocations are opaque blobs.

That’s why we have interposed on malloc and other allocation routines to formulate a crystalline memory model that demarcates memory allocations. This approach makes it trivial for KLEE-Native to classify and report heap vulnerabilities. What’s unique about this approach is that we have invented a totally new address format for allocations to make our lives easier. It contains metadata about each allocation, which makes it simple to locate in our memory model.

trent_edit2

Union showing the components of our custom address encoding

Allocations backed by our paravirtualized malloc don’t truly “exist” in a traditional address space. Instead, they exist within allocation lists. These are structures that allow individual allocations to coexist so that they’re easy to access, track, and manipulate, which makes bounds checks, double-free detection, and overflow/underflow detection extremely transparent. Furthermore, allocation lists give us the flexibility to recover from issues like heap-based buffer overflows by expanding the backing “allocation” in place.

Huzzah! We’ve achieved a clear-cut representation of the target’s allocated memory using allocation lists. But wait. Isn’t this supposed to be a symbolic execution engine? All of this is really only concrete execution. What’s going on?

Eager concretization for an improved forking model

Our approach to symbolic execution is a departure from the typical scheduling and forking model used by KLEE. Where KLEE is a “dynamic symbolic executor,” KLEE-Native is closer to SAGE, a static symbolic executor.

KLEE-Native’s approach to forking favors eager concretization and depth-first exploration. The kick is that we can do it without sacrificing comprehensiveness. Meaning, it is always possible to go to a prior point in execution and request the next possible concretization, if that is our policy. This is different than something like KLEE or Manticore, which eagerly fork (i.e., generate a multitude of feasible states, and then defer to a scheduler to choose them over time).

The mechanism we created to enable eager concretization, without sacrificing forking potential, is implemented using state continuations. State continuations are like Python generators. In KLEE-Native, they package up and hold on to a copy of the execution state prior to any forking, as well as any meta-data needed in order to produce the next possible concretization (thus giving us comprehensiveness). The executor can then request the next possible forked state from a given continuation. Thus, each request gives us back a new execution state, where some condition has been concretized (hence the term “eager concretization”).

For now, we store state continuations on a stack. The result is that KLEE-Native “goes deep” before it “goes wide.” This is because at each point where the state could be forked, we create a continuation, get the first state from that continuation, and push it onto the stack. When a state is done executing (e.g. it exits, or an unrecoverable error is encountered), we look at the last continuation on the stack and ask for its next state. This process continues until a continuation is exhausted. If that happens, it is popped off and we go to the next continuation on the stack. In the future, we will explore alternative strategies.

trent4

How to fork when eagerly concretizing

Our approach was motivated by a need to handle symbolic memory addresses. We started adding symbols, but couldn’t always get KLEE to explore all paths. KLEE was concretizing memory addresses, but not in a comprehensive way. This was honestly expected, because symbolic memory is a hard problem.

Implementing concrete memory is easy, because there is essentially a map of addresses to byte values. However, what does it mean when an address could take on many values? We decided the best policy was to create a generic mechanism to concretize the address immediately without throwing away all the other possibilities, and then leave it up to policy handlers to make more substantive approaches. Examples of more substantive policies could be sampling, min/max, etc.

Wonderful! We can now explore the program’s state space. Let’s go hunting.

Applying KLEE-Native in the real world

Because we have control over emulated address space and memory allocations, classifying different types of memory corruption vulnerabilities becomes easy with KLEE-Native, and vulnerability triaging is a fantastic use case for this. Furthermore, our eager concretization strategy ensures we will stick to the code path of interest.

Here is CVE-2016-5180 from the Google fuzzer-test-suite. It is a one-byte-write heap buffer overflow in c-ares that was used in a ChromeOS exploit chain.

We first snapshot the program at main with a dynamic breakpoint:

$ klee-snapshot-7.0 --workspace_dir ws_CVE --dynamic --breakpoint 0xb33 --arch amd64_avx -- ./c_ares

And simply run the klee-exec command:

$ klee-exec-7.0 --workspace_dir ws

Here we get KLEE-Native detecting a one-byte heap overflow.

repro_c

So what makes KLEE-Native special compared to AddressSanitizer or Valgrind? This is where our policy handler comes in. One policy to handle memory access violations like this one is replacing overflow bytes with symbolic ones. As execution continues, we could potentially diagnose the severity of the bug by reporting the range of symbolic overflow bytes at the end. This could let a vulnerability researcher distinguish states that allow limited possibility for an overflow from ones that could potentially allow a write-what-where primitive.

In KLEE-Native, undefined behavior can be the new source for symbolic execution. This enables vulnerability triaging without prior knowledge of your threat model and the need for tedious reverse engineering.

Au revoir!

My internship produced KLEE-Native; a version of KLEE that can concretely and symbolically execute binaries, model heap memory, reproduce CVEs, and accurately classify different heap bugs. The project is now positioned to explore applications made possible by KLEE-Native’s unique approaches to symbolic execution. We will also be looking into potential execution time speed-ups from different lifting strategies. As with all articles on symbolic execution, KLEE is both the problem and the solution.

Reverse Taint Analysis Using Binary Ninja

by Henry Wildermuth, Horace Mann High School

We open-sourced a set of static analysis tools, KRFAnalysis, that analyze and triage output from our system call (syscall) fault injection tool KRF. Now you can easily figure out where and why, KRF crashes your programs.

During my summer internship at Trail of Bits, I worked on KRF, a fuzzer that directly faults syscalls to cause crashes. KRF works extremely well and pumps out core dumps like nobody’s business. However, it is difficult to determine which faulted syscall caused a particular crash since there could be hundreds of faulted syscalls in a single run. Manually tracing the cause of the crash through the source or disassembled binary is tedious, tiring, and prone to errors.

I set out to solve this problem using a technique I call Reverse Taint Analysis, and implemented my solution using the Binary Ninja API. The script gives a short list of possible causes of the crash, drastically limiting the amount of manual work required. Here, I describe the process I went through to create the algorithm and script, and give a brief overview of the additional tools built to ease its use.

Human in the Loop

How can we reliably determine the source of a crash? Well, how would a human determine the cause of a crash? First, we would look at the stack trace and figure out where the crash occurred. Let’s take a look at this vulnerable example program:

#include 
#include

void fillBuffer(char * string, unsigned len) {
  for (unsigned i = 0; i &lt; len; ++i) {
      string[i] = 'A'; // if string = NULL, segfaults on invalid write
  }
}

int main() {
  char *str;

  // Memory allocated
  str = (char *) malloc(16); // if malloc fails, str = NULL 
  fillBuffer(str, 16); // str not checked for malloc errors!
  free(str);
  return 0;
}

Running KRF against this program caused a fault. In this case, we can easily guess why the crash occurred—a faulted brk or mmap caused malloc to return NULL, which produced a segfault when fillBuffer tried to write to NULL. But let’s figure out the cause of the crash for certain, acting as if we didn’t have access to the source code.

First, let’s open up the core dump’s stack trace with gdb and see what caused the crash:

(gdb) bt
#0 0x00005555555546a8 in fillBuffer ()
#1 0x00005555555546e1 in main ()

Next, let’s take a look at our memory maps for the process so we can find the instruction in the binaries:

(gdb) info proc mappings
Mapped address spaces:
Start Addr End Addr Size Offset objfile
0x555555554000 0x555555555000 0x1000 0x0 /vagrant/shouldve_gone_for_the_head
[output truncated]

Now, cross-referencing the address from the stack trace, we see that the instructions of the top stack frame, at 0x555555554000, are in the binary /vagrant/shouldve_gone_for_the_head. We can calculate the instruction pointer’s offset in the binary by subtracting its location in the mapped address space from the beginning of the memory-mapped objfile and adding the offset:

0x00005555555546a8 - 0x555555554000 + 0x0 = 0x6a8.

Great! Now we can examine the binary itself in our disassembler of choice (Binary Ninja) and see what went wrong.

binja1

Here, we can see the disassembly of the fillBuffer() function, with the instruction that causes the segfault highlighted in red. This instruction sets the byte pointed to by rax to the character code for A. So, the issue must be an invalid value of rax. Looking back, we see that rax = rax + rdx, which are both previously set to the local variables string and i, respectively. We can see in the instruction at 0x68e that string was originally stored in rdi, which is the first argument to the function. i is initialized to zero and is only incremented, so we can ignore it, since we know it could not have been tainted by a function call or the function’s arguments.

Knowing that the first argument to fillBuffer() is tainted, we can go to the next frame in the stack trace and see what happened. We perform the same subtraction with the memory map to the address in the stack trace, 0x00005555555546e1, and get the actual address:

0x00005555555546e1 - 0x555555554000 + 0x0 = 0x6e1.​

This address is going to be one instruction after the function call to fillBuffer() since it is the return address. So, we want to examine the instruction directly before the one at 0x6e1. Let’s open it up in Binary Ninja!

binja2

Here, we have the instruction at 0x6e1 highlighted in blue, and the previous instruction highlighted in red. We know from our manual analysis of fillBuffer that the first parameter is stored in rdi, so we should track the data being stored in rdi. In the instruction before, we see that rdi is set to rax, and above that, there is a call to malloc, which stores its return value in rax.

Great! Now we know that the output of malloc gets passed into fillBuffer, where it causes the segfault. We’ve figured it out! But that was really annoying. If only there were a better way…

Enter MLIL Static Single Assignment

Well, it turns out there is a better way! Binary Ninja can decompile code into something called Medium Level IL (MLIL), which is a more human-readable form of assembly. It can then convert that MLIL into a form called Static Single Assignment (SSA), where every variable is assigned exactly once. This becomes really useful, because we don’t need to worry about things changing a variable other than its definition. As an example of SSA, consider this pseudocode function:

def f(a):
  if a < 5:
    a = a * 2
  else:
    a = a - 5
  return a

In SSA form is:

def f(a0):
  if a0 < 5:
    a1 = a0 * 2
  else:
    a2 = a0 - 5
  a3 = Φ(a1, a2) // meaning “a3 is either a1 or a2”
  return a3

So, let’s look at our same example again through the lens of SSA MLIL. Here’s fillBuffer in SSA MLIL:

binja3

Here, we can easily trace rax_2#4 to rax_1#3 + rdx_1#2, then trace rax_1#3 to string#1, which we see is arg1. We can also easily trace back i and see that it is set to 0. We have once again discovered that the first argument to fillBuffer is the source of the crash. So now, let’s look at main.

binja4

This is where we really see the benefits of SSA MLIL over regular disassembly. It lets us see what arguments are passed into fillBuffer, and what values are returned by malloc, making the analysis much easier. By tracing the sources of rdi#1 backwards, we again see that malloc is tainting the first argument of fillBuffer and, therefore, causing the crash.

We’re in the endgame now

So now that we’ve realized (for the second time) that malloc is the cause of our issues, let’s write out the process we’ve been applying, so we can easily convert it to code:

1. Make an empty stack. 
2. Push the crashing instruction to the stack. 
3. While the stack is not empty: 
4.   Pop an instruction off the stack. 
5.   If it is a MLIL function call instruction: 
6.     The return value of that function call may be cause of crash 
7.   Otherwise: 
8.     For each SSA variable used in the MLIL instruction: 
9.        If it’s not assigned in this function: 
10.         # It’s a function argument 
11.         We will have to go another frame up our stack trace. 
12.         # The same as going to main after finding arg1 was tainted
13.       Otherwise: 
14.         Add the instruction assigning SSA variable to the stack.

This is going to be easy! We just have to write it out in Python using the Binary Ninja API. We need to write a function that takes our instruction’s address and a BinaryView (a class holding information on the binary), and prints out the taint sources of the instruction.

def checkFunction(self, inst_addr, bv): 
  # Get MLILFunction obj for the function containing the instruction 
  func = bv.get_functions_containing(inst_addr)[0].medium_level_il   
  
  # Get the MLILInstruction obj for instruction at inst_addr
  inst = func[func.get_instruction_start(inst_addr)].ssa_form
  
  # Convert MLILFunction to SSA form
  func = func.ssa_form 
  
  # Keep track of what is seen
  visited_instructions = set()
  
  # Variables we are interested in
  var_stack = []  
  
  # Add the vars used by first instruction to stack
  for v in inst.vars_read: 
    var_stack.append(v)
 
  # Continuously run analysis while elements are in the stack
  while len(var_stack) > 0:
    var = var_stack.pop()
    if var not in visited_instructions:
      # Add to list of things seen
      visited_instructions.add(var) 

    # Get variable declaration
    decl = func.get_ssa_var_definition(var)
    
    # Check if its an argument
    if decl is None: 
      print("Argument " + var.var.name + " tainted from function call")
      continue

    # Check if its a function call
    if decl.operation == MediumLevelILOperation.MLIL_CALL_SSA:
      # If direct call
      if decl.dest.value.is_constant:
        # Get MLILFunction object of callee from address 
        func_called = bv.get_function_at(decl.dest.value.value) 
        print("Tainted by call to", func_called.name, "(" + hex(decl.dest.value.value) + ")")
      else: 
        # Indirect calls
        print("Tainted by indirect call at instruction", hex(decl.address))
      continue

    # If not an argument or call, add variables used in instruction to the stack. Constants are filtered out
    for v in decl.vars_read:
      var_stack.append(v)

The power of SSA is used in the vars_read and get_ssa_var_definition methods. MLIL makes detecting calls easy using decl.operation == MediumLevelILOperation.MLIL_CALL_SSA.

Extending the script

We can expand on a lot here with error handling, edge cases, automatically analyzing the frame above in the stack trace, automatically extracting information from the stack trace, etc. Thankfully, I’ve already done some of that with a set of python scripts.

python3 main.py binary coredump1 [coredump2] …

Automatically extracts the needed information from the core dumps, then inserts that information and binaries into a tarball to be copied to another computer, including libraries that are called in the stack trace.

gdb.py

Uses GDB Python API to extract data from each core dump. It’s called by main.py, so they must be in the same directory.

python3 analyze.py tarball.tar.gz

Takes a tarball output by main.py and automatically runs reverse taint analysis on each core dump in it, automatically cascading tainted arguments to the next frame. It uses krf.py to run the analysis, so they must be in the same directory.

krf.py contains the analysis code, which is a more featured version of the script written in this blog post. (Requires the Binary Ninja API.)

Let’s try them on our test binary:

$ # Linux VM with KRF
$ python3 main.py shouldve_gone_for_the_head core
Produced tar archive krfanalysis-shouldve_gone_for_the_head.tar.gz in /vagrant

$ # Machine with Binary Ninja
$ python3 analyze.py krfanalysis-shouldve_gone_for_the_head.tar.gz
Analyzing binary shouldve_gone_for_the_head
Done
Analyzing crash krfanalysis-shouldve_gone_for_the_head/cores/core.json
Tainted by call to malloc (0x560)
All paths checked

Conclusion

Writing this analysis script has shown me the Binary Ninja API is amazing. The versatility and automatic analysis it allows is incredible, especially considering it acts directly on binaries, and its intermediate languages are easy to use and understand.

I’d also like to mention LLVM, another framework for static analysis, which has a very similar API to Binary Ninja. It has many benefits over Binary Ninja, including better access to debug and type information, being free, having a more mature codebase, and always-perfect analysis of calling conventions. Its downside is that it needs the source code or LLVM IR of what you are analyzing.

Three LLVM passes are available in the KRFAnalysis repository to run static analysis: one detecting race conditions caused by checking the state of a system before use (i.e. time-of-check, time-of-use or TOC/TOU), another detecting unchecked errors from standard library calls, and a third reimplementing reverse taint analysis.

My summer: A small price to pay for salvation

I am incredibly grateful to everyone at Trail of Bits for my internship. I gained some amazing technical experience and got the chance to work with the Linux Kernel, FreeBSD Kernel, and LLVM—codebases I had previously considered to be mystical.

Some of my highlights:

  • I ported KRF to FreeBSD
  • Added the ability for KRF to target processes by PID, GID, UID, or if it had a specific file open
  • Wrote LLVM passes for static analysis
  • Upstreamed LLVM changes
  • Learned how to use Binary Ninja and its API
  • Picked up good coding practices
  • Gained a sense of the security industry

I also met some incredible people. I would like to give special thanks to my mentor Will Woodruff (@8x5clPW2), who was always willing to talk over an implementation, idea, or review my pull requests. I can’t wait to apply what I’ve learned at Trail of Bits as I move forward in my career.

Wrapper’s Delight

by Patrick Palka, University of Illinois at Chicago

During my summer at Trail of Bits, I took full advantage of the latest C++ language features to build a new SQLite wrapper from scratch that is easy to use, lightweight, high performance, and concurrency friendly—all in under 750 lines of code. The wrapper is available at https://github.com/trailofbits/sqlite_wrapper under the Apache 2.0 license. Comments and pull requests are welcome.

The motivation for this new SQLite wrapper came from working on an internal code-auditing tool built on top of Clang that uses a database to store and perform queries on semantic information about source code. Originally, RocksDB was chosen as the backing database, but I quickly found myself wrestling with the rigidity of a key-value database as queries became more complex. Wishing that we had a more expressive relational database, I began to explore switching to SQLite.

Initial experiments suggested that switching would not degrade performance if the database was properly tuned (see below), so I began looking at existing C++ SQLite wrappers to see which, if any, could suit our needs. We wanted something that would let us perform raw SQL queries that fit all our criteria for being both lightweight and able to handle concurrency. Unfortunately, none of the existing wrappers satisfied all of these, so I set out to write one from scratch.

After migrating over the backend to SQLite, we were impressed by the scalability and feature-richness of SQLite. It has a command line interface that makes debugging and prototyping easy, handles databases on the order of 100GB without a sweat and even has a built-in full-text search (FTS) extension. The wrapper makes interfacing with SQLite in C++ about as easy as possible, too.

Details

  • An example of its usage can be found at https://gist.github.com/patrick-palka/ffd836d0294f71d183f4199d0e842186. For simplicity, we chose to model parameter and column bindings as variadic function calls, so that all binds are specified at once.
  • Some of the modern C++ language features you’ll notice are inline and template variables, constexpr if and auto template parameters, generic lambdas, fold expressions, and thread_local variables.
  • This wrapper supports user-defined serialization and deserialization hooks. (https://gist.github.com/patrick-palka/d22df0eceb9b73ed405e6dfec10068c7)
  • This wrapper also supports automatic marshaling of C++ functions into SQL user functions. (https://gist.github.com/patrick-palka/00f002c76ad35ec55957716879c87ebe)
  • There is no database or connection object to explicitly initialize. Because the wrapper utilizes thread_local objects to manage connections to the database, a connection is made implicitly before the first use of the connection and is disconnected once the thread exits.
  • The database name and query strings are passed as template arguments instead of function arguments. This creates compile-time separation of the thread_local connection objects, which are per-database, and the thread_local prepared-statement caches, which are per-database, per-query-string. This design decision discourages the use of dynamically-generated query strings, since non-type template arguments must be compile-time constant expressions. In cases where a database name or a query string must be dynamically generated, the wrapper does support passing a lambda, which builds and returns the query string at runtime.
  • Every single prepared statement made by this wrapper is cached and reused, so the bare minimum number of calls to sqlite3_prepare is made throughout the lifetime of the program.
  • Downside: This wrapper cannot be used to manually manage connections to the database. It currently handles connections using a thread_local object, so a connection is created before the first query is performed on a given thread and is destroyed during thread exit. If you find yourself needing fine-grained control of when to connect or disconnect from your SQLite database, this wrapper may not work well for you. But this is limitation may be amended in the future.

Fine-tuning SQLite

Here are some SQLite tuning tips to maximize performance. Our wrapper does the first three for you automatically.

  1. Prefer “external” FTS tables when using SQLite’s FTS extension. Build the table after the data is inserted using the ‘rebuild’ command. (https://www.sqlite.org/fts5.html#the_rebuild_command)
  2. Reuse prepared statements. Create them using the sqlite_prepare_v3() routine and pass in the SQLITE_PREPARE_PERSISTENT option. (https://www.sqlite.org/c3ref/prepare.html)
  3. Use the SQLITE_STATIC option when binding text and blob values via the sqlite3_bind_*() routines. Ensure that the underlying memory is valid until the first call to sqlite3_step(). This avoids a redundant copy of the text or blob data. (https://www.sqlite.org/c3ref/bind_blob.html)
  4. Perform your insertions and updates in bulk transactions when possible. The speedup relative to using unit-size transactions grows nearly linearly with the size of the transaction, so inserting 10,000 rows per transaction is thousands of times faster than inserting 1 row per transaction.
  5. Create your indexes after all most or all of your data has been inserted, and choose your indices wisely. Creating indices once is going to be faster overall than continuously building and rebuilding them as more data gets inserted. Use the SQLite command-line interface to double-check each query’s plan, and install a log callback to have SQLite inform you whenever it decides to create a temporary index.
  6. Don’t use the same database connection or prepared statement object concurrently. SQLite serializes access to these objects. (https://sqlite.org/threadsafe.html) Also, the isolation component of ACID is guaranteed only between separate connections to the same database. (https://www.sqlite.org/isolation.html)
  7. Consider pragma temp_store = memory when storing temporary tables and data structures in memory. (https://www.sqlite.org/pragma.html#pragma_temp_store)

SQLite C API Tips

Finally, here are some miscellaneous tips to simplify working with the SQLite C API, where the first two are done for you by our wrapper.

  1. Install a concurrency-friendly sqlite_busy_handler to avoid having to check for `SQLITE_BUSY` after every API call. (https://www.sqlite.org/c3ref/busy_handler.html)
  2. Set up a log callback to have errors and other notices, like hints on where to add indices, printed automatically. (https://www.sqlite.org/errlog.html)
  3. Bundle a copy of SQLite into your project. This is the recommended way to use SQLite in your application. (https://www.sqlite.org/custombuild.html) Doing so also lets you enable SQLite’s full-text-search extension and its other useful disabled-by-default extensions.
  4. Use C++11 raw string literals to format query strings.

Final Thoughts

This summer, some of my takeaways were that when locally storing a moderate amount of structured data without large concurrency demands, sooner or later you will want to perform complex queries on this data. Unless you have a clear vision for your data-access patterns from the outset, using a key-value database will quickly back you into a corner whenever your data-access pattern changes. On the other hand, relational databases make it easy to adapt your database to continuously changing access patterns. And finally, modern C++ can help make interfacing with SQLite and other C APIs concise and easy, and when configured properly, SQLite is quite scalable.

A Day in the Life of Alessandro Gario, Senior Security Engineer

People interested in joining Trail of Bits often ask us what it’s like to work on the Engineering Services team. We felt that the best answer would be a profile of some of the talented individuals on our team, and let them describe their experiences at Trail of Bits in their own words.

Today, we’re featuring Alessandro Gario, a member of our Engineering Team who lives in Italy. Alessandro works on open-source projects implementing new functionalities, reducing technical debt and improving their performance overall.

unnamed

How did you end up at Trail of Bits?

I first learned of Trail of Bits in my Twitter feed. I was on the lookout for new opportunities, so I started sniffing around the company and learning about its many open-source projects. I began with McSema, a project for lifting compiled executables to LLVM IR. Originally, I just wanted to try out the software, but I wanted to talk to the developers so I ended up in the Empire Hacking slack, where Trail of Bits engineers answer questions about their open-source work. My main contributions on the project were to the CMake code, improving the build experience and implementing better dependency management.

Dan Guido (CEO, Trail of Bits) noticed my contributions to McSema, and happened to have an immediate need for someone to work on osquery issues, so he made me an offer. Dan sent me a contract to work on a single task, and I officially became a Trail of Bits contractor! I had so much fun; it was the first time I was allowed so much freedom in working on a project — both in when I could work, and how I could direct my own tasking.

My contract ended when I finished the osquery task. With more time on my hands, it was now the perfect opportunity to engage more with the community, and take on the bugfixes and feature requests submitted by the users. Eventually, Trail of Bits had received enough requests for osquery work that they sent me a full-time job offer, and the rest is history.

What projects are you working on currently?

Primarily I am involved with the osquery project, having dedicated so much of my time to it that I was accepted as a member of the five-person committee of maintainers! The project, and especially its build toolchain, is currently being renovated to operate independently of its old home at Facebook.

I also provide Qt SDK UI work for an internal project where we are creating a powerful source code navigation and cross-referencing utility for security auditors. Beyond that, I occasionally help out our other projects with their CMake-related issues.

On the side, I’ve continued to pursue experiments with how to fit eBPF into osquery, which is part of an ongoing effort to improve osquery’s event-driven detection capability on Linux. I recently spoke on this topic at QueryCon.

How do you schedule your workday?

When I don’t have to work alone, for any kind of collaboration I need to align my schedule with the rest of the team. Because of the time zone differences, I have to be flexible. If I were to stick to a strict 9am-6pm work shift, it wouldn’t really work. I organize my workday around my preferred schedule, but also that of the US-based Trail of Bits employees and customers who are 6 to 9 hours behind/earlier than me here in Italy. When it’s late afternoon in New York, it’s nighttime in Milan. Most of my meetings are around 5pm or 6pm my time, which suits me. It has never been a problem; I really like the schedule.

What are the most challenging aspects of your job?

Sometimes a task’s requirements, at least the way the task is initially envisioned, are hard to implement due to technical or design hard constraints. That’s difficult, because you have to find a creative compromise that works for everyone.

On rare occasions that I get stuck, I get the help of the team. Our Slack channels are like a mini StackOverflow website: you can just ask, and get immediate answers from experts. That is one of the great things about working here.

When contributing to any open-source project with external maintainers, you will eventually have to work with people outside the company to finish your job and get the work integrated into the next release. Sometimes, you have to work a little extra after you think the task is “finished,” because you still have to work with the upstream project to make everyone happy.

What is the most rewarding aspect of your work at Trail of Bits?

I was always interested in information security. I would look at Twitter and see all of these conferences, events, and people who were building great things. I am finally able to travel to these events and meet these people. I even gave my first conference talk last month, at QueryCon!

I am exposed to challenging issues that make me learn, especially when I get other people at the company involved. The ability to work with, and learn from, a talented group of experienced engineers is a reward in itself.

When I am given a task, I am trusted with the responsibility to see it through to the end, and work on it on my own. I do my best work and feel the most motivated when I am trusted this way.

What is some career advice for someone who wants to join us here?

Whenever I sought positions in the security engineering field, they seemed to be mostly for external pen-testing web services, which wasn’t particularly interesting to me. I’ve done a little bit of reverse-engineering and CTFs, but vulnerability research is not really my field either. I like to apply my engineering skills working on projects to build software. I’ve decided that you have to actively seek something that challenges and interests you, and carve out your own opportunity.

My advice is to find a relevant project you would like to support, and look for easy issues to solve, or even just review an open Pull Request, or improve the documentation. Once you get to know the project, it becomes easier to start contributing cool changes.

This is exactly what has worked for me personally. I know it is hard, because most people don’t have the time for after-hours work. And there’s no guarantee that you will get hired. But choose projects that are intrinsically motivating to you, and keep doing cool stuff as much as possible in your spare time. Have fun, and in the end you will get noticed.

The Engineering Services Team is Hiring

We appreciate Alessandro taking the time from his projects to talk about what it’s like to work here. Our Engineering Services Team is distributed around the globe, and each of our engineers brings a unique set of skills and contributions. Our work is public-facing, open-source, and client-driven. In close partnership with our customers, we are continuously working to extend and improve endpoint security solutions like osquery, Santa, and gVisor. Our recent work includes the implementation of 2FA support within PyPI, the Python package management system. We contribute to security event alerting pipeline projects like StreamAlert or the Carbon Black API, and are always working to improve our own security analysis tools like McSema and Remill. Our customers rely on us to solve their open-source security software challenges.

We are currently hiring for another Senior Security Engineer. Please apply if you are interested and feel you are qualified!

246 Findings From our Smart Contract Audits: An Executive Summary

Until now, smart contract security researchers (and developers) have been frustrated by limited information about the actual flaws that survive serious development efforts. That limitation increases the risk of making critical smart contracts vulnerable, misallocating resources for risk reduction, and missing opportunities to employ automated analysis tools. We’re changing that. Today, Trail of Bits is disclosing the aggregate data from every full smart contract security review we’ve ever done. The most surprising and impactful results we extracted from our analysis are:

  • Smart contract vulnerabilities are more like vulnerabilities in other systems than the literature would suggest.
  • A large portion (about 78%) of the most important flaws (those with severe consequences that are also easy to exploit) could probably by detected using automated static or dynamic analysis tools.
  • On the other hand, almost 50% of findings are not likely to ever be found by any automated tools, even if the state-of-the-art advances significantly.
  • Finally, manually produced unit tests, even extensive ones, likely offer either weak or, at worst, no protection against the flaws an expert auditor can find.

Continue reading this post for a summary of our study and more details on these results.

Everyone wants to prevent disastrous vulnerabilities in Ethereum smart contracts. Academic researchers have supported that effort by describing some categories of smart contract vulnerabilities. However, the research literature and most online discourse is usually focused on understanding a relatively small number of real-world exploits, typically with a bias towards the highly visible. For example, reentrancy bugs are widely discussed because they were responsible for the infamous DAO attack and are somewhat unique to smart contracts. However, is reentrancy the most common serious problem in real smart contracts? If we don’t know, then we cannot effectively allocate resources to preventing smart contract vulnerabilities. It’s not enough to understand detection techniques. We have to know how and where to apply them. Smart contracts are new. Decision makers have a relatively shallow pool of developer/analyst experience upon which to base their actions. Having real data to draw conclusions from is essential.

So, we collected the findings from the full final reports for twenty-three paid security audits of smart contract code we performed, five of which have been kept private. The public audit reports are available online, and make informative reading. We categorized all 246 smart-contract related findings from these reports, in some cases correcting the original audit categorization for consistency, and we considered the potential for both static and dynamic analysis tools, in the long-run, to detect each finding. We also compared the frequencies of categories to those for fifteen non-smart-contract audits we performed. Using paid, expert audits of code means our statistics aren’t overwhelmed by the large number of relatively silly contracts on the blockchain. Using many audit findings instead of a handful of exploited vulnerabilities gives us a better picture of potential problems to keep watch for in the future.

Category Frequencies are Different than Other Audits… But Not as Much as You’d Think

The most common type of smart contract finding is also the most common kind of finding in the 15 non-smart-contract audits we examined in order to compare smart contract and other kinds of audits: data validation flaws are extremely common in every setting, constituting 36% of smart contract findings, and 53% of non-smart contract findings. This is no surprise; accepting inputs that should be rejected, and lead to bad behavior, will always be easy to do, and always be dangerous. Access control is another common source of problems in smart contracts (10% of findings) and in other systems we audited (18%); it’s easy to accidentally be too permissive, and access control can also be disastrous if too restrictive (for example, when even the owner of a contract can’t perform critical maintenance tasks in some states, due to a contract bug).

Some categories of problems are much less common in smart contracts: unsurprisingly, denial of service, configuration, and cryptography issues are less frequent in a context where the blockchain abstracts away communication issues and operating system/platform-specific behavior changes, and gas limits reduce the temptation to roll your own cryptography. Data exposure problems are also less common in smart contracts; most developers seem to understand that data on the blockchain is inherently public, so there seem to be fewer misunderstandings about the consequences of “surprise” visibility. But these cases are somewhat unusual; for a majority of types of finding, including overflow/underflow and arithmetic precision, patching, authentication, timing, error reporting, and auditing and logging, the percentages in findings are within 10% of those for non-smart-contract audits.

The Worst of the Worst

In addition to counting how many findings there were in each category, we looked at how serious those findings tended to be; their potential severity, and the difficulty for an attacker to exploit them. We refer to the worst findings as high-low: high severity, low difficulty. These issues can allow an attacker to inflict major damage with relative ease.

Many of our twenty-two categories had no high-low findings, but a small number had high-low rates greater than 10%: access controls (25% high-low), authentication (25%), timing (25%), numerics (23%), undefined behavior (23%), data validation (11%) and patching (11%). Note that much-dreaded reentrancy, while often serious (50% of reentrancy findings were high severity) had no high-low findings at all, and accounted for only 4 of the 246 total findings.

Tools and Automation: We Can Do a Lot Better

For each finding, we determined, to the best of our knowledge, if it could potentially be detected by automated static analysis (e.g., our Slither tool), using a reasonable detector without too many false positives, or by automated dynamic analysis (e.g., with property-based testing like Echidna or symbolic execution like Manticore), either with off-the-shelf properties like standard ERC20 semantics, or using custom invariants. Rather than restricting ourselves to current tools, we looked to the future, and rated a finding as detectable if a tool that could be produced with significant engineering effort, but without unprecedented advances in the state-of-the-art in software analysis, could potentially find the problem. That is, we asked “could we write a tool that would find this, given time and money?” not “can current tools definitely find this?” Obviously, this is a somewhat subjective process, and our exact numbers should not be taken as definitive; however, we believe they are reasonable approximations, based on careful consideration of each individual finding, and our knowledge of the possibilities of automated tools.

Using this standard, 26% of the full set of findings could likely be detected using feasible static approaches, and 37% using dynamic methods (though usually only with the addition of a custom property to check). However, the potential for automated tools is much better when we restrict our attention to only the worst, high-low, findings. While static tools have less potential for detecting high-low findings than dynamic ones (33% vs. 63%), four of the high-low issues could probably only be detected by static analysis tools, which are also much easier to apply, and require less user effort. Combining both approaches, in the best-case scenario, would result in automatic detection of 21 of the 27 high-low findings: almost 78% of the most important findings. Our estimates of how effective static or dynamic analysis tools might be, in the limit, also vary widely by the kind of finding:

Category Dynamic Static
Access controls 50% 4%
API inconsistency 0% 0%
Auditing/logging 0% 38%
Authentication 25% 0%
Code quality 0% 67%
Coding bug 67% 50%
Configuration 0% 0%
Cryptography 0% 100%
Data exposure 0% 0%
Data validation 57% 22%
Denial of service 40% 0%
Documentation 0% 0%
Error reporting 29% 14%
Front-running 0% 0%
Logic 0% 0%
Missing logic 67% 0%
Numerics 46% 69%
Patching 17% 33%
Race condition 6% 59%
Reentrancy 75% 100%
Timing 50% 25%
Undefined behavior 0% 31%

Of course, in some cases, these percentages are not particularly informative; for instance, there was only one cryptography finding in our audit set, so it isn’t safe to assume that all cryptography bugs are easy to catch with a static analysis tool. Similarly, the coding bug category, containing what amount to “typos” in code, is likely to have an even higher percentage of easily statically detected problems, but there were only a handful of such problems in our audits.

Our best guess is that the combination of major impact on system behavior (thus high severity) and low difficulty (thus easy to find, in some sense) is not only a boon to would-be attackers, but a big help to automated analysis tools. That’s good news. Ongoing efforts to improve smart contract analysis tools are well worth the effort. That’s part of our motivation in releasing Crytic, a kind of Travis CI for smart contracts — with built-in support for running static analysis (including some Slither detectors not yet available in the public release) and, soon, dynamic analysis, on your code, automatically.

Perhaps the most important upshot here is that using high-quality automated static analysis is a best practice with almost no downside. If you’re writing important smart contracts, looking at a relatively small number of false positives in order to detect, with almost no developer effort, some of the most critical flaws is simply the right thing to do.

Tools and Automation: No Silver Bullet

However, a lot of the findings (almost 49%) are almost impossible to imagine detecting with a tool. In most of these cases, in fact, a tool isn’t even very likely to help. Slither’s code understanding features may assist in finding some issues, but many problems, and almost a quarter of the most important problems, require deeper understanding of the larger context of blockchains and markets. For example, tools can’t inform you about most front-running. Problems requiring human attention are not limited to the obvious categories, such as front-running, configuration, and documentation, either: there are 35 data validation findings, 12 access controls findings, and 10 undefined behavior findings, for example, that are unlikely to be detectable by automated tools – and 3 of these are high-low findings. A full 35% of high severity findings are unlikely to be detected automatically.

Even in the best of possible near-future automated tool worlds, and even with full formal verification (which might take up to 9x the developer effort), a great many problems simply require human attention. Security is a Strong-AI Hard problem. Until the robots replace us, independent expert attention will remain a key component of security for the most essential contracts.

Unit Tests are Great… But Maybe Not For This

Finally, what about unit testing? We didn’t add any unit tests during our audits, but we can look at whether the presence of significant unit tests was correlated with fewer findings during audits, or at least fewer high-low findings. The number of data points is, of course, too small to draw any solid conclusions, but we didn’t find any statistically significant correlation between our estimate of unit test quantity and quality and the presence of either findings in general or high-low findings. In fact, the insignificant relationships we did detect were in the wrong direction: more unit tests means more problems (the positive relationship was at least weaker for high-low findings, which is comforting). We hope and believe that’s just noise, or the result of some confounding factor, such as a larger attack surface for more complex contracts. While our basis for this result is subject to a number of caveats, including our ability to gauge the quality of unit tests without examining each line of code in detail, we do believe that if there were a causal relationship between better unit tests and fewer audit findings, with a large and consistent effect size, we’d have seen better evidence for it than we did.

Obviously, this doesn’t mean you shouldn’t write unit tests! It means that the kinds of things attackers and auditors are looking for may not overlap significantly with the kinds of problems unit tests help you avoid. Unit testing can probably improve your development process and make your users happier, but it may not help you actually be much more secure. It’s widely known that the problems developers can imagine happening, and write unit tests to check for, do not often overlap with the problems that cause security vulnerabilities. That’s why fuzzing and property-based testing are so valuable.

One key point to take away here is that the bugs found by property-based testing can be added as new unit tests to your code, giving you the best of both worlds. The pyfakefs module for creating high-fidelity mock file systems in Python, originally developed at Google, was a widely used software system, with a fairly extensive set of well-written unit tests. However, using the TSTL property-based testing tool for Python revealed over 100 previously undetected problems in pyfakefs (all of which were fixed), and let the developers of pyfakefs add a large number of new, more powerful, unit tests to detect regressions and new bugs. The same workflow can be highly effective with a well-unit-tested smart contract and Echidna; in fact, it can be easier, because Echidna does a better job of automatically figuring out the public interface to a contract than most property-based testing tools do when interacting with a library API.

Stay Tuned for More Details

We’ll be publishing the full results after we add a few more of our own smaller-scale audits and validate our results by comparing to estimates for audits performed by other companies. In the meantime, use our preliminary results to inform your own thinking about defects in smart contracts.

From The Depths Of Counterfeit Smartphones

In an age of online second-hand retailers, marketplace exchanges, and third-party refurb shops, it’s easier than ever to save hundreds of dollars when buying a phone. These channels provide an appealing alternative for people foregoing a retail shopping experience for a hefty discount.

However, there is an additional option for those bargain hunters seeking even more savings: counterfeits of popular phone models. These knock-offs have become a burgeoning industry, transforming cheap hardware and free software into mass profits at almost no cost to the manufacturers. These clones often sell at under 1/10th of the retail price and are often very convincing replicas at first glance.

Last year, we helped Motherboard Vice with an investigative teardown of a counterfeit iPhone X. We were haunted by the many security concerns and vulnerabilities we’d discovered, so we worked with DeviceAssure—a company dedicated to anti-counterfeit solutions for mobile platform—to do a deeper dive into these dangerous duplicates.

This post details what we found and provides some insight into exactly what you are getting when you use one of these phones. Whether it’s intentional malice or just dangerous ineptitude, there is plenty to be concerned about!

First Impressions

We looked at two counterfeits: an iPhone 6 and a Samsung S10.

Front and back of the counterfeit iPhone

Front and back of counterfeit Samsung S10

The visual aesthetic of the devices are very convincing. Proper attention to peripheral layout, dimensions, and overall finish is almost identical to their retail counterparts. The various switches and buttons correspond to what you would expect in the real devices to control the phone lock, adjust the volume, and turn them on and off. The counterfeit iPhone even uses a lightning cable in its charge port!

Both models are equipped with haptic feedback and fingerprint sensors that do indeed work … mostly. Facial biometrics are also included, though they had a considerably higher failure rate and would often not work at all.

From an initial glance at the underlying guts, both devices rely on controllers from Mediatek, a Chinese hardware company that provides an ARM chipset for embedded devices that is both incredibly cheap and reasonably capable. They also rely, as many counterfeits do, on custom, largely community-built ROMs of the Android runtime—a telltale sign that functionality will be non-standard and rife with one of hundreds of variant-specific quirks.

The Good – They look and work somewhat like the real thing… sometimes

You get a phone that looks and works vaguely like the one it counterfeited, with a few exceptions. What’s good other than that?

Nothing.

No really, even if these devices had hypothetically sported pristine ROMs (which, hint: they didn’t), they still came with a slew of critical problems, even if they weren’t outright backdoored with preinstalled malware (which, hint: they were).

We can give a C+ for effort to some of the detail rendered into the system UI. A good majority of modal popups and panel settings are faithfully intercepted and recreated using extensions to the native resource framework.

In particular, the Samsung counterfeit uses the native launcher, UI/Icon pack, and theming engine for its variant of Android; it is almost indistinguishable from the original. It even includes legitimate portals for both Samsung and Google play app stores.

The iPhone, however, quickly falls apart after minutes of exploration. The ROM system layer initially presents a believable iOS UI, but edge cases in event behaviors (WiFi connection errors, application exceptions, certain input text types, etc.) reveal stock Android screens. In addition, the “iOS” apps all displayed in noticeably low resolution and contain creatively broken english (in stark contrast to the ROM system layer, suggesting separate authors).

The Bad – They are full of unpatched vulnerabilities and insecure bloatware

Both phones report running the latest version of Android Pie 9.0; a relatively hardened OS in most regards. However, it’s not true.

In the case of the iPhone, further digging revealed that it runs a far older version of Android: Kitkat 4.4.0. Kitkat’s last update came in 2014. As you can imagine, hundreds of CVEs have appeared since then, not to mention inherent design flaws that have since been reworked: sandbox mechanisms, file system partitions, and dangerous permission APIs to name a few.

We probed a few well-known weaknesses and confirmed they were unpatched. The device is susceptible to the notorious Stagefright bugs which exploit media processing in the SMS/MMS messages to gain remote control of the device. In addition, several vulnerabilities in old Android system daemons, including the Mediaserver and Surfaceflinger, exhibited unpatched functionality. Because these AOSP ROMs are compiled out-of-band and maintained ad-hoc in the depths of board-hacking and system-modding forums, it is unlikely that users could ever patch these for themselves. There is certainly no over-the-air upgrade capability.

The S10 runs a slightly newer Android: Lollipop 5.1. Last updated in 2015, Lollipop replaced the Dalvik VM with the modern ART VM, and added Material UI theming elements thus allowing our counterfeit to use the Samsung UI components.

However, there is an even more serious problem that plagues both phones: outdated kernels. In addition to the Android runtime updates, the Linux kernel in Android phones often requires vendor participation to downstream security fixes onto the phone. Even in legitimate Android devices, this process often lags behind security releases and requires additional engineering effort by the vendor. The volunteer community of Mediatek ROM maintainers aren’t going to keep up with daily security updates, so outdated kernels in counterfeits are inevitable. Both phones had vulnerable kernels that were successfully exploited by known bugs, like DirtyCow (a copy-on-write memory race condition) and Towelroot (Futex timing bug ported to Android). No doubt a wide host of other kernel bugs are available for a potential attacker to abuse.

The Mediatek device drivers and daemons are a source of abundant vulnerabilities as well, often leading to kernel-level execution. Again, the ability or likelihood that a user would be able to appropriately find and patch these systems is highly unlikely.

Another pitfall of these phones is the presence of debug and testing utilities that expose dangerous system-level permissions in the Mediatek baseline ROM packages. This was observed on both of these devices, as well as on multiple other counterfeit variants we’ve researched. The Galaxy S10 counterfeit features a remote debugging server that allows remote control over media files, logging SMS messages, and deleting phone numbers.

The Mediatek Android daemon (MTKAndroidSuite package) on the Galaxy S10 counterfeit starts a local FTP server that can be used to manipulate files due to the elevated permissions of the service.

Still on the S10, incoming SMS’ are saved to the application’s SQLite database, which is not protected by access controls and can be read by other applications.

An overview displaying some of the Mediatek daemon capabilities (as indicated via class filenames) shows that the daemon can also retrieve media files, dump phone contacts, delete messages from the phone, and more.

These counterfeits are undeniably insecure. Both lie about their Android versions. The ROM versions used were severely outdated and vulnerable to public exploits, as were their kernels. They include bloatware, like remote debugging services, that enable abuse. This is what you’d expect from a phone that’s built around a volunteer-maintained, outdated Android ROM.

The ability for vendors and developers to seamlessly integrate and enforce security updates across their devices has been a massive win for mobile security. This requires a larger ecosystem that extends beyond the phone itself. Not only are these clones lacking in the latest and greatest hardware mitigations, but being isolated from the larger ecosystem and its security safety net is an inherent risk that can never truly be mitigated by these knockoffs.

The Ugly – They contain malware and rootkits

Both the Galaxy S10 and iPhone 6 counterfeits we assessed contained malware and rootkits.

The first issue we noticed in both devices was the presence of Umeng, an invasive analytics library, embedded into many of the applications and system libraries. Based out of China, Umeng has been caught employing malware in their operations. It collects and sends user information, including name, gender, IMEI numbers, serials, and more, back to their servers regularly without prompting any of the usual permission-consent disclaimers.

In the case of the S10, we found the SystemUI framework was modified to embed a server that can arbitrarily download, install, and run .dex files, in addition to reporting event information collected from system events such as geolocation, contact creation, and package installation and removal. For example, the library components used for facial recognition came bundled with functionality that can install arbitrary Android applications on demand.

On the S10, a hidden component in the SystemUI downloads files off the internet in the background. Note the Mandarin logs at the bottom of the screenshot!

Monitoring the S10’s network activity, we found it periodically reaching out to an unknown server. This is the origin of those requests, found embedded inside the SystemUI framework library.

One example of a component that has the capability to install additional applications on-demand in the facial recognition software. “ReadFace” is a third-party library, integrated inside the SystemUI framework, that seems to simulate biometric facial recognition. Within this code, it seems that there is the ability to arbitrarily install APKs.

Finally, the S10 included a RAT masquerading as a font extension system service (“LovelyFonts”) that allows for remote native code execution, complete with a shell, arbitrary file upload/download, and logging of system events. This RAT provides unlimited access to the person who planted it there, enabling total compromise of the phone and all its data. We observed that certain events, such as installing packages or sending text messages, would trigger connections to exchange encrypted payloads remotely related to this backdoor. As a note, while this specific malware wasn’t present on the particular iPhone 6 that we studied, we have encountered variants of it on other counterfeit iPhone ROMs in the past.

This is a function inside the Lovelyfonts library that invokes a direct system call. The Lovelyfonts service comes with a library that allows a remote user to execute code directly on the machine, bypassing the Android Runtime.

Here, the malware is saying that it’s trying to instantiate a “network interceptor,” ostensibly interfering with network traffic.

The RAT malware detects whenever an app is installed or uninstalled and generates an encrypted payload to send to a remote API server.

Insecure, outdated ROMs are bad. Actual evidence of malicious intent is ugly. The phones we looked at both had Umeng, a known invasive analytics library that steals user data, embedded in multiple applications. The S10 had a server embedded in the SystemUI framework that can download, install, and run applications, and collect system data, and it had malware that grants unlimited access to the device to whoever planted it there.

The moral of the story? If you’re using counterfeit phones, there’s a high likelihood that it will provide bad actors access to your data by design. Embedding malware here is easy. It is trivial for a counterfeit manufacturer to implant and modify the ROM before distribution. Tracking or detecting either action is impossible for most users. While it is theoretically possible to find a ‘clean’ distribution, it is a gamble to make, never mind the inherent risk of using an insecure baseline system.

Conclusion – If you used one of these phones, you’d already be hacked

As the price point for handheld devices continues to climb, there will always be a temptation to seek cheaper alternatives. Counterfeit smartphones will continue to evolve in sophistication, performance, and threat to users. Using them puts your data at risk and may enable abuse of the applications and networks that you access and use.

Often times, it’s not obvious to buyers that they’re purchasing counterfeits. Fake versions like these are often acquired through Craigslist or other 3rd parties. Some are sold as scam upgrades or gifts. In some countries, it can be difficult to determine genuine sellers from counterfeit vendors because all phones are purchased independently from cellular contracts. Buying direct from Apple or Samsung is the best way to ensure nothing malicious comes preinstalled on your phone, and enables you to receive new software updates that patch security issues (well, at least theoretically). If you’re a company that allows employees to access corporate data on their phones, consider verifying devices for genuine software.

We hope that this investigation helped illuminate the dangers of opting into an “off-brand” device. If this was helpful, or sounds similar to a security concern you or your organization confront, reach out! We offer a wide range of services, including iVerify – a personal security app for iOS – for further securing your phone.

We’d like to again thank DeviceAssure for reaching out and providing us with the hardware to conduct this analysis as well as the opportunity to do some digging into this matter. They will be at Blackhat this year and so will some of us, so stop by and say hi. And as always, we love to hear about weird and strange products out there in the wild, so drop a line if there is something you think we should look at!