Work For Us: Fall and Winter Internship Opportunities

If you’re studying in a degree program, and you thrive at the intersection of software development and cyber security, you should apply to our fall or winter internship programs. It’s a great way to add paid experience -and a publication- to your resume, and get a taste of what it’s like to work in a commercial infosec setting.

You’d work remotely through the fall semester or over winter break on a meaningful problem to produce or improve tools that we -Trail of Bits and the InfoSec community- need to make security better. Your work won’t culminate in a flash-in-the-pan report for an isolated problem. It will contribute to a measurable impact on modern security problems.

Two Ex-Interns Share Their Experiences

  • Sophia D’Antoine -now one of our security engineers- spent her internship working on part of what would later become MAST.
  • Evan Jensen accepted a job at MIT Lincoln Labs as a security researcher before interning for us, and still credits the experience as formative.

Why did you take this internship over others?

SD: I wasn’t determined to take a winter internship until I heard about the type of work that I could do at Trail of Bits. I’d get my own project, not just a slice of someone else’s. The chance to take responsibility for something that could have a measurable impact was very appealing. It didn’t hurt that ToB’s reputation would add some weight to my resumé.

EJ: I saw this as a chance to extend the class I took from Dan: “Penetration Testing and Vulnerability Analysis.” Coincidentally, I lined up a summer internship in the office while Dan was there. As soon as he suggested I tell my interviewer what I was working on in class, the interview ended with an offer for the position.

What did you work on during your internship?

SD: MAST’s obfuscating passes that transform the code. This wasn’t anywhere near the focus of my degree; I was studying electrical engineering. But I was playing CTFs for fun, and [ToB] liked that I was willing to teach myself. I didn’t want a project that could just be researched with Google.

EJ: I actually did two winternships at ToB. During my first, I analyzed malware that the “APT1” group was used in their intrusion campaigns. During my second, I worked on generating training material for a CTF-related DARPA grant that eventually became the material in the CTF Field Guide.

What was your experience like?

SD: It was great. I spent my entire break working on my project, and loved it. I like to have an end-goal and parameters, and the independence to research and execute. The only documentation I could find for my project was the LLVM compiler’s source code. There was no tutorial online to build an obfuscator like MAST. Beyond the technical stuff, I learned about myself, the conditions where I work best, and the types of projects that interest me most.

EJ: Working at ToB was definitely enlightening. It was the first time I actually got to use a licensed copy of IDA Pro. It was great working with other established hackers. They answered every question I could think of. I learned a lot about how to describe the challenges reverse engineers face and I picked up a few analysis tricks, too.

Why would you recommend this internship to students?

SD: So many reasons. It wasn’t a lot of little tasks. You own one big project. You start it. You finish it. You create something valuable. It’s cool paid work. It’s intellectually rewarding. You learn a lot. ToB is one of the best companies to have on your resume; it’s great networking.

EJ: People will never stop asking you about Trail of Bits.

Here’s What You Might Work On

We always have a variety of projects going on, and tools that could be honed. Your project will be an offshoot of our work, such as:

  • Our Cyber Reasoning System (CRS) -which we developed for the Cyber Grand Challenge and currently used for paid engagements- has potential to do a lot more. This is a really complicated distributed system with a multitude of open source components at play, including symbolic executors, x86 lifting, dynamic binary translation, and more.
  • PointsTo, an LLVM-based static analysis that discovers object life-cycle (e.g. use-after-free) vulnerabilities in large software projects such as web browsers and network servers. Learn more.
  • McSema, an open-source framework that performs static translation of x86 and x86-64 binaries to the LLVM intermediate representation. McSema enables existing LLVM-based program analysis tools to operate on binary code. See the code.
  • MAST, a collection of several whole-program transformations using the LLVM compiler infrastructure as a platform for iOS software obfuscation and protection.

In general, you’ll make a meaningful contribution to the development of reverse engineering and software analysis tools. Not many places promise that kind of work to interns, nor pay for it.

Interested?

You must have experience with software development. We want you to help us refine tools that find and fix problems for good, not just for a one-time report. Show us code that you’ve written, examples on GitHub, or CTF write-ups you’ve published.

You must be motivated. You’ll start with a clear project and an identified goal. How you get there is up to you. Apart from in-person kick-off and debrief meetings in our Manhattan offices, you will work remotely.

But you won’t be alone. We take advantage of all the latest technology to get work done, including platforms like Slack, Github, Trello and Hangouts. You’ll find considerable expertise available to you. We’ll do our best to organize everything we can up front so that you’re positioned for success. We will make good use of your time and effort.

If you’re headed into the public sector -maybe you took a Scholarship For Service- you may be wondering what it’s like to work in a commercial firm. If you want some industry experience before getting absorbed into a government agency, intern with us.

A fuzzer and a symbolic executor walk into a cloud

Finding bugs in programs is hard. Automating the process is even harder. We tackled the harder problem and produced two production-quality bug-finding systems: GRR, a high-throughput fuzzer, and PySymEmu (PSE), a binary symbolic executor with support for concrete inputs.

From afar, fuzzing is a dumb, brute-force method that works surprisingly well, and symbolic execution is a sophisticated approach, involving theorem provers that decide whether or not a program is “correct.” Through this lens, GRR is the brawn while PSE is the brains. There isn’t a dichotomy though — these tools are complementary, and we use PSE to seed GRR and vice versa.

Let’s dive in and see the challenges we faced when designing and building GRR and PSE.

GRR, the fastest fuzzer around

GRR is a high speed, full-system emulator that we use to fuzz program binaries. A fuzzing “campaign” involves executing a program thousands or millions of times, each time with a different input. The hope is that spamming a program with an overwhelming number of inputs will result in triggering a bug that crashes the program.

Note: GRR is pronounced with two fists held in the air

During DARPA’s Cyber Grand Challenge, we went web-scale and performed tens of billions of input mutations and program executions — in only 24 hours! Below are the challenges we faced when making this fuzzer, and how we solved those problems.

  1. Throughput. Typically, program fuzzing is split into discrete steps. A sample input is given to an input “mutator” which produces input variants. In turn, each variant is separately tested against the program in the hopes that the program will crash or execute new code. GRR internalizes these steps, and while doing so, completely eliminates disk I/O and program analysis ramp-up times, which represent a significant portion of where time is spent during a fuzzing campaign with other common tools.
  2. Transparency. Transparency requires that the program being fuzzed cannot observe or interfere with GRR. GRR achieves transparency via perfect isolation. GRR can “host” multiple 32-bit x86 processes in memory within its 64-bit address space. The instructions of each hosted process are dynamically rewritten as they execute, guaranteeing safety while maintaining operational and behavioral transparency.
  3. Reproducibility. GRR emulates both the CPU architecture and the operating system, thereby eliminating sources of non-determinism. GRR records program executions, enabling any execution to be faithfully replayed. GRR’s strong determinism and isolation guarantees let us combine the strengths of GRR with the sophistication of PSE. GRR can snapshot a running program, enabling PSE to jump-start symbolic execution from deep within a given program execution.

PySymEmu, the PhD of binary symbolic execution

Symbolic execution as a subject is hard to penetrate. Symbolic executors “reason about” every path through a program, there’s a theorem prover in there somewhere, and something something… bugs fall out the other end.

At a high level, PySymEmu (PSE) is a special kind of CPU emulator: it has a software implementation for almost every hardware instruction. When PSE symbolically executes a binary, what it really does is perform all the ins-and-outs that the hardware would do if the CPU itself was executing the code.

frankenpse

PSE explores the relationship between the life and death of programs in an unorthodox scientific experiment

CPU instructions operate on registers and memory. Registers are names for super-fast but small data storage units. Typically, registers hold four to eight bytes of data. Memory on the other hand can be huge; for a 32-bit program, up to 4 GiB of memory can be addressed. PSE’s instruction simulators operate on registers and memory too, but they can do more than just store “raw” bytes — they can store expressions.

A program that consumes some input will generally do the same thing every time it executes. This happens because that “concrete” input will trigger the same conditions in the code, and cause the same loops to merry-go-round. PSE operates on symbolic input bytes: free variables that can initially take on any value. A fully symbolic input can be any input and therefore represents all inputs. As PSE emulates the CPU, if-then-else conditions impose constraints on the originally unconstrained input symbols. An if-then-else condition that asks “is input byte B less than 10” will constrain the symbol for B to be in the range [0, 10) along the true path, and to be in the range [10, 256) along the false path.

If-then-elses are like forks in the road when executing a program. At each such fork, PSE will ask its theorem prover: “if I follow the path down one of the prongs of the fork, then are there still inputs that satisfy the additional constraints imposed by that path?” PSE will follow each yay path separately, and ignore the nays.

So, what challenges did we face when creating and extending PSE?

  1. Comprehensiveness. Arbitrary program binaries can exercise any one of thousands of the instructions available to x86 CPUs. PSE implements simulation functions for hundreds of x86 instructions. PSE falls back onto a custom, single-instruction “micro-executor” in those cases where an instruction emulation is not or cannot be provided. In practice, this setup enables PSE to comprehensively emulate the entire CPU.
  2. Scale. Symbolic executors try to follow all feasible paths through a program by forking at every if-then-else condition, and constraining the symbols one way or another along each path. In practice, there are an exponential number of possible paths through a program. PSE handles the scalability problem by selecting the best path to execute for the given execution goal, and by distributing the program state space exploration process across multiple machines.
  3. Memory. Symbolic execution produces expressions representing simple operations like adding two symbolic numbers together, or constraining the possible values of a symbol down one path of an if-then-else code block. PSE gracefully handles the case where addresses pointing into memory are symbolic. Memory accessed via a symbolic address can potentially point anywhere — even point to “good” and “bad” (i.e. unmapped) memory.
  4. Extensibility. PSE is written using the Python programming language, which makes it easy to hack on. However, modifying a symbolic executor can be challenging — it can be hard to know where to make a change, and how to get the right visibility into the data that will make the change a success. PSE includes smart extension points that we’ve successfully used for supporting concolic execution and exploit generation.

Measuring excellence

So how do GRR and PSE compare to the best publicly available tools?

GRR

GRR is both a dynamic binary translator and fuzzer, and so it’s apt to compare it to AFLPIN, a hybrid of the AFL fuzzer and Intel’s PIN dynamic binary translator. During the Cyber Grand Challenge, DARPA helpfully provided a tutorial on how to use PIN with DECREE binaries. At the time, we benchmarked PIN and found that, before we even started optimizing GRR, it was already twice as fast as PIN!

The more important comparison metric is in terms of bug-finding. AFL’s mutation engine is smart and effective, especially in terms of how it chooses the next input to mutate. GRR internalizes Radamsa, another too-smart mutation engine, as one of its many input mutators. Eventually we may also integrate AFL’s mutators. During the qualifying event, GRR went face-to-face with AFL, which was integrated into the Driller bug-finding system. Our combination of GRR+PSE found more bugs. Beyond this one data point, a head-to-head comparison would be challenging and time-consuming.

PySymEmu

PSE can be most readily compared with KLEE, a symbolic executor of LLVM bitcode, or the angr binary analysis platform. LLVM bitcode is a far cry from x86 instructions, so it’s an apples-to-oranges comparison. Luckily we have McSema, our open-source and actively maintained x86-to-LLVM bitcode translator. Our experiences with KLEE have been mostly negative; it’s hard to use, hard to hack on, and it only works well on bitcode produced by the Clang compiler.

Angr uses a customized version of the Valgrind VEX intermediate representation. Using VEX enables angr to work on many different platforms and architectures. Many of the angr examples involve reverse engineering CTF challenges instead of exploitation challenges. These RE problems often require manual intervention or state knowledge to proceed. PSE is designed to try to crash the program at every possible emulated instruction. For example PSE will use its knowledge of symbolic memory to access any possible invalid array-like memory accesses instead of just trying to solve for reaching unconstrained paths. During the qualifying event, angr went face-to-face with GRR+PSE and we found more bugs. Since then, we have improved PSE to support user interaction, concrete and concolic execution, and taint tracking.

I’ll be back!

Automating the discovery of bugs in real programs is hard. We tackled this challenge by developing two production-quality bug-finding tools: GRR and PySymEmu.

GRR and PySymEmu have been a topic of discussion in recent presentations about our CRS, and we suspect that these tools may be seen again in the near future.

Your tool works better than mine? Prove it.

No doubt, DARPA’s Cyber Grand Challenge (CGC) will go down in history for advancing the state of the art in a variety of fields: symbolic execution, binary translation, and dynamic instrumentation, to name a few. But there is one contribution that we believe has been overlooked so far, and that may prove to be the most useful of them all: the dataset of challenge binaries.

Until now, if you wanted to ‘play along at home,’ you would have had to install DECREE, a custom Linux-derived operating system that has no signals, no shared memory, no threads, and only seven system calls. Sound like a hassle? We thought so.

One metric for all tools

Competitors in the Cyber Grand Challenge identify vulnerabilities in challenge binaries (CBs) written for DECREE on the 32-bit Intel x86 architecture. Since 2014, DARPA has released the source code for over 100 of these vulnerable programs. These programs were specifically designed with vulnerabilities that represent a wide variety of software flaws. They are more than simple test cases, they approximate real software with enough complexity to stress both manual and automated vulnerability discovery.

If the CBs become widely adopted as benchmarks, they could change the way we solve security problems. This mirrors the rapid evolution of the SAT and ML communities once standardized benchmarks and regular competitions were established. The challenge binaries, valid test inputs, and sample vulnerabilities create an industry standard benchmark suite for evaluating:

  • Bug-finding tools
  • Program-analysis tools (e.g. automated test coverage generation, value range analysis)
  • Patching strategies
  • Exploit mitigations

The CBs are a more robust set of tests than previous approaches to measuring the quality of software analysis tools (e.g. SAMATE tests, NSA Juliet tests, or the STONESOUP test cases). First, the CBs are complex programs like games, content management systems, image processors, and so on, instead of just snippets of vulnerable code. After all, to be effective, analysis tools must process real software with a fairly low bug density, not direct snippets of vulnerable code. Second, unlike open source projects with added bugs, we have very high confidence all the bugs in the CBs have been found, so analysis tools can be compared to an objective standard. Finally, the CBs also come with extensive functionality tests, triggers for introduced bugs, patches, and performance monitoring tools, enabling benchmarking of patching tools and bug mitigation strategies.

Creating an industry standard benchmarking set will solve several problems that hamper development of future program analysis tools:

First, the absence of standardized benchmarks prevents an objective determination of which tools are “best.” Real applications don’t come with triggers for complex bugs, nor an exhaustive list of those bugs. The CBs provide metrics for comparison, such as:

  • Number of bugs found
  • Number of bugs found per unit of time or memory
  • Categories of bugs found and missed
  • Variances in performance from configuration options

Next, which mitigations are most effective? CBs come with inputs that stress original program functionality, inputs that check for the presence of known bugs, and performance measuring tools. These allow us to explore questions like:

  • What is the potential effectiveness and performance impact of various bug mitigation strategies (e.g. Control Flow Integrity, Code Pointer Integrity, Stack Cookies, etc)?
  • How much slower does the resulting program run?
  • How good is a mitigation compared to a real patch?

Play Along At Home

The teams competing in the CGC have had years to hone and adapt their bug-finding tools to the peculiarities of DECREE. But the real world doesn’t run on DECREE; it runs on Windows, Mac OS X, and Linux. We believe that research should be guided by real-world challenges and parameters. So, we decided to port* the challenge binaries to run in those environments.

It took us several attempts to find the best porting approach to minimize the amount of code changes, while preserving as much original code as possible between platforms. The eventual solution was fairly straightforward: build each compilation unit without standard include files (as all CBs are statically linked), implement CGC system calls using their native equivalents, and perform various minor fixes to make the code compatible with more compilers and standard libraries.

We’re excited about the potential of multi-platform CBs on several fronts:

  • Since there’s no need to set up a virtual machine just for DECREE, you can run the CBs on the machine you already have.
  • With that hurdle out of the way, we all now have an industry benchmark to evaluate program analysis tools. We can make comparisons such as:
    • How good are the CGC tools vs. existing program analysis and bug finding tools
    • When a new tool is released, how does it stack up against the current best?
    • Do static analysis tools that work with source code find more bugs than dynamic analysis tools that work with binaries?
    • Are tools written for Mac OS X better than tools written for Linux, and are they better than tools written for Windows?
  • When researchers open source their code, we can evaluate how well their findings work for a particular OS or compiler.

Before you watch the competitors’ CRSs duke it out, explore the challenges that the robots will attempt to solve in an environment you’re familiar with.

Get the CGC’s Challenge Binaries in the most common operating systems.

* Big thanks to our interns, Kareem El-Faramawi and Loren Maggiore, for doing the porting, and to Artem, Peter, and Ryan for their support.

Why I didn’t catch any Pokemon today

tl;dr While the internet went crazy today, we went fact finding. Here are our notes on Pokemon Go’s permissions to your Google account.

Here’s what Jay and I set out to do at around 6pm today:

  • Find what permissions Pokemon Go is actually requesting
  • Investigate what the permissions actually do
  • Replicate the permissions in a test app

Our first instinct was to go straight to the code, so we began by loading up the iOS app in a jailbroken phone. The Pokemon Go app uses jailbreak detection to prevent users with modified devices from accessing the game. As we have commonly found with such protections, they were trivial to bypass and, as a result, afforded no real protection. We recommend that firms contact us about MAST if they need more formidable application protection.

Niantic issues an OAuth request to Google with their scope set to the following (note: “scope” determines the level of access that Niantic has to your account and each requested item is a different class of data):

The OAuthLogin scope stands out in this list. It is mainly used by applications from Google, such as Chrome and the iOS Account Manager, though we were able to find a few Github projects that used it too.

It’s not possible to use this OAuth scope from Google’s own OAuth Playground. It only gives various “not authorized” error messages. This means that the OAuth Playground, Google’s own service for testing access to their APIs, is unable to exactly replicate the permissions requested by Pokemon Go.

It might be part of the OAuth 1.0 API, which was deprecated by Google in 2012 and shut down in 2015. If so, we’re not sure why Pokemon Go was able to use it. We checked, and accounts that migrate up to the OAuth 2.0 API are no longer able to access the older 1.0 API.

We found changelogs in the source code for Google Chrome that refer to this OAuth scope as the “Uber” token where it is passed with the “IssueUberAuth” GET parameter.

It does not appear possible to create our own app that uses this OAuth scope through normal or documented means. In order to properly test the level of access provided by this OAuth token, we would probably need to hook an app with access to one (e.g., via a Cydia hook).

The Pokemon Go login flow does not describe what permissions are being requested and silently re-enables them after they’ve been revoked. Further, the available documentation fails to adequately describe what token permissions mean to anyone trying to investigate them.

It’s clear that this access is not needed to identify user accounts in Pokemon Go. While we were writing this we expected Niantic to ultimately respond by reducing the privileges they request. By the time we hit publish, they released a statement confirming they will.

For once, we agree with a lot of comments on Hacker News.

This seems like a massive security fail on Google’s part. There’s no reason the OAuth flow should be able to request admin privileges silently. As a user, I really must get a prompt asking me (and warning me!). — ceejayoz

We were able to query for specific token scopes through Google Apps for Work but we have not found an equivalent for personal accounts. Given that these tokens are nearly equivalent to passwords, it seems prudent to enable greater investigation and transparency about their use on all Google accounts for the next inevitable time that this occurs.

tokens

Google Apps for Work lets you query individual token scopes

By the time we got this far, Niantic released a statement that confirmed they had far more access than needed:

We recently discovered that the Pokémon GO account creation process on iOS erroneously requests full access permission for the user’s Google account. However, Pokémon GO only accesses basic Google profile information (specifically, your User ID and email address) and no other Google account information is or has been accessed or collected. Once we became aware of this error, we began working on a client-side fix to request permission for only basic Google profile information, in line with the data that we actually access. Google has verified that no other information has been received or accessed by Pokémon GO or Niantic. Google will soon reduce Pokémon GO’s permission to only the basic profile data that Pokémon GO needs, and users do not need to take any actions themselves.

After Google and Niantic follow through with the actions described in their statement, this will completely resolve the issue. As best we can tell, Google plans to find the already issued tokens and “demote” them, in tandem with Niantic no longer requesting these permissions for new users.

Thanks for reading and let us know if you have any further details! Please take a second to review what apps you have authorized via the Google Security Checkup, and enable 2FA.

Update 7/12/2016: It looks like we were on the right track with the “UberAuth” tokens. This OAuth scope initially gains access to very little but can be exchanged for new tokens that allow access to all data in your Google account, including Gmail, through a series of undocumented methods. More details: https://gist.github.com/arirubinstein/fd5453537436a8757266f908c3e41538

Update 7/13/2016: The Pokemon Go app has been updated to request only basic permissions now. Niantic’s statement indicated they were going to de-privilege all the erroneously issued tokens themselves, but if you want to jump ahead of them go to your App Permissions, revoke the Pokemon Go access, signout of the Pokemon Go app, and then sign back in.

Screen Shot 2016-07-13 at 2.30.19 PM

Start using the Secure Enclave Crypto API

tl;dr – Tidas is now open source. Let us know if your company wants help trying it out.

When Apple quietly released the Secure Enclave Crypto API in iOS 9 (kSecAttrTokenIDSecureEnclave), it allowed developers to liberate their users from the annoyance of strong passwords or OAuth.

That is, if the developers could make do without documentation.

The required attribute was entirely undocumented. The key format was incompatible with OpenSSL. Apple didn’t even say what cipher suite was used (it’s secp256r1). It was totally unusable in its original state. The app-developer community was at a loss.

We filled the gap

We approached this as a reverse-engineering challenge. Ryan Stortz applied his considerable skill and our collective knowledge of the iOS platform to figure out how to use this new API.

Once Ryan finished a working set of tools to harness the Secure Enclave, we took the next step. We released a service based on this feature: Tidas.

When your app is installed on a new device, the Tidas SDK generates a unique encryption key identifying the user and registers it with the Tidas web service. This key is stored on the client device in the Secure Enclave and is protected by Touch ID, requiring the user to use their fingerprint to unlock it. Client sign-in generates a digitally-signed session token that your backend can pass to the Tidas web service to verify the user’s identity. The entire authentication process is handled by our easy-to-use SDK and avoids transmitting users’ sensitive data. They retain their privacy. You minimize your liability.

tidas-login

David Schuetz, at NCC Group, assessed Tidas’s protocol in this tidy write-up. David’s graphic on the right accurately describes the Tidas wire protocol.

Tidas’s authentication protocol, combined with secure key storage in the Secure Enclave, provides strong security assurances and prevents attacks like phishing and replays. It significantly lowers the bar to adopting token-only authentication in a mobile-first development environment.

We saw enormous potential for security by enabling applications to use private keys that are safely stored outside of iOS and away from any potential malware, like easily unlocking your computer with a press of TouchID, stronger password managers, and more trustworthy mobile payments.

We thought the benefits were clear, so we put together a website and released this product to the internet.

Today, Tidas becomes open source.

Since its February release, Tidas has raised a lot of eyebrows. The WSJ wrote an article about it. We spoke with a dozen different banks that wanted Tidas for its device-binding properties and potential reduction to fraud. Meanwhile, we courted mobile app developers directly for trial runs.

Months later, none of this potential has resulted in clients.

Authentication routines are the gateway to your application. The developers we spoke with were unwilling to modify them in the slightest if it risked locking out honest paying customers.

Banks liked the technology, but none would consider purchasing a point solution for a single device (iOS).

So, Tidas becomes open source today. All its code is available at https://github.com/tidas. If you want to try using the Secure Enclave on your own, check out our DIY toolkit: https://github.com/trailofbits/SecureEnclaveCrypto. It resolves all the Apple problems we mentioned above by providing an easy-to-use wrapper around the Secure Enclave API. Integration with your app could not be easier.

If your company is interested in trying it out and wants help, contact us.

It’s time to take ownership of our image

Gloves
Goggles
Checkered body suits

The representation of hackers in stock media spans a narrow band of reality between the laughable and the absurd.

It overshadows the fact that lots of hackers are security professionals. They may dress differently, but they serve a critical function in the economy.

It’s easy to satirize the way the media and Hollywood portray hackers. Dorkly and Daniel J. Solove have excellently skewered many of them.

What’s harder -and more productive- would be a repository of stock assets of real-life hackers wearing -yes- hoodies, but also more formal attire. Some scenes may show dark rooms at night. Others will be in daytime offices.

If the media used the repository maybe it’d change the public’s perception. Maybe it would show aspiring hackers -boys and girls- that we’re just like them, and that if they work hard they could join our ranks.

We’re kicking off this “Hacker Anthology” by contributing stock video footage of our own employees and a hacker typer script that we made last year for fun.

In a few weeks, I’ll be in Las Vegas for Blackhat and Defcon with many of you. If there’s enough interest, I’ll hire a photographer for a few hours to build up our portfolio of stock photos. It should be a fun time. Get in touch with me if you’d be interested in contributing.

—–

I poured through dozens of truly awful and hilarious photos while writing this blog post. Here are some of my favorites that I stumbled upon from around the net:

I have met DAOAttacker and can confirm this is what they look like:

Play a hacker on TV, become a hacker in real life:

One of my favorite novelty Twitter accounts:

In some cases, bad stock photography can be physically harmful:
stock-image-fail-soldering-iron-bob-byron-1

I, too, look intently at screens that are turned off:
stock-photo-88593521-scientist-uses-modern-technology-for-its-research

If I had a nickel for every time I saw this photo used:
depositphotos_11605816-Security-concept-lock-on-digital

Alex Sotirov schooling the kids on cyberpunk style before the Hackers 15th anniversary party:
shot0737

What are you favorite hacker stock photos? Leave a comment below.

2000 cuts with Binary Ninja

Using Vector35’s Binary Ninja, a promising new interactive static analysis and reverse engineering platform, I wrote a script that generated “exploits” for 2,000 unique binaries in this year’s DEFCON CTF qualifying round.

If you’re wondering how to remain competitive in a post-DARPA DEFCON CTF, I highly recommend you take a look at Binary Ninja.

Before I share how I slashed through the three challenges — 334 cuts, 666 cuts, and 1,000 cuts — I have to acknowledge the tool that made my work possible.

Compared to my experience with IDA, which is held together with duct tape and prayers, Binary Ninja’s workflow is a pleasure. It does analysis on its own intermediate language (IL), which is exposed through Python and C++ APIs. It’s comparatively simple to query blocks of code, functions, trace execution flow, query register states, and many other tasks that seem herculean within IDA.

This brought a welcome distraction from the slew of stack-based buffer overflows and unhardened heap exploitation that have come to characterize DEFCON’s CTF.

Since the original point of CTF competitions was to help people improve, I limited my options to what most participants could use. Without Binary Ninja, I would have had to:

  1. Use IDA and IDAPython; a more expensive and unpleasant proposition.
  2. Develop a Cyber Reasoning System; an unrealistic option for most participants.
  3. Reverse the binaries by hand; effectively impossible given the number of binaries.

None of these are nearly as attractive as Binary Ninja.

How Binary Ninja accelerates CTF work

This year’s qualifying challenges were heavily focused on preparing competitors for the Cyber Grand Challenge (CGC). A full third of the challenges were DECREE-based. Several required CGC-style “Proof of Vulnerability” exploits. This year the finals will be based on DECREE so the winning CGC robot can ‘play’ against the human competitors. For the first time in its history, DEFCON CTF is abandoning the attack-defense model.

Challenge #1 : 334 cuts

334 cuts
http://download.quals.shallweplayaga.me/22ffeb97cf4f6ddb1802bf64c03e2aab/334_cuts.tar.bz2
334_cuts_22ffeb97cf4f6ddb1802bf64c03e2aab.quals.shallweplayaga.me:10334

The first challenge, 334 cuts, didn’t offer much in terms of direction. I started by connecting to the challenge service:

$ nc 334_cuts_22ffeb97cf4f6ddb1802bf64c03e2aab.quals.shallweplayaga.me 10334
send your crash string as base64, followed by a newline
easy-prasky-with-buffalo-on-bing

Okay, so it wants us to crash the service, no problem; I already had a crashing input string for that service already from a previous challenge.

$ nc 334_cuts_22ffeb97cf4f6ddb1802bf64c03e2aab.quals.shallweplayaga.me 10334
send your crash string as base64, followed by a newline
easy-prasky-with-buffalo-on-bing
YWFhYWFhYWFhYWFhYWFhYWFhYWFsZGR3YWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYWFhYQo=
easy-biroldo-with-mayonnaise-on-multigrain

I wasn’t expecting a second challenge name after the first. I’m guessing I’m going to need to crash a few services now. Next I extracted the tarball.

$ tar jxf 334_cuts.tar.bz2
$ ls 334_cuts
334_cuts/easy-alheira-with-bagoong-on-ngome*
334_cuts/easy-cumberland-with-gribiche-on-focaccia*
334_cuts/easy-kielbasa-with-skhug-on-scone*
334_cuts/easy-mustamakkara-with-pickapeppa-on-soda*
334_cuts/easy-alheira-with-garum-on-pikelet*
334_cuts/easy-cumberland-with-khrenovina-on-white*
334_cuts/easy-krakowska-with-franks-on-pikelet*
334_cuts/easy-mustamakkara-with-shottsuru-on-naan*
...
$ ls 334_cuts | wc -l
334

Hmm, there are 334 DECREE challenge binaries, all with food-related names. Well, time to throw them into Binja. Starting with easy-biroldo-with-mayonnaise-on-multigrain. DECREE challenge binaries are secretly ELF binaries (as used on Linux and FreeBSD), so they load just fine with Binja’s ELF loader.

binarynina-overview

Binary Ninja has a simple and smooth interface

This challenge binary is fairly simple and nearly identical to easy-prasky-with-buffalo-on-bing. Each challenge binary is stripped of symbols, has a static stack buffer, a canary, and a stack-based buffer overflow. The canary is copied to the stack and checked against a hard coded value. If the canary is overwritten, the challenge terminates and does not crash. Any overflow will have to make sure the canary value is overwritten with the expected value. It turns out all 334 challenges only differ in four ways:

  1. The size of the buffer you overflow
  2. The canary string and its length
  3. The size of the stack buffer in the recvmsg function
  4. The amount of data the writemsg function proceses for each iteration of its write loop

Our crashing string has to exactly overflow both the stack buffer and pass the canary check in each of the 334 binaries. It’s best to automate collecting this information. Thankfully Binja can be used as a headless analysis engine from Python!

We start by importing binja into our python script and creating a binary view. The binary view is our main interface to Binja’s analysis.

I was initially trying to create a generic solution without looking at the majority of the challenge binaries, so I found the main function programmatically. I did that by starting at the entry point and knowing that it made two calls.

From the entry point, I knew there were two calls with the second being the one I wanted. Similarly, I knew the next function had one call and the call was the one I wanted to follow to main. All my analysis used Binja’s LowLevelIL.

Once we have our reference to main, the real fun begins.

binaryninja-ilview

Binary Ninja in LowLevelIL mode

The first thing we needed to figure out was the canary string. The approach I took was to collect references to all the call instructions:

Then I knew that the first call was to a memcpy, the second was to recvmsg, and the third was to the canary memcmp. Small hiccup here, sometimes the compiler would inline the memcpy. This happened when the canary string string was less than 16 bytes long.

binaryninja-inlinememcpy

This Challenge Binary has an inline memcpy.😦

This was a simple fix, as I now counted the number of calls in the function and adjusted my offsets accordingly:

To extract the canary and size of the canary buffer, I used the newly introduced get_parameter_at() function. This function is fantastic: at any caller site, it allows you to query the function parameters with respect to calling convention and system architecture. I used it to query all the parameters for the call to memcmp.

Next I need to know how big the buffer to overflow is. To do this, I once again used get_parameter_at() to query the first argument for the read_buf call. This points to the stack buffer we’ll overflow. We can calculate its size by subtracting the offset of the canary’s stack buffer.

It turns out the other two variables were inconsequential. These two bits of information were all we needed to craft our crashing string.

I glued all this logic together and threw it at the 334 challenge. It prompted me for 10 crashing strings before giving me the flag: baby's first crs cirvyudta.

Challenge #2: 666 cuts

666 cuts
http://download.quals.shallweplayaga.me/e38431570c1b4b397fa1026bb71a0576/666_cuts.tar.bz2
666_cuts_e38431570c1b4b397fa1026bb71a0576.quals.shallweplayaga.me:10666

To start, I once again connected with netcat:

$ nc 666_cuts_e38431570c1b4b397fa1026bb71a0576.quals.shallweplayaga.me 10666
send your crash string as base64, followed by a newline
medium-chorizo-with-chutney-on-bammy

I’m expecting 666 challenge binaries.

$ tar jxf 666_cuts.tar.bz2
$ ls 666_cuts
666_cuts/medium-alheira-with-khrenovina-on-marraqueta*
666_cuts/medium-cumberland-with-hollandaise-on-bannock*
666_cuts/medium-krakowska-with-doubanjiang-on-pita*
666_cuts/medium-newmarket-with-pickapeppa-on-cholermus*
…
$ ls 666_cuts | wc -l
666

Same game as before, I throw a random binary into binja and it’s nearly identical to the set from 334. At this point I wonder if the same script will work for this challenge. I modify it to connect to the new service and run it. The new service provides 10 challenge binary names to crash and my script provides 10 crashing strings, before printing the flag: you think this is the real quaid DeifCokIj.

Challenge #3: 1000 cuts

1000 cuts
http://download.quals.shallweplayaga.me/1bf4f5b0948106ad8102b7cb141182a2/1000_cuts.tar.bz2
1000_cuts_1bf4f5b0948106ad8102b7cb141182a2.quals.shallweplayaga.me:11000

You get the idea, 1000 challenges, same script, flag is: do you want a thousand bandages gruanfir3.

Room For Improvement

Binary Ninja shows a lot of promise, but it still has a ways to go. In future versions I hope to see the addition of SSA and a flexible type system. Once SSA is added to Binary Ninja, it will be easier to identify data flows through the application, tell when types change, and determine when stack slots are reused. It’s also a foundational feature that helps build a decompiler.

Conclusion

From its silky smooth graph view to its intermediate language to its smart integration with Python, Binary Ninja provides a fantastic interface for static binary analysis. With minimal effort, it allowed me to extract data from 2000 binaries quickly and easily.

That’s the bigger story here: It’s possible to enhance our capabilities and combine mechanical efficiency with human intuition. In fact, I’d say it’s preferable. We’re not going to become more secure if we rely on machines entirely. Instead, we should focus on building tools that make us more effective; tools like Binary Ninja.

If you agree, give Binary Ninja a chance. In less than a year of development, it’s already punching above its weight class. Expect more fanboyism from myself and the rest of Trail of Bits — especially as Binary Ninja continues to improve.

My (slightly updated) script is available here. For the sake of history, the original is available here.

Binary Ninja is currently in a private beta and has a public Slack.

Update (25 August 2016): Binary Ninja is now publicly available in two flavors: commercial ($399) and personal ($99). The script presented here uses the “GUI-less processing” feature that’s only available in the commercial edition.

Empire Hacking Turns One

In the year since we started this bi-monthly meetup, we’ve been thrilled by the community that it has attracted. We’ve had some excellent presentations on pragmatic security research, shared our aspirations and annoyances with our work, and made some new friends. It’s a wonderful foundation for an even better year two!

To mark the group’s ‘birthday,’ we took a moment to reflect on all that has happened.

By the numbers:

  • 312 – Number of members on meetup.com
  • 75 – Largest turnout for a single event
  • 46 – Times Jay said “there’s a Python module for that”
  • 785 – Beers served
  • 14 – Superb presentations given
  • 154 – Members on Empire Slacking, a Slack organization for our members

Presentations

June 2015

Offense at Scale

  • Chris Rohlf from Yahoo discussed the effects of scale on vulnerability research, fuzzing and real attack campaigns.

Automatically proving program termination (and more!)

  • Dr. Byron Cook, Professor of Computer Science at University College London, shared research advances that have led to practical tools for automatically proving program termination and related properties.

Cellular Baseband Exploitation

  • Nick DePetrillo, one of our security engineers, explored the challenges of reliable, large-scale cellular baseband exploitation.

August 2015

Exploiting the Nintendo 3DS

  • Luke Arntson, a hobbyist security researcher, reverse engineer, and hardware hacker, highlighted the origins of the Nintendo DS Profile exploit, the obfuscated Gateway browser exploit, and the payloads used by both.

Trail of Bits Cyber Grand Challenge (CGC) Demo

  • Ryan Stortz, one of our security engineers, described the high-level architecture of the system we built to fight and destroy insecure software as part of a DARPA competition, how well it worked, and difficulties we overcame during the development process.

OS X Malware

  • Jay Little, another of our security engineers, gave a code review of Hacking Team’s OS X kernel rootkit in just 10 minutes.

October 2015

The PointsTo Use-After-Free Detector

  • Peter Goodman, our very own dynamic binary translator, presented the design of PointsTo, an LLVM-based static analysis system that automatically finds use-after-free vulnerabilities in large codebases.

Protecting Virtual Function Calls in COTS C++ Binaries

  • Aravind Prakash, an assistant professor in the Dept. of Computer Science at Binghamton University, showed how vfGuard protects virtual function calls in C++ from control subversion attacks.

December 2015

Exploiting Out-of-Order Execution for Covert Cross-VM Communication

  • Sophia D’Antoine, one of our security engineers, demonstrated a novel side channel that exploits out-of-order execution to enable cross-VM communication.

Experiments building and visualizing hypergraphs of security data

  • Richard Lethin, President of Reservoir Labs, discussed data structures and algorithms that enable the representation and analysis of big data (such as security logs) as hypergraphs.

February 2016

Reversing Engineering the Tytera MD380 2-way Radio

  • Travis Goodspeed, a neighbor, explained how the handheld digital radio was jailbroken to allow for patching and firmware extraction, as well as the tricks used to patch the firmware for new features, such as promiscuous mode and a secondary application.

The Mobile Application Security Toolkit (MAST)

  • Sophia D’Antoine addressed the design of the Mobile Application Security Toolkit (MAST) which ties together jailbreak detection, anti-debugging, and anti-reversing in LLVM to address these risks.

April 2016

Putting the Hype in Hypervisor

  • Brandon Falk, a software security researcher, operating system developer, and fuzzing enthusiast, presented various ways of gathering code coverage information without binary modification and how to use code coverage to direct fuzzing.

Crypto Challenges and Fails

  • Ben Agre, a computer security consultant, distinguished successful crypto challenges from failures through the lens of challenges offered by RSA, Telegram, and several smaller examples.

Join us on Empire Slacking

Last September, we created a Slack organization for our members. That’s where we discuss meetups, the latest security news, and our open-source projects. Everyone is welcome. Join through our auto-inviter, and feel free to share the link: https://empireslacking.herokuapp.com/­

Big thanks to our event partners

WeWork hosted all but one of our meetups. The April 2016 meetup took place at Digital Ocean. We are very grateful for their hosting.

We would also like to thank the New York C++ Developers Group for co-hosting our October 2015 meetup.

With all that momentum, we’re excited for the year ahead.

Speaking of the future…

Next Meetup: June 7 at 6pm

Marcin Wielgoszewski will be speaking about Doorman, an osquery fleet manager. Doorman makes it easy for network administrators to monitor the security of thousands of devices with osquery. Doorman is open-source and under active development.

Following Marcin, Nick Esposito of Trail of Bits will discuss the design of Tidas, a solution for password-free authentication for iOS software developers. Tidas takes advantage of our unique capability to generate and store ECC keys inside the Secure Enclave. Hear all about how we built Tidas at the next Empire Hacking.

Our June event is hosted at Spotify. Beverages and light food will be provided. Space is limited, so please RSVP on the meetup page.

Don’t miss it!

Next:

ProtoFuzz: A Protobuf Fuzzer

At Trail of Bits, we occasionally perform source code audits. A recent one targeted an application that used Protocol Buffers extensively.

Google’s Protocol Buffers (protobuf) is a common method of serializing data, typically found in distributed applications. Protobufs simplify the generally error-prone task of parsing binary data by letting a developer define the type of data, and letting a protobuf compiler (protoc) generate all the serialization and deserialization code automatically.

Fuzzing a service expecting protobuf-encoded structures directly is not likely to achieve satisfactory code coverage. First, protobuf deserialization code is fairly mature and has seen scrutiny. Second, we are not typically interested in flaws in the protobuf implementation itself. Our main goal is to target the code behind protobuf decoding. Our aim becomes to create valid protobuf-encoded structures that are composed of malicious values.

This library is in sufficiently widespread use that we found it worthwhile to create a generic Protobuf message generator to help with assessments. The message generator is a Python3 library with a simple interface: provided a protobuf definition, it creates Python generators for various permutations of all defined messages. We call it ProtoFuzz.

For data itself, we use the fuzzdb database as the source of values that are generated, but it’s relatively straightforward to define your own collection of values.

Installation

When installing in Ubuntu:

pip install py3-protobuffers
sudo add-apt-repository -y ppa:5-james-t/protobuf-ppa
sudo apt-get -qq update
sudo apt-get -y install protobuf-compiler
git clone --recursive git@github.com:trailofbits/protofuzz.git
cd protofuzz/
python3 setup.py install

Usage

Message generation is handled by ProtobufGenerator instances. Each instance backs a Protobuf-produced class. This class has two functions: create fuzzing strategies and create field dependencies.

A fuzzing strategy defines how fields are permuted. So far just two are defined: linear and permutation. A linear strategy creates a stream of protobuf objects that are the equivalent of Python’s zip() across all values that can be generated. A permutation produces a stream that is a cartesian product of all the values that can be generated. A linear() permutation can be used to get a sense of the kinds of values that will be generated without creating a multitude of values.

Field dependencies force the values of some fields to be created from the values of others via any callable object. This is used for fields that probably shouldn’t be fuzzed, like lengths, CRC checksums, magic values, etc.

The entry point into the library is the `protofuzz.protofuzz` module. It defines three functions:

protofuzz.from_description_string()

Create a dict of ProtobufGenerator objects from a string Protobuf definition.

from protofuzz import protofuzz
message_fuzzers = protofuzz.from_description_string("""
    message Address {
     required int32 house = 1;
     required string street = 2;
    }
""")
for obj in message_fuzzers['Address'].permute():
    print("Generated object: {}".format(obj))
Generated object: house: -1
street: "!"

Generated object: house: 0
street: "!"

Generated object: house: 256
street: "!"

protofuzz.from_file()

Create a dict of ProtobufGenerator objects from a path to a .proto file.

from protofuzz import protofuzz
message_fuzzers = protofuzz.from_file('test.proto')
for obj in message_fuzzers['Person'].permute():
    print("Generated object: {}".format(obj))
Generated object: name: "!"
id: -1
email: "!"
phone {
  number: "!"
  type: MOBILE
}

Generated object: name: "!\'"
id: -1
email: "!"
phone {
  number: "!"
  type: MOBILE
}
...

protofuzz.from_protobuf_class()

Create a ProtobufGenerator from an already-loaded Protobuf class.

Creating Linked Fields

Some fields shouldn’t be fuzzed. For example, fields like magic values, checksums, and lengths should not be mutated. To this end, protofuzz supports resolving selected field values from other fields. To create a linked field, use ProtobufGenerator’s add_dependency method. Dependencies can also be created between nested objects. For example,

fuzzer = protofuzz.from_description_string('''
message Contents {
  required string header = 1;
  required string body = 2;
}
message Payload {
  required int32 length = 1;
  required Contents contents = 2;
}
''')

fuzzer['Payload'].add_dependency('length', 'contents.body', len)
for idx, obj in zip(range(3), fuzzer['Payload'].permute()):
  print("Generated object: {}".format(obj))
Generated object: length: 1
contents {
  header: "!"
  body: "!"
}

Generated object: length: 2
contents {
  header: "!"
  body: "!\'"
}

Generated object: length: 29
contents {
  header: "!"
  body: "!@#$%%^#$%#$@#$%$$@#$%^^**(()"
}
...

Miscellaneous

Although not related to fuzzing directly, Protofuzz also includes a simple logging class that’s implemented as a ring buffer to aid in fuzzing campaigns. See protobuf.log.

Conclusion

We created Protofuzz to assist with security assessments. It gave us the ability to quickly test message-handling code with minimal ramp up.

The library itself is implemented with minimal dependencies, making it appropriate for integration with continuous integration (CI) and testing tools.

If you have any questions, please feel free to reach out at yan@trailofbits.com or file an issue.

The DBIR’s ‘Forest’ of Exploit Signatures

If you follow the recommendations in the 2016 Verizon Data Breach Investigations Report (DBIR), you will expose your organization to more risk, not less. The report’s most glaring flaw is the assertion that the TLS FREAK vulnerability is among the ‘Top 10’ most exploited on the Internet. No experienced security practitioner believes that FREAK is widely exploited. Where else did Verizon get it wrong?

This question undermines the rest of the report. The DBIR is a collaborative effort involving 60+ organizations’ proprietary data. It’s the single best source of information for enterprise defenders, which is why it’s a travesty that its section on vulnerabilities used in data breaches contains misleading data, analysis, and recommendations.

Verizon must ‘be better.’ They have to set a higher standard for the data they accept from collaborators. I recommend they base their analysis on documented data breaches, partner with agent-based security vendors, and include a red team in the review process. I’ll elaborate on these points later.

Digging into the vulnerability data

For the rest of this post, I’ll focus on the DBIR’s Vulnerability section (pages 13-16). There, Verizon uses bad data to discuss trends in software exploits used in data breaches. This section was contributed by Kenna Security (formerly Risk I/O), a vulnerability management startup with $10 million in venture funding. Unlike the rest of the report, nothing in this section is based on data breaches.

image01.png

The Kenna Security website claims they authored the Vulnerabilities section in the 2016 DBIR

It’s easy to criticize the analysis in the Vulnerabilities section. It repeats common tropes long attacked by the security community, like simple counting of known vulnerabilities (Figures 11, 12, and 13). Counting vulnerabilities fails to consider the number of assets, their importance to the business, or their impact. There’s something wrong with the underlying data, too.

Verizon notes in the section’s header that portions of the data come from vulnerability scanners. In footnote 8, they share some of the underlying data, a list of the top 10 exploited vulnerabilities as detected by Kenna. According to the report, these vulnerabilities represent 85% of successful exploit traffic on the Internet.

image03.png

Footnote 8 lists the vulnerabilities most commonly used in data breaches

Jericho at OSVDB was the first to pick apart this list of CVEs. He noted that the DBIR never explains how successful exploitation is detected (their subsequent clarification doesn’t hold water), nor what successful exploitation means in the context of a vulnerability scanner. Worse, he points out that among the ‘top 10’ are obscure local privilege escalations, denial of service flaws for Windows 95, and seemingly arbitrary CVEs from Oracle CPUs.

Rory McCune at NCC was the second to note discrepancies in the top ten list. Rory zeroed in on the fact that one of Kenna’s top 10 was the FREAK TLS flaw which requires network man-in-the-middle, a vulnerable server, a vulnerable client to exploit, and substantial computational power to pull it off at scale. Additionally, successful exploitation produces no easily identifiable network signature. In the face of all this evidence against the widespread exploitation of FREAK, Kenna’s extraordinary claims require extraordinary evidence.

When questioned about similar errors in the 2015 DBIR, Kenna’s Chief Data Scientist Michael Rohytman explained, “the dataset is based on the correlation of ids exploit signatures with open vulns.” Rohytman later noted that disagreements about the data likely stem from differing opinions about the meaning of “successful exploitation.”

These statements show that the vulnerability data is unlike all other data used in the DBIR. Rather than the result of a confirmed data breach, the “successful exploit traffic” of these “mega-vulns” was synthesized by correlating vulnerability scanner output with intrusion detection system (IDS) alerts. The result of this correlation does not describe the frequency nor tactics of real exploits used in the wild.

Obfuscating with fake science

Faced with a growing chorus of criticism, Verizon and Kenna published a blog post that ignores critics, attempts to obfuscate their analysis with appeals to authority, substitutes jargon for a counterargument, and reiterates dangerous enterprise security policies from the report.

image05.png

Kenna’s blog post begins with appeals to authority and ad hominem attacks on critics

The first half of the Kenna blog post moves the goalposts. They present a new top ten list that, in many ways, is even more disconnected from data breaches than the original. Four of the ten are now Denial of Service (DoS) flaws which do not permit unauthorized access to data. Two more are FREAK which, if successfully exploited, only permit access to HTTPS traffic. Three are 15-year-old UPnP exploits that only affect Windows XP SP0 and lower. The final exploit is Heartbleed which, despite potentially devastating impact, can be traced to few confirmed data breaches since its discovery.

Kenna’s post does answer critics’ calls for the methodology used to define a ‘successful exploitation’: an “event” where 1) a scanner detects an open vulnerability, 2) an IDS triggers on that vulnerability, and 3) one or more post-exploitation indicators of compromise (IOCs) are triggered, presumably all on the same host. This approach fails to account for the biggest challenge with security products: false positives.

image02.png

Kenna is using a synthetic benchmark for successful exploitation based on IDS signatures

Flaws in the data

As mentioned earlier, the TLS FREAK vulnerability is the most prominent error in the DBIR’s Vulnerabilities section. FREAK requires special access as a network Man-in-the-Middle (MITM). Successful exploitation only downgrades the protections from TLS. An attacker would then have to factor a 512-bit RSA modulus to decrypt the session data; an attack that cost US$75 for each session around the time the report was in production. After decrypting the result, they’d just have a chat log; no access to either the client nor server devices. Given all this effort, the low pay-off, and the comparative ease and promise of other exploits, it’s impossible that the TLS FREAK flaw would have been one of the ten most exploited vulnerabilities in 2015.

The rest of the section’s data is based on correlations between intrusion detection systems and vulnerability scanners. This approach yields questionable results.

All available evidence (threat intel reports, the Microsoft SIR, etc.) show that real attacks occur on the client side: Office, PDF, Flash, Browsers, etc. These vulnerabilities, which figure so prominently in Microsoft data and DFIR reports about APTs, don’t appear in the DBIR. How come exploit kits and APTs are using Flash as a vector, yet Kenna’s top 10 fails to list a single Flash vulnerability? Because, by and large, these sorts of attacks are not visible to IDS nor vulnerability scanners. Kenna’s data comes from sources that cannot see the actual attacks.

Intrusion detection systems are designed to inspect traffic and apply a database of known signatures to the specific protocol fields. If a match appears, most products will emit an alert and move on to the next packet. This “first exit” mode helps with performance, but it can lead to attack shadowing, where the first signature to match the traffic generates the only alert. This problem gets worse when the first signature to match is a false positive.

The SNMP vulnerabilities reported by Kenna (CVE-2002-0012, CVE-2002-0013) highlight the problem of relying on IDS data. The IDS signatures for these vulnerabilities are often triggered by benign security scans and network discovery tools. It is highly unlikely that a 14-year old DoS attack would be one of the most exploited vulnerabilities across corporate networks.

Vulnerability scanners are notorious for false positives. These products often depend on credentials to gather system information, but fall back to less-reliable testing methods as a last resort. The UPnP issues reported by Kenna (CVE-2001-0877, CVE-2001-0876) are false positives from vulnerability scanning data. Similar to the SNMP issues, these vulnerabilities are often flagged on systems that are not Windows 98, ME, or XP, and are considered line noise by those familiar with vulnerability scanner output.

It’s unclear how the final step of Kenna’s three-step algorithm, detection of post-exploitation IOCs, supports correlation. In the republished top ten list, four of the vulnerabilities are DoS flaws and two enable HTTPS downgrades. What is a post-exploitation IOC for a DoS? In all of the cases listed, the target host would crash, stop receiving further traffic, and likely reboot. It’s more accurate to interpret post-exploitation IOCs to mean, “more than one IDS signature was triggered.”

The simplest explanation for Kenna’s results? A serious error in the correlation methodology.

Issues with the methodology

Kenna claims to have 200+ million successful exploit events in their dataset. In nearly all the cases we know about, attackers use very few exploits. Duqu duped Kaspersky with just two exploits. Phineas Phisher hacked Hacking Team with just one exploit. Stuxnet stuck with four exploits. The list goes on. There are not 50+ million breaches in a year. This is a sign of poor data quality. Working back from the three-step algorithm described earlier, I conclude that Kenna counted IDS signatures fired, not successful exploit events.

There are some significant limitations to relying on data collected from scanners and IDS. Of the thousands of companies that employ these devices -and who share the resulting data with Kenna- a marginal number go through the effort of configuring their systems properly. Without this configuration, the resulting data is a useless cacophony of false positives. Aggregating thousands of customers’ noisy datasets is no way to tune into a meaningful signal. But that’s precisely what Kenna asks the DBIR’s readers to accept as the basis for the Vulnerabilities section.

Let’s remember the hundreds of companies, public initiatives, and bots scanning the Internet. Take the University of Michigan’s Scans.io as one example. They scan the entire Internet dozens of times per day. Many of these scans would trigger Kenna’s three-part test to detect a successful exploit. Weighting the results by the number of times an IDS event triggers yields a disproportionate number of events. If the results aren’t normalized for another factor, the large numbers will skew results and insights.

Screen Shot 2016-05-05 at 11.57.35 PM

Kenna weighted their results by the number of IDS events

Finally, there’s the issue of enterprises running honeypots. A honeypot responds positively to any attempt to hack into it. This would also “correlate” with Kenna’s three-part algorithm. There’s no indication that such systems were removed from the DBIR’s dataset.

In the course of performing research, scientists frequently build models of how they think the real world operates, then back-test them with empirical data. High-quality sources of empirical exploit incidence data are available from US-CERT, which coordinates security incidents for all US government agencies, and Microsoft, which has unique data sources like Windows Defender and crash reports from millions of PCs. From their reports, only the Heartbleed vulnerability appears in Kenna’s list. The rest of the data and recommendations from US-CERT and Microsoft match. Neither of them agree with Kenna.

Ignore the DBIR’s vulnerability recommendations

“This is absolutely indispensable when we defenders are working together against a sentient attacker.” — Kenna Security

Even if you take the DBIR’s vulnerability analysis at face value, there’s no basis for assuming human attackers behave like bots. Scan and IDS data does not correlate to what real attackers would do. The only way to determine what attackers truly do is to study real attacks.

image07.png

image04.png

Kenna Security advocates a dangerous patch strategy based on faulty assumptions

Empirical data disagrees with this approach. Whenever new exploits and vulnerabilities come out, attacks spike. This misguided recommendation has the potential to cause real, preventable harm. In fact, the Vulnerabilities section of the DBIR both advocates this position and then refutes it only one page later.

image06.png

The DBIR presents faulty information on page 13…

image00.png

… then directly contradicts itself only one page later

Recommendations from this section fall victim to many of the same criticisms of pure vulnerability counting: they fail to consider the number of assets, the criticality of them, the impact of vulnerabilities, and how they are used by real attackers. Without acknowledging the source of the data, Verizon and Kenna walk the reader down a dangerous path.

Improvements for the 2017 DBIR

“It would be a shame if we lost the forest for the exploit signatures.”
— Michael Rohytman, Chief Data Scientist, Kenna

This closing remark from Kenna’s rebuttal encapsulates the issue: exploit signatures were used in lieu of data from real attacks. They skipped important steps while collecting data over the past year, jumped to assumptions based on scanners and IDS devices, and appeared to hope that their conclusions would align with what security professionals see on the ground. Above all, this incident demonstrates the folly of applying data science without sufficient input from practitioners. The resulting analysis and recommendations should not be taken seriously.

Kenna’s 2015 contribution to the DBIR received similar criticism, but they didn’t change for 2016. Instead, Verizon expanded the Vulnerability section and used it for the basis of recommendations. It’s alarming that Verizon and Kenna aren’t applying critical thinking to their own performance. They need to be more ambitious with how they collect and analyze their data.

Here’s how the Verizon 2017 DBIR could improve on its vulnerability reporting:

  1. Collect exploit data from confirmed data breaches. This is the basis for the rest of the DBIR’s data. Their analysis of exploits should be just as rigorous. Contrary to what I was told on Twitter, there is enough data to achieve statistical relevance. With the 2017 report a year away, there’s enough time to correct the processes of collecting and analyzing exploit data. Information about vulnerability scans and IDS signatures don’t serve the information security community, nor their customers.
  2. That said, if Verizon wants to take more time to refine the quality of the data they receive from their partners, why not partner with agent-based security vendors in the meantime? Host-based collection is far closer to exploits than network data. CrowdStrike, FireEye, Bit9, Novetta, Symantec and more all have agents on hosts that can detect successful exploitation based on process execution and memory inspection; more reliable factors.
  3. Finally, include a red team in the review process of future reports. It wasn’t until the 2014 DBIR that attackers’ patterns were separated into nine categories; a practice that practitioners had developed years earlier. That technique would have been readily available if the team behind the DBIR had spoken to practitioners who understand how to break and defend systems. Involving a red team in the review process would strengthen the report’s conclusions and recommendations.

Be better

For the 2016 DBIR, Verizon accepted a huge amount of low-quality data from a vendor. They reprinted the analysis verbatim. Clearly, no one who understands vulnerabilities was involved in the review process. The DBIR team tossed in some data-science vocab for credibility, and a few distracting jokes, and asked for readers’ trust.

Worse, Verizon stands behind the report, rather than acknowledge and correct the errors.

Professionals and businesses around the world depend on this report to make important security decisions. It’s up to Verizon to remain the dependable source for our industry.

I’d like to thank HD Moore, Thomas Ptacek, Grugq, Dan Rosenberg, Mike Russell, Kelly Shortridge, Rafael Turner, the entire team at Trail of Bits, and many others that cannot be cited for their contributions and comments on this blog post.

UPDATE 1:

Rory McCune has posted a followup where he notes a huge spike in Kenna’s observed exploitation of FREAK occurs at exactly the same time that the University of Michigan was scanning the entire internet for it. This supports the theory that benign internet-wide scans made it into Kenna’s data set where they were scaled by their frequency of occurrence.

Kenna's data on FREAK overlaps precisely with internet-wide scans from the University of Michigan

Kenna’s data on FREAK overlaps precisely with internet-wide scans from the University of Michigan

Further, an author of FREAK has publicly disclaimed any notion that it was widely exploited.

UPDATE 2:

Rob Graham has pointed out that typical IDS signatures for FREAK do not detect attacks but rather only detect TLS clients that offer weak cipher suites. This supports the theory that the underlying data was not inspected nor were practitioners consulted prior to using this data in the DBIR.

UPDATE 3:

Justin Kennedy has shared exploit data from five years of penetration tests conducted against his clients and noted that FREAK and Denial of Service attacks never once assisted compromising a target. This supports the theory that exploitation data in the DBIR distorts the view on the ground.

UPDATE 4:

Threatbutt has immortalized this episode with the release of their Danger Zone Incident Retort (DZIR).

UPDATE 5:

Karim Toubba, Kenna Security CEO, has posted a mea culpa on their blog. He notes that they did not confirm any of their analysis with their own product before delivering it to Verizon for inclusion in the DBIR.

What is the point of Kenna's contribution if it was not backed by their insights?

Kenna’s contribution to the DBIR was not validated by their own product

Further, Karim notes that their platform ranks FREAK as a “25 out of 100”, however, even this ranking is orders of magnitude too high based on the available evidence. This introduces the question of whether the problems exposed in Kenna’s analysis from the DBIR extend into their product as well.

Screen Shot 2016-05-12 at 1.40.36 PM

Kenna’s product prioritizes FREAK an order of magnitude higher than it likely should be

Finally, I consider the criticisms in this blog post applicable to their entire contribution to the DBIR and not only their “top ten successfully exploited vulnerabilities” list. Attempts to pigeonhole this criticism to the top ten miss the fact that other sections of the report are based on the same or similar data from Kenna.

UPDATE 6:

Verizon published their own mea culpa.

 

Follow

Get every new post delivered to your Inbox.

Join 5,896 other followers