Empire Hacking Turns One

In the year since we started this bi-monthly meetup, we’ve been thrilled by the community that it has attracted. We’ve had some excellent presentations on pragmatic security research, shared our aspirations and annoyances with our work, and made some new friends. It’s a wonderful foundation for an even better year two!

To mark the group’s ‘birthday,’ we took a moment to reflect on all that has happened.

By the numbers:

  • 312 – Number of members on meetup.com
  • 75 – Largest turnout for a single event
  • 46 – Times Jay said “there’s a Python module for that”
  • 785 – Beers served
  • 14 – Superb presentations given
  • 154 – Members on Empire Slacking, a Slack organization for our members

Presentations

June 2015

Offense at Scale

  • Chris Rohlf from Yahoo discussed the effects of scale on vulnerability research, fuzzing and real attack campaigns.

Automatically proving program termination (and more!)

  • Dr. Byron Cook, Professor of Computer Science at University College London, shared research advances that have led to practical tools for automatically proving program termination and related properties.

Cellular Baseband Exploitation

  • Nick DePetrillo, one of our security engineers, explored the challenges of reliable, large-scale cellular baseband exploitation.

August 2015

Exploiting the Nintendo 3DS

  • Luke Arntson, a hobbyist security researcher, reverse engineer, and hardware hacker, highlighted the origins of the Nintendo DS Profile exploit, the obfuscated Gateway browser exploit, and the payloads used by both.

Trail of Bits Cyber Grand Challenge (CGC) Demo

  • Ryan Stortz, one of our security engineers, described the high-level architecture of the system we built to fight and destroy insecure software as part of a DARPA competition, how well it worked, and difficulties we overcame during the development process.

OS X Malware

  • Jay Little, another of our security engineers, gave a code review of Hacking Team’s OS X kernel rootkit in just 10 minutes.

October 2015

The PointsTo Use-After-Free Detector

  • Peter Goodman, our very own dynamic binary translator, presented the design of PointsTo, an LLVM-based static analysis system that automatically finds use-after-free vulnerabilities in large codebases.

Protecting Virtual Function Calls in COTS C++ Binaries

  • Aravind Prakash, an assistant professor in the Dept. of Computer Science at Binghamton University, showed how vfGuard protects virtual function calls in C++ from control subversion attacks.

December 2015

Exploiting Out-of-Order Execution for Covert Cross-VM Communication

  • Sophia D’Antoine, one of our security engineers, demonstrated a novel side channel that exploits out-of-order execution to enable cross-VM communication.

Experiments building and visualizing hypergraphs of security data

  • Richard Lethin, President of Reservoir Labs, discussed data structures and algorithms that enable the representation and analysis of big data (such as security logs) as hypergraphs.

February 2016

Reversing Engineering the Tytera MD380 2-way Radio

  • Travis Goodspeed, a neighbor, explained how the handheld digital radio was jailbroken to allow for patching and firmware extraction, as well as the tricks used to patch the firmware for new features, such as promiscuous mode and a secondary application.

The Mobile Application Security Toolkit (MAST)

  • Sophia D’Antoine addressed the design of the Mobile Application Security Toolkit (MAST) which ties together jailbreak detection, anti-debugging, and anti-reversing in LLVM to address these risks.

April 2016

Putting the Hype in Hypervisor

  • Brandon Falk, a software security researcher, operating system developer, and fuzzing enthusiast, presented various ways of gathering code coverage information without binary modification and how to use code coverage to direct fuzzing.

Crypto Challenges and Fails

  • Ben Agre, a computer security consultant, distinguished successful crypto challenges from failures through the lens of challenges offered by RSA, Telegram, and several smaller examples.

Join us on Empire Slacking

Last September, we created a Slack organization for our members. That’s where we discuss meetups, the latest security news, and our open-source projects. Everyone is welcome. Join through our auto-inviter, and feel free to share the link: https://empireslacking.herokuapp.com/­

Big thanks to our event partners

WeWork hosted all but one of our meetups. The April 2016 meetup took place at Digital Ocean. We are very grateful for their hosting.

We would also like to thank the New York C++ Developers Group for co-hosting our October 2015 meetup.

With all that momentum, we’re excited for the year ahead.

Speaking of the future…

Next Meetup: June 7 at 6pm

Marcin Wielgoszewski will be speaking about Doorman, an osquery fleet manager. Doorman makes it easy for network administrators to monitor the security of thousands of devices with osquery. Doorman is open-source and under active development.

Following Marcin, Nick Esposito of Trail of Bits will discuss the design of Tidas, a solution for password-free authentication for iOS software developers. Tidas takes advantage of our unique capability to generate and store ECC keys inside the Secure Enclave. Hear all about how we built Tidas at the next Empire Hacking.

Our June event is hosted at Spotify. Beverages and light food will be provided. Space is limited, so please RSVP on the meetup page.

Don’t miss it!

Next:

ProtoFuzz: A Protobuf Fuzzer

At Trail of Bits, we occasionally perform source code audits. A recent one targeted an application that used Protocol Buffers extensively.

Google’s Protocol Buffers (protobuf) is a common method of serializing data, typically found in distributed applications. Protobufs simplify the generally error-prone task of parsing binary data by letting a developer define the type of data, and letting a protobuf compiler (protoc) generate all the serialization and deserialization code automatically.

Fuzzing a service expecting protobuf-encoded structures directly is not likely to achieve satisfactory code coverage. First, protobuf deserialization code is fairly mature and has seen scrutiny. Second, we are not typically interested in flaws in the protobuf implementation itself. Our main goal is to target the code behind protobuf decoding. Our aim becomes to create valid protobuf-encoded structures that are composed of malicious values.

This library is in sufficiently widespread use that we found it worthwhile to create a generic Protobuf message generator to help with assessments. The message generator is a Python3 library with a simple interface: provided a protobuf definition, it creates Python generators for various permutations of all defined messages. We call it ProtoFuzz.

For data itself, we use the fuzzdb database as the source of values that are generated, but it’s relatively straightforward to define your own collection of values.

Installation

When installing in Ubuntu:

pip install py3-protobuffers
sudo add-apt-repository -y ppa:5-james-t/protobuf-ppa
sudo apt-get -qq update
sudo apt-get -y install protobuf-compiler
git clone --recursive git@github.com:trailofbits/protofuzz.git
cd protofuzz/
python3 setup.py install

Usage

Message generation is handled by ProtobufGenerator instances. Each instance backs a Protobuf-produced class. This class has two functions: create fuzzing strategies and create field dependencies.

A fuzzing strategy defines how fields are permuted. So far just two are defined: linear and permutation. A linear strategy creates a stream of protobuf objects that are the equivalent of Python’s zip() across all values that can be generated. A permutation produces a stream that is a cartesian product of all the values that can be generated. A linear() permutation can be used to get a sense of the kinds of values that will be generated without creating a multitude of values.

Field dependencies force the values of some fields to be created from the values of others via any callable object. This is used for fields that probably shouldn’t be fuzzed, like lengths, CRC checksums, magic values, etc.

The entry point into the library is the `protofuzz.protofuzz` module. It defines three functions:

protofuzz.from_description_string()

Create a dict of ProtobufGenerator objects from a string Protobuf definition.

from protofuzz import protofuzz
message_fuzzers = protofuzz.from_description_string("""
    message Address {
     required int32 house = 1;
     required string street = 2;
    }
""")
for obj in message_fuzzers['Address'].permute():
    print("Generated object: {}".format(obj))
Generated object: house: -1
street: "!"

Generated object: house: 0
street: "!"

Generated object: house: 256
street: "!"

protofuzz.from_file()

Create a dict of ProtobufGenerator objects from a path to a .proto file.

from protofuzz import protofuzz
message_fuzzers = protofuzz.from_file('test.proto')
for obj in message_fuzzers['Person'].permute():
    print("Generated object: {}".format(obj))
Generated object: name: "!"
id: -1
email: "!"
phone {
  number: "!"
  type: MOBILE
}

Generated object: name: "!\'"
id: -1
email: "!"
phone {
  number: "!"
  type: MOBILE
}
...

protofuzz.from_protobuf_class()

Create a ProtobufGenerator from an already-loaded Protobuf class.

Creating Linked Fields

Some fields shouldn’t be fuzzed. For example, fields like magic values, checksums, and lengths should not be mutated. To this end, protofuzz supports resolving selected field values from other fields. To create a linked field, use ProtobufGenerator’s add_dependency method. Dependencies can also be created between nested objects. For example,

fuzzer = protofuzz.from_description_string('''
message Contents {
  required string header = 1;
  required string body = 2;
}
message Payload {
  required int32 length = 1;
  required Contents contents = 2;
}
''')

fuzzer['Payload'].add_dependency('length', 'contents.body', len)
for idx, obj in zip(range(3), fuzzer['Payload'].permute()):
  print("Generated object: {}".format(obj))
Generated object: length: 1
contents {
  header: "!"
  body: "!"
}

Generated object: length: 2
contents {
  header: "!"
  body: "!\'"
}

Generated object: length: 29
contents {
  header: "!"
  body: "!@#$%%^#$%#$@#$%$$@#$%^^**(()"
}
...

Miscellaneous

Although not related to fuzzing directly, Protofuzz also includes a simple logging class that’s implemented as a ring buffer to aid in fuzzing campaigns. See protobuf.log.

Conclusion

We created Protofuzz to assist with security assessments. It gave us the ability to quickly test message-handling code with minimal ramp up.

The library itself is implemented with minimal dependencies, making it appropriate for integration with continuous integration (CI) and testing tools.

If you have any questions, please feel free to reach out at yan@trailofbits.com or file an issue.

The DBIR’s ‘Forest’ of Exploit Signatures

If you follow the recommendations in the 2016 Verizon Data Breach Investigations Report (DBIR), you will expose your organization to more risk, not less. The report’s most glaring flaw is the assertion that the TLS FREAK vulnerability is among the ‘Top 10’ most exploited on the Internet. No experienced security practitioner believes that FREAK is widely exploited. Where else did Verizon get it wrong?

This question undermines the rest of the report. The DBIR is a collaborative effort involving 60+ organizations’ proprietary data. It’s the single best source of information for enterprise defenders, which is why it’s a travesty that its section on vulnerabilities used in data breaches contains misleading data, analysis, and recommendations.

Verizon must ‘be better.’ They have to set a higher standard for the data they accept from collaborators. I recommend they base their analysis on documented data breaches, partner with agent-based security vendors, and include a red team in the review process. I’ll elaborate on these points later.

Digging into the vulnerability data

For the rest of this post, I’ll focus on the DBIR’s Vulnerability section (pages 13-16). There, Verizon uses bad data to discuss trends in software exploits used in data breaches. This section was contributed by Kenna Security (formerly Risk I/O), a vulnerability management startup with $10 million in venture funding. Unlike the rest of the report, nothing in this section is based on data breaches.

image01.png

The Kenna Security website claims they authored the Vulnerabilities section in the 2016 DBIR

It’s easy to criticize the analysis in the Vulnerabilities section. It repeats common tropes long attacked by the security community, like simple counting of known vulnerabilities (Figures 11, 12, and 13). Counting vulnerabilities fails to consider the number of assets, their importance to the business, or their impact. There’s something wrong with the underlying data, too.

Verizon notes in the section’s header that portions of the data come from vulnerability scanners. In footnote 8, they share some of the underlying data, a list of the top 10 exploited vulnerabilities as detected by Kenna. According to the report, these vulnerabilities represent 85% of successful exploit traffic on the Internet.

image03.png

Footnote 8 lists the vulnerabilities most commonly used in data breaches

Jericho at OSVDB was the first to pick apart this list of CVEs. He noted that the DBIR never explains how successful exploitation is detected (their subsequent clarification doesn’t hold water), nor what successful exploitation means in the context of a vulnerability scanner. Worse, he points out that among the ‘top 10’ are obscure local privilege escalations, denial of service flaws for Windows 95, and seemingly arbitrary CVEs from Oracle CPUs.

Rory McCune at NCC was the second to note discrepancies in the top ten list. Rory zeroed in on the fact that one of Kenna’s top 10 was the FREAK TLS flaw which requires network man-in-the-middle, a vulnerable server, a vulnerable client to exploit, and substantial computational power to pull it off at scale. Additionally, successful exploitation produces no easily identifiable network signature. In the face of all this evidence against the widespread exploitation of FREAK, Kenna’s extraordinary claims require extraordinary evidence.

When questioned about similar errors in the 2015 DBIR, Kenna’s Chief Data Scientist Michael Rohytman explained, “the dataset is based on the correlation of ids exploit signatures with open vulns.” Rohytman later noted that disagreements about the data likely stem from differing opinions about the meaning of “successful exploitation.”

These statements show that the vulnerability data is unlike all other data used in the DBIR. Rather than the result of a confirmed data breach, the “successful exploit traffic” of these “mega-vulns” was synthesized by correlating vulnerability scanner output with intrusion detection system (IDS) alerts. The result of this correlation does not describe the frequency nor tactics of real exploits used in the wild.

Obfuscating with fake science

Faced with a growing chorus of criticism, Verizon and Kenna published a blog post that ignores critics, attempts to obfuscate their analysis with appeals to authority, substitutes jargon for a counterargument, and reiterates dangerous enterprise security policies from the report.

image05.png

Kenna’s blog post begins with appeals to authority and ad hominem attacks on critics

The first half of the Kenna blog post moves the goalposts. They present a new top ten list that, in many ways, is even more disconnected from data breaches than the original. Four of the ten are now Denial of Service (DoS) flaws which do not permit unauthorized access to data. Two more are FREAK which, if successfully exploited, only permit access to HTTPS traffic. Three are 15-year-old UPnP exploits that only affect Windows XP SP0 and lower. The final exploit is Heartbleed which, despite potentially devastating impact, can be traced to few confirmed data breaches since its discovery.

Kenna’s post does answer critics’ calls for the methodology used to define a ‘successful exploitation’: an “event” where 1) a scanner detects an open vulnerability, 2) an IDS triggers on that vulnerability, and 3) one or more post-exploitation indicators of compromise (IOCs) are triggered, presumably all on the same host. This approach fails to account for the biggest challenge with security products: false positives.

image02.png

Kenna is using a synthetic benchmark for successful exploitation based on IDS signatures

Flaws in the data

As mentioned earlier, the TLS FREAK vulnerability is the most prominent error in the DBIR’s Vulnerabilities section. FREAK requires special access as a network Man-in-the-Middle (MITM). Successful exploitation only downgrades the protections from TLS. An attacker would then have to factor a 512-bit RSA modulus to decrypt the session data; an attack that cost US$75 for each session around the time the report was in production. After decrypting the result, they’d just have a chat log; no access to either the client nor server devices. Given all this effort, the low pay-off, and the comparative ease and promise of other exploits, it’s impossible that the TLS FREAK flaw would have been one of the ten most exploited vulnerabilities in 2015.

The rest of the section’s data is based on correlations between intrusion detection systems and vulnerability scanners. This approach yields questionable results.

All available evidence (threat intel reports, the Microsoft SIR, etc.) show that real attacks occur on the client side: Office, PDF, Flash, Browsers, etc. These vulnerabilities, which figure so prominently in Microsoft data and DFIR reports about APTs, don’t appear in the DBIR. How come exploit kits and APTs are using Flash as a vector, yet Kenna’s top 10 fails to list a single Flash vulnerability? Because, by and large, these sorts of attacks are not visible to IDS nor vulnerability scanners. Kenna’s data comes from sources that cannot see the actual attacks.

Intrusion detection systems are designed to inspect traffic and apply a database of known signatures to the specific protocol fields. If a match appears, most products will emit an alert and move on to the next packet. This “first exit” mode helps with performance, but it can lead to attack shadowing, where the first signature to match the traffic generates the only alert. This problem gets worse when the first signature to match is a false positive.

The SNMP vulnerabilities reported by Kenna (CVE-2002-0012, CVE-2002-0013) highlight the problem of relying on IDS data. The IDS signatures for these vulnerabilities are often triggered by benign security scans and network discovery tools. It is highly unlikely that a 14-year old DoS attack would be one of the most exploited vulnerabilities across corporate networks.

Vulnerability scanners are notorious for false positives. These products often depend on credentials to gather system information, but fall back to less-reliable testing methods as a last resort. The UPnP issues reported by Kenna (CVE-2001-0877, CVE-2001-0876) are false positives from vulnerability scanning data. Similar to the SNMP issues, these vulnerabilities are often flagged on systems that are not Windows 98, ME, or XP, and are considered line noise by those familiar with vulnerability scanner output.

It’s unclear how the final step of Kenna’s three-step algorithm, detection of post-exploitation IOCs, supports correlation. In the republished top ten list, four of the vulnerabilities are DoS flaws and two enable HTTPS downgrades. What is a post-exploitation IOC for a DoS? In all of the cases listed, the target host would crash, stop receiving further traffic, and likely reboot. It’s more accurate to interpret post-exploitation IOCs to mean, “more than one IDS signature was triggered.”

The simplest explanation for Kenna’s results? A serious error in the correlation methodology.

Issues with the methodology

Kenna claims to have 200+ million successful exploit events in their dataset. In nearly all the cases we know about, attackers use very few exploits. Duqu duped Kaspersky with just two exploits. Phineas Phisher hacked Hacking Team with just one exploit. Stuxnet stuck with four exploits. The list goes on. There are not 50+ million breaches in a year. This is a sign of poor data quality. Working back from the three-step algorithm described earlier, I conclude that Kenna counted IDS signatures fired, not successful exploit events.

There are some significant limitations to relying on data collected from scanners and IDS. Of the thousands of companies that employ these devices -and who share the resulting data with Kenna- a marginal number go through the effort of configuring their systems properly. Without this configuration, the resulting data is a useless cacophony of false positives. Aggregating thousands of customers’ noisy datasets is no way to tune into a meaningful signal. But that’s precisely what Kenna asks the DBIR’s readers to accept as the basis for the Vulnerabilities section.

Let’s remember the hundreds of companies, public initiatives, and bots scanning the Internet. Take the University of Michigan’s Scans.io as one example. They scan the entire Internet dozens of times per day. Many of these scans would trigger Kenna’s three-part test to detect a successful exploit. Weighting the results by the number of times an IDS event triggers yields a disproportionate number of events. If the results aren’t normalized for another factor, the large numbers will skew results and insights.

Screen Shot 2016-05-05 at 11.57.35 PM

Kenna weighted their results by the number of IDS events

Finally, there’s the issue of enterprises running honeypots. A honeypot responds positively to any attempt to hack into it. This would also “correlate” with Kenna’s three-part algorithm. There’s no indication that such systems were removed from the DBIR’s dataset.

In the course of performing research, scientists frequently build models of how they think the real world operates, then back-test them with empirical data. High-quality sources of empirical exploit incidence data are available from US-CERT, which coordinates security incidents for all US government agencies, and Microsoft, which has unique data sources like Windows Defender and crash reports from millions of PCs. From their reports, only the Heartbleed vulnerability appears in Kenna’s list. The rest of the data and recommendations from US-CERT and Microsoft match. Neither of them agree with Kenna.

Ignore the DBIR’s vulnerability recommendations

“This is absolutely indispensable when we defenders are working together against a sentient attacker.” — Kenna Security

Even if you take the DBIR’s vulnerability analysis at face value, there’s no basis for assuming human attackers behave like bots. Scan and IDS data does not correlate to what real attackers would do. The only way to determine what attackers truly do is to study real attacks.

image07.png

image04.png

Kenna Security advocates a dangerous patch strategy based on faulty assumptions

Empirical data disagrees with this approach. Whenever new exploits and vulnerabilities come out, attacks spike. This misguided recommendation has the potential to cause real, preventable harm. In fact, the Vulnerabilities section of the DBIR both advocates this position and then refutes it only one page later.

image06.png

The DBIR presents faulty information on page 13…

image00.png

… then directly contradicts itself only one page later

Recommendations from this section fall victim to many of the same criticisms of pure vulnerability counting: they fail to consider the number of assets, the criticality of them, the impact of vulnerabilities, and how they are used by real attackers. Without acknowledging the source of the data, Verizon and Kenna walk the reader down a dangerous path.

Improvements for the 2017 DBIR

“It would be a shame if we lost the forest for the exploit signatures.”
— Michael Rohytman, Chief Data Scientist, Kenna

This closing remark from Kenna’s rebuttal encapsulates the issue: exploit signatures were used in lieu of data from real attacks. They skipped important steps while collecting data over the past year, jumped to assumptions based on scanners and IDS devices, and appeared to hope that their conclusions would align with what security professionals see on the ground. Above all, this incident demonstrates the folly of applying data science without sufficient input from practitioners. The resulting analysis and recommendations should not be taken seriously.

Kenna’s 2015 contribution to the DBIR received similar criticism, but they didn’t change for 2016. Instead, Verizon expanded the Vulnerability section and used it for the basis of recommendations. It’s alarming that Verizon and Kenna aren’t applying critical thinking to their own performance. They need to be more ambitious with how they collect and analyze their data.

Here’s how the Verizon 2017 DBIR could improve on its vulnerability reporting:

  1. Collect exploit data from confirmed data breaches. This is the basis for the rest of the DBIR’s data. Their analysis of exploits should be just as rigorous. Contrary to what I was told on Twitter, there is enough data to achieve statistical relevance. With the 2017 report a year away, there’s enough time to correct the processes of collecting and analyzing exploit data. Information about vulnerability scans and IDS signatures don’t serve the information security community, nor their customers.
  2. That said, if Verizon wants to take more time to refine the quality of the data they receive from their partners, why not partner with agent-based security vendors in the meantime? Host-based collection is far closer to exploits than network data. CrowdStrike, FireEye, Bit9, Novetta, Symantec and more all have agents on hosts that can detect successful exploitation based on process execution and memory inspection; more reliable factors.
  3. Finally, include a red team in the review process of future reports. It wasn’t until the 2014 DBIR that attackers’ patterns were separated into nine categories; a practice that practitioners had developed years earlier. That technique would have been readily available if the team behind the DBIR had spoken to practitioners who understand how to break and defend systems. Involving a red team in the review process would strengthen the report’s conclusions and recommendations.

Be better

For the 2016 DBIR, Verizon accepted a huge amount of low-quality data from a vendor. They reprinted the analysis verbatim. Clearly, no one who understands vulnerabilities was involved in the review process. The DBIR team tossed in some data-science vocab for credibility, and a few distracting jokes, and asked for readers’ trust.

Worse, Verizon stands behind the report, rather than acknowledge and correct the errors.

Professionals and businesses around the world depend on this report to make important security decisions. It’s up to Verizon to remain the dependable source for our industry.

I’d like to thank HD Moore, Thomas Ptacek, Grugq, Dan Rosenberg, Mike Russell, Kelly Shortridge, Rafael Turner, the entire team at Trail of Bits, and many others that cannot be cited for their contributions and comments on this blog post.

UPDATE 1:

Rory McCune has posted a followup where he notes a huge spike in Kenna’s observed exploitation of FREAK occurs at exactly the same time that the University of Michigan was scanning the entire internet for it. This supports the theory that benign internet-wide scans made it into Kenna’s data set where they were scaled by their frequency of occurrence.

Kenna's data on FREAK overlaps precisely with internet-wide scans from the University of Michigan

Kenna’s data on FREAK overlaps precisely with internet-wide scans from the University of Michigan

Further, an author of FREAK has publicly disclaimed any notion that it was widely exploited.

UPDATE 2:

Rob Graham has pointed out that typical IDS signatures for FREAK do not detect attacks but rather only detect TLS clients that offer weak cipher suites. This supports the theory that the underlying data was not inspected nor were practitioners consulted prior to using this data in the DBIR.

UPDATE 3:

Justin Kennedy has shared exploit data from five years of penetration tests conducted against his clients and noted that FREAK and Denial of Service attacks never once assisted compromising a target. This supports the theory that exploitation data in the DBIR distorts the view on the ground.

UPDATE 4:

Threatbutt has immortalized this episode with the release of their Danger Zone Incident Retort (DZIR).

UPDATE 5:

Karim Toubba, Kenna Security CEO, has posted a mea culpa on their blog. He notes that they did not confirm any of their analysis with their own product before delivering it to Verizon for inclusion in the DBIR.

What is the point of Kenna's contribution if it was not backed by their insights?

Kenna’s contribution to the DBIR was not validated by their own product

Further, Karim notes that their platform ranks FREAK as a “25 out of 100”, however, even this ranking is orders of magnitude too high based on the available evidence. This introduces the question of whether the problems exposed in Kenna’s analysis from the DBIR extend into their product as well.

Screen Shot 2016-05-12 at 1.40.36 PM

Kenna’s product prioritizes FREAK an order of magnitude higher than it likely should be

Finally, I consider the criticisms in this blog post applicable to their entire contribution to the DBIR and not only their “top ten successfully exploited vulnerabilities” list. Attempts to pigeonhole this criticism to the top ten miss the fact that other sections of the report are based on the same or similar data from Kenna.

UPDATE 6:

Verizon published their own mea culpa.

 

Hacker Handle Bounty

It’s time to close this chapter of our industry’s past. To distance ourselves from the World Wrestling Federation and comic book superheroes.

Hulk Hogan or Terry Bollea?

We’re talking about hacker handles: Dildog, Thomas Dullien, Matt Blaze etc.

When the Internet was young and fancy-free, hacker handles had their place. They afforded anonymity and supported the curious to explore the limits of this new frontier. They felt cool. Mysterious.

No more. When you’re at a security conference how does it feel when you refer to a hacker by her handle? Maybe a little awkward?

What’s more, Google’s Project Zero has shown that handles are dangerous when leaked.

“I retired my hacker handle in 2006. It wasn’t easy. I worried I’d feel exposed at conferences. Instead I felt a lightness almost immediately after going through with it. I was free! From the constraints of an identity that didn’t really fit me any longer. Free from a box that I’d built around myself without realizing it. If I’d known how good it would feel, I would’ve done it much earlier.”
– Alexander “Solar Eclipse” Sotirov, Co-Founder & CTO

Come out of the Shadows

Today, we’re launching a bounty on hacker handles. To participate, you reject your handle in the comments section of this post.

The bounty on offer: an exclusive invitation to an Italian dinner preceding the next Empire Hacking event, to be catered by yours truly. Expect tasty goodness.

Rewards Program

Once you retire your handle, you can earn points in two ways. First, you can post old tweets of yours that turned out to be wrong. The more erroneous, the more points you’ll earn. Second, you can refer your friends. Public outing is encouraged. It’s for the common good.

If, after three months, no one has seen you using your handle, and you’ve earned enough points, you’ll receive a black hat challenge coin.

Please note, if you retire your handle and change to another one later, you’ll owe us money. The fine will correspond to the number of points you’ve accrued so far, and the severity of the offending handle.

We’re calling for the retirement of these handles to help us launch the program:

  • WeldPond
  • Dildog
  • drag0rn
  • Mudge
  • Thomas Dullien
  • Gynvael Coldwind
  • Matt Blaze
  • Redpantz
  • Ian Beer
  • j00ru
  • lcamtuf
  • Simple Nomad
  • Invisigoth
  • Jolly
  • Rattle
  • Decius
  • Space Rogue
  • Solar Designer
  • HDM
  • Dark Tangent
  • Taylor Swift
  • JDuck
  • Travis Normandy

Join our bounty program

Nominate yourself, hacker friends and peers who still use handles. None will be turned away.

The Problem with Dynamic Program Analysis

Developers have access to tools like AddressSanitizer and Valgrind that will tell them when the code that they’re running accesses uninitialized memory, leaks memory, or uses memory after it’s been freed. Despite the availability of these excellent tools, memory bugs still persist, still get shipped to users, and still get exploited in the wild.

Most of today’s bug-finding finding tools are dynamic: they identify bugs in programs while those programs are running. This is great because all programs have massive test suites that exercise every line of code… right? Wrong. Large test suites are the exception, not the rule. Test suites definitely help find and reduce bugs, but bugs still get through.

Perhaps the solution is to pay to have your code audited by professionals. More eyes on your code is a good thing™, but the underlying issue remains. Analyses run inside the heads of experts are still “dynamic”: thinking through every code path is just not tractable.

So dynamic analyses can miss bugs because they can’t check every possible program path. What can check every possible program path?

Finding use-after-frees in millions of lines of code

We use static analysis to analyze millions of lines of code, without ever running the code. The analysis technique, called data-flow tracking, enables us to analyze and summarize properties about every possible program path. This solves the aforementioned problem of missing bugs that occur when certain program paths are not exercised.

How does an analysis that sees everything actually work? Below we describe the 1-2-3 of an actual whole-program static analysis tool that we developed and regularly use. The tool, PointsTo, finds and reports on potential use-after-free bugs in large codebases.

Step 1: Convert to LLVM bitcode

PointsTo operates on the LLVM bitcode representation of a program. We chose LLVM bitcode because it is a convenient intermediate representation for performing program analyses. Unsurprisingly, the first stage of our analysis pipeline converts a program’s source code into an LLVM bitcode database. We use an internal tool named CompInfo to produce these databases. An alternative, open-source tool for doing something similar is whole-program-llvm.

image04

Step 2: Create the data-flow graph

The key idea behind PointsTo is to analyze how pointers to allocated objects flow through the program. What we care about are assignments to and copies of pointers, pointer dereferences, and frees of pointers. These operations on pointers are represented using a data-flow graph.

Four steps to creating a data-flow graph

The most interesting step in the process is the why and how of transforming allocations and frees into special assignments. The “why” is that this transformation lets us repurpose an existing program analysis to find paths from FREE definitions to pointer dereferences. The how is more subtle: how does PointsTo know that it should change “new A” into an ALLOC and “delete a” into a FREE?

Imagine a hypothetical embedded system where programs are starved for memory and so the natural choice is to use a custom memory allocator called ration_memory. We created a Python modelling language to feed PointsTo information about higher-level function behaviors. Our modelling scripts tell PointsTo that “new A” returns a new object, and so we can use it to say the same thing about ration_memory.

Segue: Hidden data flows

The transformation from source code into a data flow graph looked pretty simple, but that was because the source code we started with was simple. It had no function calls, and more importantly, it had no function pointers or method calls! What happens if callback below is a function pointer? What happens if callback frees x?

int *x = malloc(4);
callback(x);
*x += 1;

This is the secret sauce and namesake of PointsTo: we perform a context- and path-sensitive pointer analysis that tells us which function pointers point to which functions and when. Altogether, we can produce an error report that follows x through callback and back again.

Step 3: Dénouement

It’s time to report potential errors for expert analysis. PointsTo searches through the data-flow graph, looking for flows from assignments to FREE down to dereferences. These flows are converted into a program slice of the source code lines, showing the path that execution needs to follow in order to produce a use-after-free. Here’s an example program slice of a real bug:

LightHTTPD Use-After-Free

When describing this system to compiler folks, the usual first question is: but what about false-positives? What if we get a report about a use-after-free and it isn’t one? Here is where the priorities of program analysis for compilers and for vulnerabilities diverge.

False-positives in a compiler analysis can introduce bugs, and so compilers are usually conservative. That is, they trade false-positives for false-negatives. They might miss some optimization opportunities because they can’t prove something, but at least the program will be compiled correctly *cough*.

For vulnerability analysis, this is a bad trade. False-positives in a vulnerability analysis are inconvenient, but they’re a drop in the ocean when millions of lines of code need to be looked at. False-negatives, however, are unacceptable. A false-negative is a bug that is missed and might make it to production. A tool that always finds the bug and sometimes warns you about sketchy but correct code is an investment that saves time and money during code audits.

In summary, we conclude

Analyzing programs for bugs is hard. Industry best-practices like having extensive test suites should be followed. Developers should regularly run their programs through dynamic analysis tools to pick the low-hanging fruit. But more importantly, developers should understand that test suites and dynamic analyses are not a panacea. Bugs have a nasty habit of hiding behind rarely executed code paths. That’s why all paths need to be looked at. That’s why we made PointsTo.

PointsTo was a topic of discussion at a recent Empire Hacking, a bi-monthly meetup in NYC. The talk I gave there includes more information about the design and implementation of PointsTo and, for curious readers, the slides and video are reproduced below. We hope to release more videos from Empire Hacking in the future.

PointsTo was originally produced for Cyber Fast Track and we would like to thank DARPA for funding our work. Consultants at Trail of Bits use PointsTo and other internal tools for application security reviews. Contact us if you’re interested in a detailed audit of your code supported by tools like PointsTo and our CRS.

 

Apple can comply with the FBI court order

Earlier today, a federal judge ordered Apple to comply with the FBI’s request for technical assistance in the recovery of the San Bernadino gunmen’s iPhone 5C. Since then, many have argued whether these requests from the FBI are technically feasible given the support for strong encryption on iOS devices. Based on my initial reading of the request and my knowledge of the iOS platform, I believe all of the FBI’s requests are technically feasible.

The FBI’s Request

In a search after the shooting, the FBI discovered an iPhone belonging to one of the attackers. The iPhone is the property of the San Bernardino County Department of Public Health where the attacker worked and the FBI has permission to search it. However, the FBI has been unable, so far, to guess the passcode to unlock it. In iOS devices, nearly all important files are encrypted with a combination of the phone passcode and a hardware key embedded in the device at manufacture time. If the FBI cannot guess the phone passcode, then they cannot recover any of the messages or photos from the phone.

There are a number of obstacles that stand in the way of guessing the passcode to an iPhone:

  • iOS may completely wipe the user’s data after too many incorrect PINs entries
  • PINs must be entered by hand on the physical device, one at a time
  • iOS introduces a delay after every incorrect PIN entry

As a result, the FBI has made a request for technical assistance through a court order to Apple. As one might guess, their requests target each one of the above pain points. In their request, they have asked for the following:

  1. [Apple] will bypass or disable the auto-erase function whether or not it has been enabled;
  2. [Apple] will enable the FBI to submit passcodes to the SUBJECT DEVICE for testing electronically via the physical device port, Bluetooth, Wi-Fi, or other protocol available on the SUBJECT DEVICE; and
  3. [Apple] will ensure that when the FBI submits passcodes to the SUBJECT DEVICE, software running on the device will not purposefully introduce any additional delay between passcode attempts beyond what is incurred by Apple hardware.

In plain English, the FBI wants to ensure that it can make an unlimited number of PIN guesses, that it can make them as fast as the hardware will allow, and that they won’t have to pay an intern to hunch over the phone and type PIN codes one at a time for the next 20 years — they want to guess passcodes from an external device like a laptop or other peripheral.

As a remedy, the FBI has asked for Apple to perform the following actions on their behalf:

[Provide] the FBI with a signed iPhone Software file, recovery bundle, or other Software Image File (“SIF”) that can be loaded onto the SUBJECT DEVICE. The SIF will load and run from Random Access Memory (“RAM”) and will not modify the iOS on the actual phone, the user data partition or system partition on the device’s flash memory. The SIF will be coded by Apple with a unique identifier of the phone so that the SIF would only load and execute on the SUBJECT DEVICE. The SIF will be loaded via Device Firmware Upgrade (“DFU”) mode, recovery mode, or other applicable mode available to the FBI. Once active on the SUBJECT DEVICE, the SIF will accomplish the three functions specified in paragraph 2. The SIF will be loaded on the SUBJECT DEVICE at either a government facility, or alternatively, at an Apple facility; if the latter, Apple shall provide the government with remote access to the SUBJECT DEVICE through a computer allowed the government to conduct passcode recovery analysis.

Again in plain English, the FBI wants Apple to create a special version of iOS that only works on the one iPhone they have recovered. This customized version of iOS (*ahem* FBiOS) will ignore passcode entry delays, will not erase the device after any number of incorrect attempts, and will allow the FBI to hook up an external device to facilitate guessing the passcode. The FBI will send Apple the recovered iPhone so that this customized version of iOS never physically leaves the Apple campus.

As many jailbreakers are familiar, firmware can be loaded via Device Firmware Upgrade (DFU) Mode. Once an iPhone enters DFU mode, it will accept a new firmware image over a USB cable. Before any firmware image is loaded by an iPhone, the device first checks whether the firmware has a valid signature from Apple. This signature check is why the FBI cannot load new software onto an iPhone on their own — the FBI does not have the secret keys that Apple uses to sign firmware.

Enter the Secure Enclave

Even with a customized version of iOS, the FBI has another obstacle in their path: the Secure Enclave (SE). The Secure Enclave is a separate computer inside the iPhone that brokers access to encryption keys for services like the Data Protection API (aka file encryption), Apple Pay, Keychain Services, and our Tidas authentication product. All devices with TouchID (or any devices with A7 or later A-series processors) have a Secure Enclave.

When you enter a passcode on your iOS device, this passcode is “tangled” with a key embedded in the SE to unlock the phone. Think of this like the 2-key system used to launch a nuclear weapon: the passcode alone gets you nowhere. Therefore, you must cooperate with the SE to break the encryption. The SE keeps its own counter of incorrect passcode attempts and gets slower and slower at responding with each failed attempt, all the way up to 1 hour between requests. There is nothing that iOS can do about the SE: it is a separate computer outside of the iOS operating system that shares the same hardware enclosure as your phone.

The Hardware Key is stored in the Secure Enclave in A7 and newer devices

The Hardware Key is stored in the Secure Enclave in A7 and newer devices

As a result, even a customized version of iOS cannot influence the behavior of the Secure Enclave. It will delay passcode attempts whether or not that feature is turned on in iOS. Private keys cannot be read out of the Secure Enclave, ever, so the only choice you have is to play by its rules.

Passcode delays are enforced by the Secure Enclave in A7 and newer devices

Passcode delays are enforced by the Secure Enclave in A7 and newer devices

Apple has gone to great lengths to ensure the Secure Enclave remains safe. Many consumers became familiar with these efforts after “Error 53” messages appeared due to 3rd party replacement or tampering with the TouchID sensor. iPhones are restricted to only work with a single TouchID sensor via device-level pairing. This security measure ensures that attackers cannot build a fraudulent TouchID sensor that brute-forces fingerprint authentication to gain access to the Secure Enclave.

For more information about the Secure Enclave and Passcodes, see pages 7 and 12 of the iOS Security Guide.

The Devil is in the Details

“Why not simply update the firmware of the Secure Enclave too?” I initially speculated that the private data stored within the SE was erased on updates, but I now believe this is not true. Apple can update the SE firmware, it does not require the phone passcode, and it does not wipe user data on update. Apple can disable the passcode delay and disable auto erase with a firmware update to the SE. After all, Apple has updated the SE with increased delays between passcode attempts and no phones were wiped.

If the device lacks a Secure Enclave, then a single firmware update to iOS will be sufficient to disable passcode delays and auto erase. If the device does contain a Secure Enclave, then two firmware updates, one to iOS and one to the Secure Enclave, are required to disable these security features. The end result in either case is the same. After modification, the device is able to guess passcodes at the fastest speed the hardware supports.

The recovered iPhone is a model 5C. The iPhone 5C lacks TouchID and, therefore, lacks a Secure Enclave. The Secure Enclave is not a concern. Nearly all of the passcode protections are implemented in software by the iOS operating system and are replaceable by a single firmware update.

The End Result

There are still caveats in these older devices and a customized version of iOS will not immediately yield access to the phone passcode. Devices with A6 processors, such as the iPhone 5C, also contain a hardware key that cannot ever be read. This key is also “tangled” with the phone passcode to create the encryption key. However, there is nothing that stops iOS from querying this hardware key as fast as it can. Without the Secure Enclave to play gatekeeper, this means iOS can guess one passcode every 80ms.

Passcodes can only be guessed once every 80ms

Passcodes can only be guessed once every 80ms with or without the Secure Enclave

Even though this 80ms limit is not ideal, it is a massive improvement from guessing only one passcode per hour with unmodified software. After the elimination of passcode delays, it will take a half hour to recover a 4-digit PIN, hours to recover a 6-digit PIN, or years to recover a 6-character alphanumeric password. It has not been reported whether the recovered iPhone uses a 4-digit PIN or a longer, more complicated alphanumeric passcode.

Festina Lente

Apple has allegedly cooperated with law enforcement in the past by using a custom firmware image that bypassed the passcode lock screen. This simple UI hack was sufficient in earlier versions of iOS since most files were unencrypted. However, since iOS 8, it has become the default for nearly all applications to encrypt their data with a combination of the phone passcode and the hardware key. This change necessitates guessing the passcode and has led directly to this request for technical assistance from the FBI.

I believe it is technically feasible for Apple to comply with all of the FBI’s requests in this case. On the iPhone 5C, the passcode delay and device erasure are implemented in software and Apple can add support for peripheral devices that facilitate PIN code entry. In order to limit the risk of abuse, Apple can lock the customized version of iOS to only work on the specific recovered iPhone and perform all recovery on their own, without sharing the firmware image with the FBI.


For more information, please listen to my interview with the Risky Business podcast.

  • Update 1: Apple has issued a public response to the court order.
  • Update 2: Software updates to the Secure Enclave are unlikely to erase user data. Please see the Secure Enclave section for further details.
  • Update 3: Reframed “The Devil is in the Details” section and noted that Apple can equally subvert the security measures of the iPhone 5C and later devices that include the Secure Enclave via software updates.

Tidas: a new service for building password-less apps

For most mobile app developers, password management has as much appeal as a visit to the dentist. You do it because you have to, but it is annoying and easy to screw up, even when using standard libraries or protocols like OAUTH.

Your users feel the same way. Even if they know to use strong passwords and avoid reusing them, mobile devices make this difficult. Typing a strong p@4sw0r%d on a tiny keyboard is a hassle.

Today, we’ve got some good news for app developers. We’re releasing a simple SDK drop-in for iOS apps called Tidas. This SDK allows you to completely replace passwords with a simple touch to log into an app. It relies on strong encryption built into iOS to validate the user’s identity without the need to transmit any private information outside of the device.

Tidas: Make passwords obsolete

Tidas: Make passwords obsolete

When your app is installed on a new device, the Tidas SDK generates a unique encryption key identifying the user and registers it with the Tidas backend. This key is stored on the device in the iOS Secure Enclave chip and is protected by Touch ID, requiring the user to use their fingerprint to sign into the app. Signing in generates a digitally signed session token that your backend can pass to the Tidas backend to verify the user’s identity. The entire authentication process is handled by the SDK and does not require you to touch any of the user’s sensitive data.

Start a free trial to see our source code

Preserve user privacy and minimize your liability

Tidas is built by Trail of Bits, a security research company dedicated to advancing Internet security. From the ground up, we have designed Tidas to be safe even in the worst case scenario. If the Tidas backend or your servers were breached tomorrow, the attackers would gain nothing: they would find no passwords and no personally identifying information.

That’s because Tidas doesn’t store any sensitive data outside the mobile device. A user’s encryption keys never leave their device’s Secure Enclave chip and cannot be compromised even if the app, the device or the server are hacked.

Tidas doesn’t collect or have any access to the user’s fingerprints either. That’s Touch ID’s job: it collects users’ fingerprints for authentication and stores them in the Secure Enclave, so they remain completely opaque to Tidas. By design, Tidas protects user’s privacy, and you never have to worry about how to handle their login credentials.

Free access until March 31, 2016

Tidas is free until March 31st. There’s no billing, and no usage limits. Just sign up to gain unfettered access to Tidas’s API. We’ll also provide all the Ruby middleware and Objective-C client libraries you need.

Go to passwordlessapps.com now and download the Tidas SDK now!

Read more about the fast-approaching death of the password in the Wall St Journal and our press release about Tidas this morning.

Join us at Etsy’s Code as Craft

We’re excited to announce that Sophia D’Antoine will be the next featured speaker at Etsy’s Code as Craft series on Wednesday, February 10th from 6:30-8pm in NYC.

What is Code as Craft?

Etsy Code as Craft events are a semi-monthly series of guest speakers who explore a technical topic or computing trend, sharing both conceptual ideas and practical advice. All talks will take place at the Etsy Labs on the 7th floor at 55 Washington Street in beautiful Brooklyn (Suite 712). Come see an awesome speaker and take a whirl in our custom photo booth. We hope to see you at an upcoming event!

In her talk, Sophia will discuss the latest in iOS security and the cross-section between this topic and compiler theory. She will discuss one of our ongoing projects, MAST, a mobile application security toolkit for iOS, which we discussed on this blog last year. Since then, we’ve continued to work on it, added new features, and transitioned it from a proof-of-concept DARPA project to a full-fledged mobile app protection suite.

What’s the talk about?

iOS applications have become an increasingly popular targets for hackers, reverse engineers, and software pirates. In this presentation, we discuss the current state of iOS attacks, review available security APIs, and reveal why they are not enough to defend against known threats. For high-risk applications, novel protections that go beyond those offered by Apple are required. As a solution, we discuss the design of the Mobile Application Security Toolkit (MAST) which ties together jailbreak detection, anti-debugging, and anti-reversing in LLVM to address these risks.

We hope to see you there. If you’re interested in attending, follow this link to register. MAST is still a beta product, so if you’re interested in using it on your own iOS applications after seeing this talk, contact us directly.

Software Security Ideas Ahead of Their Time

Every good security researcher has a well-curated list of blogs they subscribe to. At Trail of Bits, given our interest in software security and its intersections with programming languages, one of our favorites is The Programming Language Enthusiast by Michael Hicks.

Our primary activity is to describe and discuss research about — and the practical development and use of — programming languages and programming tools (PLPT). PLPT is a core area of computer science that bridges high-level algorithms/designs and their executable implementations. It is a field that has deep roots in mathematical logic and the theory of computation but also produces practical compilers and analysis tools.

One of our employees and PhD student at UMD, Andrew Ruef, has written a guest blog post for the PL Enthusiast on the topic of software security ideas that were ahead of their time.

As researchers, we are often asked to look into a crystal ball. We try to anticipate future problems so that work we begin now will address problems before they become acute. Sometimes, a researcher foresees a problem and its possible solution, but chooses not to pursue it. In a sense, she has found, and discarded, an idea ahead of its time.

Recently, a friend of Andrew’s pointed him to a 20-year-old email exchange on the “firewalls” mailing list that blithely suggests, and discards, problems and solutions that are now quite relevant, and on the cutting edge of software security research. The situation is both entertaining and instructive, especially in that the ideas are quite squarely in the domain of programming languages research, but were not considered by PL researchers at the time (as far as we know).

Read on for a deep dive into the firewalls listserv from 1995, prior to the publication of Smashing the Stack for Fun and Profit, as a few casual observers correctly anticipate the next 20 years of software security researchers.

If you enjoyed Andrew’s post on the PL Enthusiast, we recommend a few others that touch upon software security:

Hacking for Charity: Automated Bug-finding in LibOTR

At the end of last year, we had some free time to explore new and interesting uses of the automated bug-finding technology we developed for the DARPA Cyber Grand Challenge. While the rest of the competitors are quietly preparing for the CGC Final Event, we can entertain you with tales of running our bug-finding tools against real Linux applications.

Like many good stories, this one starts with a bet:

image01

On November 4, 2014, Thomas Ptacek (of Starfighter) bet Matthew Green (of Johns Hopkins) that libotr, a popular library used in secure messaging software, would have a high severity (e.g. remote code execution, information disclosure) bug in the next 12 months. Here at Trail of Bits, we like a good wager, especially when the proceeds go to charity. And we just happened to have an automated bug-finding system laying around, itching for something to do. The temptation was too much to resist: we decided to use our automated bug-finding system from the Cyber Grand Challenge to look for bugs in libotr.

Before we go on, we should state that this was not a security audit. We simply wanted to test how well our automated bug-finding system works on real Linux software and maybe win some money for charity.

We successfully enhanced our bug-finding system to support the libotr library and tested it extensively. Our system confirmed that there were no critical bugs in code paths that we tested; since no one else reported any bugs, the bet ended with Matthew Green donating $1000 to Partners in Health.

Read on to discover the challenges encrypted communications systems present for automated testing, how we solved them, and our testing methodology. Of course, just because our system didn’t find bugs in libotr does not mean that libotr is bug-free.

Background

The automated bug-finding system, known as a Cyber Reasoning System (CRS), that we built for the Cyber Grand Challenge operates on binary code for the DECREE operating system. While DECREE is based on Linux, it differs considerably from plain Linux. DECREE has no signals, no shared memory, no threads, no sockets, no files, and only seven system calls. This means that DECREE is not binary or source compatible with Linux libraries like libotr.

After weighing our options, we decided the easiest and fastest way to test libotr was to port it to DECREE, instead of adding full Linux support to our CRS. We attempted the port in a generic manner, to ensure we could use the lessons learned to test future Linux software.

To port libotr, we had to solve two major issues: shared library dependencies (libotr depends on libgpgerror and libgcrypt) and libc support. We used LLVM to solve both problems at once. First, we used whole-program-llvm to compile libotr and all dependencies to LLVM bitcode. We then merged all the shared libraries at the bitcode level, and aggressively optimized the resulting bitcode. In one move, we eliminated the need for shared libraries, and drastically reduced the amount of libc we’d have to implement, because unused libc calls were optimized out of the resulting bitcode. To build a libc that works on DECREE, we combined libc implementations from the challenge binaries, stubbed functions that don’t make sense in DECREE, and created new implementations based on DECREE calls where appropriate.

Automated Testing

Encrypted communications applications are, by design, difficult to automatically audit. This makes perfect sense: if an automated system can reason how ciphertext relates to plaintext, the encrypted communication system is already broken. These systems are also difficult to audit by random testing (e.g. fuzzing), because recipients will verify the integrity of every message. Typically when testing encrypted systems, the encryption is turned off (or data is manipulated prior to encryption or after decryption). We wanted to simulate testing a black-box binary, so we did not modify libotr in any way. Instead, we thought the best path was to make our CRS simulate a man-in-the-middle (MITM) attack. Because we tested an unmodified libotr, our CRS could not effectively attack code past message integrity checks. However, there was still much in the way of attack surface: message control data, headers, and possibility of flaws in decryption/authentication code. The problem was that our CRS was not designed to MITM. We instead architected the test application (not libotr) to be easier to attack, which results in the convoluted architecture below.

image02

The CRS acts as a man-in-the-middle between two applications communicating using libotr.

Creating the test application was more difficult than porting libotr to DECREE. The porting process was fairly straightforward and took about two weeks. The sample application took a bit longer, and was a much more frustrating experience: the official libotr distribution has no sample code, and the documentation leaves a lot to be desired.

Our testing was limited by the features of libotr exercised by our sample application (for instance, it doesn’t use SMP), and by the unusual test application we created. Additionally, some vulnerabilities may only occur after decryption, and modification of encrypted and authenticated data will never trigger these bugs.

Results

The results of testing libotr are very encouraging. We ran 48 Xeon CPUs for 24 hours against our libotr sample application, and did not identify any memory safety violations.

image00

The CRS acts as a man-in-the-middle between two applications communicating using libotr.

This negative result does not mean that libotr is bug free. We only tested a subset of libotr, and there are considerable parts that our CRS never audited. The lack of obvious bugs is however a very good sign.

Conclusion

The timeframe of the libotr bet has expired without any reported high severity vulnerabilities. We audited parts of libotr with our automated bug-finding tools, and also didn’t find memory corruption vulnerabilities. In the process of setting up this test, we learned how to port Linux applications to DECREE and verified that our CRS can identify real bugs in Linux programs. Better documentation, tests, and sample applications that exercise every libotr feature would simplify both automated and manual auditing. For this experiment we constrained ourselves to an unmodified libotr. We are planning a future test where we modify libotr to enable easier automated testing.

Follow

Get every new post delivered to your Inbox.

Join 5,511 other followers