How safe browsing fails to protect user privacy

Recently, security researchers discovered that Apple was sending safe browsing data to Tencent for all Chinese users. This revelation has brought the underlying security and privacy guarantees of the safe browsing protocol under increased scrutiny. In particular, safe browsing claims to protect users by providing them with something called k-anonymity. In this post we’ll show that this definition of privacy is somewhat meaningless in the context of safe browsing. Before jumping into why k-anonymity is insufficient, let’s take a look at how the safe browsing protocol works.

How does safe browsing work?

A while back, Google thought it would be useful to leverage their knowledge of the web to provide a database of malicious sites to web clients. Initially, users would submit their IP address and the URL in question to Google, which would subsequently be checked against a malware database. This scheme was called the Lookup API and is still available today. However, people quickly became uneasy about surrendering so much of their privacy. This reasonable concern led to the development of the current safe browsing scheme, called the Update API, which is used by both Google and Tencent.

Screen Shot 2019-10-29 at 6.19.23 PM

Safe browsing Update API flowchart from Gerbet et al

At a high level, Google maintains a list of malicious URLs and their 256-bit hashes. To save on bandwidth when distributing this list to browsers, they only send out a 32-bit prefix of each hash. This means that when the user’s browser checks whether or not a site is malicious, they might get a false positive, since many 256-bit URL hashes will contain the same 32-bit prefix. To remedy this, if a match occurs, the browser will send the 32-bit prefix in question to Google and get a full list of URLs whose 256-bit hash contains that prefix. To recap, the safe browsing Update API goes through the following steps every time a user tries to visit a new URL:

  1. Browser hashes the URL and checks it against the (local) list of 32-bit prefixes.
  2. If there is a match, the browser sends Google the 32-bit prefix.
  3. Google then sends back all blacklisted URLs whose 256-bit hash contains the prefix.
  4. If there is a match in the updated list, the browser issues a warning to the user.

Intuitively, this safe browsing scheme is more private than the original, since Google only learns the 32-bit prefix of each potentially malicious site the user visits. Indeed, Google has argued that it provides users with something called k-anonymity—a metric used by privacy analysts to determine how unique a piece of identifying information is. Let’s take a look at what exactly k-anonymity is, and to what extent safe browsing satisfies this definition.

What is k-anonymity

Traditionally, k-anonymity has been used to remove personal identifying information from a database. At a high level, it involves removing pieces of sensitive data until everyone in the dataset “looks like” at least k other people with respect to certain traits. For example, if we had the table of medical records in Figure 1, we could modify the Name and Age fields to make patients 2-anonymous with respect to Name, Age, Gender, and State, as shown in Figure 2.

Name Age Gender State Disease
Candace 26 Female NY Flu
Richard 23 Male CA Flu
Robin 15 Nonbinary NY None
Alyssa 52 Female CA Cancer
Omar 29 Male CA None
Kristine 17 Nonbinary NY Cancer
Emily 58 Female CA Heart-disease
Jasmine 20 Female NY None

Figure 1

Anyone trying to use this data will get all the info they need to perform some kind of statistical analysis (ostensibly your name won’t really affect your likelihood of getting TB), but anyone represented in the database will “look like” at least two other people. That way, an attacker trying to de-anonymize people will fail because they won’t be able to distinguish between the three entries that look alike. Obviously the bigger the k, the better; if the attacker is an insurance provider trying to use medical data as a way to justify hiking up your premiums, a database providing 2-anonymity might not be enough. In Figure 2, if the insurance company knows you are represented in the database and a 52 year old woman from California, they will be able to deduce that you have either cancer or heart disease and start charging you more money.

Name Age Gender State Disease
* 20-30 Female NY Flu
* 20-30 Male CA Flu
* 10-20 Nonbinary NY None
* 50-60 Female CA Cancer
* 20-30 Male CA None
* 10-20 Nonbinary NY Cancer
* 50-60 Female CA Heart-disease
* 20-30 Female NY None

Figure 2

Back to safe browsing: We can see how restricting the URLs viewable by Google or Tencent to a 32-bit hash prefix renders both providers unable to distinguish your request from any other URL with that same hash prefix. The question is, how many such collisions can we expect to occur? In 2015 Gerbet et al concluded that each prefix occurred roughly 14757 times across the web, implying that users of safe browsing can expect their browsing data to be roughly 14757-anonymous. In other words, Google/Tencent only knows that the website you attempted to go to is contained in a set of size approximately 14757, which is likely big enough to contain plenty of generic websites that would not be politically (or commercially) very revealing.

Why k-anonymity fails to protect users

Despite the fact that safe browsing satisfies the definition of k-anonymity, it actually isn’t very hard for Google to recover your browsing data from these queries. This insecurity is due to the fact that the privacy guarantees of k-anonymity don’t account for Google’s ability to cross-reference multiple safe browsing queries and narrow down which specific website corresponds with a given 32-bit prefix.

As a first example of such an attack, recall that Google uses cookies for safe browsing and can therefore see when multiple queries come from the same IP address. Now, suppose both www.amazon.com and https://www.amazon.com/gp/cart/view.html?ref_=nav_cart share a 32-bit hash prefix with two different malicious websites. If a user visits both Amazon and their shopping cart in rapid succession, Google will receive both 32-bit hash prefixes. Since it is unlikely the user visited two unrelated malicious websites back to back, Google can be reasonably sure that they were shopping on Amazon. This attack only works when two related websites both share a 32-bit hash prefix with malicious websites and the user visits them within a small window of time. However, this example already shows that k-anonymity isn’t so useful when faced with an adversary capable of correlating multiple queries.

The situation is actually much worse, though, because the safe browsing protocol often forces users to submit several highly correlated URLs at the same time. These multiple queries occur because many URLs are in some sense “too specific” for Google to keep track of, since malicious websites can create new URLs faster than Google can report each specific one. To account for this, users submit a set of “URL decompositions” for each query, which is constructed by progressively stripping pieces of the URL off. For example, when visiting the URL http://a.b.c/1/2 the browser would simultaneously check the following URLs against the safe browsing database:

  1. a.b.c/1/2
  2. a.b.c/1/
  3. a.b.c/
  4. b.c/
  5. b.c/1/

Using the full URL decomposition allows Google to provide users with a high degree of confidence that the website they are visiting isn’t malicious. However, submitting many highly correlated 32-bit hash prefixes all at once ruins much of the privacy originally provided by the safe browsing protocol. If Google receives the 32-bit hash prefix corresponding to both a.b.c/ and a.b.c/1 in the same query, it can easily de-anonymize the user’s browsing data. Even in circumstances where submitting multiple prefixes doesn’t lead to full URL recovery, there may be sufficient information to learn sensitive details about the user’s browsing history.

To bring things down to earth, Gerbet et al. showed that this URL decomposition attack can be used to identify a user’s taste in pornography—something an oppressive government would certainly be interested in monitoring. Even worse, since these malware databases aren’t made public, it’s difficult to determine if hash prefixes haven’t been adversarially included to track users. While we may trust Google not to rig the database so they can determine when users visit pro-Hong Kong websites, it’s easy to imagine Tencent taking advantage of this vulnerability.

Looking forward

While safe browsing undoubtedly provides real security benefits to users, it fails to protect them from companies or governments that want to monitor their browsing habits. Unfortunately, this lack of privacy is obscured by the fact that the protocol provides users with a weak, but technically precise, notion of anonymity. As both the technology and legal communities rally around tools like k-anonymity and differential privacy, we need to keep in mind that they are not one-size-fits-all techniques, and systems that theoretically satisfy such definitions might provide no real meaningful privacy when actually deployed.

If you’re considering using tools like differential privacy or k-anonymity in your application, our cryptographic services team can help you navigate the inherent subtleties of these systems. Whether you need help with protocol design or auditing an existing codebase, our team can help you build something your users will be able to trust.

Grace Hopper Celebration (GHC) 2019 Recap

by Rachel Cipkins, Stevens Institute of Technology, Hoboken, NJ

A few weeks ago I had the inspiring experience of attending the annual Grace Hopper Celebration (GHC), the world’s largest gathering of women in technology. Over four days in Orlando, Florida, GHC hosted a slew of workshops and presentations, plus a massive career fair with over 450 vendors (by comparison, Black Hat USA had about 300 vendors this year). The conference attracted over 25,000 attendees from 83+ countries—a mix of women in technology as well as a significant population of male allies. And at a time when still only 25% of computing jobs are held by women, it was encouraging to see GHC garner vast coverage from the technology industry.

As an aspiring security professional, I was also pleased that GHC 2019 had extensive coverage of cybersecurity (at least 15 talks!), even though the conference was not dedicated to it. It was uplifting to represent women in computing from Stevens along with 20 of my peers (half engineering majors and half computer science majors). We took full advantage of the jam-packed conference program, which included highlights like:

  • Keynote speakers focused on the importance of giving women the courage to explore careers in technology.
  • The PitchHER competition, where female entrepreneurs leading early-stage startups compete for prize money.
  • Open Source day, where participants can contribute to open source projects with the goal of making a positive impact on the world.
  • Diversity group meetups for organizations such as Lesbians Who Tech and Black Girls Who Code.
  • 20 workshop tracks scheduled in between events such as Artificial Intelligence, Emerging Technology, and Open Source.

That awesome, unexpected cybersecurity focus

Naturally, I took in as many of the cybersecurity events as I could. I especially appreciated the featured academic presentations from three ACM Award Winning Research in Cybersecurity winners—I highly recommend digging into their research papers:

  • Aisha Ali-Gombe, Towson University, Leveraging Software Instrumentation for Android Security Assessment. Ali-Gombe outlined ways to maliciously use Android security flaws, such as methods for spying on users, privilege escalation, data exploitation, and botnet. She also described her process for combating these threats, which included developing DroidScraper and AspectDroid to eliminate malicious Android use without low-level modification of the OS/framework.
  • Alexandra Dmitrienko, Institute of Information Security, ETH Zurich, Secure Free-Floating Car Sharing For Offline Cars. Free-floating car sharing is efficient, cost effective, and reduces traffic and air pollution. However, it does not support offline cars, and modifications have to be made to the car in order for the service to work. Dmitrienko created a solution that uses RFID chips in off-the-shelf cars to overcome these issues. She also implemented two-factor authentication on mobile platforms for enhanced security.
  • Shagufta Mehnaz, Purdue University, A Fine-Grained Approach for Anomaly Detection in File System Accesses. Mehnaz’ research explored how access control mechanisms cannot always prevent authorized users from misusing or stealing sensitive data. She created fine-grained profiles of file system users, then used these profiles to detect anomalous access to file systems.

I attended most of the Security/Privacy track workshops, which covered all knowledge levels. One of my peers, a third-year doctoral student in cybersecurity research, agreed the workshops covered a wide range of skill levels and presented the problems well, but thought the time constraints (~1 hour) didn’t allow for enough background information to help everyone understand solutions. Still, it was a generous attempt to allow non-technical attendees to benefit as well.

THE place to find female tech talent

GHC is a well-known recruiting ground for women in technology; the conference is scheduled to coincide with the start of the recruiting season for new graduates and summer internships. The career fair spans the duration of the conference and includes companies as well as universities.

One of the most useful things GHC provides is a resume database companies and universities can use to recruit potential candidates before the conference starts. Recruiters can also rent space in the interview hall to host interviews at the conference, and larger companies tend to host private networking events at offsite locations. Almost everyone in my group received an offer from a company they interviewed with, or was contacted to schedule an interview with a company not holding interviews at the conference.

Nearly every company I talked to at the career fair perked up at the mention of cybersecurity. Despite the high demand for security expertise, though, there were few security sponsors. Notable security companies that were recruiting at the conference included MITRE, Red Balloon, and Crowdstrike, but only one of them conducted interviews on-site. Hopefully, we’ll see more security companies at GHC next year.

You want to be there

Attending the Grace Hopper Celebration was an empowering experience for me as a woman in technology and especially as a woman in security. The coverage of cybersecurity-related workshops and talks at GHC is definitely growing, and it has proven to be a great place to recruit female security talent.

It was also just an incredible and unique experience to spend the week surrounded by amazing women. Everyone I spoke to described the energy at the conference as “electric.” My peers and I were able to make new professional connections, learn about new technology trends, and bring back new ideas to the various women in STEM groups at Stevens. We left the conference with a sense of courage to continue to grow our careers in technology and to inspire other women to pursue the same path.

GHC 2020 will be held in Orlando, Florida, Sept. 29–Oct. 2, and it is definitely an event worth considering for both general attendees and sponsors.

Formal Analysis of the CBC Casper Consensus Algorithm with TLA+

by Anne Ouyang, Piedmont Hills High School, San Jose, CA

As a summer intern at Trail of Bits, I used the PlusCal and TLA+ formal specification languages to explore Ethereum’s CBC Casper consensus protocol and its Byzantine fault tolerance. This work was motivated by the Medium.com article Peer Review: CBC Casper by Muneeb Ali, Jude Nelson, and Aaron Blankstein, which indicated that CBC Casper’s liveness properties impose stricter Byzantine fault thresholds than those suggested by the safety proof. To explore this, I specified the Casper the Friendly Binary consensus protocol in TLA+.

As expected, it was impossible to determine finality without a Byzantine fault threshold of less than one-third of the total weight of all validators, which is consistent with Lamport et al.’s original paper on the Byzantine Generals Problem. However, as long as that condition was satisfied, CBC Casper appeared to function as intended.

First, a Little Background

The CBC Casper (Correct By Construction) protocol is a PoS consensus protocol designed to one day be used for Ethereum. However, the current state of CBC Casper hasn’t undergone much formal analysis, and its under-specification poses a challenge to the introduction of Ethereum 2.0.

In a distributed network, individual nodes, called validators, use the Minimal CBC Casper family of consensus protocols to make consistent decisions. The protocol’s five parameters are the names and weights of validators, fault tolerant threshold, consensus values, and estimator. Validators make individual decisions based on their current state, which is defined in terms of the messages received, which have three components:

  • Sender: the name of a validator sending the message
  • Estimate: a subset of values in the consensus values set
  • Justification: a set of messages (state) received to arrive at the estimate

As a result, the sending and receiving of messages can be defined as state transitions.

Equivocation occurs when a validator sends a pair of messages that do not have each other in their justifications. The future states are all reachable states, where the equivocation fault is less than the fault tolerant threshold. Finality is defined by safety oracles, which detect when a certain property holds for all future states.

TLA+ and PlusCal

TLA+ is a formal specification language describing behaviors with states and state transitions. The specifications and state machine models can be checked with the TLC model checker. TLC performs a breadth-first search over the defined state transitions and checks for the properties that need to be satisfied.

The Byzantine Generals Problem

For context, The Byzantine Generals Problem is an analogy for decision-making in distributed systems in the presence of malicious individuals. The problem states that a commanding general must send an order to n-1 lieutenant generals such that 1) all of them obey the same order and 2) if the commanding general is loyal, then every loyal lieutenant obeys the order. A solution to the problem must ensure that all the loyal generals agree on the same plan, and a small number of traitors cannot lead the generals to a bad plan.

Now Let’s Dive into the Process

To start, I specified the definitions in the TLA+ language and defined the states and state transitions in terms of sets and set operations. Figure 1 shows a snippet of the specification.

anne_1

Figure 1: Snippet of specification in TLA+

I specified the message relay in PlusCal, which is more pseudocode-like and can be automatically transpiled to TLA+.

anne_2

Figure 2: The CBC Casper message relay specified in PlusCal.

The assumption is that all the messages are sent and received successfully without time delay. The specification does not include message authentication, because it is assumed that the validators can verify the authenticity and source of messages. In this implementation, equivocating validators behave such that they take different subsets of all received messages, use these subsets to obtain the estimates, and send different messages to different validators.

The TLC model checker checks that eventually a clique will be found where all the non-Byzantine nodes can mutually see each other agreeing with a certain estimate in messages, and cannot see each other disagreeing. When this condition is met, finality is achieved, and the model checker should terminate without giving errors, as shown in Figure 3.

anne_3

Figure 3: TLC model checking results.

When finality cannot be reached, the model checker will detect that a temporal property has been violated, as shown in Figure 4.

anne_4

Figure 4: TLC errors.

The temporal property checks for the existence of a clique of validators with a total weight greater than:

CodeCogsEqn (2)

All the validators in the clique satisfy the following:

  • None of the validators are Byzantine-faulty. They do not send equivocating messages.
  • All the validators can mutually see each other agreeing. Each validator has exactly one latest message, and they have the same estimate.
  • None of the validators can mutually see each other disagreeing. A validator has a latest message in the other’s justification, but has a new latest message that doesn’t have the same estimate as the latest message of the other validator.

In practice, the failure to achieve finality would mean the blockchain stops producing new blocks, since there is no guarantee that a transaction (i.e. an estimate) will not be reversed in the future.

Conclusion

We find that when the fault-tolerant threshold is set to greater than one-third of the total weight of all the validators, e-cliques cannot be formed. This means finality can be reached when the fault-tolerant threshold is less than one-third of the total weight. This is consistent with “The Byzantine Generals Problem,” which states that the problem “is solvable if and only if more than two-thirds of the generals are loyal.” Intuitively, for the protocol using the clique oracle, a higher fault-tolerant threshold would cause there to be too few validators to form a clique.

While the CBC Casper specification provides a proof of safety, it does not address liveness properties. For example, the CBC Casper blockchain protocol may encounter the problem in which no more blocks can be finalized. Further work is needed in specifying liveness, because finality is a liveness problem and is necessary for a switch to a PoS method. I found no liveness faults, but only tested binary consensus with a very small number of validators. Liveness faults may exist in more sophisticated instantiations of CBC Casper.

Some Thoughts on Formal Verification and TLA+

Developing an abstract mathematical model and systematically exploring the possible states is an interesting and important way to check the correctness of algorithms. However, I encountered the following challenges:

  • Few examples of implementation details of the TLA+ language and good practices for formal verification. Therefore, writing specifications can involve a lot of trial and error.
  • Unhelpful error messages generated by the TLC model checker when something fails to compile. The error messages are vague and do not pinpoint a specific location where the error occurs. In addition, the TLA+ toolbox is a Java application, so the error messages are often Java exceptions. Figuring out what’s wrong with the TLA+ specification, given the Java exceptions, is difficult.
  • Limited documentation of formal verification methods. Googling a question specific to TLA+ yields very few results. As of [date], there were only 39 questions on Stack Overflow with “TLA+” as a tag.

Thanks

Working at Trail of Bits as an intern this summer was an amazing experience for me. I learned a lot about distributed systems and formal verification and greatly enjoyed the topics. I am glad to have experienced working in security research, and I am motivated to explore more when I go to college.

Watch Your Language: Our First Vyper Audit

A lot of companies are working on Ethereum smart contracts, yet writing secure contracts remains a difficult task. You still have to avoid common pitfalls, compiler issues, and constantly check your code for recently discovered risks. A recurrent source of vulnerabilities comes from the early state of the programming languages available. Most developers are using Solidity, which is infamous for its numerous unsafe behaviors. Now Vyper, a Python-like language, aims to provide a safer language. And since community interest in Vyper is growing, we had to review Vyper contracts on a recent audit with Computable.

Overall, Vyper is a promising language that:

  • Includes built-in security checks,
  • Increases code readability, and
  • Makes code review simpler.

However, Vyper’s age is showing; our review confirmed that this young language will benefit from more testing and tools. For instance, we found a bug in the compiler, which indicates a lack of in-depth testing. Also, Vyper does not yet benefit from the third-party tool integrations that Solidity does, but we’re on the case: We recently added Vyper support to crytic-compile, allowing Manticore and Echidna to work on the Vyper contracts, and the Slither integration is in progress. For now, you can check out the details of our Vyper audit and our recommendations below.

The Good

Integer checks are built-in

Vyper comes with built-in integer overflow checks, and will revert if one is detected. Since integer overflows are frequently at the root of vulnerabilities, overflow protection by default is definitely a good step towards safer contracts. And with this protection, you don’t need to use libraries like SafeMath anymore.

The main caveat here, though, is the higher gas cost. For example, the compiler will add two SLOAD for the following code:

Screen Shot 2019-10-23 at 5.07.03 PM

Figure 1: Integer overflow check.

screen-shot-2019-10-23-at-2.49.06-pm-e1571865015924.png

Figure 2: evm_cfg_builder result for the Figure 1 example.

Nevertheless, overflow protection by default is still the best strategy. In the future, Vyper could reduce the gas cost through optimizations (e.g., removing two SLOADs from the example above), or by adding unsafe types in the language for developers with specific needs.

Unsafe functionality is restricted

Vyper comes with a lot of restrictions compared to Solidity, including:

  • No inheritance
  • No recursive code
  • No infinite length loop
  • No dynamically sized array
  • No assembly code
  • Inability to import logic from another file
  • Inability to create one contract from another

Although these restrictions might seem excessive, most contracts can be implemented while still following these rules.

Solidity allows multiple inheritance, which is frequently overused by developers. We saw many codebases with an overly complex inheritance graph, which made the code review much harder than it should be. In fact, contracts are so difficult to track with multiple inheritance, we had to build a dedicated printer to output the inheritance graph in Slither. Preventing multiple inheritance will force developers to create better designs.

Solidity also allows assembly code, which is frequently used to compensate for inadequate compiler optimizations. When it’s impossible to write these optimizations at the developer level, there’s more pressure on the Vyper compiler team to write good compiler optimizations. This is not a bad thing—optimization should rely on the compiler, not the developers.

Overall, one-third of Slither’s detectors are not needed when using Vyper, thanks to Vyper’s many language restrictions. Vyper-specific detectors can be written, but the simplicity of the language tends to make it safer than Solidity by design.

The Not-So-Good

Vyper has not been tested or reviewed enough

Vyper’s Readme warns its users:

screen-shot-2019-10-23-at-2.52.11-pm

As a result, the compiler is likely to have bugs, and the language’s syntax and semantics might change. Vyper’s users must be careful, follow its development closely, and review the generated EVM bytecode.

For example, until 0.1.0b12, public functions were callable from the contract itself, which created a security risk due to the way Vyper handles msg.sender and msg.value. Since 0.1.0b12, all public functions are the equivalent of external functions in Solidity, removing the risk of this vulnerability.

The compiler bug we found shows that the compiler would benefit from more testing (see details below). It would not be a surprise to see previous solc bugs present in Vyper. For example, the following bugs were either recently fixed or are still present:

Some restrictions are cumbersome

While many of Vyper’s restrictions are good steps toward safer code, some may create problems.

For instance, the total absence of inheritance makes it more difficult to test the code. The creation of mock contracts, or the addition of properties for testing with Echidna, require copying and pasting the code—an error-prone process. Although multiple inheritance is frequently abused by developers, it won’t hurt to allow simple inheritance to facilitate testing.

Like the lack of inheritance, the absence of contract creation is also inconvenient— it increases the complexity of mock contracts, unit tests, and automated testing.

Finally, each contract has to be written in a separate file and import has a partial support. If contract A calls contract B, A needs to know B’s interface. It is then the developer’s responsibility to copy and paste the latest interface version. If B is updated, but its interface in A is not, A will be buggy and error-prone in handling the contract’s dependencies. To prevent these types of vulnerabilities, we built slither-dependencies, a tool that will check the correct interfaces in the codebase.

Our Solutions

Compiler bug: Function collision

Vyper follows the function dispatcher standard used by Solidity: To call a function, the first four bytes of the keccack of the function signature will be used as an identifier. A so-called dispatcher takes care to match the identifier with the correct code to execute. In Figure 3, the dispatcher checks for two different function id:

  • 0x0e8927fbc (pushed at 0x94): increase()
  • 0x61bc221a (pushed at 0xcb): counter()

screen-shot-2019-10-23-at-3.12.40-pm

Figure 3: Dispatcher of the example in Figure 1.

This strategy has a shortcoming: Four bytes is small, and collisions are possible. For example, both gsf() and tgeo() will lead to an id of 0x67e43e43. Figure 4 shows the dispatcher generated with vyper 0.1.0b10:

screen-shot-2019-10-23-at-3.16.54-pm

Figure 4: Function id collision.

As a result, calling tgeo() will execute gsf() code, and tgeo() will never be executable. This issue creates the perfect conditions for backdoored contracts. We reported this bug to the Vyper team and it was fixed in July. Their initial fix did not consider the corner case of a collision with the fallback function, but this is also properly fixed now.

Finally, we implemented a detector in Slither that will catch this bug. Use Slither if you are concerned about interacting with Vyper contracts.

Crytic tools integration

Vyper is now natively supported by most of our tools (including Manticore, Echidna and evm-cfg-builder) as of crytic-compile 0.1.3.

Manticore

Manticore is a symbolic execution framework that lets you prove assertions in your code. It works at the EVM level, which is necessary to avoid potential compiler bugs. For example, the following token has a bug that will give free tokens to anyone requesting fewer than 10 tokens:

Screen Shot 2019-10-23 at 6.24.48 PM

Figure 5: Vyper buggy token.

The following Manticore script will detect this issue:

Screen Shot 2019-10-23 at 6.26.41 PM

Figure 6: Manticore example working with Vyper.

The script will generate a transaction showing inputs leading to the bug:

Function call:
buy(1) -> STOP (*)

Developers can then integrate the script to their CI to detect the bug, and prove its absence once it has been fixed.

Check out Computable Manticore scripts for more examples.

Echidna

Echidna is a property-testing fuzzer: It tries different combinations of inputs until it succeeds in breaking a given property. Like Manticore, it works at the EVM level. In the following example, Echidna tries a combination of calls until the echidna_test function returns false:

screen-shot-2019-10-23-at-3.32.41-pm

Figure 7: Echidna example.

When running Echidna on the example in Figure 7, the result is:

screen-shot-2019-10-23-at-3.34.08-pm

Figure 8: Echidna running on Vyper code.

Similar to Manticore, Echidna can be integrated with CI to detect bugs during development. Keep an eye on crytic.io for an easy solution for using Echidna.

Slither

We are working to support Vyper in our static analyzer. Slither is already capable of:

  1. Detecting if code is vulnerable to the collision id compiler bug we discovered.
  2. Detecting if there is an incorrect external contract definition (via slither-dependencies).

Once the Vyper support is complete, Vyper contracts will benefit from our intermediate representation (SlithIR), and have access to all the vulnerability detectors and code analyses already present in our framework.

Conclusion

Vyper is a good step towards a better smart contract language. We loved its simplicity and its focus on security. However, the language is a bit too young to recommend for production. If you want to use Vyper, we highly recommend using Manticore and Echidna to check the EVM code, and to follow along Slither’s development.

Already loving Vyper and want to secure your code? Contact us!

Multi-Party Computation on Machine Learning

During my internship this summer, I built a multi-party computation (MPC) tool that implements a 3-party computation protocol for perceptron and support vector machine (SVM) algorithms.

MPC enables multiple parties to perform analyses on private datasets without sharing them with each other. I developed a technique that lets three parties obtain the results of machine learning across non-public datasets. It is now possible to perform data analytics on private datasets that was previously impossible due to data privacy constraints.

MPC Primer

For MPC protocols, a group of parties, each with their own set of secret data, xi, share an input function, f, and each is able to obtain the output of f(x1,…,xn) without learning the private data of other parties.

One of the first demonstrations of MPC was Yao’s Millionaire Problem, where two people want to determine who has more money without revealing the specific amounts they each have. Each person has a private bank account balance, xi, and they want to compute f(x1,x2) = x1 > x2. Yao demonstrated how to securely compute this, along with any other function, using garbled circuits.

Each person may infer some information about the other’s data via the MPC function’s output. The leaked information is limited to whether the other party’s amount is greater or less than their own; the exact amount is never disclosed.

Boolean and arithmetic circuits are used by MPC protocols to compute arbitrary functions (see Figure 1). Any computable function can be represented as a combination of Boolean (AND/OR) and/or arithmetic (ADD/MULT) gates. Thus, a secure MPC protocol is established through a series of secure MPC addition and multiplication protocols. In practice, we want to convert a function into a circuit and then run MPC protocols on that circuit.

MPC protocols must also operate in a finite field, so many MPC tools are designed to operate over the integers modulo a prime p (a finite field). This makes machine learning applications difficult and, at times, incompatible, because they require fixed-point arithmetic. Almost all of these tools require a synthesis tool to convert a program (typically a C program) into its corresponding arithmetic circuit representation.

Here are several challenges that make MPC machine learning hard.

  • Fixed-point arithmetic and truncation protocols are not well-suited for the finite fields that MPC protocols operate in.
  • The circuit size can be as large as the input data. When functions are evaluated as circuits, all input values must be loaded into the circuit. The circuit will have input gates for every input value, which makes the input layer of the circuit as large as the dataset.
  • Storing the circuit and intermediate values requires a significant amount of memory.
  • Machine learning introduces loops whose exit conditions depend on intermediate values; however, each party must share its circuit with the other parties at the start of the protocol. They can’t depend on intermediate values.

The goal of my project was to create a novel MPC tool that can train machine learning models from secret data held by multiple parties. We wanted to implement a more-than-two-party computation protocol that included fixed-point arithmetic for arbitrary precision for machine learning. We also wanted to provide an alternative to circuit synthesis as a way of performing MPC with complex functions.

Figure 1: (left) A simple arithmetic circuit. (middle) A simple boolean circuit. (right) The roadmap for performing multi-party computation.

Secure MPC Protocols

For three parties, each party will split their data into “secret shares” and send them to the others. Secret shares must only reveal secret value information when they are all combined, otherwise all secret values must remain hidden. This property is formally defined as privacy, and is illustrated by the following:

Let n=3 parties. The parties agree to use the integers mod p, Zp, with p=43, a random small prime. Party 1 has x1=11 and wants to create secret shares for this value. Party 1 will then generate two random values in Zp, say r2=21 and r3 = 36, and send r2 to Party 2 and r3 to Party 3. Each party then stores the following values as their secret shares:

share1 = (x1r2r3) mod 43 = 40 mod 43

share2 = r2 = 21 mod 43  

share3 = r3 = 36 mod 43

Notice that any one or combination of two secret shares reveal information about x1. This is where the strength of secret sharing lies.

After creating secret shares for all their private data, the parties can begin to perform secure addition or multiplication on their inputs. The protocols for both operations depend on the method of secret sharing used. I will demonstrate how we can add and multiply for the specific secret sharing scheme used above.

Let’s build the secure addition and multiplication primitives for MPC.

Addition

Consider the case of n=3, using field Zp and p=43. Here, we will take the three input values to be x1 = 11, x2 = 12, x3 = 13. Each party generates and distributes random values to obtain the following secret shares. Again, notice that:

(share1,i + share2,i + share3,i) mod 43 = xi mod 43:

Party 1 shares:  share1,1 = 40   share1,2 = 13   share1,3 = 8

Party 2 shares:  share2,1 = 21   share2,2 = 2 share2,3 = 17

Party 3 shares:  share3,1 = 36   share3,2 = 40   share3,3 = 31

To securely add the values x1, x2, and x3, the three parties simply add their shares of each of those values.

Party 1: share+,1 = share1,1 + share1,2 + share1,3 = 40 + 13 + 8 = 18 (mod 43)

Party 2: share+,2 = share2,1 + share2,2 + share2,3 = 21 + 2 + 17 = 40 (mod 43)

Party 3: share+,3 = share3,1 + share3,2 + share3,3 = 36 + 40 + 31 = 21 (mod 43)

That’s it! Now, each party has a secret share for the value:

x1 + x2 + x3 = 11 + 12 + 13 = (18 + 40 + 21) mod 43 = 36.

Scalar Addition and Multiplication

Scalar addition and multiplication refer to adding/multiplying a secret value by a public value. Suppose each party has secret shares of some value z and they want to obtain shares of say 5 + z or 5z. This actually turns out to be easy as well. To perform addition, Party 1 adds 5 to their share and the rest of the parties do nothing, and now they all have secret shares of 5 + z. To perform multiplication, all of the parties simply multiply their shares of z by 5, and they have all obtained secret shares of 5z. You can easily check that these hold in general for any public integer.

Multiplication

Secure multiplication is more complicated than addition, because all of the parties must interact. This slows down performance when computing non-trivial functions that require several multiplications, especially if the parties are operating on a high-latency network. Therefore, the goal of most of these schemes is to minimize the amount of communication required to securely multiply.

Using Beaver Triples is a well-known method for multiplication that breaks the operation into a series of secure addition, scalar addition, and scalar multiplication. In more concrete terms, this method takes advantage of the following property:

Secret shares of Beaver Triples are created along with secret shares of the input values. Beaver Triples are two random values, a and b, along with their product, ab = c. Secret shares for each of these values are sent to the other parties, so each party has secret shares for x1, x2, a, b, and c.

To compute x1x2, every party first computes shares of (x1 – a) and (x2 – b) to be broadcast to the other parties. The share of (x1 – a) are obtained by scalar multiplying by -1 and adding the result to x1. The same is done for (x2 – b). Once those shares are computed and broadcast,  everyone can now compute yz. In the final step, parties obtain shares for yb and az through scalar multiplication, leaving everyone with the value yz and the secret shares for yb, az, and c.

Secure addition is used on the shares for yb, az, and c, in order to obtain secret shares of (yb + az + c). Then, they perform a scalar addition with yz and their shares of (yb + az + c). Now, the shares for (yz + yb + az + c) = x1 ✕ x2 are obtained, and the protocol is completed.

Optimization

A major bottleneck in MPC is the communication between parties due to multiplication, which cannot typically be done in parallel. To optimize, the choices are to either reduce the communication exchanges and/or reduce the number of multiplications needed for a function. My work focused on the latter.

Integer comparison is difficult for arithmetic circuits, and if the transformation from function to circuit isn’t clever enough, comparisons may be very inefficient. But in fact, all arithmetic circuits can be represented by a polynomial in the input values.

Let’s consider the polynomial representing f(x1,x2) = x1 < x2, where the number of zeros is half the input space and proportional to the field size. Since this function is not constant zero, the polynomial degree is proportional to the number of zeroes. So the arithmetic circuit that is computing this has as many multiplication gates as the size of the field. Incredibly inefficient!But we can actually obtain an exponentially smaller arithmetic circuit by breaking everything down into building blocks using the methodologies from Octavian Catrina and Sebastiaan de Hoogh’s paper. The major building block needed is an efficient protocol for truncation, and they develop this via a modular remainder protocol that is built from other sub-protocols. What’s important is that instead of circuits needing to be proportional to the field itself, they allow us to use circuits proportional to the bit-size of the field, making them exponentially smaller.

Results

I chose to implement the MPC protocol from Araki, et. al., including their methods for secret sharing, addition, and multiplication. I also developed the specialized secure comparison and truncation protocols, as well as a dot product protocol. Combining complex protocols with addition and multiplication is known as the hybrid model of secure computation. This hybrid model gives us really compact, intuitive circuits that can even be written by hand! (see figures 2 and 3)

I applied this MPC protocol to both perceptron and SVM algorithms, which requires converting them into the building block operations. The SVM algorithm can be manually synthesized into a compact, hybrid circuit using complex building blocks of comparison and dot product operations. Specifically, one iteration of each algorithm only requires around 15 gates. For comparison, synthesizing AES using a synthesis tool will result in thousands of addition and multiplication gates.

Figure 2: (left) Pseudocode for the perceptron algorithm. (right) Hybrid circuit representing one iteration of the perceptron algorithm.

To avoid massive circuits, proportional to the dataset, I only synthesized circuits for one iteration of each protocol. This avoided having to deal with loops and yielded a very compact circuit that can be reused for any dataset. Lastly, using one iteration gives flexibility on how to handle the convergence of our protocol. This approach provides three options:

  • the parties agree on a fixed number of iterations beforehand,
  • the parties reveal an epsilon value publicly, or
  • the parties compute a convergence check according to another MPC.

Figure 3: (left) Pseudocode for the support vector machine (using partial gradient descent). (right) Hybrid circuit representing one iteration of a support vector machine.

Once the hybrid circuits were manually synthesized, I performed the algorithms on test datasets and achieved arbitrary fixed-point precision. The classification accuracy was also the same as the raw algorithms. Therefore, this tool can be used to train secret data using either perceptron or SVMs with arbitrary precision.

Concluding Thoughts

I spent the summer tackling several difficulties related to MPC with machine learning. By the end of the summer, I had an efficient solution to each of these problems and was able to run two different machine learning algorithms securely across three parties. I enjoyed working on this project. Before interning at Trail of Bits, I had just completed the second year of my Ph.D. I initially thought the transition from school to industry would be drastic. But I quickly noticed that in this internship, my project would be very similar to Ph.D. research, which is one of the many things that makes Trail of Bits such a great place to work.

TSC Frequency For All: Better Profiling and Benchmarking

Have you ever tried using LLVM’s X-Ray profiling tools to make some flame graphs, but gotten obscure errors like:

==65892==Unable to determine CPU frequency for TSC accounting.

==65892==Unable to determine CPU frequency.

Or worse, have you profiled every function in an application, only to find the sum of all function runtimes accounted for ~15 minutes of a 20-minute run? Where did those five minutes go!?

Well, we’ve run into both situations, so we built a solution in the form of a Linux kernel module called tsc_freq_khz. We’ll take a quick but deep dive into x86 timers to explain the problem, but first, let’s get to the goods. The tsc_freq_khz module enhances the performance of profiling and benchmarking tools like X-Ray in virtualized environments. No more “Unable to determine CPU frequency” errors!

The tsc_freq_khz module also makes these tools more accurate on newer Intel processors. X-Ray uses the x86 Time Stamp Counter (TSC), a low-latency monotonically increasing clock, to measure event duration, and assumes that TSC frequency is equivalent to maximum clock speed. This assumption is wrong on newer CPUs, and leads to improper time measurement. The tsc_freq_khz module provides a way to read the actual TSC frequency. No more missing minutes in your profiling data!

The tsc_freq_khz module works by exporting the Linux kernel’s internal value of the x86 Time Stamp Counter (TSC) frequency, tsc_khz, to userspace via sysfs. Programs can query this value by reading /sys/devices/system/cpu/cpu0/tsc_freq_khz.

Several open-source projects, like X-Ray and Abseil, already check for the presence of tsc_freq_khz, but until now, that file was only present on Google’s production kernels. The tsc_freq_khz module enables TSC frequency export to userspace for everyone else.

The trouble with timestamps

Before we explain what we did and why this works, let’s do a quick introduction to timestamps on the x86 architecture.

An x86 machine has at least six different ways to measure time:

Each method has unique and subtle flaws that make it completely unworkable for certain applications. This cornucopia of timers is what happens when you maintain 30 years of backwards compatibility and never remove a feature because someone, somewhere, is depending on it.

Despite its many flaws, the TSC is useful for benchmarking and profiling. It has extremely low latency, because the circuitry is directly on the CPU, and it is accessible directly from user-mode applications. Very useful profiling tools, like X-Ray, rely on the TSC for accurate measurements.

However, TSC measures time in ticks, which are not comparable across different processors and are therefore largely meaningless. For profiling and benchmarking, we want to measure time in comparable units, like nanoseconds.

From ticks to nanoseconds

So how many nanoseconds are in a tick? The basic formula for converting some duration of ticks to nanoseconds is simple:

time_in_nanoseconds = (tsc_count_end - tsc_count_start) * tsc_frequency

Unfortunately, determining TSC frequency is difficult, and in some cases, e.g., cloud-based or other virtualized environments, impossible. Until now.

The core issue is that the Linux kernel does not provide a way for applications to know the TSC frequency, although the Linux perf utility does attempt to calculate TSC frequency via Intel PT. There are reasonable arguments for not exposing the value directly, because TSC frequency is fairly obscure, and there are cases where the value is completely meaningless. Until recently, the processor’s maximum clock speed was also an accurate approximation of TSC frequency.

However, this is no longer true. Using maximum clock speed as the TSC frequency will give the wrong results on newer Intel CPUs. This is the cause of the missing minutes when profiling.

Additionally, the maximum clock speed, accessible in Linux via /sys/devices/system/cpu/cpu0/cpufreq/cpuinfo_max_freq, is not available in cloud-based or other virtualized environments. This is the cause of “Unable to determine CPU Frequency” errors.

cpuinfo_max_freq is populated by the cpufreq driver used for frequency scaling—that is, making the CPU run faster or slower depending on power-saving settings. Naturally, frequency scaling is not typically permitted in virtualized environments, because each physical CPU is shared with multiple virtual tenants. Hence, this value is not present.

A hint from Google

Timestamp measurement code in X-Ray refers to a mystery sysfs entry named /sys/devices/system/cpu/cpu0/tsc_freq_khz, which sounds like it provides exactly what we need: TSC frequency in kilohertz. Unfortunately, there are absolutely no references to that file in the Linux kernel source code. What’s going on?

More searching reveals the following hint in the Abseil source code:

  // Google's production kernel has a patch to export the TSC
  // frequency through sysfs. If the kernel is exporting the TSC
  // frequency use that. There are issues where cpuinfo_max_freq
  // cannot be relied on because the BIOS may be exporting an invalid
  // p-state (on x86) or p-states may be used to put the processor in
  // a new mode (turbo mode). Essentially, those frequencies cannot
  // always be relied upon. The same reasons apply to /proc/cpuinfo as
  // well.
  if (ReadLongFromFile("/sys/devices/system/cpu/cpu0/tsc_freq_khz", &freq)) {
    return freq * 1e3;  // Value is kHz.
  }

Jackpot! The comment tells us that:

  • Google runs a custom Linux kernel in production.
  • Google is aware that cpuinfo_max_freq should not be used for benchmarking.
  • The kernel’s calculation of TSC frequency is sane, and
  • Google’s internal kernels have a patch to export TSC frequency via sysfs, but the patch has not been upstreamed to the main kernel tree.

TSC frequency For all!

Why should only Google kernels have access to the TSC frequency? Since the Linux kernel is open source, we can write our own kernel module that does the same thing. Fortunately, the Linux kernel already computes the TSC frequency during boot and stores it in the tsc_khz variable. All we had to do was export the variable via sysfs.

Our kernel module, tsc_freq_khz, does exactly this. It creates a sysfs entry that reads the tsc_khz variable defined by the kernel and exports it via /sys/devices/system/cpu/cpu0/tsc_freq_khz. The module is extremely simple but also quite useful.

Follow the build instructions on Github and test the module by inserting it into your kernel:

$ sudo insmod ./tsc_freq_khz.ko
$ dmesg | grep tsc_freq_khz
[14045.345025] tsc_freq_khz: starting driver
[14045.345026] tsc_freq_khz: registering with sysfs
[14045.345028] tsc_freq_khz: successfully registered

The file at /sys/devices/system/cpu/cpu0/tsc_freq_khz should now be populated. (The values on your system will be different.)

$ cat /sys/devices/system/cpu/cpu0/tsc_freq_khz
2712020

Warning: Please do not use this code in a real production system. While it shouldn’t cause problems, it does make some assumptions, e.g., that CPU0 exists. There are also no sanity checks to warn if the TSC value is nonsensical or unreliable.

Conclusion

Seemingly simple problems like this one can lead us down a fascinating rabbit hole. In time, Google’s internal patches may make it into the Linux kernel, or kernel developers may create their own official patch to export TSC frequency to userspace. Meanwhile, tsc_freq_khz can be a stopgap for those who want accurate measurements with LLVM’s excellent X-Ray profiling tools.

Enjoyed reading this? Well, we thrive on interesting issues and challenging problems. We’d love to work with you to solve your security or software engineering challenges — please contact us.

Tethered jailbreaks are back

Earlier today, a new iPhone Boot ROM exploit, checkm8 (or Apollo or Moonshine), was published on GitHub by axi0mX, affecting the iPhone 4S through the iPhone X. The vulnerability was patched in devices with A12 and A13 CPUs. As of this writing, the iPhone XS, XS Max, XR, 11, 11 Pro and 11 Pro Max are all safe from this exploit.

We strongly urge all journalists, activists, and politicians to upgrade to an iPhone that was released in the past two years with an A12 or higher CPU. All other devices, including models that are still sold — like the iPhone 8, are vulnerable to this exploit. Regardless of your device, we also recommend an alphanumeric passcode, rather than a 6-digit numeric passcode. A strong alphanumeric passcode will protect the data on your phone from this and similar attacks.

It’s been a long time since the release of a public Boot ROM exploit. The last one was discovered in 2010 by George Hotz (geohot), and it affected the iPhone 3GS and iPhone 4.

What was released

checkm8 exploits the Boot ROM to allow anyone with physical control of a phone to run arbitrary code. The Boot ROM, also called the Secure ROM, is the first code that executes when an iPhone is powered on and cannot be changed, because it’s “burned in” to the iPhone’s hardware. The Boot ROM initializes the system and eventually passes control to the kernel. It’s the root of trust for the trusted boot chain of iOS and verifies the integrity of the next stage of the boot process before passing execution control.

Screen Shot 2019-09-27 at 12.26.05 PM

The Boot ROM is the powerhouse of the cell root of trust of the secure boot chain (source: Apple iOS Security Guide)

checkm8 does not include code required to boot a jailbroken kernel, but it does include the first steps. Likely, the community will soon release a full-tethered jailbreak that will slowly evolve to support all devices and versions.

The exploit also includes the ability to enable debugging features like JTAG on the iPhone CPU—a big win for security researchers and jailbreakers. Apple refers to this as “demoting” the phone in their 2016 BlackHat presentation. This is probably not what Apple intended when they said they were releasing “research devices.”

Impact and use cases

The arbitrary (i.e., not signed by Apple) code that can be executed during the boot process can be used to boot already jailbroken kernels, as was done in ~2011 with redsn0w. This early access also provides access to the AES engine, enabling decryption with the GID key of Apple-encrypted firmware like iBoot.

In 2013, we used the previous Boot ROM exploit to get access to user data by brute-forcing passcodes. In the years since, Apple introduced the Secure Enclave (SEP), a separate processor that manages encryption keys for user data. The SEP has been hardened against passcode brute-forcing using replay counters and backoff timers. It’s currently unclear how much access this new exploit provides to the SEP, so we can’t accurately judge the impact to device privacy. In Apple’s 2016 presentation, they point out that demoting the device forces the SEP to change the UID key, which is entangled with the passcode. Changing the UID key has the effect of permanently protecting user data on the device by making it impossible to decrypt.

When designing the SEP, Apple’s threat model included “adversarial” situations such as another Boot ROM exploit.

When designing the SEP, Apple’s threat model included “adversarial” situations such as another Boot ROM exploit.

It would not be surprising if the vulnerability exploited by checkm8 has been used by Cellebrite for forensic analysis, but they would need another trick or exploit to undermine the SEP’s protections.

Future

The vulnerability has been fixed on new hardware (iPhone XS, iPhone XR, iPhone 11), but Apple still sells hardware vulnerable to this exploit (iPhone 8 and also some iPads and iPods). They would need to create new CPU masks and release updated versions of these devices to secure them. This is unlikely, since Apple did not release a patched iPhone 4 when the previous Boot ROM exploit was released.

It is likely we will see one large community project that will apply common patches and install Cydia and other alternate App Stores. This is a boon to researchers but also to pirates.

Advice for at-risk users

Upgrade to an iPhone that was released in the past two years with an A12 or higher CPU. All other devices are vulnerable to this exploit. After upgrading, wipe your previous phone by selecting “Erase All Content and Settings” from Settings > General > Reset. Devices patched for this issue include:

  • iPhone 11, 11 Pro, 11 Pro Max
  • iPhone XS, XS Max
  • iPhone XR

Set an alphanumeric passcode, rather than a 6-digit numeric passcode. Even with this exploit, an attacker must brute-force your password to gain access to your data. A strong alphanumeric passcode will protect your data against these attacks. To configure an alphanumeric passcode:

  1. On iPhone X and later, go to Settings > Face ID & Passcode.
    On earlier iPhone models, go to Touch ID & Passcode. On devices without Touch ID, go to Settings > Passcode.
  2. Tap Change passcode.
  3. Enter your current passcode.
  4. When prompted to enter a new passcode, tap Passcode Options and select “Custom Alphanumeric Code”

Detecting these jailbreaks

It will still be possible to detect jailbroken phones in the common case, but it is not possible to detect if an iPhone has been exploited with checkm8. Jailbreak detection will have to continue to rely on identifying side-effects of the exploitation. iVerify Core continues to offer a comprehensive jailbreak detection suite, and we will continue to update it as new jailbreaks are released. Contact us to learn more about iVerify Core.

QueryCon 2019: A Turning Point for osquery

Has it really been 3 months since Trail of Bits hosted QueryCon? We’ve had such a busy and productive summer that we nearly forgot to go back and reflect on the success of this event!

On June 20-21, Trail of Bits partnered with Kolide and Carbon Back to host the 2nd annual QueryCon, at the Convene Old Slip Convention Center in downtown New York. We beat last year’s attendance with 150 attendees from around the globe. The 14 speakers presented talks on osquery ranging from technical presentations on Linux security event monitoring to discussions of end-user research. We saw familiar faces from last year’s event in San Francisco, and we met many new teams interested in osquery.

_OSS4146

Tania McCormack of Carbon Black presented her user research on introducing osquery to new audiences.

Last year’s inaugural QueryCon brought us all together in person for the first time. QueryCon 2019 strengthened our sense of community and proved a catalyst for positive change: Our productive collaboration generated community-based and technical changes that have put this project back on track.

A new foundation

On June 18th, the day before QueryCon, the Linux Foundation officially announced that they would be taking over ownership of osquery from Facebook. Under the Linux Foundation, the new osquery Foundation will be directed by a Technical Steering Committee (TSC) consisting of engineers and developers from Facebook, Trail of Bits, Google, and Kolide—companies that are using osquery and have committed to supporting the project. The TSC members are:

  • Teddy Reed (Facebook)
  • Alessandro Gario (Trail of Bits)
  • Zachary Wasserman (independent consultant)
  • Victor Vrantchan (from Google, but working independently)
  • Joseph Sokol-Margolis (Kolide)

This change was exciting news to a growing list of companies who rely on osquery for endpoint protection. As we reported in April, osquery outgrew its original home as a Facebook project, and its community’s expectations and needs now exceed what Facebook can be expected to manage on its own. A new community-based governance model was needed, and conference attendees were eager to discuss the change. We hosted a panel discussion with Facebook’s lead osquery maintainer, Teddy Reed, and representatives from the new osquery TSC.

 

 

How the foundation will work

The Linux Foundation functions as a steward for osquery, providing various funding and management platforms. (Learn more about their stewardship model here.) The new osquery TSC will guide and maintain the project with the help of contributions from the greater community, and Trail of Bits will commit to biweekly office hours for public comment and transparent project governance.

Meanwhile, Facebook will turn over credentials and control of funding, infrastructure, hosting, and engineer review to a new committee of maintainers (of which Facebook will remain a member). The organizations on the TSC are contributing significant engineering time to establish build and release processes, and a forthcoming funding platform on CommunityBridge will allow sponsorship.

Technical decisions

The TSC has a significant backlog of contributions to work through, but we’re already seeing a massive acceleration of activity on the project.

First, osquery core will be updated to feature parity with osql, the community- oriented osquery fork by Trail of Bits. The initial goal is a monthly release, with alternating “developer” and “stable” releases. Another big priority is to merge all major independent efforts and private forks into a single canonical osquery that everyone can benefit from.
Once Trail of Bits resolves the technical debt that has accrued on the project—build toolchains, dependency management, CI systems—it will maintain these components and focus on client-driven engineering requests for osquery. Other stakeholders are also contributing a backlog of Pull Requests, which will be prioritized and merged as soon as possible.

A proliferation of committed PRs

One way to track the health and activity of a project on GitHub is by Pull Requests. Over nine months, from September 2018 to the day before QueryCon, there were roughly 35 PRs merged to the osquery project, with only a few from the community outside Facebook. In just the 12 weeks since QueryCon, nearly 90 PRs were successfully merged (representing about 113 commits). More importantly, the majority of those contributions were from outside Facebook.

Trail of Bits alone is responsible for approximately 44 of the PRs merged this summer.

Some highlights from our recent contributions:

  • #5604 and #5610: The new osquery Foundation version of osquery was kicked off by merging the Facebook and Trail of Bits versions of osquery. This meant we restored CMake build support and set up the public CI, which were key improvements brought over from the osql fork.
  • #5706: We refactored the build scripts so that all of osquery’s third-party library dependencies will build from source on Linux. This absolves the need for the project to host and distribute the correct versions of pre-built binaries for these libraries (a job that previously relied on Facebook); improves compatibility across different varieties of Linux; is a prerequisite for our goals of reproducible builds and offline builds; and, finally, avoids incompatibilities arising from system-installed versions of these dependencies.
  • #5696, #5697, and #5640: We fixed and vastly improved the table for querying the certificates stores on Windows. It is now possible to use osquery to list all per-user certificates, whether they are in a registry hive or on the filesystem, and whether or not those users are even logged in at the time of the query. Searching your fleet for anomalous certificates is an important security monitoring and incident response capability, as it can be an indicator of compromise.
  • #5665: We fixed several bugs that we found with the use of dynamic analysis (fuzzing and Clang’s Address Sanitizer). Soon, we plan to incorporate both static and dynamic analysis passes into the CI system, for proactive detection of code vulnerabilities in osquery. This is a best practice that Trail of Bits recommends for all of our clients, and we’re happy to contribute it to the security of osquery.

A new stable release

DSC_6478

During a community workshop at the end of the conference, osquery users and TSC members discussed the best path to the next stable release.

Prior to QueryCon 2019, the most recent major cross-platform release was August 2018. Seven days after the conference, Trail of Bits’ Alessandro Gario provided a pre-release of the new version of osquery. For the past nine months Facebook had refactored osquery around Buck, a build system created and used by Facebook that had long been problematic for the greater community. Our pre-release restored CMake support, CI and packaging, and a few fixes not related to the build system.

Now the first full stable release of osquery is out! It’s a significant effort to improve the build system for the future of osquery, ensuring that:

  • Building osquery from source will no longer rely on Facebook to maintain and host the downloads for pre-built dependencies
  • The osquery project once again has a public-facing Continuous Integration server, automatically testing all proposed contributions to detect any developer mistakes or regressions
  • All contributors can use their preferred build tools: developers inside Facebook can use their build tool, Buck, and developers in the greater community can use the standard build tool, CMake
  • An all-new custom build toolchain for Linux will enable broader Linux support, and eventually reproducible builds

New features for osquery users:

  • The process_events table detects more kinds of events, on Linux
  • More powerful query syntax: osquery now supports a regex_match function to allow searches over a particular column of a given table
  • Initial support for eBPF-based event tracing, on Linux
  • New macOS tables for detecting T1/T2 chips, querying the list of running apps, and listing the installed Atom packages on macOS or Linux
  • The certificates, logged_in_users, and logical_drives tables on Windows are all greatly improved
  • Initial implementation of a new high-performance eventing framework that will enable more types of event-based monitoringImproved ability to profile and benchmark tables’ performance
  • New detections added to the macOS query pack

But wait, there’s more! Dozens of bugs have been squashed, additional security hardening mitigations have been turned on, certain performance cases have been improved and resource leaks plugged, the documentation has been updated…we could go on and on. For a full list of changes in this release, refer to the comprehensive change notes.

QueryCon and beyond

DSC_5954

The hosts of the QueryCon 2019 posed for a team group shot!

We had so much fun hosting QueryCon this year and we want to thank everyone who attended. This event was a catalyst for positive change in our community thanks to the thoughts, discussions, and passion of this year’s attendees. We can’t wait to see how osquery improves now that its development has been unlocked.

What’s next for osquery? We want you to tell us! If you’re using osquery in your organization, let’s talk about what features and fixes should be next. Thanks to a revolutionary meeting of the minds, we now have the power to make it happen.

Crypto 2019 Takeaways

This year’s IACR Crypto conference was an excellent blend of far-out theory and down-to-earth pragmatism. A major theme throughout the conference was the huge importance of getting basic cryptographic primitives right. Systems ranging from TLS servers and bitcoin wallets to state-of-the-art secure multiparty computation protocols were broken when one small sub-component was either chosen poorly or misused. People need to stop using RSA, drop AES-CBC, and make sure they’re generating randomness in a cryptographically secure way.

It wasn’t all attacks and bad news, though. The ascendance of cryptographic tools for privacy-preserving computation continues apace. Zero-knowledge proofs, secure multiparty computation, and secure messaging systems all had major breakthroughs this year. These areas of cryptography have become efficient enough to transcend purely theoretical interest: Companies large and small, from niche blockchain startups to tech giants like Facebook, are deploying cutting-edge cryptographic protocols. Even the city of Boston has used secure multiparty computation in a study on pay equity.

All of the papers presented at Crypto were remarkable and groundbreaking. From this impressive pool, we’ve highlighted the papers we believe will have a substantial impact outside of academia in the near future.

Attacks

Breaking OCB2

The Best Paper award went to the Cryptanalysis of OCB2: Attacks on Authenticity and Confidentiality. This amalgamation of three papers released last fall demonstrates attacks against OCB2, an ISO-standard authenticated encryption scheme, for forging signatures on arbitrary messages and full plaintext recovery. This result is especially shocking because all three versions of OCB have been standardized, and were thought to have airtight security proofs. Fortunately, OCB1 and OCB3 are not vulnerable to this attack because it relies on details specific to how OCB2 applies the XEX* mode of operation.

ECDSA Nonce Bias and Reuse

A well-known weakness in the ECDSA signature scheme is that nonces must be generated uniformly at random; otherwise, an attacker can recover the signer’s private key. In Biased Nonce Sense, Breitner and Heninger demonstrate the real-world practicality of these attacks by scraping the Bitcoin and Ethereum blockchains in search of either duplicate or slightly biased nonces. These attacks allowed them to recover the private keys for over 300 Bitcoin accounts and several dozen Ethereum accounts. In most instances, these accounts had a nonzero balance, which indicates that active users of these blockchains are using libraries with bad nonce generation. This result confirms the need to move towards deterministic nonce generation in ECDSA, as specified in RFC6979—or, even better, to stop using ECDSA entirely and use Ed25519 instead.

Automating TLS Padding Oracle Attack Discovery and Implementation

Developing tools for vulnerability detection proved to be a popular theme this year. One group of researchers developed a tool that automatically scans websites for CBC padding oracle vulnerabilities in their TLS protocol. They found that roughly 1.83% of the Alexa Top Million websites were vulnerable. Matthew Green’s team at Johns Hopkins used SAT and SMT solvers to automate the development of new padding oracle attacks (they actually consider a broader class of attacks called format oracles), which eliminates the laborious task of discovering and writing such attacks by hand. Their tool was able to rediscover common examples of such attacks, including Bleichenbacher’s attack on PKCS#1 v1.5.

Secure Computation

Efficient Zero-Knowledge Proofs

This year was huge for pushing zero-knowledge proofs further into the realm of practicality. Among the most impressive results was the development of the Libra scheme (no relation to the Facebook cryptocurrency). Libra is notable for several reasons. First, it has a pre-processing phase that is only linear in the witness size, not linear in the statement size like SNARKs. Second, it has a prover that runs in linear time with respect to the computation being run in zero-knowledge.

Comparison of zero-knowledge protocols

Only Bulletproofs achieve the same asymptotic efficiency, although they run much slower in practice because they require the prover to perform many expensive cryptographic operations. On the other hand, Bulletproofs have no trusted setup phase and make somewhat more standard cryptographic assumptions. The table above, taken from the Libra paper, shows a comprehensive overview of the most performant zero-knowledge proofs.

Breaking Secure Multiparty Computation

Over in the secure multiparty computation world, Jonathan Katz delivered a keynote talk about a devastating class of vulnerabilities that affects nearly all MPC implementations. The fundamental issue is that these protocols are extremely complex and often leave low-level details up to the implementer. Since MPC protocols are extremely resource intensive, practical implementations often apply optimizations in a haphazard way.

Here’s some background: MPC protocols require the computation of many hash functions, and even relatively simple functions like RSA encryption require the computation of tens of millions of hashes in this context. While we often think of SHA3 as rather fast, in extreme settings like this it’s actually quite slow and is one of the main bottlenecks of the protocol. This led to researchers using fixed-key AES instead, since it’s roughly 50 times faster to compute than SHA3. Originally, this optimization was placed on firm cryptographic ground in the JustGarble system. However, the security proof does not extend to many modern systems. In fact, Katz et al showed that using fixed-key AES in most widespread protocols completely undermines the privacy guarantees of MPC. However, they also showed that a simple modification to fixed-key AES was secure and equally performant.

This attack highlights the recklessness of rushing to deploy cutting-edge cryptography. These protocols are often extremely slow and complex, and few people understand the subtle details of the security proof. More work must be done to quantify the concrete security of these protocols as they are actually instantiated, not just asymptotically using idealized functionalities.

Content Moderation and Signatures

Metadata-Private Message Franking

A fundamental problem in end-to-end messaging systems is how content moderation should be handled. Facebook has been particularly concerned with this issue due to the widespread use of WhatsApp and Messenger, so they developed something called a message franking system. This system allows users to report abusive content without allowing Facebook to see the content of all users’ messages. Message franking provides the following security guarantees:

  • Message privacy: Platform should only learn messages that are reported
  • Accountability: Moderator should always be able to verify that the alleged sender actually sent the reported message
  • Deniability: Only the moderator should be able to verify the reported message

Unfortunately, prior work on message franking is not metadata private—the moderator is able to see the sender and recipient of every message, even those that aren’t reported. This has been remedied in a new scheme that extends the functionality of message franking to private end-to-end encryption protocols using zero-knowledge proofs. Unlike the zero-knowledge proofs discussed above, the ones used in this franking scheme are extremely efficient and produce signatures that are only around 500 bytes.

Ring Signatures With Deniability

Another popular signature-type primitive that has been especially useful in blockchain systems is the ring signature. Ring signatures allow one member of a ring (i.e., a designated group of people) to anonymously sign messages on behalf of the entire group. For example, a ring could be a workers’ union, and a ring signature would allow someone from the union to anonymously make a complaint while verifying that they’re actually part of the union. It would be beneficial in some ring signature use cases if individuals could claim responsibility for signing the message. Conversely, it may also be useful for members of a ring to prove that they did not sign a given message. Park and Sealfon formalized these security properties and developed new ring signature constructions that satisfy them.

Post-Quantum Cryptography

As the NIST post-quantum cryptography standardization effort enters its second phase, it’s essential to understand the concrete security of proposed cryptosystems. Quantum cryptanalysis in the RAM model: Claw-finding attacks on SIKE, one of the two papers chosen for the Best Young Researcher award, develops a new model for thinking about post-quantum security and applies that model to the SIKE cryptosystem. SIKE is a key encapsulation mechanism that uses supersingular isogenies to achieve post-quantum security.

One big takeaway of the paper is that SIKE is more secure than previously thought. But the paper’s formulation of a new way to quantify post-quantum security may have a more enduring impact. This method involves thinking about quantum computers as a classical RAM machine controlling arrays of bits and qubits, then using this model to reason about time/memory tradeoffs in different attack strategies. As a result, the researchers determined that attacks previously thought to be especially potent against SIKE would require so much classical RAM that it would be more efficient to simply run the best known classical attack.

Looking Forward

Navigating the complex landscape of cryptographic protocols and parameter choices continues to be a key point of difficulty for developers. We need to agree as a community that developers should not be responsible for security-critical configuration choices, and move towards building libraries that are misuse resistant, such as Libsodium and Tink. The use of these libraries alone would have prevented many of the attacks on deployed systems we saw this year.

However, we realize that not all systems can realistically support a complete cryptography overhaul, and some are often stuck using a specific library or primitive for legacy reasons. While we saw lots of activity this year around automating vulnerability detection and exploit writing, we’d like to see the community emphasize bug-fixing tools as well. For example, we recently released a tool called Fennec that can rewrite functions in compiled binaries, and we used this tool to detect and repair the use of a static IV in AES-CBC without access to source.

On the theory side, we’d like to see a more stringent examination of the concrete security of cutting-edge protocols like zero-knowledge proofs and MPC. As we saw at Crypto this year, implementation details matter, and many of these complex systems are implemented in ways that either render them insecure or dramatically reduce their security level. This is made worse by the fact that many of these new protocols aren’t based on standard security assumptions like factoring or the discrete log problem. For example, many blockchain companies are rushing to roll out verifiable delay functions, which rely on a very new and poorly understood property of imaginary quadratic number fields. We need more thorough analyses of assumptions like this and how they impact security.

Finally, NIST will select the third round of candidates for their post-quantum cryptography standardization program in 2020. To succeed, NIST will need to rigorously assess the security of the round two candidates. We need more work like the paper on SIKE, which helps us compare classical and quantum security in a more precise way.

DeepState Now Supports Ensemble Fuzzing

by Alan Cao, Francis Lewis High School, Queens, NY

We are proud to announce the integration of ensemble fuzzing into DeepState, our unit-testing framework powered by fuzzing and symbolic execution. Ensemble fuzzing allows testers to execute multiple fuzzers with varying heuristics in a single campaign, while maintaining an architecture for synchronizing generated input seeds across fuzzer queues.

This new ensemble fuzzer includes a new deepstate-ensembler tool and several notable fuzzer engines powered by a new and open-sourced DeepState front-end API/SDK. This SDK enables developers to integrate front-end executors for DeepState and provides seed synchronization support for our ensembler tool.

The Fallacy of Smart Fuzzers

Fuzzers are one of the most effective tools in a security researcher’s toolbox. In recent years, they have been widely studied to build better heuristics, or strategies that fuzzers use to explore programs. However, one thing remains clear: fuzzer heuristics rarely live up to their hype. When scaling up so-called smart fuzzers for real-world programs, their performance often falters, and we end up defaulting back to our “dumb” standard tooling, like AFL and LibFuzzer.

Since we have explored the topic of evaluating fuzzing research in the past, let’s take a tangent and instead explore the possibility of combining various fuzzer heuristics to maximize fuzzer performance, without giving up time in our testing workflow. This led to my summer internship project of integrating ensemble fuzzing into DeepState.

What is Ensemble Fuzzing?

The insight from ensemble fuzzing is that, while certain heuristics work well in certain contexts, combining them should produce greater results than just a single fuzzer with a single strategy. This idea was first introduced in Chen et al.’s EnFuzz paper, however, there are no available open-source or commercial implementations.

Our implementation of an ensemble fuzzer follows the architecture implemented in EnFuzz, as seen below:

alan_1

Given a set of pre-determined diverse base fuzzers with respective local seed queues, we can integrate a global-asynchronous and local-synchronous (GALS) seed synchronization mechanism that pulls interesting seeds from local queues to a shared global queue during an execution cycle. Therefore, as a base fuzzer’s heuristics fail to improve coverage or discover interesting input seeds, it can now pull other fuzzers’ seeds from this global queue in the next execution cycle. Furthermore, once the campaign is terminated, we can receive any fuzzing feedback from the ensembler regarding base fuzzer performance, crash triaging/deduplication, or any other post-processing statistics.

Using ensemble fuzzing and our already powerful unit-testing framework for fuzzing/symbolic execution, DeepState, we are able to approach the following problems during testing:

  • Fuzzer performance diversity – do different fuzzer heuristics contribute varying useful seeds, maximizing the potential for improving coverage and crashes discovered?
  • Fuzzer workflow – how can we do exhaustive fuzz testing and/or symbolic execution while simplifying our workflow?
  • Invariant consistency – do different fuzzers return different results, indicating that there might be a source of nondeterminism in our test?

Spinning up Front-ends

Since DeepState already supports Eclipser as a backend, we chose to first build a front-end API, where a developer can write a front-end wrapper for a fuzzer backend. This orchestrates the running fuzzer process, and performs compile-time instrumentation, pre- and post-processing, and seed synchronization. It also simplifies the fuzzing environment setup by unifying how we can construct tools while implementing functionality.

The snippet below shows an example of a front-end wrapper for AFL. It inherits from a base DeepStateFrontend class and includes methods that define fuzzer-related functionality.

alan_2

Example front-end wrapper for AFL. One inherited method, pre_exec, allows the user to perform sanity checks before execution. Both environment properties (i.e. core dump pattern) and argument parsing are checked.

In order to build a front-end wrapper, we should have the following methods in our fuzzer object:

alan_6Each fuzzer has its own ensemble method, which provides a specialized rsync call to push and pull seeds from a global and local queue directory:

alan_3

Ensemble method for seed synchronization. Each __sync_seeds() call invokes a specialized rsync command to transfer seeds between a local and global queue directory.

Once built, we can use a front-end wrapper as so:

# compile a DeepState harness with AFL instrumentation
$ deepstate-afl --compile_test MyDeepStateHarness.cpp --compiler_args=”-Isomelib/include -lsomelib -lotherlib”

# execute the AFL fuzzer through DeepState
$ deepstate-afl -i seeds -o out ./out.afl

For a more detailed look into the fuzzer front-end API and how you can implement your own frontends, see this tutorial. DeepState has existing front-end executors for AFL, libFuzzer, Angora, Eclipser, and Honggfuzz.

https://asciinema.org/a/262023

Building the Ensembler

Using the unified API, we can now build an ensemble fuzzer that provisions front-end objects and executes fuzzers concurrently while maintaining seed synchronization.

To start, take a DeepState test harness input, and “ensemble compile” multiple instrumented binaries to a workspace directory with our fuzzers through the exposed compile() call from each frontend object.

alan_4

Provisioning a test workspace with binaries with the “ensemble compilation” strategy.

Once complete, each parallel fuzzer process is instantiated through run(). Since each front-end wrapper invokes rsync-style synchronization through ensemble(), the ensembler simply calls it from each front-end after a specified sync cycle (in seconds) to synchronize seeds.

This implementation is surprisingly simple, and was built with around 300 lines of code in Python. Here’s a quick demo running the ensembler on one of our test examples, Crash.cpp.

Fuzz ‘Em All!

Inspired by Google’s fuzzer-test-suite and the aforementioned work in fuzzer-performance testing, we decided that a DeepState-powered test suite, deepstate-test-suite, could help fuzzer performance A/B tests and actual bug-hunting. With our easy-to-use fuzzers and ensembler, let’s evaluate how they stand against real-world test-case benchmarks!

Bignum vulnerabilities are an especially interesting class of bugs because edge cases are much more difficult, and even probabilistic, to discover. This makes them ideal targets for DeepState property tests.

We benchmarked our base and ensembler fuzzers for their performance reproducing an existing test case and a real-world bignum vulnerability—a carry mis-propagation bug in TweetNaCl. Following the evaluation methods in this fuzzing evaluation paper, the average time was measured for each test case using 10 crash instances from well-formed initial inputs:

alan_5

These results provide some interesting insights about fuzzer diversity. Running smarter fuzzers like Angora and Eclipser on smaller contained test cases, like the Runlen example, work well. However, their performance falters when scaled up to the context of actual bug discovery in real-world software, like the TweetNaCl bug. The ensemble fuzzer’s performance shows it can scale up well for both of these test cases.

What’s in the future for Ensemble Fuzzing?

Ensemble fuzzing is a powerful technique that scales to real-world software libraries and programs. Integrating an ensemble fuzzer into DeepState gives power to unit-testing with a simplified workflow, and it opens up the possibility for many other research and engineering efforts.

Based on our current benchmark results, we can’t definitely say that ensemble fuzzing is the best fuzzing strategy, but it’s worth noting that there are always elements of randomness and probabilistic behavior when evaluating fuzzers. Effective ensemble fuzzing may be dependent on base fuzzer selection—determining which fuzzers to invoke and when based on the type of target or bug class being analyzed.

Maybe our current ensemble of fuzzers works effectively on reproducing bignum vulnerabilities, but would they work just as well on other classes of bugs? Would it be even more effective if we invoke fuzzers in a specific order? These are the questions we can answer more accurately with more benchmark tests on diverse targets.

Thanks!

Being a security engineering intern at Trail of Bits has been a wonderful experience, as always. Working with awesome employees and interns has really propelled my understandings in security research and how we can turn insightful academic research into working software implementations, just like my previous work with analyzing cryptographic primitives with Manticore. I’m especially excited to continue to do this at NYU, where I’ll be starting in the spring!