Better Encrypted Group Chat

Broadly, an end-to-end encrypted messaging protocol is one that ensures that only the participants in a conversation, and no intermediate servers, routers, or relay systems, can read and write messages. An end-to-end encrypted group messaging protocol is one that ensures this for all participants in a conversation of three or more people.

End-to-end encrypted group messaging is a necessary problem to solve. Whether it be for limiting liability, providing verifiable client-side security, or removing a single point of failure, there are good reasons for a group messaging host to use an end-to-end encrypted protocol.

End-to-end encrypted group messaging is also a hard problem to solve. Existing solutions such as Signal, WhatsApp, and iMessage have inherent problems with scaling, which I’ll discuss in detail, that make it infeasible to conduct group chats of more than a few hundred people. The Message Layer Security (MLS) protocol aims to make end-to-end encrypted group chat more efficient while still providing security guarantees like forward secrecy and post-compromise security.1

To these ends, I’ve been working on molasses, a Rust implementation of MLS, designed with safety, ease-of-use, and difficulty-of-misuse in mind.

Molasses has helped refine the MLS spec

The primary contribution of molasses has been in detecting errors in the specification and other implementations through unit and interoperability testing. Molasses implements most of MLS draft 6. Why not all of draft 6? There was an error in the spec that made it impossible for members to be added to any group. This broke all the unit tests that create non-trivial groups. Errors like this are hard to catch just by reading the spec; they require some amount of automated digging. Once they are found, the necessary revisions tend to be pretty obvious, and they are swiftly incorporated into the subsequent draft.

Iterating this discovery/patching process using molasses has given me a chance to put the spec through its paces and help make things clearer. This winter internship (“winternship”) project has been a great experience, especially as a first-time IETF contributor.

How to build encrypted group chat

In this section we derive why MLS is constructed the way it is (hint: for efficiency reasons), and how it compares to other solutions (hint: it’s better).

First off, MLS works on a lower level than most chat applications. It is a protocol upon which applications can be built. For example, MLS does not govern group permissions such as who can add people to the chat — this would be done by an application using MLS under the hood). Thus, we can leave things like formal rule systems out of the conversation entirely when analyzing the protocol. Here, we’re only going to consider the sending of messages and the removal of members.

The following section makes use of cryptographic primitives such as digital signatures, Diffie-Hellman key exchange, (a)symmetric encryption, and key-derivation functions. If the reader feels underprepared in any of these areas, a quick skim of the sections in Serious Cryptography on ECIES and Authenticated Diffie-Hellman should be sufficient.

Without further ado,

A Motivating Problem

Wilna is planning a retirement party for an acquaintance, Vince. The logistics are a nightmare, so she invites her friends Xavier, Yolanda, and Zayne to help her plan. They would like to make a group chat on Slack so they can all stay on the same page, but they remember that Vince is an infrastructure manager for Slack—he can see all the messages sent over any Slack server in the world. This is a problem, since they want to give Vince a nice long vacation upstate and they want it to be a surprise. Vince’s position poses even more problems: he happens to manage every single server in town. Even if Wilna purchases her own server to mediate the group chat, Vince will be tasked with managing it, meaning that he can read everything the server stores.

What Wilna needs is a centralized end-to-end encrypted group chat, i.e., a group chat where every member can broadcast messages and read all incoming messages, but the single server that mediates these messages cannot read anything. For clarity, we’ll distinguish between application messages, which carry the textual content of what a group member wants to say to everyone else in the group, and auxiliary messages (called “Handshake messages” in MLS), which members use to manage group membership and cryptographic secrets. Since this is all mediated through one server, the members can rely on the server to broadcast their messages to the rest of the group.

With the setup out of the way, what are the options?

Solution #1: Pairwise Channels

Suppose Wilna, Xavier, Yolanda, and Zayne all know each other’s public keys for digital signatures. This means that each pair of people can do an authenticated Diffie-Hellman key exchange over some auxiliary messages and derive a shared symmetric key called the pair key. This process produces six separate pairwise channels, represented here:

If Wilna wants to send an application message m to the group, she has to encrypt it three separate times (once for each member of the group) and send all the ciphertexts:

The grey arrows represent application messages encrypted under a symmetric key.

Note that Wilna isn’t making use of the server’s ability to broadcast messages, since each member in the group can only decrypt messages encrypted under their own pair keys. Generalizing this, if there is a group of size N, sending an application message requires a member to encrypt and send N-1 times. Roughly speaking, this is how iMessage does group chat.2

Great, so that’s just three encryptions per person. This probably takes at most a few milliseconds on a phone. What’s the issue? The issue is what about the WhatsApp group with >10,000 members where my aunts talk about who’s getting married next? Do you want them to do 9,999 encryptions every time they send something? I do, but they probably don’t. To accommodate my aunts, we need to get cleverer.

Solution #2: Sender Keys

Instead of having a key between every user in the group, let’s give every user a sender key that they use to encrypt application messages. This is roughly what Signal2, WhatsApp2, and Keybase do. If you’re a group member, you have to go through the following setup:

  1. Randomly generate your sender key
  2. For every user in the group, encrypt your sender key with your pair key that you share with that user
  3. Send every user their encrypted copy of your sender key as an auxiliary message

After the setup, which requires N-1 encryptions for each user in a group of size N (that’s Θ(N2) total auxiliary messages), we finally see some efficient behavior. To send an application message m, Wilna:

  1. Encrypts m with her sender key precisely once
  2. Broadcasts the ciphertext to the group

The grey arrows represent application messages encrypted under a symmetric key.

The grey arrows represent application messages encrypted under a symmetric key.

Although there are three arrows here, they are all the same ciphertext, so the application message only needs to be encrypted and broadcast once. Thus, after the setup phase, each outgoing application message only costs a single encryption. So we’re done, right? Wrong, of course wrong. Because…

What about Removal?

The fallacy here is that the setup phase runs once. It actually runs every time the group is modified. Suppose in the process of premeditating this “retirement party,” the group finds out that Zayne has been leaking details to Vince the whole time. Naturally, they kick Zayne out. Now Zayne still knows all the sender keys, so if he talks to Vince and gets an encrypted transcript of the group conversation that happened after his departure, he would still be able to decrypt it. This is a no-no, since Zayne has already defected. To prevent this from happening, each remaining user in the group has to create a new sender key and share it with everyone else through their pairwise channels. Again, this is Θ(N2) total auxiliary messages, which can be a lot. So if we want to tolerate tons of group modifications,3 we’re going to have to find a way to bring down the number of auxiliary messages sent during the setup phase, while still being able to keep using sender keys for application messages. A well-known secret in computer science is that when the naïve solutions of pairs and lists don’t work, there’s a next logical step:

Solution #3: Trees

We would like to have sender keys (since they make application messages efficient). We also want to be able to transmit new sender keys to subsets of the group without using too many auxiliary messages. The important insight here is that, when we remove a member, we shouldn’t need to individually send new keying information to every single remaining member like we had to in the previous solution. After all, we need to send this to the whole group minus just one person. So why not have public keys that cover large subsets of the group, and use those for sending auxiliary messages? This is exactly what the MLS ratchet tree (a.k.a. TreeKEM) affords us.

The MLS ratchet tree is a binary tree4 whose leaves correspond to members of the group, and whose non-leaf nodes, called intermediate nodes, carry a Diffie-Hellman public key and private key. Intermediate nodes don’t represent people, computers, or locations on a network; they’re just pieces of data that facilitate auxiliary message sending. We also allow nodes to be blank, meaning that they do not have an associated keypair. A node that does have an associated keypair is said to be filled. Every member in the group retains a copy of the ratchet tree, minus the private keys. Knowledge of the private keys follows the ratchet tree property:

Ratchet Tree Property If a member M is a descendant of intermediate node N, then M knows the private key of N.

*deep breath* Sender keys are derived via key-derivation function (KDF) from the root node’s private key, and private keys are derived via KDF from its most-recently updated child’s private key.5 Upon the removal of a user, new private keys are distributed to the resolutions of the copath nodes, i.e, the maximal non-blank nodes of the subtrees whose root is the sibling of an updated node.

That paragraph alone took about 10 minutes to write, so let’s just see…

A Small Example

We start off with a group like so:

Zayne wants out, so Yolanda removes him.6 To remove him, Yolanda will first blank out Zayne and all his ancestors:

The boxes with red slashes through them represent blank nodes.

The boxes with red slashes through them represent blank nodes.

Yolanda needs to contribute new keying information to the new group so that the new sender keys can be derived from the new root’s private key. To do this, she generates a new personal keypair pubY’ and privY’ and derives all her ancestors’ keypairs by iteratively applying a KDF to the private key and computing its corresponding public key (this is called “ratcheting,” whence “ratchet tree”).

The green circles indicate recently updated nodes.

But Yolanda isn’t done. Wilna and Xavier need to be told about these new keys somehow. It’s Yolanda’s job to share this info. In particular,

  1. Every member needs to get a copy of the public keys of all updated nodes (i.e., Yolanda’s own public key and all her ancestors’). This is important. The public keys are part of the shared group state, and shared group state is how a bunch of values in the MLS protocol are derived.
  2. Every member needs to get a copy of the private keys of their nearest modified ancestor. This is in order to preserve the ratchet tree property.

Remember that the end goal is still to derive the sender keys, which means that Wilna and Xavier need to be told the value of the root private key, privY”’. This will be a consequence of item two above.

Since everyone needs public keys and public keys are not secret, Yolanda can just broadcast them as unencrypted auxiliary messages. But private keys are more sensitive. She needs to encrypt them for just the members who need them. This is where we use the ratchet tree property. If she wants Wilna and Xavier to be able to read an auxiliary message containing privY”’, she need only encrypt the message under pubWX, since Wilna and Xavier are descendants of the WX intermediate node, and will therefore be able to decrypt anything encrypted under pubWX.7 This describes how the auxiliary messages are sent to the rest of the group:

The solid black arrows above indicate public-key encrypted messages. The dashed arrows indicate plaintext messages. The arrows do not indicate who is doing the sending (since that’s all Yolanda). They’re just meant to illustrate where in the tree the values are coming from and whom they’re intended for.

Now Wilna and Xavier will update their view of the tree by saving the public keys and decrypting the root private key. Thus, everyone is on the same page and the ratchet tree property is preserved. Finally, everyone re-derives their sender keys, and the removal is complete:

Note that Zayne’s position remains blank after the removal. This saves the members from the computational overhead of shuffling themselves around and recomputing their ancestors’ keypairs. MLS defines two ways to prevent removed members from overcrowding the tree: it allows blank nodes to be removed from the right end of the tree after removals (not applicable in the example above), and it allows new members to be added in the position of previously removed members. So if the “party-planners” above wanted to replace Zayne, they could do so without making the tree bigger.

This example illustrates the smaller details in updating keys, but it doesn’t do a particularly good job at illustrating which node secrets are sent to which other nodes in the resolutions of the copath nodes. To give an idea, here’s…

A Much Bigger Example

Suppose Zayne wants to break out and go solo, but still feels the desire to be in a boy band. After cloning himself 15 times, Zayne #1 notices that one of the clones, Zayne #11, keeps hinting at breaking off and doing a solo career of his own. Zayne #1 acquiesces and removes him from the group. He sees what he’s created. Zayne #1 looks up at the stars. War soon.

Let’s see what auxiliary messages were sent when Zayne #11 was booted. In this removal process, Zayne #1 generates new secrets, ratchets them all the way up the tree, and shares them with the appropriate subtrees:

The green circles still represent the updated nodes. The solid arrows represent the private key of its tail being encrypted under the public key of its head.

The green circles still represent the updated nodes. The solid arrows represent the private key of its tail being encrypted under the public key of its head.

Notice on the right hand side of the tree, since you can’t encrypt to a blank node, the root private key needs to be encrypted under three separate public keys. The dashed arrows were omitted for clarity, but it’s still true that the public keys of all the circled nodes are broadcasted in this step.
With this larger example, you might start to see some pattern in how many auxiliary messages are sent per tree update. Let’s play

Can You Eyeball the Asymptotic Behavior?

We got efficient application messages with sender keys, and we’d like to say that we got efficient auxiliary messages with TreeKEM so we can call it a day. Is this true? Absolutely not, at least not entirely. Let’s first talk about the example above, where we start off with a tree whose nodes are all filled.

Removal in a Filled Tree

The Zayne example is actually worst-case removal behavior in a filled tree in terms of number of auxiliary messages (you should prove this to yourself: what would happen if Zayne #1 removed Zayne #6 instead?). If there are N many members in the group, there are at most log(N)-1 encrypted auxiliary messages that don’t have to deal with blank nodes, and another log(N)-1 that do. Plus, there are log(N) many public keys to share. So, to complete the sage wisdom from computer scientists of days past, if you use trees, you get O(log(N)) behavior. This is way better than the quadratic number of auxiliary messages we saw in solution #2. The same WhatsApp group of kibbitzing mumehs now only takes about 3log2(10,000) ≈ 40 total auxiliary messages to establish a new set of sender keys (assuming a filled tree) instead of the N(N-1) ≈ 99 million total auxiliary messages required previously.

Removal in a Tree with Blanks

This logarithmic behavior is fantastic, but we only checked for the very specific case where we start with a full group and then remove one person. How efficient is it when we remove a single person from a group that already has some blanks? The good news is that it’s still better than Θ(N2). The bad news is that the worst case is… well let me just show you.

Suppose every odd-numbered Zayne was removed from the group besides Zayne #1. Finally, Zayne #2 deals the finishing blow, removing Zayne #1 and restoring peace. Here is what the update looks like:

That’s N-1 messages to remove a single person! As mentioned before, this can be a prohibitively large number of auxiliary messages for large N. Even worse, it may be possible for malicious group members to strategically remove people until the tree reaches the worst-case state, thus slowing down group operations for everyone in the group.

Dealing with this situation is an open issue, and people are actively working on resolving or at least mitigating it. As of this writing, though, there are no proposed solutions that would materially improve the worst-case behavior.

Conclusion and More Info

It’s underwhelming to end at an open issue, but this is where the protocol stands today. Efficiently updating keys is at the crux of end-to-end group messaging. The TreeKEM method, edge cases and all, is one of the most important singular contributions that MLS makes. Given that there’s still at least one open issue in the spec, you may wonder

How close is the protocol to being done?

No clue. MLS has plenty of open issues (nine as of this writing) and is being tweaked constantly. Draft 7 landed just this month, and it completely overhauled the symmetric key schedule. Inefficiencies are being shaved down as issues around authenticity, confidentiality, deniability, etc. are being patched.

What are the other implementations?

The unofficial reference implementation, mlspp, is used to create test vectors that we implementers all test against. There’s also MLS*, a project at Inria to implement and formally model the protocol in F*. And there’s even another Rust implementation, melissa, being written at Wire.

Remind me why you’re writing yet another Rust implementation?

The more implementations the better. Writing this implementation has helped find errors in mlspp and the specification itself.

Errors found in mlspp include missing important fields (missing protocol version and missing hash of WelcomeInfo, which enforces sequencing), incorrect tree addressing (using leaf indices instead of node indices and vice-versa), and incorrectly generated test vectors. Errors in the specification that we found include ambiguities (how are removed nodes pruned from the ratchet tree?), logical impossibilities (how can you add a user to the group if your WelcomeInfo doesn’t include the current decryption keys?), and deontological omissions (SHOULD8 a user verify the broadcasted pubkeys against their derived pubkeys or not?).

Ok great, but why Rust?

*cracks knuckles*

I thought it would be nice to have an MLS implementation that has a clear API (thanks to molasses’ careful design and Rust’s strong typing), memory-safe semantics (thanks to the Rust borrow checker), thorough documentation (thanks to cargo doc and molasses’ current 43% comment-code ratio), and good performance (thanks to ZERO-COST-ABSTRACTIONS). Of course, none of these features make up for the fact that molasses is not formally verified like MLS* and may never be, but hey, nobody ever complained that cotton isn’t as bulletproof as kevlar, cuz those are for different things.

How can I help?

I don’t recommend filing issues with molasses quite yet. The spec is moving too quickly and the library has to be redesigned accordingly each time. If you would like to contribute, the MLS IETF page has a mailing list where you can read and participate in discussions. The organizers are helpful and patient, and I appreciate them immensely. If you want to write your own implementation, see the implementers’ Github repo for organizing info and test vectors.

If you are interested in reading more about the protocol and seeing some of the other open issues, you should give the spec9 a read.

“I want your expertise”

Well that’s going to cost you. We offer consulting in end-to-end protocol design, engineering, and auditing. Drop us a line on our contact page if you’re interested.

Thanks for reading! If you have questions or corrections, please feel free to email me at michael.rosenberg@trailofbits.com.

Footnotes:

  1. Full post-compromise security, i.e., the problem of non-deterministically deriving all new shared data so as to make the excluded parties unable to participate, is actually not easily achieved in this scheme. There is ongoing research in characterizing how post-compromise secure MLS is after a certain number of group updates.
  2. Source. This is a fantastic paper which provides a lot of context for this article. Seriously, if you want to understand this topic better, you should read the MLS spec and this paper and compare the two, since they differ in pretty subtle but significant ways. E.g., the ART scheme used in the paper does not allow intermediate nodes to be blank, which affects confidentiality of messages sent to offline members.
  3. The problem of Removal in this article is a placeholder for (a weaker form of) post-compromise security. Here, “group modifications” includes updating key material without changing group membership.
  4. Specifically, it is a left-balanced binary tree. This is fancy computer talk for “every left subtree is full,” which itself is fancy computer talk for “it behaves good when stuffed into an array.”
  5. Both these statements are technically false, but it’s way easier to think of things this way, and it’s close enough to the truth imo. In reality, sender keys are derived from a long chain of secret values relating to group state and state transitions. Node private keys are simpler, but they are also derived from chains of other secrets called “node secrets” and “path secrets.” As always, see the spec for more details.
  6. MLS doesn’t allow users to remove themselves. This is a quirk of the protocol, but it doesn’t really affect anything.
  7. If you’re confused why I say all these keys are DH keys and then use public-key encryption, it’s because the public-key encryption in MLS is done with ECIES. More specifically, it’s HPKE.
  8. The all-caps “SHOULD” means something specific in IETF RFCs. Its meaning is governed by not one but two RFCs, which are referred to as Best Current Practice 14. The linguistic conventions of RFCs are super cool and alone make it worth skimming a few specs and paying attention to their “conventions and terminology” sections. TLS is as good a place to start as any.
  9. If you want a particularly nice reading experience, you should compile the spec yourself from source. It really is appreciably better.

SVGs of the images in this post are available.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s