Sinter: New user-mode security enforcement for macOS

TL;DR: Sinter is the first available open-source endpoint protection agent written entirely in Swift, with support for Apple’s new EndpointSecurity API from first principles. Sinter demonstrates how to build a successful event-authorization security agent, and incorporates solutions to many of the challenges that all endpoint protection agents will face as they migrate from kernel-mode to user-mode agents before the release of macOS 11 Big Sur.

Simple, open-source, and Swift

Sinter is our new open-source endpoint security enforcement agent for macOS 10.15 and above, written in Swift. We built it from scratch as a 100% user-mode agent leveraging the new EndpointSecurity API to receive authorization callbacks from the macOS kernel for a set of security-relevant event types. Sinter is controlled with simple rules to allow or deny events—and uses none of the expensive full-system scans or signature-based detection of traditional anti-virus solutions.

Grab an installer for the beta version today and try it out!

Currently, Sinter lets you write a set of rules to block or allow process execution events, and to provide the rules to the agent with a Santa-compatible sync server or with a local configuration file (here is an example rule that demonstrates an explicit-allow). However, we’re planning to develop a more sophisticated rule syntax and add blocking capability for the many other kinds of events supported by the API, which would also mean an end to the Santa rule compatibility.

The quest for the 100% user-mode security agent

Implementing an endpoint security solution (e.g., anti-virus, anti-malware) requires interception and authorization of OS-level events in real time. Historically, that has meant the use of kernel-mode callback APIs or hooking kernel-mode operating system code when a proper API was not provided. Operating system developers have long known that third-party kernel-mode code like this was the leading source of system instability and insecurity, because any small error in kernel code tends to have large consequences.

Enter the macOS EndpointSecurity API. In late 2019, Apple announced that support for all third-party kernel extensions would be deprecated in macOS, and that they would introduce user-mode APIs and frameworks to replace the functionality needed for third-party products. All security vendors were put on notice: Deprecate your existing kernel-mode solutions within the next year and migrate to the EndpointSecurity API before the next release of macOS (macOS 11 Big Sur). It’s clearly not a fun prospect for many teams, and soon after the announcement a client tapped us to develop a user-mode solution that would make migration less painful.

What is the EndpointSecurity API?

EndpointSecurity is an API that implements a callback from the macOS kernel, in real time, as a particular event is about to happen. EndpointSecurity clients subscribe to one or more event types that are either a NOTIFY type or an AUTH (Authorization) type. Notify is just what it sounds like, and is useful for capturing a simple activity log on the host. An Authorization callback is much more powerful; it lets a client process make a decision to allow or deny the event from happening.

EndpointSecurity replaces the kernel-mode equivalents for real-time event authorizing on macOS (Kauth KPI and other unsupported kernel methods) and the read-only event monitoring OpenBSM audit trail. Any real-time monitoring or protection product for macOS must be rewritten to use EndpointSecurity for macOS 11 Big Sur.

Note that there are no network-related events in the EndpointSecurity API (except UNIX domain sockets). All of these are in the Network Extension framework. You can combine the use of both APIs from one System Extension, but here we focus on the EndpointSecurity API specifically.

Using this API, Stephen Davis at FireEye and Patrick Wardle at Objective-See quickly released event monitoring applications that could display, for example, process-related and file-related events in real time. But read-only monitoring tools following in the footsteps of Process Monitor (“ProcMon”), while useful, are only using the Notify half of the functionality of the EndpointSecurity API (the ability to monitor). Google’s Santa, an open-source macOS process allow/deny solution in Objective-C, demonstrates the ability to authorize events using EndpointSecurity: Its agent now receives and makes allow/deny decisions for process events from EndpointSecurity.

We saw that it would be critically important to master the EndpointSecurity API, as many teams would need to migrate to it in their existing macOS security applications. With the development of Sinter, we’ve delved into EndpointSecurity, learned some lessons from the experience, and incorporated solutions for various challenges we encountered—so you don’t have to. Sinter also demonstrates an implementation of an EndpointSecurity client in the Swift programming language, which promises better memory safety and performance than Objective-C, while maintaining compatibility with all of the other new macOS APIs.

Developing Sinter: Not for the faint of heart

Implementing an event-authorizing agent is an order of magnitude more difficult than implementing a read-only event-subscriber. We also learned—the hard way—certain shortcomings of the EndpointSecurity API. Here’s some of the more significant heavy lifting we did in the course of Sinter’s development.

1. Making decisions in real-time without impacting the system

Possibly the most difficult part of implementing a security event authorization agent is that authorization decisions must be made in real time. You cannot block forever to make a decision, and EndpointSecurity enforces a deadline for each authorization message: If your client blows the deadline, EndpointSecurity will terminate your client process to preserve a functioning system.

Decisions shouldn’t be made synchronously; Sinter uses the es_copy_message to dequeue the message from EndpointSecurity and allow its dispatcher to immediately send you the next message. Decisions should be made in separate threads, responding asynchronously as soon as possible from each one. Some decisions will take longer than others to process, but often the APIs needed to perform signature checks can’t be interrupted.

With Sinter, we ran into this problem head-on when a quick burst of execution events involving large programs caused Sinter to lock up the machine. We solved this by implementing an efficient queuing system, with one queue for small programs and another one for big programs, so events would never get stuck waiting in the queue. The big-programs queue works out-of-process so a long-running verification can be aborted whenever necessary. This new approach performs reliably on all of our tests.

2. Mitigating the TOCTOU risks in real-time security decisions

The TOCTOU (time of check, time of use) race condition vulnerability pattern commonly occurs when making security decisions. Any security agent performing a check must not allow the checked resource to be modified in the time between the check and the action’s approval.

When authorizing macOS execution events, the resource being checked is the executable file, which is mapped into memory before executing. Here’s one TOCTOU attack scenario:

A malicious actor executes Bad.app. The bad executables are mapped into memory and an execution authorization event is emitted by EndpointSecurity. But then the attacker immediately replaces or modifies the executable file to make it Good.app. The EndpointSecurity client gets the event, verifies that the bundle and its files all look good, and allows the execution.

This problem is not unique to EndpointSecurity, and was always a risk with the KAuth framework that preceded it (e.g., an issue was raised about this TOCTOU in Santa not long ago). It’s still a challenge that must be solved by any agent that wants to authorize events. As mentioned, Sinter attempts to monitor file events to catch TOCTOU attacks. It would have been much easier if Apple handled this responsibility within the EndpointSecurity API itself (submitted to Apple as a developer feedback suggestion FB8352031; see OpenRadar).

3. macOS executable files live in application bundles

Execution events occur in the context of a single executable file, but most macOS executables exist within application bundles, the directory-like structure that appears as a single “.app” file in macOS Finder. A bundle itself is code-signed, and code-signing verification must be done at the bundle level. This means that a security agent that catches an execution event must discover if the executable has a containing app bundle, then verify the code signature on the entire bundle—these are tasks not performed by EndpointSecurity itself. Some bundles like Apple’s Xcode.app are upwards of a gigabyte in size, and processing a verification in real time isn’t possible. Execution events have to be denied at first, until the verification completes.

EndpointSecurity does provide a built-in caching mechanism, a single cache shared by all EndpointSecurity clients. However, as a client you cannot invalidate a single entry in this cache; you can only clear the entire cache at once. EndpointSecurity will automatically invalidate a cache item if the related file is changed/deleted/etc., but it does this on a per-file basis, not per-application-bundle. Currently, Sinter works with two caches: the one managed by EndpointSecurity, and another custom cache containing the application bundle code-signing verification results.

In theory, malware could be added into an application bundle and EndpointSecurity would not react by clearing a cached approval decision, if the previously approved executable file in the bundle had not changed. EndpointSecurity clients would have to monitor for this themselves, and invalidate the entire cache in response. This is less than ideal, and we hope Apple will make improvements to this caching mechanism. In the near term, EndpointSecurity clients may have to implement their own integrity monitoring on application bundles to avoid being circumvented this way. Sinter attempts its own bundle file integrity monitoring capability to detect when this custom cache should be cleared.

4. Advantages of installing your agent as a System Extension

System Extensions” is Apple’s name for “user-mode components that extend the system” and is the umbrella term for what replaces the now-deprecated third-party Kernel Extensions. EndpointSecurity is one API under this umbrella; DriverKit and Network Extensions are a couple of others. A System Extension is also a new kind of managed plug-in package for macOS through which you can install your executable.

Installing an EndpointSecurity client as a System Extension is not required—you can implement all of the EndpointSecurity functionality from any kind of executable, even a basic command-line application—but is highly encouraged. There are additional benefits and system-enforced protections for your agent when it is installed as a System Extension. System Extensions can opt to be loaded before all other third-party applications at startup. Apple also announced that macOS extends SIP (System Integrity Protection) to cover System Extensions, meaning it prevents even root users from unloading your security agent. Historically this was only possible if you developed your own kernel-mode anti-tamper logic, but installing your agent as a System Extension frees you from reinventing this wheel. Sinter is currently a background daemon, but now that Apple has documented the anti-tamper protection benefit of installing your agent as a System Extension, we will be converting Sinter to this format.

5. Mastering the Entitlements, Signing, and Notarization workflow

The EndpointSecurity API is only usable within code-signed and notarized applications by Apple-approved developers like Trail of Bits. In other words, the API is gated by a special entitlement. Unlike most entitlements, this one requires a manual application and approval by Apple, after which you are granted a code-signing certificate with the EndpointSecurity entitlement. In our case, the time to be approved was six calendar weeks, but your mileage may vary. Apple is apparently being careful with this entitlement, because a misbehaving or malicious EndpointSecurity client could put a halt to everything on a host system.

Apple’s code-signing and notarization steps are difficult to troubleshoot when they fail, so it’s essential to set up and automate the process early, so you will immediately notice when they break and easily narrow down the breaking changes. For Sinter, we created our own CMake-driven approach that automates the workflow for Apple’s notarization, packaging, package signing, and package notarization steps. All of that now integrates perfectly into our CI with minimal fuss.

One last entitlement that EndpointSecurity agents need is related to user privacy. Because most agents will be inspecting files (whether in the context of file events or the executables of process events), they need the user’s permission to access the filesystem. On or before the first run of your application, the user must manually go to Privacy settings in System Preferences, and enable “Full Disk Access.” There are MDM payloads that can automatically enable the permission and sidestep this manual user approval step.

Those were the thornier challenges we addressed when writing Sinter, and of course there were more miscellaneous gotchas and lessons learned (e.g., determining whether files are binaries, signature verification, and multiple EndpointSecurity clients). We’ll update the most compelling details as development continues—stay tuned.

The upshot

With the deprecation of kernel extensions, Apple is leveling the playing field for endpoint protection agents: Everyone must use the same user-mode APIs. This will benefit everyone with improved system stability and reduced attack surface, but existing security product developers first have to replace their kernel extensions with a user-mode approach. In user mode, they can now work in any language, not just C/C++.

So instead of starting from scratch with just the example code in C, we hope organizations will help us build and rely upon an open-source platform in Swift, a forward-looking choice for long-term investment as Apple’s successor to Objective-C.

Get involved with Sinter

The beta version of Sinter is available today. It’s a significant first step, and here’s a peek at some of the larger items we’re working on now:

We invite you to partner with us to sponsor the continued development of Sinter, or to discuss the integration of EndpointSecurity-based capability into your existing agent—just contact us to get started.

Contributors are welcome, too! Give us your feedback on GitHub, or join us in the #sinter channel on the Empire Hacking Slack.

Accidentally stepping on a DeFi lego

The initial release of yVault contained logic for computing the price of yUSDC that could be manipulated by an attacker to drain most (if not all) of the pool’s assets. Fortunately, Andre, the developer, reacted incredibly quickly and disabled the faulty code, securing the approximately 400,000 USD held at the time. However, this bug still highlights the risk stemming from increased complexity caused by composition in the DeFi space.

What is yVault?

On July 25th 2020, yEarn launched a new service called yVault: Users could deposit tokens in the vault, which would then be supplied to a DeFi protocol chosen to maximize their interest.

The initial release supported USDC and integrated with the USDC/MUSD Balancer pool. Any USDC held by the vault would be supplied to the Balancer pool as liquidity, and the vault would receive BPT tokens in return.

To use the vault, a user sends USDC and is minted yUSDC. Similarly, USDC can be withdrawn by burning yUSDC. These two operations rely on a dynamically calculated exchange rate, defined as the ratio of the value of the BPT held by the contract and the total supply of yUSDC. Since the value of BPT goes up when fees are paid by traders, the value of each yUSDC token slowly goes up over time.

Within an hour of yVault’s release, users had already deposited around 400,000 USDC, so I knew I had to take a look at the code for myself.

What was the bug?

Since the initial release integrated with Balancer, let’s consider how Balancer works. Balancer removes the need for liquidity providers to manually rebalance their portfolio by incentivizing rational market actors to do so instead. If a token goes up in price, the pool will become unbalanced. While normally a liquidity provider may need to pay fees in order to sell a token that has increased in value, Balancer incentivizes external users to pay a fee for the privilege of purchasing the token at a profit instead. The fees paid are then distributed to the liquidity providers.

Figure 1 presents the equation used to calculate the amount of tokens received based on the state of the Balancer pool and the amount of tokens sent. For the remainder of this post, let’s refer to the MUSD/USDC 50/50 pool. The swap fee is 0.05%.

/**********************************************************************************************
// calcOutGivenIn                                                                            //
// aO = tokenAmountOut                                                                       //
// bO = tokenBalanceOut                                                                      //
// bI = tokenBalanceIn              /      /            bI             \    (wI / wO) \      //
// aI = tokenAmountIn    aO = bO * |  1 - | --------------------------  | ^            |     //
// wI = tokenWeightIn               \      \ ( bI + ( aI * ( 1 - sF )) /              /      //
// wO = tokenWeightOut                                                                       //
// sF = swapFee                                                                              //
**********************************************************************************************/

Figure 1: Token output given input.

First, to get a sense of how this function behaves, we’ll see what happens when a rational market actor swaps a pool back into balance and when an irrational market actor swaps a pool out of balance.

Suppose the pool is currently out of balance and contains 1,100,000 USDC and 900,000 MUSD. If a rational market actor pays 90,000 MUSD, they’ll receive 99,954 USDC in exchange and make 9,954 USDC in profit. A very good deal!

Now suppose the pool is currently balanced and contains 1,000,000 USDC and 1,000,000 MUSD. What happens if an irrational market actor pays 100,000 USDC? Well, they would receive 90,867 MUSD for a loss of 9,133 MUSD. Not such a great deal.

Although the second trade results in an immediate loss and thus seems rather useless, pairing it with the first trade results in some interesting behavior.

Consider a user who first performs The Bad Trade: The user converts 100,000 USDC to 90,867 MUSD, losing 9,133 USD in the process. Then, the user performs The Good Trade and converts 90,867 MUSD to 99,908 USDC, earning 9,041 USD in the process. This results in a net loss of 92 USD. Not ideal, but certainly not as bad as the loss of 9,200 USD.

Now consider the valuation of BPT during this process. If you held 1% of the total BPT, at the start of the transaction your tokens would have been worth 1% of 2,000,000 USD, or 20,000 USD. At the end of the transaction, your tokens would have been worth 1% of 2,000,092 USD, or 20,000.96 USD. Yet for a magical moment, right in the middle of the transaction, your tokens were worth 1% of 2,009,133 USD, or 20,091.33 USD. This is the crux of the vulnerability at hand.

Knowing this, I applied the same process behavior to yVault. Before The Bad Trade, the vault holds some BPT worth some amount of USD. After The Good Trade, the vault holds the same amount of BPT worth a slightly larger amount of USD. However, between The Bad Trade and The Good Trade, the vault holds some BPT worth a significantly larger amount of USD.

Recall that the value of yUSDC is directly proportional to the value of the BPT it holds. If we bought yUSDC before The Bad Trade and sold yUSDC before The Good Trade, we would instantaneously make a profit. Repeat this enough times, and we would drain the vault.

How was it fixed?

It turns out that accurately calculating the true value of BPT and preventing attackers from extracting profit from slippage is a difficult problem to solve. Instead, the developer, Andre, deployed a new strategy that simply converts USDC to MUSD and supplies it to the mStable savings account was deployed and activated.

Future Recommendations

DeFi composability is hard, and it’s easy to accidentally expose your new protocol to unexpected risk. If you integrate multiple tokens, any one token could compromise the security of your entire platform. On the other hand, if you integrate multiple platforms, your protocol could suffer from complex interactions.

Security tooling can be used to help prevent most simple bugs in code:

  • Crytic uses an advanced version of Slither to automatically detect up to 90 types of vulnerabilities
  • Echidna asserts specific properties through fuzz testing
  • Manticore can symbolically analyze your code

Of course, tooling isn’t a panacea for security. In our study “What are the Actual Flaws in Important Smart Contracts (and How Can We Find Them)?” we discovered that almost 50% of findings were unlikely to be detected by tooling, even if the technology significantly improves. For complex codebases and DeFi projects, reach out to us to arrange a security assessment, or sign up for our Ethereum security office hours.

Contract verification made easier

Smart contract authors can now express security properties in the same language they use to write their code (Solidity) and our new tool, manticore-verifier, will automatically verify those invariants. Even better, Echidna and Manticore share the same format for specifying property tests.

In other words, smart contract authors can now write one property test and have it tested with fuzzing and verified by symbolic execution! Ultimately, manticore-verifier reduces the initial effort and cost involved in symbolic testing of arbitrary properties.

How it works

A smart contract’s behavior—and its potential bugs—are often unique and depend heavily on unspoken contract invariants. Let’s test a simple contract:

contract Ownership{
    address owner = msg.sender;
    function Owner() public{
        owner = msg.sender;
    }
    modifier isOwner(){
        require(owner == msg.sender);
        _;
    }
}
contract Pausable is Ownership{
    bool is_paused;
    modifier ifNotPaused(){
        require(!is_paused);
        _;
    }
    function paused() isOwner public{
        is_paused = true;
    }
    function resume() isOwner public{
        is_paused = false;
    }
}
contract Token is Pausable{
    mapping(address => uint) public balances;
    function transfer(address to, uint value) ifNotPaused public{
        balances[msg.sender] -= value;
        balances[to] += value;
    }
}

This contract maintains a balance sheet and allows for simple transactions. Users can send their tokens to other users, but the total amount of tokens must remain fixed—in other words, tokens can’t be created after the contract has started. So under this invariant, a valid property could state: “If there are only 10,000 tokens, no user could own more than that.”

We can express this property as a Solidity method: “crytic_test_balance.”

import "token.sol";
contract TestToken is Token {
    constructor() public{
        balances[msg.sender] = 10000;
    }
    // the property
    function crytic_test_balance() view public returns(bool){
        return balances[msg.sender] <= 10000;
    }   
}

The emulated world

ManticoreEVM compiles and then creates the contract in a fully emulated symbolic blockchain.

Different normal accounts are also created there to replicate real-world situations. A deployer account is used to deploy the contract, others are used to explore the contract and try to break the properties, and, finally, a potentially different account is used to test the properties.

ManticoreEVM detects the property type methods present in high-level source code and checks them after every combination of symbolic transactions. A normal property is considered failed if the method returns false.

The loop (exploration)

The deployer account initially creates the target contract via a CREATE transaction. Then manticore-verifier simulates all possible interleaving transactions originating from the contract testers until (for example) no more coverage is found. After each symbolic transaction, the properties are checked in the name of the property-checker account, and if anything looks broken, a report of the reproducible exploit trace is generated. Normal properties like crytic_test_balance() are expected to return true; any other result is reported as a problem.

manticore-verifier dapp.sol –contract TestToken

It’s a command–line-based tool

Several aspects of the exploration, the stopping condition, and the user accounts employed can be modified by command line arguments. Try $manticore-verifier –help for a thorough list. Here’s an excerpt of it in action:

$manticore-verifier dapp.sol  --contract TestToken

# Owner account: 0x28e9eb58c2f5be87161a261f412a115eb85946d9
# Contract account: 0x9384027ebe35100de8ef216cb401573502017f7
# Sender_0 account: 0xad5e556d9699e9e35b3190d76f75c9bf9997533b
# PSender account: 0xad5e556d9699e9e35b3190d76f75c9bf9997533b
# Found 1 properties: crytic_test_balance
# Exploration will stop when some of the following happens:
# * 3 human transaction sent
# * Code coverage is greater than 100% measured on target contract
# * No more coverage was gained in the last transaction
# * At least 1 different properties where found to be breakable. (1 for fail fast)
# * 240 seconds pass
# Starting exploration...
Transaction 0. States: 1, RT Coverage: 0.0%, Failing properties: 0/1
Transaction 1. States: 2, RT Coverage: 60.66%, Failing properties: 0/1
Found 1/1 failing properties. Stopping exploration.
60.66% EVM code covered 
+---------------------+------------+
|    Property Named   |   Status   |
+---------------------+------------+
| crytic_test_balance | failed (0) |
+---------------------+------------+
Checkout testcases here:./mcore_kkgtybqb

Note that each failing property will have a test case number associated with it. More details can be found at the specified test case files: ./mcore_kkgtybqb/user_000000.tx

Bug Found!

In our example, manticore-verifier finds a way to break the specified property. When trying to transfer an incredibly large amount tokens, an internal integer representation exceeds its limits and makes it possible to boost the sender’s savings, i.e., create tokens out of thin air.

transfer(0,115792089237316195422001709574841237640532965826898585773776019699400460720238) -> STOP (*)

Conclusion: Interoperability = 101%

manticore-verifier lowers the initial cost to symbolically test arbitrary properties. It also allows our symbolic executor to work more tightly with Solidity, Echidna, and slither-prop.

The same methodology can be used with our Ethereum fuzzer, Echidna. As a result, you can write the properties once and test them with symbolic execution and fuzzing with no extra effort.

manticore-verifier can check automatically generated ERC20 properties. Moreover, slither-prop, our static analyzer, has detailed information about what an ERC20 contract should do, and can automatically produce properties for ERC20 that manticore-verifier can check, automatically.

So get your contract, add the property methods, and test with manticore-verifier at will. If you have any questions please join the Empire Hacking Slack.

Advocating for change

As a company, we believe Black lives matter. In the face of continued police brutality, racial disparities in law enforcement, and limited accountability, we demand an end to systemic racism, endorse restrictions on police use of force, and seek greater accountability for police actions. We believe police misconduct, militarization of police, and unchecked abuse of power are issues that we as Americans should protest.

Giving time, money, and attention

In this spirit, I have reaffirmed our employees’ right to protest without reprisal or retaliation. While there’s certainly no account of who has and hasn’t, I’m aware that many of our employees have recently marched to end systemic racism and police brutality.

To support understanding and discussion, we created a #solidarity channel on our company Slack. Conversations there grew rapidly as we shared research on social policy and upcoming legislation, including policies that have been analyzed by social scientists studying criminal justice:

  1. A large-scale analysis of racial disparities in police stops across the United States
  2. Collective Bargaining Rights and Police Misconduct: Evidence from Florida
  3. Evidence that curtailing proactive policing can reduce major crime
  4. Good Cop, Bad Cop: Using Civilian Allegations to Predict Police Misconduct
  5. The Wandering Officer

Many of our employees also decided to “protest with our wallets” and use our existing charitable donation matching program to support organizations we believe can effect change. In the last two weeks, employees have donated $12k and the company matched $12k ($24k total) to a number of related non-profits, including:

More we can do now: Calls to action

Advocacy is not new to us—Trail of Bits is among the largest employers of cybersecurity professionals in NYC, and has frequently advocated for policy change as part of Tech:NYC and the Coalition for Responsible Cybersecurity. As an NYC-based company, we urge the NYC Council to take action.

The June 18 legislative session of the NYC Council will be livestreamed, and we’ll be watching. We urge our representatives to:

  • Pass all five bills that were heard in the last meeting of the Public Safety Committee
  • Pass the POST Act and require reporting on NYPD use of surveillance technology
  • Commit to NYC Budget Justice and reallocate funds towards social programs

While policing is largely a state and local matter in the United States, federal action has a strong effect on state and local policies. We call on the US Congress to:

Local and state action may have the most direct impact on policing practices. If you want to lobby your representatives as an individual, use “Who are my representatives?” to find their contact information and give them a call. Personal, authentic contact with local representatives can be very effective at shaping policy decisions.

If you’re an individual motivated to support a charitable organization, consider reviewing the following resources first:

When donating, strongly consider a charitable donation matching program. If your employer does not offer one, suggest that they sign up for the Technology Partner program from RaisedBy.Us. Trail of Bits uses their service to facilitate donation matching through Brightfunds.

If you are planning to attend a protest, research what local activists in your area are recommending to protect yourself and others. There are widespread disparities in treatment of protesters across the United States: a “March for Families” in NYC may be completely unlike a similarly named event in Oregon. Consider advice from the Legal Aid Society of NYC or Vice (and their digital guide) and put on a mask before attending a protest.

We can always do more

We know our efforts are modest, and that the problems will not be fixed by a few waves of donations and legislation. Our own efforts to advocate for change started small, but they are growing.

We also recognize the diversity deficit in our own company. As part of our effort to close that gap, we are working with diversity and inclusion-focused recruiting groups and conducting implicit bias training. We’ve created the CTF Field Guide to help eliminate the knowledge gap for industry newcomers and we host yearly winternships that provide inroads for people new to computer security. We’re also increasing the matching for our existing charity matching program and making the most of our diversity-focused donation to the Summercon Foundation. Finally, to help ensure this is not a one-off effort, we are listening to our employees and community to hold us accountable.

The protests have been extraordinarily effective in moving legislation forward; so much so, it can be tough to keep up. We realize it’s only a long-overdue beginning, but the more we know about what’s gaining ground, the better we can advocate for it. To help, we’ve assembled a summary of the changes we’ve seen at the NYC, New York State, and federal levels.

Upgradeable contracts made safer with Crytic

Upgradeable contracts are not as safe as you think. Architectures for upgradeability can be flawed, locking contracts, losing data, or sabotaging your ability to recover from an incident. Every contract upgrade must be carefully reviewed to avoid catastrophic mistakes. The most common delegatecall proxy comes with drawbacks that we’ve catalogued before.

Crytic now includes a comprehensive suite of 17 upgradeability checks to help you avoid these pitfalls.

The how-to

Reviewing upgradeable contracts is a complex low-level task that requires investigating the storage layout and organization of functions in memory. We created a sample token that supports upgradeability to help walk through the steps in crytic/upgradeability-demo. This simple demo repository includes:

  • MyToken, our initial implementation of a simple token
  • Proxy, our proxy

Any call to Proxy will use a delegatecall on MyToken to execute its logic, while the storage variables will be held on Proxy. This is a standard setup for most upgradeable contracts.

Consider these two contracts are already deployed on mainnet. However, the code for MyToken has become stale and you need to change its features. It’s time for MyTokenV2! The code for MyTokenV2 is similar to MyToken, with the exception of removing the init() function and its associated state variable.

Let’s use Crytic to ensure that deploying MyTokenV2 does not introduce new security risks.

Configuration

First, tell Crytic about your upgradeable contracts. Go to your Crytic settings and find this panel:

Here you can configure:

  1. The contract being upgraded
  2. The proxy used
  3. The new version of the contract

Note: (1) and (2) are optional; Crytic will run as many checks as are appropriate.

For example, if you only have the upgradeable contract, and no proxy or new version, Crytic can already look for flaws in the initialization schema. If you have the upgradeable contract and the proxy, but no new version, Crytic can look for function collisions between the implementation and the proxy. If you have multiple upgradeable contracts, or multiple proxies, you can then configure any combination that fits your setup.

Back to MyToken, we have these three contracts:

Once we configure Crytic, the upgradeability checks will run on every commit and pull request, similar to security checks and unit tests:

Crytic’s Findings

Occasionally, Crytic will find serious errors in your upgradeability code (oh no!). We built one such issue into our demo. Here’s what it looks like when Crytic discovers a security issue:

The was_init storage variable was removed, so balances has a different storage offset in MyToken and MyTokenV2, breaking the storage layout of the contract.

This is a common mistake that can be particularly difficult to find by hand in complex codebases with many contracts and inheritances—but Crytic will catch the issue for you!

What else can Crytic find?

Crytic will review (depending on your configuration):

  • Storage layout consistency between the upgrades and the proxy
  • Function collisions between the proxy and the implementation
  • Correct initialization schema
  • Best practices for variable usage

Here’s the detailed list of checks:

Num What it Detects Impact Proxy needed New version needed
1 Variables that should not be constant High X
2 Function ID collision High X
3 Function shadowing High X
4 Missing call to init function High
5 initializer() is not called High
6 Init function called multiple times High
7 Incorrect vars order in v2 High X
8 Incorrect vars order in the proxy High X
9 State variables with an initial value High
10 Variables that should be constant High X
11 Extra vars in the proxy Medium X
12 Variable missing in the v2 Medium X
13 Extra vars in the v2 Informational X
14 Initializable is not inherited Informational
15 Initializable is missing Informational
16 Initialize function that must be called Informational
17 initializer() is missing Informational

Check your contracts with Crytic

In addition to finding 90+ vulnerabilities, Crytic can now detect flaws in your upgradeability code. It is the only platform that can protect your codebase in depth for so many issues. If you want to avoid catastrophic mistakes, use Crytic before deploying any upgradeable contract.

Got questions? Join our Slack channel (#crytic) or follow @CryticCI on Twitter.

ECDSA: Handle with Care

The elliptic curve digital signature algorithm (ECDSA) is a common digital signature scheme that we see in many of our code reviews. It has some desirable properties, but can also be very fragile. For example, LadderLeak was published just a couple of weeks ago, which demonstrated the feasibility of key recovery with a side channel attack that reveals less than one bit of the secret nonce.

ECDSA is fragile and must be handled with care

This post will walk you through:

  • the various ways in which ECDSA nonce bias can be exploited
  • how simple it is to attack in practice when things go wrong, and
  • how to protect yourself.

You’re probably familiar with attacks against ECDSA. Some attacks are trivial, and some involve advanced Fourier analysis and lattice math. Although these attacks can be complicated, I hope this post will demonstrate that they are easy to implement in practice. In fact, even if you don’t know anything about lattices, after reading this blog post you will be able to leverage a lattice attack to break ECDSA signatures produced with a very slightly faulty RNG using less than 100 lines of python code.

Math disclaimer: to read this post, you will need to be somewhat familiar with mathematical groups, recognizing that they have a binary operation and a group generator. You do not need to be an expert on elliptic curves; you just need to know that elliptic curves can be used to form a mathematical group (and, thus, have a concept of addition and scalar multiplication). Familiarity with other math concepts like lattices is helpful, but not required.

DSA primer

ECDSA is a specific form of the digital signature algorithm (DSA). DSA is a pretty common digital signature scheme, and is defined with three algorithms: key generation, signing, and verification. The key generation algorithm generates a private and public key; the private key is responsible for creating signatures; and the public key is responsible for verifying signatures. The signature algorithm takes as input a message and private key, and produces a signature. The verification algorithm takes as input a message, signature, and public key, and returns true or false, indicating whether the signature is valid.

DSA is defined over any mathematical group, and this scheme is secure as long as the discrete log problem is hard over this group. The group typically used is the integers modulo a prime, p. Along with this group, we will have a group generator, g, and some cryptographically secure hash function, H. We can assume that p, g, and H will all be publicly known.

Key generation works by first randomly selecting a value, x, from the integers mod p. Then the value y = gx mod p is computed. The private signing key is set to x, and the public key is y. The signing key must be kept secret, as this is what allows signatures to be made.

The signing algorithm produces a signature from a message, m, and the secret key, x. First, a random element of the group, k, is generated. This is known as the nonce, which is important when talking about attacks. Then, the values r = gk mod p and s = (k-1(H(m) + xr)) mod p are computed. Here k-1 is the group inverse, and H(m) is the result of computing the hash of m and interpreting the result as an integer mod p. The signature is defined to be the pair (r,s). (Note: if either of the r or s values equal 0, the algorithm restarts with a new k value).

The verification algorithm receives as input the signature, (r,s), the message, m, and the public key, y. Let ŝ = s-1, then the algorithm outputs true if and only if r,s ≠ 0 and r = (gH(m)yr)ŝ. This verification check works because gH(m)yr = gH(m)+xr = gks, and so (gH(m)yr)ŝ = gk = r.

A digital signature scheme is considered secure if it is unforgeable. Unforgeability has a formal cryptographic meaning, but on a high level it means that you cannot produce signatures without knowing the secret key (unless you have copied an already existing signature created from the secret key). DSA is proven to be unforgeable under the discrete log assumption.

ECDSA

DSA is defined over a mathematical group. When DSA is used with the elliptic curve group as this mathematical group, we call this ECDSA. The elliptic curve group consists of elliptic curve points, which are pairs (x,y) that satisfy the equation y2 = x3 + ax + b, for some a,b. For this blog post, all you need to know is that, using elliptic curves, you can define a finite group, which means you obtain a group generator, g (an elliptic curve point), and addition and scalar multiplication operations just like you can with integers. Since they form a finite group, the generator, g, will have a finite order, p. This blog post will not explain or require you to know how these elliptic curve operations work, but If you’re curious, I encourage you to read more about them here.

ECDSA works the same way as DSA, except with a different group. The secret key, x, will still be a random value from the integers mod p. Now, the public key, y, is still computed as y = gx, except now g is an elliptic curve point. This means that y will also be an elliptic curve point (before, y was an integer mod p). Another difference occurs in how we compute the value r. We still generate a random nonce, k, as an integer mod p, just as before. We will compute gk, but again, g is an elliptic curve point, and so gk is as well. Therefore, we can compute (xk,yk) = gk, and we set r = xk. Now, the s value can be computed as before, and we obtain our signature (r,s), which will still be integers mod p as before. To verify, we need to adjust for the fact that we’ve computed r slightly differently. So, as before, we compute the value (gH(m)yr)ŝ, but now this value is an elliptic curve point, so we take the x-coordinate of this point and compare it against our r value.

Recovering secret keys from reused nonces

Now that we understand what ECDSA is and how it works, let’s demonstrate its fragility. Again, since it’s a digital signature scheme, it is imperative that the secret key is never revealed to anyone other than the message signer. However, if a signer ever releases a signature and also releases the nonce they used, an attacker can immediately recover the secret key. Say I release a signature (r,s) for a message m, and I accidentally reveal that I used the nonce k. Since s = (k-1(H(m) + xr)), we can easily compute the secret key:

s = (k-1(H(m) + xr))

ks = H(m) + xr

ksH(m) = xr

x = r-1(ksH(m))

Therefore, not only does a signer need to keep their secret key secret, but they also must keep all of their nonces they ever generate secret.

Even if the signer keeps every nonce secret, if they accidentally repeat a single nonce (even for different messages), the secret key can immediately be recovered as well. Let (r,s1) and (r,s2) be two signatures produced on messages m1 and m2 (respectively) from the same nonce, k—since they have the same nonce, the r values will be the same, so this is very easily detected by an attacker:

s1 = k-1(H(m1) + xr) and s2 = k-1(H(m2) + xr)

s1s2 = k-1(H(m1) – H(m2))

k(s1s2) = H(m1) – H(m2)

k = (s1s2)-1(H(m1) – H(m2))

Once we have recovered the nonce, k, using the formula above, we can then recover the secret key by performing the previously described attack.

Let’s take a moment to digest this.

If a nonce for a signature is ever revealed, the secret key can immediately be recovered, which breaks our entire signature scheme. Further, if two nonces are ever repeated, regardless of what the messages are, an attacker can easily detect this and immediately recover the secret key, again breaking our entire scheme. That is pretty fragile, and these are just the easy attacks!

Attacking ECDSA from leaked and biased nonces

It turns out that even leaking small parts of the nonce can also be very damaging to the signature scheme. In 1999, work by Howgrave-Graham and Smart demonstrated the feasibility of using lattice attacks to break DSA from partial nonce leakage. Later, Nguyen and Shparlinski improved on their work, and were able to recover secret keys on 160-bit DSA (here 160-bit refers to p), and later ECDSA, by knowing only three bits of each nonce from 100 signatures.

Later, Mulder et al were able to perform more attacks on partial nonce leakage. They used a different, Fourier transform-based attack derived from work by Bleichenbacher. Using these techniques, and knowing only five bits of each nonce from 4,000 signatures, they were able to recover secret keys from 384-bit ECDSA, and leveraged their techniques to break 384-bit ECDSA running on a smart card.

You may have heard of the Minerva attack: Several timing side channels were leveraged to recover partial nonce leakage, and these lattice attacks were performed on a wide variety of targets. With enough signatures, they were able to successfully attack targets even when only the size of the nonce was leaked!

Even worse, a few weeks back, the LadderLeak attack further improved on Fourier analysis attacks, and now ECDSA secret keys can be recovered if only 1 bit of the nonce is leaked! In fact, the single bit can be leaked with probability less than 1, so attackers technically need less than 1 bit. This was leveraged to attack a very small leakage in Montgomery ladders in several OpenSSL versions.

Again, let’s digest this. Even when only a few bits of the nonce are leaked—or further, even if only the size of the nonce is leaked—or further, if one bit of nonce is leaked—then, most of the time, the entire signature scheme can be broken by observing enough signatures. This is incredibly fragile!

On top of this, even if you manage to keep all of your nonces secret and never repeat a nonce, and you never leak any bits of your nonce to an attacker, you still aren’t fully protected! Work by Breitner and Heninger showed that a slightly faulty random number generator (RNG) can also catastrophically break your scheme by leveraging lattice attacks. Specifically, when using 256-bit ECDSA, if your RNG introduces a bias of just 4 bits in your nonce, your signature scheme can be broken completely by a lattice attack, even if we don’t know what those biased values are.

These attacks involve some complicated math. Like most cryptographic attacks, they formulate a series of ECDSA signatures as another hard math problem. In this case, the problem is known as the Hidden Number Problem. The Hidden Number Problem has been fairly widely studied by other researchers, so there are a lot of techniques and algorithms for solving it. This means that once we figure out how to mold a series of ECDSA signatures into an instance of the Hidden Number Problem, we can then apply existing techniques to find an ECDSA secret key.

Breaking ECDSA from bad nonces

Now, Fourier analysis, Hidden Number Problems, and lattice attacks are more complicated than your everyday cryptography, and they seem daunting. However, the fact that these attacks involve complicated math may fool some people into thinking they’re very difficult to implement in practice. This is not the case. As I mentioned in the beginning, I will teach you how to implement these attacks using fewer than 100 lines of Python code. Moreover, to perform this attack, you actually don’t need to know anything about the Hidden Number Problem or lattices. The only lattice component we need is access to the LLL algorithm. However, we can treat this algorithm as a black box; we don’t need to understand how it works or what it is doing.

We’ll be attacking signatures produced from bad nonces (i.e., bad RNG). Specifically, these nonces will have a fixed prefix, meaning their most significant bits are always the same. (The attack still works even if the fixed bits aren’t the most significant bits, but this is the easiest to follow). When using LLL, all we have to know is that we will input a matrix of values, and the algorithm will output a matrix of new values. If we use a series of ECDSA signatures to construct a matrix in a particular way, LLL will output a matrix that will allow us to recover the ECDSA private key. More specifically, because of the way we construct this matrix, one of the rows of the output of LLL will contain all of the signatures’ nonces. (It requires more complicated math to understand why, so we won’t discuss it here, but if you’re curious, see section 4 of this paper). Once we recover the nonces, we can use the basic attack described above to recover the secret key.

To perform the attack we’ll need access to an ECDSA and an LLL library in python. I chose this ECDSA library, which allows us to input our own nonces (so we can input nonces from bad RNGs to test our attack), and this LLL library. We’ll perform this attack on the NIST P-256 elliptic curve, beginning with the easiest form of the attack: We are given two signatures generated from only 128-bit nonces. First, we generate our signatures.

import ecdsa
import random

gen = ecdsa.NIST256p.generator
order = gen.order()
secret = random.randrange(1,order)

pub_key = ecdsa.ecdsa.Public_key(gen, gen * secret)
priv_key = ecdsa.ecdsa.Private_key(pub_key, secret)

nonce1 = random.randrange(1, 2**127)
nonce2 = random.randrange(1, 2**127)

msg1 = random.randrange(1, order)
msg2 = random.randrange(1, order)

sig1 = priv_key.sign(msg1, nonce1)
sig2 = priv_key.sign(msg2, nonce2)

Now that we have our signatures, we need to craft the matrix we’ll input into the LLL algorithm:

Matrix that we will input into the LLL algorithm

Here N is the order of NIST P-256 (ord in code snippet above), B is the upper bound on the size of our nonces (which will be 2128 in this example, because both nonces are only 128 bits in size); m1 and m2 are the two random messages; and (r1, s1) and (r2,s2) are the two signature pairs. In our python code, our matrix will look like this (here modular_inv is a function for computing the inverse mod N):

r1 = sig1.r
s1_inv = modular_inv(sig1.s, order)
r2 = sig2.r
s2_inv = modular_inv(sig2.s, order)

matrix = [[order, 0, 0, 0], [0, order, 0, 0],
[r1*s1_inv, r2*s2_inv, (2**128) / order, 0],
[msg1*s1_inv, msg2*s2_inv, 0, 2**128]]

Now we’ll input this matrix into the black-box LLL algorithm, which will return a new matrix to us. For reasons that don’t matter here, one of the rows of this returned matrix will contain the nonces used to generate the two signatures. If we knew more about what the algorithm is actually doing, we could probably predict where the nonce is going to be. But since we don’t care about the details, we are just going to check every row in the returned matrix to see if we can find the secret key. Remember, we already showed how to recover the private key once we have the nonce, k. Specifically, we compute r-1(ksH(m)). An attacker in the real world would have access to the public key corresponding to these signatures. Therefore, to determine if we have found the correct private key, we will compute its corresponding public key and compare it against the known public key. The attack will look like this:

import olll

new_matrix = olll.reduction(matrix, 0.75)
r1_inv = modular_inv(sig1.r, order)
s1 = sig1.s

for row in new_matrix:
    potential_nonce_1 = row[0]
    potential_priv_key = r1_inv * ((potential_nonce_1 * s1) - msg1)

    # check if we found private key by comparing its public key with actual public key
    if ecdsa.ecdsa.Public_key(gen, gen * potential_priv_key) == pub_key:
        print(&quot;found private key!&quot;)

I should mention that there is a noticeable failure rate for this basic attack. If you run the code presented to you, you will notice this as well. But again, for the purposes of this post, don’t worry about these specifics. Also, this failure rate should decrease if you perform this same attack with more signatures.

Hopefully at this point I’ve shown why these attacks aren’t so complicated. We were able to recover the secret key from just two signatures, and we didn’t do anything overly complicated. That said, some of you would probably argue that being able to attack signatures with only 128-bit nonces isn’t that interesting. So let’s move on to more realistic attacks.

Exploiting real-world ECDSA bugs

You may have heard of a recent bug in the randomness generated in Yubikeys. Essentially, bad randomness caused as many as 80 bits of the nonce to be fixed to the same value. Attacking this real-world bug will not be much more difficult than the attack we just performed above, except we don’t know what the fixed 80-bit values are (in the previous example, we knew the fixed 128 bits were all set to 0). To overcome this, we need to add a trick to our attack.

Imagine we receive a collection of signatures whose nonces have 80 fixed bits. For ease of explanation, we will assume these 80 bits are the most significant bits (the attack is still feasible if this is not the case; you simply shift the fixed bits to the most significant bits by multiplying each signature by a power of 2). Even though we don’t know what these 80 bits are, we know that if we subtract any two nonces, the 80 most significant bits of their difference will all be zeros. Therefore, we are going to perform the same attack as above, except with our signature values subtracted. Specifically, given a set of n signatures and messages, we will build the following matrix:

Matrix that we will input into the LLL algorithm when the nonce bias is unknown

This time, we will again input this matrix into LLL and receive a new matrix back. However, since we subtracted the nth value from every entry in this matrix, instead of receiving a row full of nonces, we will actually receive a row with the difference between each nonce and the nth nonce. In other words, the matrix returned from LLL will give us the value k1 – kn, the difference between the nonces for signatures 1 and n. It takes some algebraic manipulation, but we can still recover the secret key from this value using the following formula:

s1 = k1-1(m1 + xr1) and sn = kn-1(mn + xrn)

s1k1 = m1 + xr1 and snkn = mn + xrn

k1 = s1-1(m1 + xr1) and kn = sn-1(mn + xrn)

k1kn = s1-1(m1 + xr1) – sn-1(mn + xrn)

s1sn(k1kn) = sn(m1 + xr1) – s1(mn + xrn)

s1sn(k1kn) = xsnr1 – xs1rn + snm1 – s1mn

x(s1rn – snr1) = snm1 – s1mn s1sn(k1kn)

Secret key = x = (rns1 – r1sn)-1 (snm1 – s1mn – s1sn(k1 – kn))

With all of that context, let’s exploit the Yubikey bug. If signatures are produced from nonces with 80 fixed bits, we only need five signatures to recover the secret key. We will build the matrix above with n = 6 to reduce the error rate:

# generate 80 most significant bits, nonce must be less than order
yubikey_fixed_prefix = random.randrange(2**176, order)

msgs = [random.randrange(1, order) for i in range(6)]
nonces = [random.randrange(1, 2**176) + yubikey_fixed_prefix for i in range(6)]
sigs = [priv_key.sign(msgs[i],nonces[i]) for i in range(6)]

matrix = [[order, 0, 0, 0, 0, 0, 0],
[0, order, 0, 0, 0, 0, 0],
[0, 0, order, 0, 0, 0, 0],
[0, 0, 0, order, 0, 0, 0],
[0, 0, 0, 0, order, 0, 0]]

row, row2 = [], []
[msgn, rn, sn] = [msgs[-1], sigs[-1].r, sigs[-1].s]
rnsn_inv = rn * modular_inv(sn, order)
mnsn_inv = msgn * modular_inv(sn, order)

# 2nd to last row: [r1(s1^-1) - rn(sn^-1), ... , rn-1(sn-1^-1) - rn(sn^-1), 2^176/order, 0 ]
# last row: [m1(s1^-1) - mn(sn^-1), ... , mn-1(sn-1^-1) - mn(sn^-1), 0, 2^176]
for i in range(5):
    row.append((sigs[i].r * modular_inv(sigs[i].s, order)) - rnsn_inv)
    row2.append((msgs[i] * modular_inv(sigs[i].s, order)) - mnsn_inv)

# add last elements of last two rows, B = 2**(256-80) for yubikey
row.append((2**176) / order)
row.append(0)
row2.append(0)
row2.append(2**176)

matrix.append(row)
matrix.append(row2)

new_matrix = olll.reduction(matrix, 0.75)

for row in new_matrix:
    potential_nonce_diff = row[0]

    # Secret key = (rns1 - r1sn)-1 (snm1 - s1mn - s1sn(k1 - kn))
    potential_priv_key = (sn * msgs[0]) - (sigs[0].s * msgn) - (sigs[0].s * sn * potential_nonce_diff)
    potential_priv_key *= modular_inv((rn * sigs[0].s) - (sigs[0].r * sn), order)

    # check if we found private key by comparing its public key with actual public key
    if ecdsa.ecdsa.Public_key(gen, gen * potential_priv_key) == pub_key:
        print(&quot;found private key!&quot;)

That’s it! We just exploited a real-world bug in about 50 lines of python.

Some might further argue that although this was an actual bug, systems producing 80 fixed bits are rare. However, this attack can be much more powerful than shown in this one example! For 256-bit elliptic curves, this attack will work even if only 4 bits of the nonce are fixed. Moreover, the attack does not become more complicated to implement. You simply need to increase the dimension of your lattice—i.e., in the matrix figure above, just increase the value of n and repeat the attack—nothing else! This will increase the running time of your attack, but not the complexity to implement. You could copy that code snippet and recover ECDSA secret keys generated from nonces with as little as 4 bits of bias. On top of that, the attack against nonce leakage is a similar level of difficulty.

Hopefully, I’ve now convinced you of the fragility of ECDSA and how easily it can be broken in practice when things go wrong.

By the way, some of you may be wondering how we determine the value n. Remember, n is the number of signatures we need to recover the secret key. When the nonce had the first 128 bits fixed to 0, this value was 2 (this value is 3 when 128 bits are fixed, but we don’t know to what value they are fixed). When the nonce had 80 randomly fixed bits, this value was 5. If you consult the relevant publications around these attacks, you can find the exact formula and derivation of this value for a given number of fixed bits. For simplicity, I derived these values empirically by attempting this attack with different numbers of signatures on different amounts of fixed bits. I’ve compiled the results into the figure below:

The number of signatures required to use this attack for a given number of fixed nonce bits (derived empirically)

Protecting your ECDSA signatures

If ECDSA is so fragile, how can users protect themselves? Ideally, we recommend that you use EdDSA instead of ECDSA, which handles nonce generation much more safely by eliminating the use of RNGs. Further, Ed25519, which is EdDSA over Curve25519, is designed to overcome the side-channel attacks that have targeted ECDSA, and it is currently being standardized by NIST.

If you’re required to use ECDSA, proceed with caution and handle with care! ECDSA is fragile, but it is not broken. As we saw, it is imperative that nonces used for ECDSA signatures are never repeated, never revealed (even partially), and generated safely.

To protect yourself from nonce leakage, the mitigation strategy is to write the implementation to operate in “constant time.” However, guaranteeing this can be very difficult, as we saw with OpenSSL. For instance, code can appear to be constant time, but then an optimizing compiler can introduce non-constant time behavior. Further, some assembly instructions are constant time in some architectures or processor models, but not in others. (Read more about this here).

Another technique for mitigating nonce leakage is known as blinding, where random numbers are included in your arithmetic to randomize timing information. However, evaluating the security of your blinding implementation can be tricky, and slightly weak blinding schemes can be problematic.

With both of these mitigations, keep in mind that the amount of nonce leakage is on the order of a single bit, so even the slightest changes by an optimizing compiler or the slightest leakage from your blinding technique can be catastrophic to your signature scheme.

To ensure that nonces are generated safely, most people recommend using RFC 6979, which specifies a way to securely generate nonces deterministically (i.e., without an RNG), using the message and secret key as entropy. This protocol to generate nonces eliminates the problem of bad RNGs, which can be problematic for devices such as Yubikeys where generating randomness securely is difficult. The signature scheme EdDSA actually uses a similar nonce generation method by default to avoid bad RNGs.

If you are using ECDSA in your system, I encourage you to consider all of those recommendations. Hopefully, with enough care, your signature scheme won’t end up like this:

This is what happens to ECDSA when you don’t generate your nonces safely

We’re always experimenting and developing tools to help you work faster and smarter. Need help with your next project? Contact us!

How to check if a mutex is locked in Go

TL;DR: Can we check if a mutex is locked in Go? Yes, but not with a mutex API. Here’s a solution for use in debug builds.

Although you can Lock() or Unlock() a mutex, you can’t check whether it’s locked. While it is a reasonable omission (e.g., due to possible race conditions; see also Why can’t I check whether a mutex is locked?), having such functionality can still be useful for testing whether the software does what it is supposed to do.

In other words, it would be nice to have an AssertMutexLocked function solely for debug builds, which could be used like this:

// this method should always be called with o.lock locked
func (Object* o) someMethodImpl() {
    AssertMutexLocked(&amp;o.lock)
    // (...)
}

Having such a function would allow us to confirm the assumption that a given mutex is locked and find potential bugs when it’s added into an existing codebase. In fact, there was a GitHub issue about adding this exact functionality in the official Go repository (golang/go#1366), but it was closed with a WontFix status.

I also learned via the great grep.app project that many projects have similar preconditions about mutexes, such as google/gvisor, ghettovoice/gossip, vitessio/vitess, and others.

Now let’s implement the MutexLocked (and other) functions.

Checking if a mutex is locked

To check whether a mutex is locked, we have to read its state. The sync.Mutex structure contains two fields:

type Mutex struct {
	state int32
	sema  uint32
}

The state field’s bits correspond to the following flags (source):

const (
	mutexLocked = 1 << iota // mutex is locked
	mutexWoken
	mutexStarving
	mutexWaiterShift = iota
	// (...)

So if a mutex is locked, its state field has the mutexLocked (1) bit set. However, we can’t just access the state field directly from a Go program, because this field is not exported (its name does not start with a capital letter). Luckily, the field can still be accessed with Go reflection, which I used in the code below when I implemented the functions that allow us to check if a given sync.Mutex or sync.RWMutex is locked.

package main

import (
	"fmt"
	"reflect"
	"sync"
)

const mutexLocked = 1

func MutexLocked(m *sync.Mutex) bool {
	state := reflect.ValueOf(m).Elem().FieldByName("state")
	return state.Int()&amp;mutexLocked == mutexLocked
}

func RWMutexWriteLocked(rw *sync.RWMutex) bool {
	// RWMutex has a "w" sync.Mutex field for write lock
	state := reflect.ValueOf(rw).Elem().FieldByName("w").FieldByName("state")
	return state.Int()&amp;mutexLocked == mutexLocked
}

func RWMutexReadLocked(rw *sync.RWMutex) bool {
	return reflect.ValueOf(rw).Elem().FieldByName("readerCount").Int() &gt; 0
}

func main() {
	m := sync.Mutex{}
	fmt.Println("m locked =", MutexLocked(&amp;m))
	m.Lock()
	fmt.Println("m locked =", MutexLocked(&amp;m))
	m.Unlock()
	fmt.Println("m locked =", MutexLocked(&amp;m))

	rw := sync.RWMutex{}
	fmt.Println("rw write locked =", RWMutexWriteLocked(&amp;rw), " read locked =", RWMutexReadLocked(&amp;rw))
	rw.Lock()
	fmt.Println("rw write locked =", RWMutexWriteLocked(&amp;rw), " read locked =", RWMutexReadLocked(&amp;rw))
	rw.Unlock()
	fmt.Println("rw write locked =", RWMutexWriteLocked(&amp;rw), " read locked =", RWMutexReadLocked(&amp;rw))
	rw.RLock()
	fmt.Println("rw write locked =", RWMutexWriteLocked(&amp;rw), " read locked =", RWMutexReadLocked(&amp;rw))
	rw.RLock()
	fmt.Println("rw write locked =", RWMutexWriteLocked(&amp;rw), " read locked =", RWMutexReadLocked(&amp;rw))
	rw.RUnlock()
	fmt.Println("rw write locked =", RWMutexWriteLocked(&amp;rw), " read locked =", RWMutexReadLocked(&amp;rw))
	rw.RUnlock()
	fmt.Println("rw write locked =", RWMutexWriteLocked(&amp;rw), " read locked =", RWMutexReadLocked(&amp;rw))
}

We can see this program’s output below:

m locked = false
m locked = true
m locked = false
rw write locked = false  read locked = false
rw write locked = true  read locked = false
rw write locked = false  read locked = false
rw write locked = false  read locked = true
rw write locked = false  read locked = true
rw write locked = false  read locked = true
rw write locked = false  read locked = false

And this can later be used to create AssertMutexLocked and other functions. To that end, I’ve created a small library with these functions at trailofbits/go-mutexasserts—which enables the assertion checks only in builds with a debug tag.

Note: Although there are other tools for detecting race conditions in Go, such as Go’s race detector or OnEdge from Trail of Bits, these tools will detect problematic situations only once they occur, and won’t allow you to assert whether the mutex precondition holds.

We’re always developing tools to help you work faster and smarter. Need help with your next project? Contact us!

Breaking the Solidity Compiler with a Fuzzer

Over the last few months, we’ve been fuzzing solc, the standard Solidity smart contract compiler, and we’ve racked up almost 20 (now mostly fixed) new bugs. A few of these are duplicates of existing bugs with slightly different symptoms or triggers, but the vast majority are previously unreported bugs in the compiler.

This has been a very successful fuzzing campaign and, to our knowledge, one of the most successful ever launched against solc. This isn’t the first time solc has been fuzzed with AFL; fuzzing solc via AFL is a long-standing practice. The compiler has even been tested on OSSFuzz since January of 2019. How did we manage to find so many previously undiscovered bugs–and bugs worth fixing fairly quickly, in most cases? Here are five important elements of our campaign.

1. Have a secret sauce

Fortunately, it’s not necessary that the novelty actually be kept secret, just that it be genuinely new and somewhat tasty! Essentially, we used AFL in this fuzzing campaign, but not just any off-the-shelf AFL. Instead, we used a new variant of AFL expressly designed to help developers fuzz language tools for C-like languages without a lot of extra effort.

The changes from standard AFL aren’t particularly large; this fuzzer just adds a number of new AFL havoc mutations that look like those used by a naive, text-based source code mutation testing tool (i.e., universalmutator). The new approach requires less than 500 lines of code to implement, most of it very simple and repetitive.

This variation of AFL is part of a joint research project with Rijnard van Tonder at Sourcegraph, Claire Le Goues at CMU, and John Regehr at the University of Utah. In our preliminary experiments comparing the method to plain old AFL, the results look good for solc and the Tiny C Compiler, tcc. As science, the approach needs further development and validation; we’re working on that. In practice, however, this new approach has almost certainly helped us find many new bugs in solc.

We found a few of the early bugs reported using plain old AFL in experimental comparisons, and some of the bugs we found easily with our new approach we also eventually duplicated using AFL without the new approach—but the majority of the bugs have not been replicated in “normal” AFL. The graph below shows the number of issues we submitted on GitHub, and underscores the significance of the AFL changes:

The big jump in bug discovery in late February came immediately after we added a few smarter mutation operations to our version of AFL. It could be coincidence, but we doubt it; we manually inspected the files generated and saw a qualitative change in the AFL fuzzing queue contents. Additionally, the proportion of files AFL generated that were actually compilable Solidity jumped by more than 10%.

2. Build on the work of others

Fuzzing a system that has never been fuzzed can certainly be effective; the system’s “resistance” to the kinds of inputs fuzzers generate is likely to be extremely low. However, there can also be advantages to fuzzing a system that has been fuzzed before. As we noted, we aren’t the first to fuzz solc with AFL. Nor were previous efforts totally freelance ad-hoc work; the compiler team was involved in fuzzing solc, and had built tools we could use to make our job easier.

The Solidity build includes an executable called solfuzzer that takes a Solidity source file as input and compiles it using a wide variety of options (with and without optimization, etc.) looking for various invariant violations and kinds of crashes. Several of the bugs we found don’t exhibit with the normal solc executable unless you use specific command-line options (especially optimization) or run solc in certain other, rather unusual, ways; solfuzzer found all of these. We also learned from the experience of others that a good starting corpus for AFL fuzzing is in the test/libsolidity/syntaxTests directory tree. This was what other people were using, and it definitely covers a lot of the “what you might see in a Solidity source file” ground.

Of course, even with such existing work, you need to know what you’re doing, or at least how to look it up on Google. Nothing out there will tell you that simply compiling solc with AFL won’t actually produce good fuzzing. First, you need to notice that that the fuzzing results in a very high map density, which measures the degree to which you’ve “filled” AFL’s coverage hash. Then you either need to know the advice given in the AFL User Guide, or search for the term “afl map density” and see that you need to recompile the whole system with AFL_INST_RATIO set to 10 to make it easier for the fuzzer to identify new paths. This only happens, according to the AFL docs, when “you’re fuzzing extremely hairy software.” So if you’re used to fuzzing compilers, you probably have seen this before, but otherwise you probably haven’t run into map density problems.

3. Play with the corpus

You may notice that the last spike in submitted bugs comes long after the last commit made to our AFL-compiler-fuzzer repository. Did we make local changes that aren’t yet visible? No, we just changed the corpus we used for fuzzing. In particular, we looked beyond the syntax tests, and added all the Solidity source files we could find under test/libsolidity. The most important thing this accomplished was allowing us to find SMT checker bugs, because it brought in files that used the SMTChecker pragma. Without a corpus example using that pragma, AFL has essentially no chance of exploring SMT Checker behaviors.

The other late-bloom bugs we found (when it seemed impossible to find any new bugs) mostly came from building a “master” corpus including every interesting path produced by every fuzzer run we’d performed up to that point, and then letting the fuzzer explore it for over a month.

4. Be patient

Yes, we said over a month (on two cores). We ran over a billion compilations in order to hit some of the more obscure bugs we found. These bugs were very deep in the derivation tree from the original corpus. Bugs we found in the Vyper compiler similarly required some very long runs to discover. Of course, if your fuzzing effort involves more than just playing around with a new technique, you may want to throw machines (and thus money) at the problem. But according to an important new paper, you may need to throw exponentially more machines at the problem if that’s your only approach.

Moreover, for feedback-based fuzzers, just using more machines may not produce some of the obscure bugs that require a long time to find; there’s not always a shortcut to a bug that requires a mutation of a mutation of a mutation of a mutation…of an original corpus path. Firing off a million “clusterfuzz” instances will produce lots of breadth, but it doesn’t necessarily achieve depth, even if those instances periodically share their novel paths with each other.

5. Do the obvious, necessary things

There’s nothing secret about reducing your bug-triggering source files before submitting them, or trying to follow the actual issue submission guidelines of the project you’re reporting bugs to. And, of course, even if it’s not mentioned in those guidelines, performing a quick search to avoid submitting duplicates is standard. We did those things. They didn’t add much to our bug count, but they certainly sped up the process of recognizing the issues submitted as real bugs and fixing them.

Interestingly, not much reduction was usually required. For the most part, just removing 5-10 lines of code (less than half the file) produced a “good-enough” input. This is partly due to the corpus, and (we think) partly due to our custom mutations tending to keep inputs small, even beyond AFL’s built-in heuristics along those lines.

What did we find?

Some bugs were very simple problems. For instance, this contract used to cause the compiler to bomb out with the message “Unknown exception during compilation: std::bad_cast”:

contract C {
    function f() public returns (uint, uint) {
        try this() {
        } catch Error(string memory) {
        }
    }
}

The issue was easily fixed by changing a typeError into a fatalTypeError, which prevents the compiler from continuing in a bad state. The commit fixing that was only one line of code (though quite a few lines of new tests).

On the other hand, this issue, which prompted a bug bounty award and made it into a list of important bug fixes for the 0.6.8 compiler release, could produce incorrect code for some string literals. It also required substantially more code to handle the needed quoting.

Even the un-reduced versions of our bug-triggering Solidity files look like Solidity source code. This is probably because our mutations, which are heavily favored by AFL, tend to “preserve source-code-y-ness.” Much of what seems to be happening is a mix of small changes that don’t make files too nonsensical plus combination (AFL splicing) of corpus examples that haven’t drifted too far from normal Solidity code. AFL on its own tends to reduce source code to uncompilable garbage that, even if merged with interesting code, won’t make it past initial compiler stages. But with more focused mutations, splicing can often get the job done, as in this input that triggers a bug that’s still open (as we write):

contract C {
    function o (int256) public returns (int256) {
    assembly {
        c:=shl(1,a)
    }
    }

    int constant c=2 szabo+1 seconds+3 finney*3 hours;
}

The triggering input combines assembly and a constant, but there are no files in the corpus we used that contain both and look much like this. The closest is:

contract C {
  bool constant c = this;
  function f() public {
    assembly {
        let t := c
    }
  }
}

Meanwhile, the closest file containing both assembly and a shl is:

contract C {
    function f(uint x) public returns (uint y) {
        assembly { y := shl(2, x) }
    }

Combining contracts like this is not trivial; no instance much like the particular shl expression in the bug-exposing contract even appears anywhere in the corpus. Trying to modify a constant in assembly isn’t too likely to show up in legitimate code. And we imagine manually producing such strange but important inputs is extremely non-trivial. In this case, as happens so often with fuzzing, if you can think of a contract at all like the one triggering the bug, you or someone else probably could have written the right code in the first place.

Conclusion

It’s harder to find important bugs in already-fuzzed high-visibility software than in never-fuzzed software. However, with some novelty in your approach, smart bootstrapping based on previous fuzzing campaigns (especially for oracles, infrastructure, and corpus content), plus experience and expertise, it is possible to find many never-discovered bugs in complex software systems, even if they are hosted on OSSFuzz. In the end, even our most aggressive fuzzing only scratches the surface of truly complex software like a modern production compiler—so cunning, in addition to brute force, is required.

We’re always developing tools to help you work faster and smarter. Need help with your next project? Contact us!

Detecting Bad OpenSSL Usage

by William Wang, UCLA

OpenSSL is one of the most popular cryptographic libraries out there; even if you aren’t using C/C++, chances are your programming language’s biggest libraries use OpenSSL bindings as well. It’s also notoriously easy to mess up due to the design of its low-level API. Yet many of these mistakes fall into easily identifiable patterns, which raises the possibility of automated detection.

As part of my internship this past winter and spring, I’ve been prototyping a tool called Anselm, which lets developers describe and search for patterns of bad behavior. Anselm is an LLVM pass, meaning that it operates on an intermediate representation of code between source and compilation. Anselm’s primary advantage over static analysis is that it can operate on any programming language that compiles to LLVM bitcode, or any closed-source machine code that can be converted backwards. Anselm can target any arbitrary sequence of function calls, but its original purpose was to inspect OpenSSL usage so let’s start there.

OpenSSL

The design of OpenSSL makes it difficult to understand and work with for beginners. It has a variety of inconsistent naming conventions across its library and offers several, arguably too many, options and modes for each primitive. For example, due to the evolution of the library there exist both high level (EVP) and low level methods that can be used to accomplish the same task (e.g. DSA signatures or EC signing operations). To make this worse, their documentation can be inconsistent and difficult to read.

In addition to being difficult to use, other design choices make the library dangerous to use. The API inconsistently returns error codes, pointers (with and without ownership), and demonstrates other surprising behavior. Without rigorously checking error codes or defending against null pointers, unexpected program behavior and process termination can occur.

So what types of errors can Anselm detect? It depends on what the developer specifies, but that could be anything from mismanaging the OpenSSL error queue to reusing initialization vectors. It’s also important to remember that these are heuristics, and misidentifying both good and bad behavior is always possible. Now, let’s get into how the tool works.

Function Calls

While the primary motivation of this project was to target OpenSSL, the library itself doesn’t actually matter. One can view OpenSSL usage as a sequence of API calls, such as EVP_EncryptUpdate and EVP_EncryptFinal_ex. But we could easily replace those names with anything else, and the idea remains the same. Hence bad behavior is a pattern of any function calls (not just OpenSSL’s) which we would like to detect.

My main approach was to search through all possible paths of execution in a function, looking for bad sequences of API calls. Throughout this post, I’ll be using OpenSSL’s symmetric encryption functions in my examples. Let’s consider EVP_EncryptUpdate, which encrypts a block of data, and EVP_EncryptFinal_ex, which pads the plaintext before a final encryption. Naturally, they should not be called out of order:

EVP_EncryptFinal_ex(ctx, ciphertext + len, &amp;amp;len);
...
EVP_EncryptUpdate(ctx, ciphertext, &amp;amp;len, plaintext, plaintext_len);

This should also be flagged, since the bad sequence remains a possibility:

EVP_EncryptFinal_ex(ctx, ciphertext + len, &amp;amp;len);
...
if (condition) {
  EVP_EncryptUpdate(ctx, ciphertext, &amp;amp;len, plaintext, plaintext_len);
}

I worked with LLVM BasicBlocks, which represent a list of instructions that always execute together. BasicBlocks can have multiple successors, each reflecting different paths of execution. A function, then, is a directed graph of many BasicBlocks. There is a single root node, and any leaf node represents an end of execution.

Finding all possible executions amounts to performing a depth-first search (DFS) starting from the root node. However, notice that the graph can contain cycles; this is analogous to a loop in code. If we performed a blind DFS, we could get stuck in an infinite loop. On the other hand, ignoring previously visited nodes can lead to missed behavior. I settled this by limiting the length of any path, after which Anselm stops exploring further (this can be customized).

One issue remains, which is that performing DFS over an entire codebase can be very time-consuming. Even if our exact pattern is simple, it still needs to be matched against all possible paths generated by the search. To solve this, I first prune the graph of any BasicBlock that does not contain any relevant API calls. This is done by pointing any irrelevant node’s predecessors to each of its successors, removing the middleman.

In practice, this dramatically reduces the complexity of a graph for faster path-finding: entire if statements and while loops can be eliminated without any consequences! It also makes any path limit much more reasonable.

Matching Values

Although solely examining function calls is a good start, we can do better. Consider OpenSSL contexts, which are created by EVP_CIPHER_CTX_new and must be initialized with algorithm, key, etc. before actual use. In the following situation, we want every context to be initialized by EVP_EncryptInit_ex:

EVP_CIPHER_CTX *ctx1 = EVP_CIPHER_CTX_new();
EVP_CIPHER_CTX *ctx2 = EVP_CIPHER_CTX_new();
EVP_EncryptInit_ex(ctx1, EVP_aes_256_cbc(), NULL, key, iv);

EVP_EncryptInit_ex always follows EVP_CIPHER_CTX_new, yet ctx2 is obviously not initialized properly. A more precise pattern would be, “Every context returned from EVP_CIPHER_CTX_new should later be initialized in EVP_CIPHER_CTX_new.”

I addressed this by matching arguments and return values — checking whether they pointed to the same LLVM Value object in memory. Contexts are a prime situation to match values, but we can use the same technique to detect repeated IVs:

EVP_EncryptInit_ex(ctx1, EVP_aes_256_cbc(), NULL, key1, iv);
EVP_EncryptInit_ex(ctx2, EVP_aes_256_cbc(), NULL, key2, iv);

Internally, Anselm uses regex capture groups to perform this analysis; every execution path is a string of function calls and Value pointers, while a bad behavior is defined by some regex pattern.

Pattern Format

Towards the end of my internship, I also defined a format for developers to specify bad behaviors, which Anselm translates into a (somewhat messy) regex pattern. Every line begins with a function call, followed by its return value and arguments. If you don’t care about a value, use an underscore. Otherwise, define a token which you can use elsewhere. Hence a rule banning repeat IVs would look like this:

EVP_EncryptInit_ex _ _ _ _ _ iv
EVP_EncryptInit_ex _ _ _ _ _ iv

Since the iv token is reused, Anselm constrains its search to only match functions which contain the same Value pointer at that argument position.

I also defined a syntax to perform negative lookaheads, which tells Anselm to look for the absence of specific function calls. For example, if I wanted to prevent any context from being used before initialized, I would prepend an exclamation mark like so:

EVP_CIPHER_CTX_new ctx
! EVP_EncryptInit_ex _ ctx _ _ _ _
EVP_EncryptUpdate _ ctx _ _ _ _

In English, this pattern identifies any calls to EVP_CIPHER_CTX_new and EVP_EncryptUpdate that do not have EVP_EncryptInit_ex sandwiched in between.

Final Notes

With its current set of tools, Anselm is capable of interpreting a wide range of function call patterns and searching for them in LLVM bitcode. Of course, it’s still a prototype and there are improvements to be made, but the main ideas are there and I’m proud of how the project turned out. Thanks to Trail of Bits for supporting these types of internships — it was a lot of fun!

Verifying Windows binaries, without Windows

TL;DR: We’ve open-sourced a new library, μthenticode, for verifying Authenticode signatures on Windows PE binaries without a Windows machine. We’ve also integrated it into recent builds of Winchecksec, so that you can use it today to verify signatures on your Windows executables!

svcli

As a library, μthenticode aims to be a breeze to integrate: It’s written in cross-platform, modern C++ and avoids the complexity of the CryptoAPI interfaces it replaces (namely WinVerifyTrust and CertVerifyCertificateChainPolicy). You can use it now as a replacement for many of SignTool’s features, and more are on the way.

A quick Authenticode primer

Authenticode is Microsoft’s code signing technology, comparable in spirit (but not implementation) to Apple’s Gatekeeper.

At its core, Authenticode supplies (or can supply, as optional features) a number of properties for signed programs:

  • Authenticity: A program with a valid Authenticode signature contains a chain of certificates sufficient for validating that signature. Said chain is ultimately rooted in a certificate stored in the user’s Trusted Publishers store, preventing self-signed certificates without explicit opt-in by the user.
  • Integrity: Each Authenticode signature includes a cryptographic hash of the signed binary. This hash is compared against the binary’s in-memory representation at load time, preventing malicious modifications.
    • Authenticode can also embed cryptographic hashes for each page of memory. These are used with forced integrity signing, which is necessary for Windows kernel drivers and requires a special Microsoft cross-signed “Software Publisher Certificate” instead of a self-signed or independently trusted Certificate Authority (CA).
  • Timeliness: Authenticode supports countersignatures embedding from a Timestamping Authority (TSA), allowing the signature on a binary to potentially outlive the expiration dates of its signing certificates. Such countersignatures also prevent backdating of a valid signature, making it more difficult for an attacker to re-use an expired signing certificate.

Like all code signing technologies, there are things Authenticode can’t do or guarantee about a program:

  • That it has no bugs: Anybody can write buggy software and sign for it, either with a self-signed certificate or by purchasing one from a CA that’s been cross-signed by Microsoft.
  • That it runs no code other than itself: The Windows execution contract is notoriously lax (e.g., the DLL loading rules for desktop applications), and many applications support some form of code execution as a feature (scripts, plugins, sick WinAMP skins, etc). Authenticode has no ability to validate the integrity or intent of code executed outside of the initial signed binary.

Similarly, there are some things that Authenticode, like all PKI implementations, is susceptible to:

  • Misplaced trust: CAs want to sell as many certificates as possible, and thus have limited incentives to check the legitimacy of entities that purchase from them. Anybody can create a US LLC for a few hundred bucks.
  • Stolen certificates: Code-signing and HTTPS certificates are prime targets for theft; many real-world campaigns leverage stolen certificates to fool users into trusting malicious code. Companies regularly check their secret material into source control systems, and code signing certificates are no exception.
  • Fraudulent certificates: Flame infamously leveraged a novel chosen-prefix attack on MD5 to impersonate a Microsoft certificate that had been accidentally trusted for code signing. Similar attacks on SHA-1 are now tractable at prices reasonable for nation states and organized crime.

All told, Authenticode (and all other forms of code signing) add useful authenticity and integrity checks to binaries, provided that you trust the signer and their ability to store their key material.

With that said, let’s take a look at what makes Authenticode tick.

Parsing Authenticode signatures: spicy PKCS#7

In a somewhat unusual move for 2000s-era Microsoft, most of the Authenticode format is actually documented and available for download. A few parts are conspicuously under-defined or marked as “out of scope”; we’ll cover some of them below.

At its core, Authenticode has two components:

  • The certificate table, which contains one or more entries, each of which may be a SignedData.
  • SignedData objects, which are mostly normal PKCS#7 containers (marked with a content type of SignedData per RFC 2315).

The certificate table

The certificate table is the mechanism by which Authenticode signatures are embedded into PE files.

It has a few interesting properties:

  • Accessing the certificate table involves reading the certificate table directory in the data directory table. Unlike every other entry in the data directory table, the certificate directory’s RVA field is not a virtual address—it’s a direct file offset. This is a reflection of the behavior of the Windows loader, which doesn’t actually load the certificates into the address space of the program.
  • Despite this, real-world tooling appears to be inflexible about placement and subsequent parsing of the certificate table. Microsoft’s tooling consistently places the certificate table at the end of the PE (after all sections); many third-party tools naively seek to the certificate table offset and parse until EOF, allowing an attacker to trivially append additional certificates1.

Once located, actually parsing the certificate table is straightforward: It’s an 8 byte-aligned blob of WIN_CERTIFICATE structures:


win_certificate.png

…with some fields of interest:

  • wRevision: the “revision” of the WIN_CERTIFICATE.
    • MSDN only recently fixed the documentation for this field: WIN_CERT_REVISION_2_0=0x0200 is the current version for Authenticode signatures; WIN_CERT_REVISION_1_0=0x0100 is for “legacy” signatures. I haven’t been able to find the latter in the wild.
  • wCertificateType: the kind of encapsulated certificate data.
    • MSDN documents four possible values for wCertificateType, but we’re only interested in one: WIN_CERT_TYPE_PKCS_SIGNED_DATA.
  • bCertificate: the actual certificate data. For WIN_CERT_TYPE_PKCS_SIGNED_DATA, this is the (mostly) PKCS#7 SignedData mentioned above.

As you might have surmised, the structure of the certificate table allows for multiple independent Authenticode signatures. This is useful for deploying a program across multiple versions of Windows, particularly versions that might have legacy certificates in the Trusted Publishers store or that don’t trust a particular CA for whatever reason.

Authenticode’s SignedData

Microsoft helpfully2 supplies this visualization of their SignedData structure:

authenticode-signeddata

This is almost a normal PKCS#7 SignedData, with a few key deviations:

  • Instead of one of the RFC 2315 content types, the Authenticode SignedData’s contentInfo has a type of SPC_INDIRECT_DATA_OBJID, which Microsoft defines as 1.3.6.1.4.1.311.2.1.43.
  • The structure corresponding to this object identifier (OID) is documented as SpcIndirectDataContent. Microsoft conveniently provides its ASN.1 definition:
    spc_indirect(Observe that the custom AlgorithmIdentifier is actually just X.509’s AlgorithmIdentifier—see RFC 3279 and its updates).

    ⚠ The code below does no error handling or memory management; read the μthenticode source for the full version. ⚠

    Given the ASN.1 definitions above, we can use OpenSSL’s (hellish and completely undocumented) ASN.1 macros to parse Microsoft’s custom structures:


    spc_indirect_macros

Actually checking the signature

With our structures in place, we can use OpenSSL’s (mostly) undocumented PKCS#7 API to parse our SignedData and indirect data contents:

verify1

…and then validate them:

verify2

Voilà: the basics of Authenticode. Observe that we pass PKCS7_NOVERIFY, as we don’t necessarily have access to the entire certificate chain—only Windows users with the relevant cert in their Trusted Publishers store will have that.

Calculating and checking the Authenticode hash

Now that we have authenticity (modulo the root certificate), let’s do integrity.

First, let’s grab the hash embedded in the Authenticode signature, for eventual comparison:

authenticode_digest

Next, we need to compute the binary’s actual hash. This is a little involved, thanks to a few different fields:

  • Every PE has a 32-bit CheckSum field that’s used for basic integrity purposes (i.e., accidental corruption). This field needs to be skipped when calculating the hash, as it’s calculated over the entire file and would change with the addition of certificates.
  • The certificate data directory entry itself needs to be skipped, since relocating and/or modifying the size of the certificate table should not require any changes to pre-existing signatures.
  • The certificate table (and constituent signatures) itself, naturally, cannot be part of the input to the hash.
  • To ensure a consistent hash, Authenticode stipulates that sections are hashed in ascending order by the value of each section header’s PointerToRawData, not the order of the section headers themselves. This is not particularly troublesome, but requires some additional bookkeeping.

μthenticode’s implementation of the Authenticode hashing process is a little too long to duplicate below, but in pseudocode:

  1. Start with an empty buffer.
  2. Insert all PE headers (DOS, COFF, Optional, sections) into the buffer.
  3. Erase the certificate table directory entry and CheckSum field from the buffer, in that order (to avoid rescaling the former’s offset).
  4. Use pe-parse’s IterSec API to construct a list of section buffers. IterSec yields sections in file offset order as of #129.
  5. Skip past the certificate table and add trailing data to the buffer, if any exists.
  6. Create and initialize a new OpenSSL message digest context using the NID retrieved from the signature.
  7. Toss the buffer into EVP_DigestUpdate and finish with EVP_DigestFinal.
  8. Compare the result with the Authenticode-supplied hash.

Other bits and pieces

We haven’t discussed the two remaining major Authenticode features: page hashes and timestamp countersignatures.

Page hashes

As mentioned above, page hashes are conspicuously not documented in the Authenticode specification, and are described as stored in a “[…] binary structure [that] is outside the scope of this paper.”

Online information on said structure is limited to a few resources:

  • The VirtualBox source code references OIDs for two different versions of the page hashes structure:
    • SPC_PE_IMAGE_PAGE_HASHES_V1_OBJID: 1.3.6.1.4.1.311.2.3.1
    • SPC_PE_IMAGE_PAGE_HASHES_V2_OBJID: 1.3.6.1.4.1.311.2.3.2

    These OIDs are not listed in Microsoft’s OID reference or in the OID repository4, although they do appear in Wintrust.h.

  • At least one fork of osslsigncode has support for generating and validating page hashes, and grants us further insight:
    • The V1 OID represents SHA-1 page hashes; V2 represents SHA2-256.
    • The serializedData of each SpcSerializedObject is an ASN.1 SET, each member of which is an ASN.1 SEQUENCE, to the effect of:page_hash_asn1(The definitions above are my reconstruction from the body of get_page_hash_link; osslsigncode confusingly reuses the SpcAttributeTypeAndOptionalValue type for Impl_SpcPageHash and constructs the rest of the contents of SpcSerializedObject manually.)

As far as I can tell, osslsigncode only inserts one Impl_SpcPageHash for the entire PE, which it calculates in pe_calc_page_hash. The code in that function is pretty dense, but it seems to generate a table of structures as follows:

page_hash

…where IMPL_PAGE_HASH_SIZE is determined by the hash algorithm used (i.e., by Impl_SpcPageHash.type), and the very first entry in the table is a null-padded “page hash” for just the PE headers with page_offset=0. This table is not given an ASN.1 definition—it’s inserted directly into Impl_SpcPageHash.pageHashes.

Timestamp countersignatures

Unlike page hashes, Authenticode’s timestamp countersignature format is relatively well documented, both in official and third-party sources.

Just as the Authenticode SignedData is mostly a normal PKCS#7 SignedData, Authenticode’s timestamp format is mostly a normal PKCS#9 countersignature. Some noteworthy bits include:

  • When issuing a timestamp request (TSR) to a timestamp authority (TSA), the request takes the form of an HTTP 1.1 POST containing a DER-encoded, then base64-encoded ASN.1 message:timestamp_asn1…where countersignatureType is the custom Microsoft OID 1.3.6.1.4.1.311.3.2.1 (i.e., SPC_TIME_STAMP_REQUEST_OBJID) and content is the original Authenticode PKCS#7 ContentInfo.
  • The TSA response is a PKCS#7 SignedData, from which the SignerInfo is extracted and embedded into the main Authenticode SignedData. The certificates from the TSA response are similarly embedded into the certificate list as unauthenticated attributes.

Wrapup

We’ve covered all four major components of Authenticode above: verifying the signature, checking the integrity of the file against the verified hash, calculating page hashes, and verifying any timestamp countersignatures.

μthenticode itself is still a work in progress, and currently only has support for signatures and the main Authenticode hash. You can help us out by contributing support for page hash parsing and verification, as well as timestamp signature validation!

μthenticode’s’ APIs are fully documented and hosted, and most can be used immediately with a peparse::parsed_pe *:

uthenticode_example

Check out the svcli command-line tool for an applied example, including retrieving the embedded Authenticode hash.

Prior work and references

μthenticode was written completely from scratch and uses the official Authenticode document supplied by Microsoft as its primary reference. When that was found lacking, the following resources came in handy:

The following resources were not referenced, but were discovered while researching this post:

  • jsign: A Java implementation of Authenticode

Want the scoop on our open-source projects and other security innovations? Contact us or sign up for our newsletter!


  1. Adding additional certificates without this parsing error is still relatively trivial, but requires the attacker to modify more parts of the PE rather than just append to it.↩︎
  2. For some definitions of helpful.↩︎
  3. The OID tree for Authenticode shows lots of other interesting OIDs, most of which aren’t publicly documented.↩︎
  4. I’ve submitted them, pending approval by the repository’s administrator.↩︎