Panicking the right way in Go

A common Go idiom is to (1) panic, (2) recover from the panic in a deferred function, and (3) continue on. In general, this is okay, so long there are no global state changes between the entry point to the function calling defer, and the point at which the panic occurs. Such global state changes can have a lasting effect on the program’s behavior. Moreover, it is easy to overlook them and to believe that all actions are undone by a call to recover.

At Trail of Bits, we have developed a tool called OnEdge to help detect such incorrect uses of the “defer, panic, recover” pattern. OnEdge reduces the problem of finding such global state changes to one of race detection. Go’s outstanding race detector can then be used to find these errors. Moreover, as we explain below, you can incorporate OnEdge into your own programs in order to find these types of errors.

OnEdge is one of the tools that we use to verify software. For example, we audit a lot of blockchain software written in Go, where it is common to panic upon receiving an invalid transaction, to recover from the panic, and to continue processing transactions. However, care must be taken to ensure that an invalid transaction is reverted completely, as a partially applied transaction could, oh say for example, cause the blockchain to fork.

“Defer, Panic, and Recover”

The definitive reference on this technique is Andrew Gerrand’s blog post, referenced above. We will not give such a thorough account here, though we will walk through an example.

In Figure 1 is a simple program that employs the “defer, panic, and recover” pattern. The program randomly generates deposits and withdrawals. If there are not sufficient funds to cover a withdrawal, the program panics. The panic is caught in a deferred function that reports the error, and the program continues on.

package main

import (
	"fmt"
	"log"
	"math/rand"
)

var balance = 100

func main() {
	r := rand.New(rand.NewSource(0))
	for i := 0; i < 5; i++ {
		if r.Intn(2) == 0 {
			credit := r.Intn(50)
			fmt.Printf("Depositing %d...\n", credit)
			deposit(credit)
		} else {
			debit := r.Intn(100)
			fmt.Printf("Withdrawing %d...\n", debit)
			withdraw(debit)
		}
		fmt.Printf("New balance: %d\n", balance)
	}
}

func deposit(credit int) {
	balance += credit
}

func withdraw(debit int) {
	defer func() {
		if r := recover(); r != nil {
			log.Println(r)
		}
	}()
	balance -= debit
	if balance < 0 {
		panic("Insufficient funds")
	}
}

Figure 1: A program that uses the “defer, panic, and recover” pattern incorrectly.

Running the program in Figure 1 produces the output in Figure 2.

Depositing 14...
New balance: 114
Withdrawing 6...
New balance: 108
Withdrawing 96...
New balance: 12
Withdrawing 77...
<time> Insufficient funds
New balance: -65
Depositing 28...
New balance: -37

Figure 2: Output of the program in Figure 1.

Note that there is a bug: even though there are not sufficient funds to cover one of the withdrawals, the withdrawal is still applied. This bug is a special case of a more general class of errors; the program makes global state changes before panicking.

A better approach would be to make such global state changes only after the last point at which a panic could occur. Rewriting the withdraw function to use this approach would cause it to look something like Figure 3.

func withdraw(debit int) {
	defer func() {
		if r := recover(); r != nil {
			log.Println(r)
		}
	}()
	if balance-debit < 0 {
		panic("Insufficient funds")
	}
	balance -= debit
}

Figure 3: A better implementation of the withdraw function from Figure 1.

Following a brief introduction to Go’s race detector, we describe a method for finding improper global state changes like those in Figure 1.

The Go Race Detector

The Go Race Detector is a combination of compiler instrumentation and a runtime library. The compiler instruments (1) memory accesses that cannot be proven race-free, and (2) uses of known synchronization mechanisms (e.g., sending and receiving on a channel). The runtime library, based on Google’s ThreadSanitizer, provides the code to support the instrumentation. If two instrumented memory accesses conflict and cannot be proven synchronized, then the runtime library produces a warning message.

The Go race detector can produce “false negatives” i.e., it can fail to detect some races. However, provided that synchronization mechanisms known to the runtime library are used, every warning message that it produces is a “true positive,” i.e., an actual race.

One enables the Go race detector by passing the “-race” flag, e.g., “go run“ or “go build.“ The “-race” flag tells the Go compiler to instrument the code as described above, and to link-in the required runtime library.

Using the Go race detector is not cheap. It increases memory usage by an estimated 5-10x, and increases execution time by 2-20x. For this reason, the race detector is typically not enabled for “release” code, and is used only during development. Nonetheless, the strong guarantees that come with the detector’s reports can make the overhead worthwhile.

Detecting Global State Changes

The problem of detecting global state changes has obvious similarities to the problem of detecting data races: both involve memory accesses. Like data races, detecting global state changes would seem amenable to dynamic analysis. So, a question that one might ask is: can one leverage the Go race detector to find global state changes? Or, more precisely, can one make a global state change look like a data race?

We solve this problem by executing code that could modify global state twice: once in a program’s main thread, and once in a second, “shadow” thread. If the code does modify global state, then there will be two conflicting memory accesses, one in either thread. So long as the two threads do not appear synchronized (which is not hard to ensure), then the two memory accesses will potentially be reported as a data race.

OnEdge

OnEdge detects improper global state changes using the approach described above. OnEdge is a small library that exports a handful of functions, notably, WrapFunc and WrapRecover. To incorporate OnEdge into a project, do three things:

  1. Wrap function bodies that defer calls to recover in WrapFunc(func() { … }).
  2. Within those wrapped function bodies, wrap calls to recover in WrapRecover( … ).
  3. Run the program with Go’s race detector enabled.

If a panic occurs in a function body wrapped by WrapFunc, and that panic is caught by a recover wrapped by WrapRecover, then the function body is re-executed in a shadow thread. If the shadow thread makes a global state change before calling recover, then that change appears as a data race and can be reported by Go’s race detector.

Figure 4 is the result of applying steps 1 and 2 above to the withdraw function from Figure 1.

func withdraw(debit int) {
	onedge.WrapFunc(func() {
		defer func() {
			if r := onedge.WrapRecover(recover()); r != nil {
				log.Println(r)
			}
		}()
		balance -= debit
		if balance < 0 {
			panic("Insufficient funds")
		}
	})
}

Figure 4: The withdraw function from Figure 1 with OnEdge incorporated.

A complete source file to which the above steps have been applied can be found here: account.go. Running the modified program with the race detector enabled, e.g.,

go run -race account.go

produces the output in Figure 5.

Depositing 14...
New balance: 114
Withdrawing 6...
New balance: 108
Withdrawing 96...
New balance: 12
Withdrawing 77...
==================
WARNING: DATA RACE
Read at 0x0000012194f8 by goroutine 8:
  main.withdraw.func1()
      <gopath>/src/github.com/trailofbits/on-edge/example/account.go:61 +0x6d
  github.com/trailofbits/on-edge.WrapFunc.func1()
      <gopath>/src/github.com/trailofbits/on-edge/onedge_race.go:82 +0x3d
  github.com/trailofbits/on-edge.shadowThread.func1()
      <gopath>/src/github.com/trailofbits/on-edge/onedge_race.go:239 +0x50
  github.com/trailofbits/on-edge.shadowThread()
      <gopath>/src/github.com/trailofbits/on-edge/onedge_race.go:240 +0x79

Previous write at 0x0000012194f8 by main goroutine:
  main.withdraw.func1()
      <gopath>/src/github.com/trailofbits/on-edge/example/account.go:61 +0x89
  github.com/trailofbits/on-edge.WrapFunc.func1()
      <gopath>/src/github.com/trailofbits/on-edge/onedge_race.go:82 +0x3d
  github.com/trailofbits/on-edge.WrapFuncR()
      <gopath>/src/github.com/trailofbits/on-edge/onedge_race.go:132 +0x3d4
  github.com/trailofbits/on-edge.WrapFunc()
      <gopath>/src/github.com/trailofbits/on-edge/onedge_race.go:81 +0x92
  main.withdraw()
      <gopath>/src/github.com/trailofbits/on-edge/example/account.go:50 +0x84
  main.main()
      <gopath>/src/github.com/trailofbits/on-edge/example/account.go:39 +0x3cf

Goroutine 8 (running) created at:
  github.com/trailofbits/on-edge.WrapFuncR()
      <gopath>/src/github.com/trailofbits/on-edge/onedge_race.go:126 +0x3a1
  github.com/trailofbits/on-edge.WrapFunc()
      <gopath>/src/github.com/trailofbits/on-edge/onedge_race.go:81 +0x92
  main.withdraw()
      <gopath>/src/github.com/trailofbits/on-edge/example/account.go:50 +0x84
  main.main()
      <gopath>/src/github.com/trailofbits/on-edge/example/account.go:39 +0x3cf
==================
<time> Insufficient funds
<time> Insufficient funds
New balance: -142
Depositing 28...
New balance: -114
Found 1 data race(s)
exit status 66

Figure 5: The output of the program from Figure 1 with OnEdge incorporated and the race detector enabled.

What’s going on here? As before, there are insufficient funds to cover one of the withdrawals, so the withdraw function panics. The panic is caught by a deferred call to recover. At that point, OnEdge kicks in. OnEdge re-executes the body of the withdraw function within a shadow thread. This causes a data race to be reported at line 61 in account.go; this line:

balance -= debit

This line makes a global state change by writing to the balance global variable. Executing this line in the main and shadow threads results in two writes, which Go’s race detector recognizes as a race.

Limitations

Like all dynamic analyses, OnEdge’s effectiveness depends upon the workload to which one subjects one’s program. As an extreme example, if one never subjects one’s program to an input that causes it to panic, then OnEdge will have done no good.

A second limitation is that, since Go’s race detector can miss some races, OnEdge can miss some global state changes. This is due in part to a limitation of ThreadSanitizer, which keeps track of only a limited number of memory accesses to any one memory location. Once that limit is reached, ThreadSanitizer starts evicting entries randomly.

OnEdge present and future

OnEdge is a tool for detecting improper global state changes arising from incorrect uses of Go’s “defer, panic, and recover” pattern. OnEdge accomplishes this by leveraging the strength of Go’s existing tools, namely, its race detector.

We are exploring the possibility of using automation to incorporate WrapFunc and WrapRecover into a program. For now, users must do so manually. We encourage the use of OnEdge and welcome feedback.

3 thoughts on “Panicking the right way in Go

  1. A common Go idiom is to (1) panic, (2) recover from the panic in a deferred function, and (3) continue on.

    I don’t believe it’s a common Go idiom. It’s generally discouraged to use panics, and prescribed to not assume code will panic. While there is literature showing how to properly do it, it’s not a common idiom.

  2. Pingback: Attacking Go: VR TTPs | Trail of Bits Blog

Leave a Reply