Confessions of a smart contract paper reviewer

Alex Groce

February 05, 2021

If you’re thinking of writing a paper describing an exciting novel approach to smart contract analysis and want to know what reviewers will be looking for, you’ve come to the right place. Deadlines for many big conferences (ISSTA tool papers, ASE, FSE, etc.) are approaching, as is our own Workshop on Smart Contract Analysis, so we’d like to share a few pro tips. Even if you’re not writing a smart contract paper, this post can also help you identify research that’s worth reading and understand its impact.

I’ve been reviewing smart contract analysis papers for a few years now—over 25 papers in seven different venues in the last year—and I’ve taken away six requirements for a good paper. I’d also like to share a little about extra points that take a paper from “good enough” to “great.”

Explain what the analysis method actually does! Some authors fail to describe how a proposed analysis algorithm works, perhaps due to page limits. It’s not essential to describe every painstaking detail of an algorithm or implementation, but a good paper does more than describe a method with high-level buzzwords. “We combine symbolic execution with swarm testing” is a great sentence for a paper’s abstract, but this level of granularity cannot be sustained throughout the paper. Reviewers see this as hand-wavy. Provide details for your audience to understand what you actually are proposing and evaluate it. For a tool paper—a short paper that basically advertises a particular tool exists—a generic explanation is sometimes fine. Still, such uninformative descriptions appear surprisingly often in full conference submissions, which are supposed to allow readers to understand and even duplicate interesting new approaches.
Understand the basics of blockchain and smart contracts. Too many papers are rejected for making obvious mistakes about how a contract or the blockchain works. This kind of problem is often foreshadowed by an introduction that includes boilerplate text describing blockchains and smart contracts. Smart contract and blockchain analyses are, in some ways, pure instances of core code analysis and test generation problems. However, if you’re still in the early stages of researching this topic, a minimum amount of homework is required before you can produce credible results. We recommend going through the Ethernaut CTF exercises to understand some basics of contract exploitation and then reading our Building Secure Contracts tutorials, including experiments with real tools used in paid audits. Blockchains and smart contracts are fast-moving targets, and many early papers concentrated on handling ether. However, much of the modern contracts’ financial value is in ERC-20 or other recently developed token types. If you’re only looking at ether-related exploits, you’re not addressing much of what’s going on now.
Base experimental results on meaningful sets of contracts. Out of all the contracts ever created on the Ethereum blockchain, only a small fraction accounts for almost all transactions and ether/token activity. Most Etherscan contracts have little practical use and are toys deployed by programmers learning Solidity. If your experiments are based on randomly selected contracts from Etherscan, your results will not reflect contracts of interest, and many are likely to be near-duplicates. A random sampling of contracts is a red flag for reviewers because the data set is noisy and may fail to include any contracts anyone actually cares about. Instead, base your experiments on active contracts that have participated in transactions, held ether or token value, or satisfying other criteria demonstrating that they’re meaningful. It also shows good judgment to include some diversity in the contract set and demonstrate that you aren’t, say, basing your results on 30 basic ERC-20 tokens with nearly identical implementations. Moreover, the fact that state-of-the-art Ethereum moves fast applies here. These days a lot of the action is not in single contracts but in multi-contract systems, where analysis based on those contracts’ composition is necessary to explore meaningful behavior. The same guidance goes for demonstrating a method for finding vulnerabilities. Finding meaningless vulnerabilities in contracts that hold no ether or token value and never participate in transactions isn’t compelling. On the other hand, there are real vulnerabilities in real contracts that participate in numerous transactions. Find those, and you have a good demonstration of your ideas! Google BigQuery is one way to get started on this since it can be difficult to extract from the blockchain.
More generally, respect the rules of (fuzzing) research. Our post on performing good fuzzing research mostly applies to fuzzing smart contracts, too. Smart contract fuzzers may not be expected to run for 24 hours, but it’s certainly essential to run tools “long enough.” You need statistics-based evidence, not a string of anecdotes. If your contract fuzzer performed better, did it do so by a statistically significant margin? What’s the estimated effect size, and how confident can we be in that estimate’s quality? Other points of good fuzzing experimental practice are just common sense but easily overlooked: for example, if you don’t use a consistent version of the Solidity compiler in your experiments or fail to report what version you used, reproducing (and understanding) your results will be complicated. Two particular aspects of these general guidelines are essential for smart contract fuzzing papers, so we’ll separate those out.
Compare against a meaningful tool baseline. You need to compare your work with widely accepted concepts and tools. Pit your tool against real competition, which may require more effort in smart contract work. People are always releasing new tools and updating old ones, and some older tools no longer work. By the time your paper is reviewed, the cutting edge may have moved. Still, it must be obvious that you went through the trouble of selecting a reasonable set of comparable tools and comparing your work against state of the art when you wrote the paper.
Draw clear boundaries explaining what your tool does and does not detect. Smart contract fuzzers report a variety of bugs, but there are no Solidity “crashes.” So tools have to look for something, whether it be an integer overflow, reentrancy, locked funds, or runaway gas usage indicating a possible denial-of-service vulnerability. Ultimately, this means that tools may excel in some areas and lag behind in others. One fuzzer may use an aggressive set of oracles, which can lead to false positives, while another may report a particular set of bugs lowering its false-positive error. Comparing apples to apples can be hard in this context, but you must show that your method finds meaningful bugs. One way to do this in fuzzing is to compare code coverage results between your tool and others.

We hope this advice helps strengthen your approach to publishing your smart contract research. In summary, you can almost guarantee that if I review your paper, and I can’t figure out what your method even is, I’d reject your paper. If you clearly didn’t do any homework on smart contracts beyond reading the Wikipedia page on Ethereum, I’d reject your paper. If you based your experiments on 50 random contracts on the blockchain that have received 10 transactions after deployment, hold a total of $0.05 worth of Ether, and are mostly duplicates of each other, I’d reject your paper. If you don’t understand the basic rules of fuzzing research, if you only compare to one outdated academic research tool, and ignore five popular open-source tools, if you claim your approach is better simply because you have a tendency to produce more false positives based on a very generous notion of “bug”… well, you can guess!

The good news is, doing all of the things this post suggests is not just part of satisfying reviewers. It’s part of satisfying yourself and your future readers (and potential tool users), and it’s essential to building a better world for smart contract developers.

Finally, to take a paper from “good” to “great,” tell me something about how you came up with the core idea of the paper and what the larger implications of the idea working might be. That is, there’s some reason, other than sheer luck, why this approach is better at finding bugs in smart contracts. What does the method’s success tell us about the nature of smart contracts or the larger problem of generating tests that aren’t just a sequence of bytes or input values but a structured sequence of function calls? How can I improve my understanding of smart contracts or testing, in general, based on your work? I look forward to reading about your research. We’d love to see your smart contract analysis papers at this year’s edition of WoSCA to be co-located with ISSTA in July!