Publications
Speedrunning the New York Subway
We optimized the route for visiting every NYC subway station using algorithms from combinatorial optimization, creating a 20-hour tour that beats the existing world record by 45 minutes.
Detecting code copying at scale with Vendetect
Vendetect is our new open-source tool for detecting copied and vendored code between repositories. It uses semantic fingerprinting to identify similar code even when variable names change or comments disappear. More importantly, unlike academic plagiarism detectors, it understands version control history, helping you trace vendored code back to its exact source commit.
Investigate your dependencies with Deptective
Deptective, our new open-source tool, automatically finds the packages needed to install software dependencies. It does so not based on the software’s self-reported requirements, but by observing what the software needs at runtime.
Preventing account takeover on centralized cryptocurrency exchanges in 2025
This blog post highlights key points from our new white paper Preventing Account Takeovers on Centralized Cryptocurrency Exchanges, which documents ATO-related attack vectors and defenses tailored to CEXes. Imagine trying to log in to your centralized cryptocurrency exchange (CEX) account and your password and username just… don’t work. You […]
libmagic: The Blathering
A couple of years ago we released PolyFile: a utility to identify and map the semantic structure of files, including polyglots, chimeras, and schizophrenic files. It’s a bit like file, binwalk, and Kaitai Struct all rolled into one. PolyFile initially used the TRiD definition database for file identification. However, […]
Are blockchains decentralized?
A new Trail of Bits research report examines unintended centralities in distributed ledgers Blockchains can help push the boundaries of current technology in useful ways. However, to make good risk decisions involving exciting and innovative technologies, people need demonstrable facts that are arrived at through reproducible methods and open data. We believe the risks inherent […]
What does your code use, and is it vulnerable? It-depends!
You just cloned a fresh source code repository and want to get a quick sense of its dependencies. Our tool, it-depends, can get you there. We are proud to announce the release of it-depends, an open-source tool for automatic enumeration of dependencies. You simply point it to a source code repository, and it will build […]
Never a dill moment: Exploiting machine learning pickle files
Many machine learning (ML) models are Python pickle files under the hood, and it makes sense. The use of pickling conserves memory, enables start-and-stop model training, and makes trained models portable (and, thereby, shareable). Pickling is easy to implement, is built into Python without requiring additional dependencies, and supports serialization of custom […]
PDF is Broken: a justCTF Challenge
Trail of Bits sponsored the recent justCTF competition, and our engineers helped craft several of the challenges, including D0cker, Go-fs, Pinata, Oracles, and 25519. In this post we’re going to cover another of our challenges, titled PDF is broken, and so is this file. It demonstrates some of the PDF file format’s idiosyncrasies in a […]
Graphtage: A New Semantic Diffing Tool
Graphtage is a command line utility and underlying library for semantically comparing and merging tree-like structures such as JSON, JSON5, XML, HTML, YAML, and TOML files. Its name is a portmanteau of “graph” and “graftage” (i.e., the horticultural practice of joining two trees together so they grow as one). Read on for what Graphtage does differently and better, why we developed it, how it works, and directions for using it as a library.
Two New Tools that Tame the Treachery of Files
Parsing is hard, even when a file format is well specified. But when the specification is ambiguous, it leads to unintended and strange parser and interpreter behaviors that make file formats susceptible to security vulnerabilities. What if we could automatically generate a “safe” subset of any file format, along with an associated, verified parser? That’s […]
Empire Hacking: Ethereum Edition 2
On December 12, over 150 attendees joined a special, half-day Empire Hacking to learn about pitfalls in smart contract security and how to avoid them. Thank you to everyone who came, to our superb speakers, and to BuzzFeed for hosting this meetup at their office. Watch the presentations again It’s hard to find such rich […]
