Buckle up, Buttercup, AIxCC’s scored round is underway!

Page content

The one and only scored round of DARPA’s AI Cyber Challenge (AIxCC) Finals Competition has officially started! Our CRS (Cyber Reasoning System), Buttercup, is now competing against six other teams to see which autonomous AI-driven system can find and patch the most software vulnerabilities. It’s been a long road to this point, and we’re excited to see the results of our hard work over the last two years building Buttercup.

After the scored round closes, DARPA and ARPA-H will announce the winners on the main DEFCON 33 stage on August 8. The top scoring CRS will receive a $4 million top prize, with the next two runners up receiving $3 million and $1.5 million in prize money. Our team will be there to watch the final reveal live and will also be involved in the larger AIxCC experience in various ways. If you’re planning to come to DEFCON this August, please come see us at our booth in the AIxCC Experience and attend our talk on the AIxCC stage (date/time TBD) about the ups and downs of building Buttercup and competing in AIxCC.

What’s happening in the scored round?

Each competing CRS will be tasked with finding and patching multiple vulnerabilities in dozens of different real-world, open-source programs. These programs are chosen from the most heavily used C and Java open-source programs, and the vulnerabilities they contain are often actual historic vulnerabilities that have been strategically re-injected by the competition organizers. SQLite, Nginx, Apache Tika, Jenkins, and even the Linux Kernel are among programs that have been used in prior rounds.

Each CRS will tasked with waves of distinct challenges based on these open-source programs. Each challenge comes equipped with OSS-Fuzz-compatible fuzzing harnesses and, in many cases, a set of functional tests. A CRS can score points by:

  1. Proving that a vulnerability exists in the program by finding an input that crashes the program or triggers a sanitizer at runtime
  2. Fixing a vulnerability in the program with a patch that addresses the root cause of the vulnerability and does not break functional tests
  3. Classifying a static analysis alert highlighting a possible vulnerability as a true- or false-positive

To accomplish this, each CRS has been given a sizable compute and third-party AI budget. The scale of AIxCC’s scored round is massive, and for good reason. The CRS that wins this competition will prove that it can immediately scale to the challenge of securing the vast open-source software ecosystem.

What’s next for our team?

While Buttercup is competing and we await the announcement of the winning teams, we’re still hard at work making Buttercup even better! In the coming month, we will be preparing Buttercup to be released as open-source software, which we expect to make available in August. We’re also working on building a version of Buttercup that can be run on commodity hardware so everyone can try it out!

Also, once the competition is over, we can finally share technical details on how Buttercup works. Stay tuned for technical deep dives on how Buttercup uses AI to accelerate traditional fuzz testing and create high-quality patches for vulnerabilities!

For more background, see our previous posts on the AIxCC: