6 min read
|
Saved February 14, 2026
|
Copied!
Do you care about this?
This article describes a framework for testing how AI models, specifically Opus 4.5 and GPT-5.2, generate exploits from vulnerability reports. It focuses on the experiments conducted using a QuickJS vulnerability, outlining the agents' strategies to bypass various security mitigations and achieve their objectives.
If you do, here's more
The repository presents an evaluation framework for assessing how large language model (LLM) agents create exploits from vulnerability reports, particularly when faced with security mitigations. Using a zero-day vulnerability in the QuickJS JavaScript engine as a basis, experiments were conducted with agents built on Opus 4.5 and GPT-5.2. These agents were tasked with generating exploits under various configurations of protection mechanisms. While Opus 4.5 succeeded on many tasks, GPT-5.2 managed to solve all of them, demonstrating a clear edge in efficiency and capability.
Agents were given a budget of 30 million tokens for each run, with the exception of a more complex experiment where GPT-5.2 was allocated 60 million tokens. In total, ten runs were performed for each model across different experimental conditions. Notably, both agents could produce working exploits that allowed them to manipulate the target process's address space and evade security measures. For instance, they employed techniques like overwriting function pointers and utilizing ROP (Return-Oriented Programming) chains to bypass mitigations such as Address Space Layout Randomization (ASLR) and Control Flow Integrity (CFI).
The experiments involved progressively harder challenges, such as using a sandbox environment that restricted typical exploit methods. GPT-5.2 excelled in this environment by creatively chaining function calls through glibc's exit handler mechanism, taking over three hours and requiring 50 million tokens to succeed. The repository also documents specific exploits generated, including methods to bypass mitigations like Partial RELRO and CFI, highlighting the technical prowess of both models in navigating complex security landscapes. The results emphasize the need for further tests to draw more definitive comparisons between the two models, especially given the variability in performance across different experimental setups.
Questions about this article
No questions yet.