Quit Emailing Yourself

Escaping Isla Nublar: Coming around to LLMs for Formal Methods

6 min read | Saved February 14, 2026 | Copied!

llms 🤖 formal-methods 🤖 memory-safety 🤖 c-code 🤖 translation 🤖

Do you care about this?

The article discusses the author's shift in perspective on using large language models (LLMs) in formal methods, particularly through the development of CNnotator, a tool that generates memory safety annotations for C code. It highlights the potential of LLMs to improve code translation from memory-unsafe to memory-safe languages like Rust.

If you do, here's more

The author reflects on their initial skepticism about using large language models (LLMs) in formal methods, comparing it to Ian Malcolm's critique of scientists in Jurassic Park. During a summer internship at Galois, they developed CNnotator, a tool that uses LLMs to generate memory safety annotations for C code. To their surprise, CNnotator performed better than expected, successfully creating annotations and revealing that even older LLMs could yield impressive results.

Memory safety issues in languages like C and C++ account for a significant portion of security vulnerabilities, with estimates suggesting that up to 70% of bugs in projects like Chromium stem from these languages. Modern languages like Java and Rust have built-in protections against such issues, highlighting the need for effective translation of legacy code. CNnotator synthesizes annotations in the C specification language CN, which helps ensure that the code adheres to memory safety rules similar to those enforced by safe Rust. The tool follows a straightforward iterative process: it focuses on a function, generates an annotation using an LLM, tests the annotation, and then refines it as needed.

CNnotator aims to differentiate between safe and unsafe C functions, inserting comments for inherently unsafe code and providing reasoning for the user. Testing revealed that CNnotator effectively handles various memory usage patterns and can annotate complex functions. The tool was evaluated against multiple LLMs, including OpenAI's reasoning model o3, which managed to annotate 90% of the test functions on the first attempt. The chat model GPT-4o also performed well, annotating 65% of functions initially. Overall, CNnotator represents a promising advancement in automating the translation of C code into safe Rust, streamlining the process and potentially enhancing code safety.

Questions about this article

No questions yet.