Episode	Podcast	Published	Duration	Status

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

The Great Security Update: AI ∧ Formal Methods with Kathleen Fisher of RAND & Byron Cook of AWS

December 24, 2025•5940•18,697 words•Erik Torenberg, Nathan Labenz

Description

Kathleen Fisher and Byron Cook dive into automated reasoning and formal verification as tools for building truly secure software systems. PSA for AI builders: Interested in alignment, governance, or A...

Summary

Kathleen Fisher and Byron Cook explore how formal methods and automated reasoning can secure critical software systems against AI-enabled cyber threats. They explain how these mathematical techniques provide provable security guarantees, discuss AWS's decade of applying formal verification to cloud infrastructure, and reveal how generative AI is accelerating both proof discovery and secure code generation. The conversation culminates in examining AWS's automated reasoning checks for AI agents and the potential for a 'great software rewrite' where AI-generated code achieves superhuman security levels.

Jump to Topic

AI's Impact on the Cybersecurity Threat Landscape

Fisher and Cook assess how AI is amplifying cyber threats across all skill levels and attack stages, from script kiddies to nation-state actors. They explain that AI helps attackers at every point in the cyber kill chain while also noting the optimistic potential for AI to strengthen defenses through formalization and automated reasoning.

•AI assists cyber attackers at all skill levels and stages of the kill chain, making both novices and experts more effective
•Expert reverse engineers are shocked at how well AI tools perform at vulnerability discovery
•Current software is 'riddled with vulnerabilities' - equivalent to having doors and windows wide open
•The same AI technology enabling attacks can be flipped to make software less vulnerable through formal methods
•Moving from sociotechnical mechanisms to formalization creates feedback cycles that improve future systems

Formal Methods 101: Proofs, Specifications, and Assumptions

The guests provide a foundational explanation of formal methods as algorithmic proof search following rigorous logical rules. They clarify the spectrum from simple type checking to full functional correctness, emphasizing that all proofs rest on assumptions and that the goal is raising assurance rather than achieving absolute certainty.

•Formal methods = algorithmic search for finite proofs about infinite behaviors using chess-like logical rules (e.g., modus ponens)
•Spectrum ranges from easy-to-use type systems (weak guarantees) to interactive theorem provers (full functional correctness, PhD-level effort)
•All proofs rest on assumptions - you're always making assumptions about physics, hardware, ISAs, etc.
•Memory safety and type safety are formal properties that languages like JavaScript provide automatically
•The hardest part is defining the specification - what you actually want to prove - not finding the proof itself

DARPA HACMS: Proving Helicopter Security with Formal Methods

Fisher recounts the landmark HACMS program where formal methods secured a military helicopter against red team attacks. The system used SEL4 hypervisor, parser generators, and architecture modeling to prove system-wide security properties, withstanding attacks even during flight with test pilots aboard.

•Red team easily hacked Boeing helicopter baseline in 6 weeks; after formal methods hardening, couldn't breach it even with full system knowledge
•Used SEL4 verified hypervisor (10K lines C, 100K lines Isabelle proof) to guarantee partition separation
•Proved system-wide properties: only authenticated/encrypted messages could reach mission control
•Red team attacked helicopter in flight with pilots aboard - pilots couldn't tell difference from normal version
•Same quadcopter remained unhacked years later at DEFCON, demonstrating durability of formal methods approach

AWS's Decade of Formal Methods: From Policies to Hypervisors

Cook details AWS's systematic application of formal methods since 2014, including tools for customers to verify their configurations and internal proofs of critical infrastructure. He explains how these efforts are now connecting into a comprehensive security framework, with the policy interpreter being called over a billion times per second.

•Built IAM Access Analyzer and VPC Reachability Analyzer to help customers formally reason about their own AWS configurations
•Proved correctness of AWS policy interpreter (called >1 billion times/second) against formally defined policy language semantics
•Focused on cryptography, virtualization, storage durability, and identity infrastructure proofs
•Announced new formally verified hypervisor (isolation engine) for Graviton5 using Isabelle theorem prover
•Proofs are beginning to 'touch and connect' - creating system-wide formal guarantees across AWS infrastructure

The Specification Challenge: Defining 'Data at Rest is Encrypted'

Cook illustrates the fundamental difficulty of formal methods: defining what you want to prove. Using the example of 'all data at rest is encrypted,' he shows how seemingly simple requirements require extensive iteration to precisely specify what encryption means, what 'at rest' means, and what edge cases exist.

•Hardest part of formal methods is specification, not proof discovery - even for 'simple' properties
•Example: 'data at rest is encrypted' requires defining encryption (which algorithms?), rest (which storage media?), and edge cases (data in network latency loops?)
•Historically required pairing domain experts with formal methods experts, iterating through shuttle bus rides between buildings
•Engineers exhibit cognitive biases - when bugs found, teams split on whether it's real or impossible
•Only way to resolve disputes is creating proof-of-concept exploits demonstrating the vulnerability

Generative AI Revolutionizing Formal Methods

The guests explain how LLMs are transforming formal methods by finding inductive invariants and ranking functions that previously required PhD-level human insight. They describe the hierarchy of proof complexity and how generative AI excels at the hardest parts while combinatorial solvers handle verification.

•SAT solvers made huge progress on NP-complete combinatorial reasoning; distributed solvers (Mallib) enable 20-second policy verification
•Proving programs with loops requires finding 'inductive invariants' - previously needed humans chained to desks
•GenAI can now find these artifacts (invariants, ranking functions) that reduce undecidable problems to tractable verification
•LLMs trained on proof corpora (Lean, etc.) can read old proofs and synthesize new ones - like expert humans do
•Teams with proof scaffolding are 'super happy' with GenAI - can pair programmers with AI instead of requiring PhD formal methods experts

The Path to Superhuman Secure Code Generation

Fisher and Cook outline how AI-generated code can achieve superhuman security levels through a flywheel of proof-based training. They explain that verification is easier than generation, enabling AI to generate proofs, validate them, and use successful proofs as training data - with formal methods providing reward signals for secure coding.

•Current AI code is insecure, but path to superhuman security is clear: use proof verification as reward signal
•Verification easier than generation - run AIs many times, high failure rate initially, but successes fold into training data
•Auto-formalization tools translate natural language specs to logic, making it easier to define what 'secure' means
•GPT-6, Gemini 4, Nova, and open weights models should be superhuman at writing secure code given current trajectory
•Already seeing open math conjectures and Erdős problems being solved - formal code generation is next

AWS Automated Reasoning Checks for AI Agents

Cook demonstrates how AWS's automated reasoning checks translate natural language policies into formal logic for verifying AI agent outputs. The system uses multiple translations with theorem proving to achieve 99% accuracy, employing active listening to resolve ambiguities and creating a human-in-the-loop formalization process.

•Translate natural language policies (FMLA, HR handbooks) to temporal logic using Claude or other LLMs
•Walk through corner cases with non-experts to iteratively refine the formal specification without requiring logic expertise
•At inference time, translate user queries multiple ways and use theorem prover to verify translations are equivalent
•If translations differ, use 'active listening' to disambiguate (first class vs coach flight example)
•Achieves 99% verification accuracy (not 100% due to natural language translation uncertainty)

The Great Software Rewrite: Motivation and Timeline

Fisher argues that technology for a society-wide secure software rewrite exists now, but motivation is the limiting factor. She compares the situation to Y2K, noting society can mobilize when sufficiently motivated, while acknowledging we've been 'boiling frogs' for 20 years of increasing cyber threats without adequate response.

•AI+cyber competitions (AICC) show AI can find and fix bugs at speed and scale; Google CodeMender and OpenAI have similar capabilities
•Technology for rewrite exists now and is only getting better - current state is 'worst it will ever be'
•Question is motivation, not capability - society mobilized for Y2K but hasn't for ongoing cyber threats
•Commercial marketplace may drive adoption as customers choose secure infrastructure and pay premiums
•Tools becoming cheaper and easier - formal methods moving from monasteries to mainstream practice

Future Concerns: Over-Rigid AI Policy Enforcement

The conversation concludes with a novel concern: as AI agents become perfectly consistent at policy enforcement through formal methods, we may lose valuable human flexibility to make exceptions when official policy doesn't fit specific situations. This represents a new class of problem emerging from successful AI alignment.

•First time host worried that AIs might become too consistent in policy adherence
•Perfect policy enforcement could eliminate beneficial human judgment and exception-making
•Represents 'problem for another day' - current priority is getting basic security right
•Formal methods for AI agents stand out as one of strongest approaches for implementing guardrails
•Demonstrates how solving alignment creates new challenges around rigidity vs. flexibility

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

The Great Security Update: AI ∧ Formal Methods with Kathleen Fisher of RAND & Byron Cook of AWS

0:00 / 0:00

View original episode →

Summary

Jump to Topic

AI's Impact on the Cybersecurity Threat Landscape

•AI assists cyber attackers at all skill levels and stages of the kill chain, making both novices and experts more effective
•Expert reverse engineers are shocked at how well AI tools perform at vulnerability discovery
•Current software is 'riddled with vulnerabilities' - equivalent to having doors and windows wide open
•The same AI technology enabling attacks can be flipped to make software less vulnerable through formal methods
•Moving from sociotechnical mechanisms to formalization creates feedback cycles that improve future systems

Formal Methods 101: Proofs, Specifications, and Assumptions

•Formal methods = algorithmic search for finite proofs about infinite behaviors using chess-like logical rules (e.g., modus ponens)
•Spectrum ranges from easy-to-use type systems (weak guarantees) to interactive theorem provers (full functional correctness, PhD-level effort)
•All proofs rest on assumptions - you're always making assumptions about physics, hardware, ISAs, etc.
•Memory safety and type safety are formal properties that languages like JavaScript provide automatically
•The hardest part is defining the specification - what you actually want to prove - not finding the proof itself

DARPA HACMS: Proving Helicopter Security with Formal Methods

•Red team easily hacked Boeing helicopter baseline in 6 weeks; after formal methods hardening, couldn't breach it even with full system knowledge
•Used SEL4 verified hypervisor (10K lines C, 100K lines Isabelle proof) to guarantee partition separation
•Proved system-wide properties: only authenticated/encrypted messages could reach mission control
•Red team attacked helicopter in flight with pilots aboard - pilots couldn't tell difference from normal version
•Same quadcopter remained unhacked years later at DEFCON, demonstrating durability of formal methods approach

AWS's Decade of Formal Methods: From Policies to Hypervisors

•Built IAM Access Analyzer and VPC Reachability Analyzer to help customers formally reason about their own AWS configurations
•Proved correctness of AWS policy interpreter (called >1 billion times/second) against formally defined policy language semantics
•Focused on cryptography, virtualization, storage durability, and identity infrastructure proofs
•Announced new formally verified hypervisor (isolation engine) for Graviton5 using Isabelle theorem prover
•Proofs are beginning to 'touch and connect' - creating system-wide formal guarantees across AWS infrastructure

The Specification Challenge: Defining 'Data at Rest is Encrypted'

•Hardest part of formal methods is specification, not proof discovery - even for 'simple' properties
•Example: 'data at rest is encrypted' requires defining encryption (which algorithms?), rest (which storage media?), and edge cases (data in network latency loops?)
•Historically required pairing domain experts with formal methods experts, iterating through shuttle bus rides between buildings
•Engineers exhibit cognitive biases - when bugs found, teams split on whether it's real or impossible
•Only way to resolve disputes is creating proof-of-concept exploits demonstrating the vulnerability

Generative AI Revolutionizing Formal Methods

•SAT solvers made huge progress on NP-complete combinatorial reasoning; distributed solvers (Mallib) enable 20-second policy verification
•Proving programs with loops requires finding 'inductive invariants' - previously needed humans chained to desks
•GenAI can now find these artifacts (invariants, ranking functions) that reduce undecidable problems to tractable verification
•LLMs trained on proof corpora (Lean, etc.) can read old proofs and synthesize new ones - like expert humans do
•Teams with proof scaffolding are 'super happy' with GenAI - can pair programmers with AI instead of requiring PhD formal methods experts

The Path to Superhuman Secure Code Generation

•Current AI code is insecure, but path to superhuman security is clear: use proof verification as reward signal
•Verification easier than generation - run AIs many times, high failure rate initially, but successes fold into training data
•Auto-formalization tools translate natural language specs to logic, making it easier to define what 'secure' means
•GPT-6, Gemini 4, Nova, and open weights models should be superhuman at writing secure code given current trajectory
•Already seeing open math conjectures and Erdős problems being solved - formal code generation is next

AWS Automated Reasoning Checks for AI Agents

•Translate natural language policies (FMLA, HR handbooks) to temporal logic using Claude or other LLMs
•Walk through corner cases with non-experts to iteratively refine the formal specification without requiring logic expertise
•At inference time, translate user queries multiple ways and use theorem prover to verify translations are equivalent
•If translations differ, use 'active listening' to disambiguate (first class vs coach flight example)
•Achieves 99% verification accuracy (not 100% due to natural language translation uncertainty)

The Great Software Rewrite: Motivation and Timeline

•AI+cyber competitions (AICC) show AI can find and fix bugs at speed and scale; Google CodeMender and OpenAI have similar capabilities
•Technology for rewrite exists now and is only getting better - current state is 'worst it will ever be'
•Question is motivation, not capability - society mobilized for Y2K but hasn't for ongoing cyber threats
•Commercial marketplace may drive adoption as customers choose secure infrastructure and pay premiums
•Tools becoming cheaper and easier - formal methods moving from monasteries to mainstream practice

Future Concerns: Over-Rigid AI Policy Enforcement

•First time host worried that AIs might become too consistent in policy adherence
•Perfect policy enforcement could eliminate beneficial human judgment and exception-making
•Represents 'problem for another day' - current priority is getting basic security right
•Formal methods for AI agents stand out as one of strongest approaches for implementing guardrails
•Demonstrates how solving alignment creates new challenges around rigidity vs. flexibility

"The Cognitive Revolution" | AI Builders, Researchers, and Live Player Analysis

The Great Security Update: AI ∧ Formal Methods with Kathleen Fisher of RAND & Byron Cook of AWS

0:00 / 0:00

The Great Security Update: AI ∧ Formal Methods with Kathleen Fisher of RAND & Byron Cook of AWS

Description

Summary

Jump to Topic

AI's Impact on the Cybersecurity Threat Landscape

Formal Methods 101: Proofs, Specifications, and Assumptions

DARPA HACMS: Proving Helicopter Security with Formal Methods

AWS's Decade of Formal Methods: From Policies to Hypervisors

The Specification Challenge: Defining 'Data at Rest is Encrypted'

Generative AI Revolutionizing Formal Methods

The Path to Superhuman Secure Code Generation

AWS Automated Reasoning Checks for AI Agents

The Great Software Rewrite: Motivation and Timeline

Future Concerns: Over-Rigid AI Policy Enforcement

Navigate

Chat with Episode

Summary

Jump to Topic

AI's Impact on the Cybersecurity Threat Landscape

Formal Methods 101: Proofs, Specifications, and Assumptions

DARPA HACMS: Proving Helicopter Security with Formal Methods

AWS's Decade of Formal Methods: From Policies to Hypervisors

The Specification Challenge: Defining 'Data at Rest is Encrypted'

Generative AI Revolutionizing Formal Methods

The Path to Superhuman Secure Code Generation

AWS Automated Reasoning Checks for AI Agents

The Great Software Rewrite: Motivation and Timeline

Future Concerns: Over-Rigid AI Policy Enforcement

Navigate

Chat with Episode