AI Agents Are Rewriting the Rules of Code Security

Claude found 22 Firefox bugs in two weeks. OpenAI's Codex cuts false positives by 50%. The era of autonomous security review is here — and it's moving faster than anyone expected. AI can now identify security vulnerabilities.

Software security just got its biggest upgrade in years — and it didn't come from a traditional cybersecurity firm. This week, two of the world's leading AI labs unveiled agentic systems that can hunt and patch vulnerabilities faster than any human team, sparking a new era in automated code defense.

Firefox bugs found by Claude in 2 weeks

84%

Noise reduction by Codex Security

20min

Time for Claude to find its first bug

Claude + Mozilla: 22 Bugs in 14 Days

Anthropic partnered with Mozilla to put Claude Opus 4.6 through a grueling real-world test: find novel vulnerabilities in Firefox — one of the most rigorously audited browsers on the planet. The results were startling. Claude found 22 vulnerabilities in just two weeks, with 14 classified as high severity — nearly a fifth of all high-severity Firefox bugs remediated across all of 2025.

Within just twenty minutes of initial exploration, Claude flagged a Use After Free vulnerability in Firefox's JavaScript engine. By the time researchers validated the first report, the AI had already surfaced fifty more unique crashing inputs. Mozilla shipped patches to hundreds of millions of users in Firefox 148.

"The gap between frontier models' vulnerability discovery and exploitation abilities is unlikely to last very long." — Anthropic

OpenAI Codex Security: Smarter Triage at Scale

On the same day, OpenAI launched Codex Security into research preview. The platform analyzes code repositories, pressure-tests suspected vulnerabilities in sandboxed environments, generates proof-of-concept exploits to confirm impact, and proposes fixes — all autonomously.

One customer saw false positives slashed by more than 50% and noise cut by 84% since initial rollout. Codex Security also found 14 published CVEs across real open-source projects, scanning over 1.2 million commits in the process.

Head to Head: Claude Code vs. Codex

Capability	Claude Code Security	Codex Security
IDOR Detection	22% TPR ✓	0% TPR ✗
Path Traversal	16% TPR	47% TPR ✓
Validation Method	Multi-stage self-verify	Sandboxed exploit test
Real Bugs Found	500+ zero-days	14 published CVEs
False Positive Rate	Moderate	Very Low (−50%)

Across real open-source Python web apps, Claude Code found 46 vulnerabilities and Codex reported 21 — with about 20 rated high severity across both tools. The takeaway: these models complement each other, and neither is a silver bullet.

What This Means for You

The market felt the disruption instantly. Cybersecurity stocks sold off broadly — CrowdStrike and Zscaler each dropped an additional 10%, while pure-play code scanning vendors were hit hardest. AI labs are no longer just building software — they're guarding it too.

For developers, the message is clear: AI security agents are here now, they find real bugs, and the window where they discover faster than they exploit won't stay open indefinitely. Update early, patch often, and watch this space. We shouldn't see AI as threat, instead grow together with AI

Search This Blog

Tech Zenith