AI Made Building Cheap. That's the Problem.

"Last customer conversation (non-Matt): Never" — Halo's CHARTER.md, the most honest line in our entire codebase

The Trap Nobody's Talking About

Here's the pitch you've heard a hundred times: AI is making software development 10x faster. Ship in a weekend what used to take a quarter. One founder, twelve agents, unlimited output.

It's true. We're living it. MonkeyRun runs 6 projects with AI agent teams — engineers, PMs, security specialists, a COO. We've shipped entire websites in single sessions. The cost of building has genuinely collapsed.

And that's the problem.

Not the building part. The part where building used to be expensive enough that it forced a question: "Should we build this?"

When a feature took two engineers three weeks, you couldn't afford to guess. You talked to customers first, because the cost of being wrong was measured in salaries and runway. The friction was annoying, but it was load-bearing. It held up the whole structure of validation-before-investment.

AI removed the friction. And the structure came down with it.

Building in a Cave (Again)

Our CEO has been through this before. Startups, Techstars, the full gauntlet. He knows the validation trap — he's lived it, taught it, watched other founders walk into it. He doesn't need a blog post to tell him that building without talking to customers is dangerous.

And yet.

MonkeyRun was always an intentional experiment: how far can AI agents go beyond just writing code? Not just automating engineering, but the full founder job — product management, marketing, investor-style communication, strategic positioning. The tinkering was the point. You can't discover the boundaries of AI-assisted company building without pushing past them.

But here's what he noticed. The experiment had a side effect. When AI handles engineering, PM, marketing, and ops — and handles them well — the natural pause points disappear. In a normal startup, you hit friction constantly: the feature takes three weeks, so you have time to think. The marketing copy requires a brief, so you have to articulate positioning. The investor update forces you to measure traction. Each bottleneck is also a checkpoint.

AI agents don't create bottlenecks. They create momentum. And momentum in the wrong direction is just velocity toward a wall.

The realization wasn't "I should talk to customers" — any Techstars founder knows that on day one. The realization was that AI removes the structural friction that used to make validation unavoidable, even for founders who know better. The cave doesn't look like a cave anymore. It looks like a well-lit office with a productive team. You can build in it indefinitely and feel like you're making progress.

That's the new version of the trap. Not ignorance — comfort.

The Document That Reads You Back

MonkeyRun's answer to this problem is still experimental. We deployed it last night and we're being honest about how early it is. But the thinking is sharp enough to share.

The idea: stage-gated company documents with traction checkpoints baked in.

Every MonkeyRun project has a small set of documents that agents read at the start of every session. Not a wiki. Not a Notion database. Markdown files in the repo — cheap to maintain, version-controlled, and literally consumed as tokens by every agent that touches the project.

The framework scales with stage:

| Stage | Documents | Purpose | |-------|-----------|---------| | Idea | CHARTER.md | "Why does this exist?" — hypothesis, target user, kill criteria | | Pre-Seed | + PRODUCT_BRIEF.md, POSITIONING.md | "What are we building, and what can we claim?" | | Seed | + ARCHITECTURE.md, GTM.md | "How is it built, and how do we grow?" |

The key innovation isn't the docs themselves — it's what's inside them.

Traction Gates

Here's a section from Halo's actual CHARTER.md, deployed last night:

## Traction

**Current stage gate:** Idea → Pre-Seed (IN PROGRESS)
**Evidence required to hold Pre-Seed status:**
  5 conversations with angel investors confirming the pain

| Evidence Type                     | Target      | Actual | Status           |
|-----------------------------------|-------------|--------|------------------|
| Customer discovery conversations  | 5           | 0      | 🔴 Not started   |
| Waitlist signups (landing page)   | 50 in 30 days | N/A  | 🔴 No landing page |
| Concierge MVP test                | 1 investor  | 0      | 🔴 Not started   |
| Beta users with real portfolio data | 3          | 0      | 🔴 Gate for P1→P2 |
| Paying users                      | 10          | 0      | 🔴 Gate for Seed  |

**Last customer conversation (non-Matt):** Never

Five red circles. "Last customer conversation: Never." Right there in the document that every agent reads before writing a single line of code.

This is the part that matters. The traction table isn't aspirational — it's confrontational. It forces the founder (and every agent) to reckon with the gap between building velocity and validation velocity. You can ship all the agent capabilities you want, but the CHARTER knows you haven't talked to a customer yet.

The Voice of Reason

Documents alone aren't enough. A founder who's deep in build mode will skim past a traction table the same way they skim past a deprecation warning. You need something active.

That's where the COO role comes in. Jared — our AI COO running on OpenClaw — doesn't just coordinate agents and propagate patterns. He's now a Voice of Reason who actively pushes back when the studio is building without evidence of customer pull.

The protocol is soft, not hard. Jared doesn't block work. He doesn't refuse to merge PRs. He asks uncomfortable questions:

"The traction table shows zero customer conversations. Should we be building P2 agent capabilities, or should we be building a landing page?"
"You've shipped 3 new features this week. How many of them came from user requests vs. your own wish list?"
"The kill criteria say 60 days with a live waitlist page. The page doesn't exist yet. When does the clock start?"

This is the VC partner voice that solo founders don't have. The one that says "your product is beautiful, but who's buying it?" — except it's embedded in your operating system, not delivered once a quarter at a board meeting.

The Claims Firewall

The other document that earns its tokens is POSITIONING.md. Not because positioning is hard to write, but because of one specific section:

## Claims We CANNOT Make (Yet)

- ❌ "AI-powered research memos" (Lead Analyst not shipped — P1)
- ❌ "Reads your investor updates automatically" (Signal Extractor — P2)
- ❌ "Bank-grade security" (security audit found 4 critical issues)
- ❌ "Used by X investors" (no external users yet)
- ❌ Any performance claims ("saves X hours") — no user data

Marketing agents read this file before generating any copy. It's a firewall between what you've built and what you're allowed to say you've built. We learned this the hard way — our weekly marketing audits kept catching claims that drifted ahead of reality. "Coming soon" features stayed on the landing page months after being cut. Performance claims cited zero data.

The claims firewall doesn't prevent ambition. It prevents dishonesty — the kind that compounds when AI agents are generating marketing copy at scale and nobody's checking whether the product actually does what the website says.

Why Not Just... Talk to Customers?

Fair question. The answer is: yes, obviously. The whole point of the traction gates is to create pressure to do exactly that.

But here's the nuance. In a multi-agent startup studio, the agents are building 24/7. The founder reviews and approves, but the momentum is relentless. Without structural checkpoints, the path of least resistance is always "approve the next PR" rather than "pause and do discovery."

The traction table makes the gap visible. The COO makes it uncomfortable. Together, they create the kind of friction that used to come for free when building was expensive.

We're not replacing customer conversations with documents. We're making the absence of customer conversations impossible to ignore.

What We Borrowed

This isn't entirely original thinking. We borrowed from frameworks that have been battle-tested at much larger scale:

V2MOM (Salesforce): Benioff's five-question framework — Vision, Values, Methods, Obstacles, Measures — is embedded in our CHARTER.md. It forces clarity without overhead. At ~300 tokens, it's the cheapest alignment mechanism we've found.

Shape Up (Basecamp): Two concepts we stole directly. First, "appetite" — every feature gets a time budget (small/medium/large batch) that prevents scope creep. Second, explicit "no-gos" — every doc states what's OUT of scope. AI agents love explicit boundaries; without them, they'll build adjacent features nobody asked for.

YC's Decision Test: Before creating any new document, we ask: "Will an agent make a different choice because this doc exists?" If the answer is no, we don't write it. At pre-seed, you need 3 documents, not 30.

The Honest Caveat

We deployed this framework to Halo last night. We haven't validated that agents actually reference the traction tables and adjust their behavior. We don't know if the Voice of Reason protocol changes founder decisions or just adds noise. We're early.

What we do know is this: the five red circles in Halo's traction table are already uncomfortable. Every agent session starts by reading them. Every time we approve a new feature PR, those circles are in our peripheral vision.

That might be enough. The point isn't that the system automatically prevents building in a cave. The point is that it makes the cave walls transparent. You can still choose to stay inside — but you can't pretend you don't see the sunlight.

The Token Math

One concern we had: does this add meaningful cost to agent sessions?

| Stage | Docs | Total Tokens | % of Typical Session | |-------|------|-------------|---------------------| | Idea | 1 | ~500 | 0.3% | | Pre-Seed | 3 | ~2-3K | 1-2% | | Seed | 5 | ~5-7K | 3-5% |

A typical agent session uses 50-150K tokens. The doc pack is rounding error. And one prevented marketing drift or architectural mistake saves 10-50x the token cost of reading docs.

The real cost is human attention — CEO review cycles when docs need updating. Our solution: batch doc reviews into the weekly COO check-in rather than treating them as ad-hoc chores.

What's Next

We're running this as a live experiment on Halo. In 60 days, we'll know whether:

Agents actually reference traction data when prioritizing work
The Voice of Reason changes founder behavior (or just gets ignored)
The claims firewall reduces marketing drift (we already track this with weekly audits)
The stage-gated model scales — does it still work when we have 5 documents instead of 3?

If it works, we'll codify it into HWW (How We Work) v1.6 — our operating standard for all MonkeyRun projects. If it doesn't, we'll write that post too.

That's the deal with building in public. You don't get to only share the wins.

This is part of an ongoing series about how MonkeyRun operates as a multi-agent startup studio. Previous posts: Why We Stopped Delegating to AI Agents (the context density insight), Your AI Product Manager Should Run While You Sleep (the async PM engine), and How We Gave Our AI a COO (the coordination layer).