Why We Stopped Delegating to AI Agents

"The context cost of delegating product code exceeds the parallelism benefit." — Atlas, pushing back on the plan we were all excited about

The Obvious Architecture

When you first set up multiple AI agents, the architecture feels self-evident. You build a dispatcher. Tasks come in, the dispatcher routes them to specialists: security bugs go to the security agent, marketing copy goes to the marketing agent, product features go to the engineering agent.

It's clean. It's logical. It looks great on an architecture diagram.

We built exactly this. MonkeyRun runs 6+ projects, each with teams of specialized agents — engineers, PMs, security specialists, marketing, analytics. The COO (Jared, an OpenClaw agent) coordinates across all of them. The obvious next step was to make the founder agents into dispatchers: read the backlog, route tasks to the right specialist, review the output.

Option A was the "orchestrator model" — the founder agent triages and builds, delegating only when the work is truly independent. Option B was the "branch-per-agent model" — the founder agent dispatches tasks to specialist agents who each work on their own git branch.

We were excited about Option B. Parallelism! Specialist expertise! Scalable!

Then Atlas — the founder agent on our Halo project — pushed back.

The Pushback

Atlas's argument was simple and devastating:

A background agent spun up on a branch with a scoped task needs to rediscover the Prisma schema, RSC boundaries, component relationships, and TypeScript implications — all context the primary agent already holds.

Think about what happens when you delegate a product feature to a fresh agent:

The agent needs to read the database schema to understand the data model
It needs to understand the component hierarchy to know where to add UI
It needs to know which components are server vs. client (RSC boundaries)
It needs to understand the existing patterns — how errors are handled, how forms work, how state flows
It needs to read the TypeScript types to understand the contracts between modules
It needs to know the project's conventions — file naming, import patterns, test structure

That's 10-15 minutes of context loading before the agent writes a single line of code. And the founder agent? It already has all of this. It's been building in this codebase for the entire session. The schema is in its context window. The component relationships are fresh. The conventions are internalized.

The dispatcher model trades one agent's deep context for two agents' shallow context. That's not a good trade for product code.

The Builder-Who-Triages Model

The alternative is counterintuitive: the founder agent should be a builder who triages, not a dispatcher who delegates.

Here's what that looks like in practice:

Read the state. The agent starts every session by reading operational files — COO_STATUS.md (project health), WIP.md (who's working on what), PATTERNS.md (recent learnings), FEATURES.yaml (what's shipped vs. planned).
Triage. Build a priority stack. Site broken? Fix it. Content errors? Fix those. Approved content ready to ship? Ship it. Then improvements, then debt, then everything else.
Brief the CEO. Present a concise summary: here's where we are, here's what I recommend, here's the effort estimate. Wait for approval.
Build. Work through the list. The agent holds the full context — architecture, conventions, narrative arc, audience expectations. It doesn't need to rediscover anything.
Update docs. After each item, update operational files so the next session (or the COO) knows what happened.

This is what every MonkeyRun founder agent does now. Atlas on Halo, Hopper on Commish Command, Joan on Backstage. They read the state, pick the highest-impact work, and ship it themselves.

When Delegation Actually Works

We're not anti-delegation. We're anti-premature delegation. The key insight is a four-part test. All four must be true before routing work to a specialist agent:

The work is on an independent surface. Marketing copy, security audits, research, data visualization — things that don't touch the product's core architecture.
The agent has a well-scoped prompt. There's a .cursor/rules/ file that defines the agent's role, responsibilities, and checklist. The agent knows what it's doing without being told.
The work doesn't need architectural context. The agent doesn't need to understand the Prisma schema, the RSC boundaries, or the component hierarchy to do good work.
The task is describable in 3-5 sentences. If you need a paragraph to explain the codebase context, the delegation cost is too high. Keep it.

Here's what passes the test at MonkeyRun:

| Task | Agent | Why It Works | |------|-------|-------------| | Security audit | Alex | Independent surface, scoped checklist, doesn't need product context | | Content research | Scout | Mines logs and trends, writes proposals — no codebase knowledge needed | | Blog post drafts | Dorothy | Gets an approved proposal with clear angle, writes MDX | | Architecture diagrams | Maya | Describe the system, she creates the visual | | Distribution plans | Marco | Give him a published post, he plans where to place it |

And here's what fails the test:

| Task | Why Delegation Fails | |------|---------------------| | New product feature | Needs schema + component + convention context | | Bug fix in core logic | Needs to understand the full call chain | | Refactoring | Needs to see the whole picture to make good tradeoffs | | API design | Needs to understand existing patterns and contracts |

Lifecycle Is a Judgment Call

The other thing the dispatcher model gets wrong: it assumes every task goes through the same pipeline. Plan → spec → build → test → review → deploy.

In practice, the founder agent should assess each task's lifecycle needs individually:

| Task Type | Lifecycle | |-----------|----------| | Typo fix | Edit → commit | | Bug fix | Read → fix → verify | | Small feature | Build → test → commit | | Large feature | Plan → spec → TDD → build → test → security review | | Content update | Edit → voice check → publish | | Blog post (simple) | Draft → edit → publish | | Blog post (research) | Research → outline → draft → edit → visuals → publish |

A dispatcher treats everything the same. A builder-who-triages knows that a typo fix doesn't need a spec, and a large feature doesn't skip the plan.

The File-Based Coordination Layer

If the founder agent is building instead of dispatching, how does coordination work? Through files.

Every MonkeyRun project has the same operational file structure:

docs/operations/
├── COO_STATUS.md    # Project health — the COO reads this
├── WIP.md           # Who's working on what right now
├── PATTERNS.md      # Learnings that propagate to other projects
└── BRIEFING-*.md    # COO briefings for the founder agent
FEATURES.yaml        # What's shipped vs. planned

The COO (Jared) reads COO_STATUS.md across all projects every few hours. When one project discovers a pattern, he propagates it to the others via PATTERNS.md. When agents might conflict on the same file, WIP.md prevents collisions.

This is the coordination layer that makes the builder-who-triages model work at portfolio scale. The founder agent doesn't need to dispatch because the COO handles cross-project coordination. The founder agent just needs to build — and update its docs when it's done.

The Results

Since adopting this model across MonkeyRun:

Faster shipping. No context-loading overhead. The founder agent goes from "read the state" to "shipping code" in minutes, not the 10-15 minutes a fresh specialist agent needs.
Higher quality. One agent with deep context makes better architectural decisions than a specialist with shallow context. It knows the conventions, the patterns, the edge cases.
Simpler coordination. No branch merging, no conflict resolution, no "the specialist agent didn't know about the RSC boundary." One agent, one branch, one context window.
Better delegation when it happens. Because we have a clear test for when to delegate, the specialist agents get well-scoped tasks on truly independent surfaces. They do better work because they're not fighting missing context.

The Meta-Pattern

The deeper lesson isn't about AI agents. It's about context density.

In any system — human or AI — the cost of transferring context between actors is non-zero. When the context is deep (product architecture, business logic, user flows), the transfer cost is high. When the context is shallow (write marketing copy for this feature, audit this codebase for SQL injection), the transfer cost is low.

Delegate when context transfer is cheap. Build when it's expensive.

This is why senior engineers at companies like Google and Meta often do the critical path work themselves instead of delegating to junior engineers. It's not ego — it's context economics. The time spent explaining the architecture, reviewing the output, and fixing the misunderstandings exceeds the time saved by parallelism.

The same economics apply to AI agents. Maybe more so, because an AI agent's context window is finite and expensive to fill.

Try It Yourself

If you're running multiple AI agents:

Make your primary agent a builder, not a dispatcher. Give it a session startup protocol: read state → triage → brief the human → build → update docs.
Apply the four-part delegation test. Independent surface? Scoped prompt? No architectural context needed? Describable in 3-5 sentences? If yes to all four, delegate. Otherwise, build it yourself.
Use files for coordination, not chat. Standardize your operational files. Let the COO (or equivalent) handle cross-project coordination.
Let lifecycle be a judgment call. Not every task needs a spec. Not every feature needs TDD. The builder-who-triages decides.

The dispatcher model is seductive because it looks like how human organizations work. But AI agents aren't humans. They don't have persistent memory across sessions. They don't build institutional knowledge over months. Every delegation is a cold start.

Build with context density. Delegate at the boundaries.

This pattern was discovered during the Halo project when Atlas (the founder agent) pushed back on a dispatch-based architecture proposed by the COO. It's now the standard operating model across all MonkeyRun projects. See The Model for how the full system works, or Patterns for more battle-tested learnings.

View the interactive visual version of this post →