Skip to content

History and Inspirations

Andi Lamprecht Andi Lamprecht ·· 15 min read· Draft

This is the narrative of how Metis came to exist in its current shape, told in the voice of the person who drove most of the early decisions. It is a personal account for most of its length, and it transitions to a collective voice at the end — because the project has outgrown the point at which any one person can steward it alone, and the rest of this document set is written for the people who now share that responsibility.

The Original Ambition — ADS

Before Metis was Metis, and before it was a fork of Archon, there was ADS: the Autonomous Development System. ADS was my attempt to describe, in product terms, what governed human-AI software delivery would actually look like if someone built it properly.

I wanted a platform where a team could start with an idea, refine it into a PRD, let the platform break that down into System-Level Requirements, generate implementation plans, and take each plan through a sequence of agent-authored and human-approved stages until it reached a release. I wanted eight named human gates — PRD Approval, Architecture Review, Implementation Plan Approval, Complex PR Review, Memory Promotion, Escalation Resolution, Production Release, and Design Review — so that the places where a human had to decide something were explicit, visible, and instrumented. I wanted a set of specialized agents for the roles along the pipeline, each with its own tier, credentials, and scope — a Supervisor, an Engineering Lead, a QA agent, a Story Writer, a Technical Writer, a Compliance Reporting agent, a Memory Steward, a Conflict Resolution agent, and a handful of others — so that the system’s behavior in any given stage was understood as a named actor with a known job, not an undifferentiated “the AI did it.”

I wanted four core principles to sit underneath all of this: pipeline integrity is non-negotiable, humans govern contracts and not implementation, the system learns from failure, and speed is a function of trust. I wanted trust to be earned, measured, and progressively extended — Mode 1 where agents work the bench with full human supervision, then Mode 2 through Mode 4 as automation earned its way forward. I wanted a memory system with three tiers, a knowledge graph, and a requirements hierarchy with real trace links so that any artifact could be followed back to the intent that produced it.

ADS was — and is — a work in progress. The artifacts I wrote for it captured a vision of what the end state could look like, not a contract for what anyone had committed to build. They are a reference, not a blueprint. Much of what follows is the story of how I tried to get to that end state without starting from zero.

The Pragmatic Turn

The thing about writing a detailed product vision for something the size of ADS is that once it exists on paper, the work of building it from scratch is clearly, obviously, a multi-year endeavor with a sizable team. I did not have a multi-year runway and a sizable team. I had a conviction, a set of design artifacts, and a growing sense that the industry was producing open-source substrate for this kind of platform faster than I could build my own.

So I started looking. I was willing to give up the privilege of a clean-sheet architecture if it meant getting to something usable inside a realistic horizon. The question I kept asking was: what is the shortest path from where I am now to a working harness for human-AI software delivery at our organization?

The answer I arrived at was: take the closest-fitting open-source platform, fork it, and bend it to the shape of the vision.

Archon as Baseline

The closest-fitting platform I found was Archon. Archon was a single-developer local tool for orchestrating AI coding workflows. It had a real workflow engine with YAML-defined DAGs of prompts, commands, bash steps, approvals, and loops. It had a provider abstraction for different AI agent runtimes. It had isolation via git worktrees. It had a web UI. It had a plausible data model for conversations, runs, and events. It had the bones of what I needed.

What it lacked was every dimension that turns a personal tool into a team platform. It was single-user by design. It ran on your laptop. It had no notion of identity, ownership, team-shared resources, per-user ownership, or hosted deployment. Its authentication story was “whoever is at the keyboard.” Its multi-user story was nonexistent. Its deployment story was bun run dev.

For a one-person tool it was excellent. For what I wanted, it was the starting point, not the destination. The two choices in front of me were (a) adopt Archon as-is and tolerate the personal-tool constraints until the pain of not-being-a-team-platform broke it open, or (b) fork it, treat the fork as terminal, and rebuild the operational parts around the same engine.

I chose (b). I chose it because I did not want to keep negotiating with someone else’s roadmap about decisions I needed to make immediately, and because the operational changes I needed — identity, ownership, BYOK, hosted execution, managed infrastructure — were not minor surface edits. They were a whole-platform rewiring that would have produced a permanently-diverged downstream fork whether I called it that or not. Better to call it what it was.

Metis as Hard Fork

The fork happened at Archon’s dev commit 3dedc225. I removed the upstream remote. I renamed everything. I changed the default branch convention. From that point forward, Metis is Metis — a distinct product that happens to share code heritage with Archon.

The early work on Metis is an operational story more than a product story. Identity was the first real battle: I wanted the platform accessible to every member of our GitHub organization via a single shared instance, with GitHub OAuth as the login and org membership as the auto-admit. That meant stateful DB-backed sessions, a bootstrap-admin rule, role management, and a deactivation path — none of which Archon had. Ownership and visibility were the second battle: every codebase, conversation, workflow, command, script, skill, and MCP configuration needed an owner, a visibility stance, and a shared query pattern that kept every reader from seeing every other user’s private work by default. The third battle was resources: the workflows, commands, scripts, skills, and MCP configurations that had lived as YAML files in Archon needed to become first-class, team-shared, version-history-having database objects so that people could collaborate on them in a web UI, not by handing each other files.

Then came BYOK — letting each user supply their own Anthropic and OpenAI keys while still having an enterprise fallback — because a hosted team tool that pools everyone’s AI spend onto one admin’s credit card is a broken model. Then came the execution split: moving workflow runs out of the web process and into a queue-backed worker topology, so that a long-running run could not tie up the web tier and so that workers could scale independently in the production deployment. Then came the GitHub App integration: replacing local-path and URL-based codebase registration with a proper org-installed App and a repo-picker flow.

At each step, the platform shipped something real — a migration, a set of routes, a UI surface — and then the next step built on it. The details are in 03-current-state.md. The point for this document is that nine phases of work have taken Metis from “Archon with a user table bolted on” to a functioning hosted team platform, and that body of work is the substrate the vision is now going to be built on.

Inspirations

The shape of what Metis should become is not something I derived in isolation. Four external influences pushed the design in specific directions, and one experiment I ran myself validated a piece of the direction that I would otherwise have had to take on faith.

obra/superpowers

Early in the Metis work, I started using obra/superpowers as a framework for how I worked with Claude inside the repo. Superpowers organizes itself around “skills” — composable, mandatory processes that activate when certain conditions are met. It enforces red-green-refactor discipline through a test-driven-development skill. It enforces YAGNI through a brainstorming skill that will not allow implementation to start until a design has been presented and approved. It enforces verification-before-completion through a skill that refuses to let you claim work is done without evidence. It has hard gates, not suggestions; the skills are a process the agent cannot rationalize its way out of.

What surprised me, working with superpowers day to day, was the material difference it made. I already knew these principles — TDD, YAGNI, evidence-based verification. I had written them into rules files before. What I did not expect was how much better the outcomes got when the agent’s process was a program the agent was forced to execute, not a document the agent was expected to remember. The difference between “the rule is written down” and “the harness will not let you proceed until the rule is satisfied” is the difference between guidance and enforcement. Superpowers convinced me that the governance parts of ADS — the gates, the evidence requirements, the pipeline integrity — would not work as prose in a wiki. They had to be hard walls that the platform built and maintained.

Superpowers also introduced me to the pattern of skill-as-reusable-unit. A skill is a small, focused thing — brainstorming, writing plans, executing plans, debugging, committing — that any session can invoke when it fits. The analog for Metis is clear: skills, commands, MCP configurations, and named scripts need to be first-class organizational assets, centrally managed, versioned, and injected into the right workers at the right time. Every project should inherit a baseline of organizational skills and be able to extend them without losing the baseline.

Taskmaster-AI

I used taskmaster-ai (originally just “taskmaster”) long before most of the agentic coding tooling that is common now. I used it with Cursor to take a feature, break it down into an implementation plan, and manage that plan as a tree of increasingly smaller work units. It was a way to keep track of what needed to be done and in what order, and it was a way to impose the discipline of “break the work down until each piece is actually achievable” on projects where my natural tendency was to start coding the biggest piece first.

What taskmaster-ai taught me was twofold. First, the breakdown is as important as the implementation — a project’s velocity is set not by how fast you can type, but by how clearly you can decompose. Second, a roadmap is a living thing: the act of tracking items, reorganizing them, marking them done, and surfacing the next one is itself a significant productivity multiplier, not a bureaucratic overhead. Even when I was not formally using taskmaster-ai, the principles carried into how I worked alongside superpowers.

The limitation of taskmaster-ai — and of the individual-tool pattern in general — was that it was built for one person working alone. There was no collaborative surface for product people to shape requirements, no gate for a stakeholder to approve the breakdown before execution started, no visibility across projects for leadership to see what was in flight. The things ADS called out as necessary for a team platform — human-AI collaboration on requirements, implementation plans, ADRs, cross-project observability — were missing from the individual-contributor tools, because they were by design individual-contributor tools.

Hamster

Hamster — and specifically the Hamster Studio documentation — came into my field of view after the ADS vision was already written. It is important to be precise about the direction of influence: ADS came first, Hamster came later, and seeing Hamster was a confirmation that the ADS direction was sound, not a source I copied from.

Hamster is an AI-powered team collaboration platform built around briefs (intent documents distinct from PRDs), plans (AI-generated with tasks, subtasks, dependencies, and acceptance criteria), tasks, blueprints (reusable project patterns capturing team methodology), initiatives (hierarchical grouping), and a context graph that automatically links conversations, decisions, documents, briefs, plans, and people. It has connections into Linear, Jira, Slack, Notion, Figma, Google Drive, GitHub, and Cursor. It has a real-time multiplayer mode where teams can align on briefs before execution.

What made seeing Hamster valuable was not that it was novel to me — most of its concepts I had already described in the ADS artifacts under different names — but that it was real. Briefs, plans, tasks, and blueprints were the hamster names for concepts ADS had already called PRDs, SLRs and Implementation Plans, Epics and Stories, and agent role definitions. The context graph was the hamster name for what ADS had called the requirements hierarchy with trace links. The multiplayer brief-review was the hamster name for what ADS had called a human gate. Hamster was the embodiment of several of the things ADS had asked for, shipped as a product, working in the wild — which told me that the shape I had described was achievable and that someone else had arrived at it independently.

The reason to build these capabilities into Metis rather than adopt Hamster is that the capabilities are not the thing — the workflow engine is the thing. Workflows are the delivery vehicle for every capability. A brief that is not tied to a workflow that produces a plan is a document in a drawer; a plan that is not tied to a workflow that produces work is a slide in a deck. Metis has the workflow engine at its core. The rest of these capabilities need to be built into Metis so that the workflow is the axis around which intent, plan, execution, and release rotate — not bolted on as an adjacent surface.

Hamster is one reference among several for the shape of each capability. It is not a ceiling, and it is not a design target. The future-direction document describes problems Metis needs to solve, and it draws shape examples from ADS (which defined most of them first), from Hamster (which shipped adjacent solutions for several of them), and from other sources where relevant.

The Codex-Reviews-Claude Experiment

Before any of the vision work settled, I ran a small experiment on myself. I took a piece of work — a spec, a plan — that Claude had written, and I gave it to Codex with instructions to review it for YAGNI, red-green path discipline, minimal-implementation-first thinking, and the other principles superpowers had already taught me to enforce. Codex came back with substantive critique. I took Claude’s draft and Codex’s critique, gave them back to Claude, and asked for a reconciled version. The reconciled version was materially better than either input.

Then I did it the other way around. I had Claude review something Codex had produced. Same result: the reviewer had a broader view of the context, the reviewed had a narrower view of the code, and forcing them to negotiate produced an output that was better than either could have produced alone.

What I was doing, in the language of the platform, was acting as a harness. I was pulling work from one agent, routing it to another agent with a different system prompt and a different orientation, carrying context between them, and stitching the results back together. The experiment worked well enough that I started doing it more often. It worked poorly enough, as a human-as-harness pattern, that I got tired fast.

The insight from that experiment, in retrospect, is one of the core pieces of the Metis vision: multi-agent critique is not a novelty, it is a design tool. Having a second agent challenge the first — especially when the two agents have asymmetric context, asymmetric skills, or asymmetric stakes — produces better output than any single agent in isolation. But this pattern has to be cheap to invoke, or it does not get invoked. It has to be built into the harness, as a first-class workflow pattern, so that “have Codex review Claude’s plan and reconcile” is a node type, not a human ritual.

The human-as-harness problem is a recurring theme across all of these inspirations. Superpowers demonstrated how much discipline the harness could enforce. Taskmaster-AI showed how much individual productivity the harness could unlock. Hamster showed what the harness could look like for a team. The codex-reviews-claude experiment showed what the harness could unlock when it mediated between multiple agents. The common thread is the harness — the platform — and the question Metis answers is: what if the harness were real, and good, and ours?

Where This Leaves Us

The history above is personal because, for most of the work so far, it has been. The decisions were individual decisions. The forks were individual forks. The experiments were individual experiments. That is an accurate account of the early life of the project.

It is not an accurate account of where the project is going. Metis has reached the point at which no single person can steward the full picture — the substrate is substantial, the surface area is growing, and the pieces of the vision still ahead are not the kind of work anyone should do alone. The rest of this document set is written in an institutional voice because the institutional voice is now the right one.

We carry forward the principles I learned from superpowers, the decomposition discipline I learned from taskmaster-ai, the collaborative shape I saw in hamster, the multi-agent critique pattern I validated in the codex experiment, and the governance vision I described in the original ADS artifacts. We carry forward what Metis has become as a hosted team platform. We turn those into a harness that is real, good, and ours.

The current state is in 03-current-state.md. The principles that guide new work are in 02-principles.md. The direction the platform is headed is in 04-future-direction.md. This file captures how we got here; those three describe where we are, what we believe, and where we are going.

Last updated on