Operating Principles
This document captures the principles that guide Metis design decisions. Each principle is a statement, an attribution to the source or sources that shaped it, a description of what it looks like in the platform today, and a note about what it still needs to become. The principles are organized into five clusters — agent discipline, human-in-the-loop governance, evidence and observability, platform tenets, and tensions we deliberately hold rather than resolve.
None of these principles is aspirational only. Each one is actively applied when a design choice is being made, a review is being conducted, or a disagreement is being settled. When two principles are in tension on a specific decision, we name the tension and decide explicitly rather than pretending one does not apply.
Cluster 1 — Agent Discipline
The cluster that governs how agents operate inside the platform. These principles exist because a capable model without a disciplined harness will produce inconsistent outcomes at best and harmful ones at worst, and because most of the observed failure modes of agentic coding come from the absence of discipline, not from the absence of capability.
1. Hard Walls, Not Soft Rules
Rules files describe intent; deterministic checks enforce it. When a constraint matters — and most of the constraints we care about do — it is expressed as a check the platform runs, not as guidance we hope the agent will follow.
Drawn from obra/superpowers’ skills-as-mandatory-processes pattern, from test-driven development’s red-green-refactor discipline, and from the ADS core principle that pipeline integrity is non-negotiable.
In Metis today, this shows up as schema validation on workflow definitions, as approval gates that block workflow progress until a human acts, as workflow-level tool allowlists and denylists that constrain what an agent can do inside a node, as the visibility and ownership filters that every read passes through, and as the isolation model that runs workflow work inside a git worktree so that a runaway agent cannot damage the live checkout. The platform decides what an agent can see, touch, and invoke — not the agent.
What it still needs to become is broader: more checks, stronger gates, and more aggressive refusal to proceed when evidence of correctness is missing. Every time a rule lives only in a rules file, we are accepting the risk that the rule will be ignored. Over time, rules that matter enough to write down should migrate into deterministic enforcement inside the workflow engine — where the platform can enforce them rather than hope the agent will remember them.
2. Multi-Agent Critique Is a Design Tool, Not a Novelty
Having a second agent challenge the first — especially when the two agents have asymmetric context, asymmetric skills, or asymmetric stakes — produces better output than any single agent in isolation. This is not a party trick. It is a reliable pattern for improving quality, and it is a pattern the platform should make cheap to invoke.
Drawn from the personal experiment of using Codex to review plans Claude wrote (and the reverse) and finding the reconciled output materially better than either input. Reinforced by the ADS vision’s separation of agent roles by tier and purpose.
In Metis today, this pattern is possible but not first-class. A workflow can invoke different providers on different nodes, and a human can route work between them — but the critique-and-reconcile pattern is something the user has to build explicitly inside a workflow each time. It is not a node type, not a bundled workflow shape, and not an observable cross-project pattern.
What it still needs to become is a first-class primitive: critique nodes, reconciliation nodes, bundled critique workflows, and measurable outcome telemetry that shows when the critique pattern is paying off versus when it is burning tokens without adding value.
3. YAGNI, Red-Green, DRY-with-Rule-of-Three
You aren’t gonna need it. Get to the minimum red-green path before expanding. Duplicate small logic locally rather than extract it prematurely; extract only after the same pattern appears at least three times and has stabilized.
Drawn from superpowers’ writing-plans and TDD skills, from the ADS vision’s emphasis on pipeline integrity, and from the long-standing tradition these shorthand slogans compress.
In Metis today, these principles are principles the platform intends to enforce in the work its agents produce for users’ projects. Bundled workflows encode the discipline where they can — brainstorming before implementation, planning before coding, red-green before expansion — and approval gates give humans the chance to reject work that violates them. The enforcement is workflow-shaped and still partial; an agent running inside a workflow can still produce a YAGNI-violating draft, and the platform relies on downstream human review to catch it.
What it still needs to become is observable and structural. Workflows should be able to critique their own outputs against these principles through dedicated critique nodes, not just rely on human reviewers to notice violations. The platform should surface patterns of violation across projects — which workflows tend to produce over-scoped plans, which node configurations correlate with expanded-beyond-red-green implementations — so that the violation itself becomes a signal the platform acts on, not a memory the reviewer has to hold.
4. Fail Fast, Explicit Errors, No Silent Fallbacks in Agent Runtimes
When an agent runtime encounters an unsupported state, an unsafe permission, or an ambiguous condition, it errors loudly with a clear message. It does not guess, it does not fall back to a weaker mode, and it does not silently broaden its capabilities. Silent fallback in an agent runtime creates unsafe or costly behavior; explicit error creates corrective behavior.
Drawn from the ADS vision’s insistence that every escalation is a signal, and from the general observation that agent systems which “degrade gracefully” most often degrade invisibly.
In Metis today, this principle is encoded in the way the workflow engine surfaces errors to humans rather than recovering blindly. A workflow run that needs an API key it does not have fails cleanly with a message that tells the user to configure one, rather than silently trying something else. Git errors and permission errors are classified and surfaced. The audit log captures every mutation, and the visible state of a run is the true state of the run; there is no ambient best-effort recovery happening behind the scenes.
What it still needs to become is applied more broadly across agent interactions: when an agent tries to use a tool it does not have, when an MCP server is unreachable, when a skill fails to load, when an approval gate times out. The platform’s default response to ambiguity should always be “surface it to a human,” not “make a best guess and continue.”
Cluster 2 — Human-in-the-Loop Governance
The cluster that governs where humans belong in the pipeline. These principles exist because AI agents are capable contributors but not accountable actors, and because the decisions that require accountability — what we are building, what we are shipping, what risks we are accepting — are exactly the decisions that require human judgment.
5. Humans Gate Intent and Release; AI Drafts the Middle
The boundaries of a piece of work are human decisions. What are we building, and what does “done” mean? Those are the entry gate. When is this ready to ship? That is the exit gate. Between the two gates, most of the mechanical work — decomposition, drafting, implementation, review, documentation, testing — is where agents earn their keep.
Drawn from the ADS vision’s framing that humans govern contracts and not implementation, and from the hamster pattern of brief-first, plan-second, execution-third with humans in the loop at the transitions.
In Metis today, this principle is partially realized. Workflow approval nodes let a human gate a piece of work inside a workflow, and the GitHub PR merge is a natural exit gate that requires a human to authorize. But the entry gate — the act of converting an idea into an approved piece of intent — is mostly informal, happening inside conversations that are not a first-class durable artifact.
What it still needs to become is a pipeline with first-class gates at intent, at plan, and at release, all with the same discipline — a durable artifact, a human reviewer with a named role, a decision that is recorded in an audit log, and a defined set of consequences for approval and rejection.
6. AI Earns Automation Through Evidence
Trust is not declared, it is observed. When an agent or a workflow performs reliably over a sufficient sample of work, the platform can extend automation — lowering human-gate friction, widening tool allowlists, granting longer autonomous runs. When an agent or a workflow performs poorly, the platform constrains. The extension is always reversible, always observable, and always backed by telemetry.
Drawn from the ADS vision’s progressive gate trust model and Modes 1 through 4, from the general principle that “speed is a function of trust,” and from the engineering intuition that trust in any system should track the evidence that system produces, not the promises about what it will do.
In Metis today, this principle is not yet implemented as a trust model. Every workflow is treated the same regardless of past performance, and automation levels are set per-workflow by the author rather than earned by the pattern. Audit logs exist; telemetry is emitted; but the feedback loop that turns evidence into progressive automation is not built.
What it still needs to become is a real trust-calibration system — per-workflow, per-node-type, and possibly per-agent — that tracks outcomes, human-override rates, and cost-per-successful-outcome, and that exposes those metrics to platform administrators and workflow authors. Automation that cannot be justified by evidence should be constrained, and constraint that cannot be justified by evidence should be relaxed.
Cluster 3 — Evidence and Observability
The cluster that governs what the platform knows about its own behavior. These principles exist because a platform that cannot see its own work cannot improve it, and because the link between intent and delivery has to be recoverable — not from memory, not from Slack threads, but from the platform itself.
7. Evidence-Linked Traceability: Intent to Release
Every artifact in the pipeline — brief, requirement, plan, workflow run, PR, release — carries an explicit, queryable link to the artifacts upstream and downstream of it. Any delivered change can be traced back to the intent that authorized it. Any approved intent can be traced forward to the work that fulfilled it, or the reason it did not.
Drawn from the ADS requirements hierarchy with first-class trace links, from hamster’s context graph pattern, and from the compliance-adjacent intuition that “who approved what, and did we actually build it” is a question the platform should never have to ask its human operators.
In Metis today, this principle is partially realized as implicit links: a workflow run references a conversation; a conversation references a codebase; a PR is mentioned in events. But the graph is not first-class, not queryable end-to-end, and not surfaced as a navigable UI.
What it still needs to become is a durable, indexed graph of the artifacts the platform produces, with bidirectional links, version pinning, and a UI that lets a human or an agent start from any artifact and walk to any related artifact in either direction.
8. Observability Is Cross-Project by Default
The platform is a substrate for many projects, and the patterns worth learning about — which workflows succeed, which fail, where humans override the defaults, which nodes cost the most, which skills earn automation — are cross-project patterns. Per-project observability is necessary but not sufficient; cross-project analytics is where the platform’s value compounds.
Drawn from the ADS vision’s telemetry dashboards, from the engineering intuition that “what can be measured can be improved,” and from the specific observation that a platform serving many projects without cross-project analytics is not meaningfully different from a pile of independent tools.
In Metis today, audit logs and workflow events exist per run and per codebase, but cross-project analytics dashboards, trust calibration telemetry, and organizational-level metrics are not built. The raw data is in the database; the views and dashboards that would make it useful are not.
What it still needs to become is a real cross-project analytics layer — dashboards for workflow success rates, cost per outcome, human-override frequency, agent performance by provider and by model, and retrospective-driven improvement tracking. The platform should be able to tell its administrators what the platform is doing, as a whole, without requiring them to assemble the picture by hand.
Cluster 4 — Platform Tenets
The cluster that governs what the platform is. These principles exist because the character of a platform — what it treats as first-class, what it allows, what it refuses — is set by its tenets, not by its features.
9. Workflows Are the Delivery Vehicle for Every Capability
If a capability matters enough to exist in the platform, it exists as a workflow. Intake is a workflow. Requirements decomposition is a workflow. Multi-agent critique is a workflow. Release review is a workflow. Retrospectives are workflows. Platform-internal automation is a workflow. The workflow engine is the axis around which every capability rotates; capabilities that live outside workflows are capabilities that do not benefit from the platform’s determinism, observability, isolation, or reusability.
Drawn from the framing that has guided Metis from the Archon fork forward, reinforced by every inspiration source that pointed in the same direction.
In Metis today, this principle is realized for the capabilities the platform already ships: workflow-backed CI, workflow-backed research, workflow-backed implementation, workflow-backed review. It is not yet realized for the capabilities the platform does not yet ship — intake, requirements, retrospectives, governance artifacts — because those are not yet built.
What it still needs to become is absolute. Every new capability added to the platform should be implemented as a workflow unless there is a specific, articulated reason it cannot be. A feature that would live outside the workflow engine is a feature the platform should probably not have.
10. Reusability and Centralization of Skills, MCPs, Commands, Scripts
Skills, MCP server configurations, commands, and named scripts are shared organizational assets. They are managed centrally, versioned, and injected into the right workers at the right time by the workflow that references them. Every project inherits a baseline of organizational resources and can extend them without losing the baseline. The workflow is the injection point: it decides what is, and what is not, available to the worker that executes it.
Drawn from the Metis platform-resources work (workflows, commands, scripts, skills, MCP configs as DB-backed first-class assets), from the superpowers pattern of skills-as-reusable-units, and from the hamster pattern of blueprints capturing team methodology.
In Metis today, this principle is largely realized for workflows, commands, scripts, skills, and MCP configurations — all of which are DB-backed, versioned, ownership-tracked, team-visible by default, and injected into worker runs through the platform’s materialization layer. Project-level resources in the target repo’s .metis/ directory override and extend platform-level resources.
What it still needs to become is the governance layer around it: how does the organization decide which project-level patterns should be promoted to platform-level, how does it retire skills that are no longer useful, how does it surface to a project author that a platform-level skill exists that would solve the problem they are about to solve themselves?
11. Reversibility-First for Changes
Every change should be easy to revert. Small scope, clear blast radius, rollback path defined before the change lands. Risky changes have rollback plans written down. Changes that cannot be rolled back are identified as such and given the extra scrutiny they deserve.
Drawn from the ADS vision’s explicit treatment of rollback as a first-class operation, and from the operational intuition that systems optimize for the recovery cases that have been rehearsed.
In Metis today, this principle shows up in the way the platform defaults to isolated worktrees so that a failed run does not damage the live checkout, in the way workflow runs can be abandoned and resumed, in the way resource changes are version-history-tracked so that a previous version can be inspected and restored, and in the way ownership and visibility changes are reversible through audit-logged actions.
What it still needs to become is a platform feature extended to every artifact: every workflow run, every resource change, every ownership transition, every governance artifact should have an observable undo path where possible, and every operation that does not have an undo path should be flagged as such at the time the change is considered, not when the rollback is needed.
Cluster 5 — Tensions Held, Not Resolved
The cluster of principles that name tensions rather than resolving them. Some of the most important design decisions in a platform are the ones where two good principles pull in opposite directions, and where the right answer is not to pick a winner but to live with the tension honestly.
12. Consistency and Autonomy Pull Against Each Other; The Platform Holds the Tension
Some capabilities belong to everyone, everywhere. Some belong to a single project. A shared workflow for PR review is a consistency win; a project-specific workflow for a team’s unique release process is an autonomy win. A bundled set of organizational skills is a consistency win; a project’s custom skill for its specific stack is an autonomy win. The platform is not supposed to pick a side on this. It is supposed to make both possible, make the seams visible, make the defaults sensible, and make the overrides traceable.
Named as a principle because this tension is the single most recurring source of design disagreement on the project, and naming it allows every decision that touches it to be made in the open rather than by accident.
In Metis today, this principle is implemented through the platform-versus-project resource model: bundled defaults, platform-level team-shared resources, and project-level overrides are all present and form a three-layer precedence (project > platform > bundled). The workflow discovery layer merges them. The UI surfaces them.
What it still needs to become is a governance model for when consistency should be enforced as a hard wall versus when autonomy should be preserved. There will be specific capabilities — compliance-adjacent workflows, security-sensitive gates, release authorization — where the organization should refuse to allow project-level override. There will be others where the platform should make overriding easy and traceable rather than hard and invisible. Deciding which is which, and making the decision observable, is work we have not yet done.
How to Use This Document
When a design decision is in front of you and you are uncertain, the right question is not “what would the team want?” It is “which of these principles apply here, and what do they say about this decision?” If two principles apply and they disagree, name the disagreement explicitly and decide in the open. If no principle applies, the decision is either trivial (go ahead) or it is revealing a gap in this document (bring it back and let us add the missing principle).
Principles evolve. When a new lesson lands — from a retrospective, from a failure, from an observation — it belongs here. This file is the place where what the platform has learned about itself gets written down in a form the platform can act on next time.