Keystone — Level 2 Project Harness

A self-updating harness — corpus, guides, sensors, and flywheels — that scaffolds a Level 2 project harness around your AI coding agent. One install command, owned by your repo, agent-agnostic.

~/your-app
$ brew tap tacoda/tap
$ brew install tacoda/tap/keystone
keystone installed
$ keystone init
┌─ Select your agent ────────────────────────────┐ │ claude-code Anthropic Claude Code │ │ codex OpenAI Codex / Codex CLI │ │ cursor Cursor editor │ └────────────────────────────────────────────────┘
installing harness harness/{corpus,guides,sensors,…}/
installing claude-code target CLAUDE.md
keystone installed for claude-code.
Next: run the bootstrap action in claude-code.
$ claude
> /bootstrap
Thinking… (3s · esc to interrupt)

Prefer to skip the prompts? Pass every option as a flag and init runs non-interactively — handy for CI, dotfiles, and onboarding scripts:

$ keystone init \
    --agent claude-code \
    --app-type web-api \
    --architecture hexagonal,continuous-delivery \
    --testing tdd,property-based \
    --compliance soc2

Multi-select flags take comma-separated values. Run keystone options for the full catalog of labels per flag.

What it is

Keystone generates a Level 2 project harness — codebase-level scaffolding an AI coding agent reads on every task. No language runtime, no daemon. The keystone CLI is a single Go binary with the entire harness embedded; keystone init writes the harness and an agent menu file into your repo, then gets out of the way.

After install, the binary is no longer required. The harness is plain markdown you own and version alongside your code. It has four components: guides (rules — always loaded), corpus (informational reference — loaded on demand), sensors (automated checks that fire at phase boundaries), and two flywheels — Learning adds rules and reasoning, Pruning removes stale ones (guides regularly, corpus rarely).

Your agent drives a six-phase workflow (spec → planning → implementation → verification → review → release) under the always-loaded guide rules, reaching for corpus reasoning only when the rules aren't enough.

Where it sits in the harness stack

The harness stack has five levels. Keystone generates a Level 2 (Project) harness — codebase-scoped scaffolding the agent reads on every task. It blurs into Level 3 (Organization) because the upper corpus layers — principles, process, adapters — are not codebase-specific and naturally travel across an organization's projects, even though the install target is a single repository.

LEVEL 0 Model Harness The AI tool itself — Claude Code, Cursor, Copilot. You select it; you don't configure it.
LEVEL 1 Agent Harness Global config that travels with you across projects. Memory, user-level settings, global ~/.claude/CLAUDE.md.
LEVEL 2 Project Harness Codebase-level scaffolding. Slash commands, hooks, subdirectory CLAUDE.md files, static analysis config — the terrain a single codebase presents to its agents. ← keystone
LEVEL 3 Organization Harness Cross-project consistency — shared conventions, tool configs, policies that apply everywhere. The currently most underbuilt level. ← blurs here
LEVEL 4 Orchestrator Harness Fleet-level coordination — LangGraph, Devin, CrewAI. Task decomposition, distribution, reassembly across multiple agents.

Why the blur? Keystone's corpus/principles/, guides/principles/, guides/process/, and sensors/ are universal; adapters/ are per-agent, not per-project. Cherry-pick those into every repo in your org and you have de facto Level 3 coverage — without a separate Level 3 mechanism. Only corpus/idioms/, corpus/domain/, guides/idioms/, guides/domain/, and corpus/state/ are genuinely project-scoped.

Architecture
Embedded Harness — //go:embed
guides/ Rules — always loaded. principles · idioms · domain · process.
corpus/ Reference — on demand. principles · idioms · domain · state.
sensors/ Automated checks — invoked at phase boundaries. 12 sensors.
flywheels learning/ + archive/. Self-update — additive + subtractive.
Four-component harness + per-agent adapters + opt-in content, baked into the binary at build time.
keystone init
Installer — keystone binary
detect agent · prompt missing options (huh TTY)
preflight (git, CI)
copy harness/ from embed.FS
merge menu files (additive · idempotent)
install opt-in content from optional/
write INSTALL_PROFILE.md
print per-agent capability gaps · next steps
writes to disk
Consumer Repository
harness/ The harness — corpus, guides, sensors, flywheels, plus per-agent adapters. Project-owned, agent-agnostic, versioned with the code.
CLAUDE.md · AGENTS.md · .cursor/rules/ · … The agent's menu file — merges additively under your existing H1, bracketed by markers so re-installs refresh in place. Format depends on the agent.
After install, keystone is gone. The harness lives in the repo and evolves with the code.
Harness Components

Four components. Three are content (corpus, guides, sensors); one is the dynamics that keep the content current (the flywheels). Each has a distinct activation — when and how the agent picks it up.

Component What it is · what it contains Author Activation
guides
Rules. IRON LAWs (non-negotiable) and GOLDEN RULES (strong defaults). Sublayers: guides/principles/, guides/idioms/<stack>/, guides/domain/, guides/process/. The drift sensor enforces them.
Extracted from corpus + Learning flywheel
Ambient · always loaded
corpus
Informational reference. The reasoning, the citations, the anti-patterns. Sublayers: corpus/principles/, corpus/idioms/<stack>/, corpus/domain/, corpus/state/. Each pairs with a guide via forward-link.
Literature · domain expert · agent
On demand · forward-link from a guide
sensors
Automated checks. 12 sensors in sensors/: lint, type-check, test, build, drift, coverage, risk-fingerprint, traffic-topology, state-region, commit-message, tracker-card-fetcher, spec-adherence. Each declares trigger, inputs, exit condition, output, state writes.
Lead engineer + adapter
Invoked · fires inside lifecycle actions at phase boundaries
flywheels
Self-update. learning/ stages candidates (rule → guides, info → corpus); synthesize promotes. archive/ receives pruned content — guides regularly, corpus rarely (only on design/strategy shifts).
Agent writes · human gates
Invoked · learn, synthesize, audit

The corpus/guides split is the central move in 0.3.0. Rules are short and high-value-per-token, so they live always-loaded in guides/. Long-form reasoning sits in corpus/ and is reached only when an agent needs the why behind a rule — see Context Budgeting below.

Harness Loading Order

There is no enforcement mechanism — the loading order is a discipline encoded in the menu file, the lifecycle actions, and the adapter activation docs. The agent follows it because the rules tell it to, and because each step earns its way into context.

  1. 01 → Session start — menu file auto-loads.

    The agent reads its menu (CLAUDE.md, AGENTS.md, .continuerules, etc.) — a short pointer at harness/ with the four components and the iron laws inline.

  2. 02 → Guides become ambient.

    Per-agent activation projects harness/guides/ into the agent's rules surface — Cursor's .cursor/rules/*.mdc, Claude Code's directive block, Cline's custom-instructions field. The agent is now under every IRON LAW and GOLDEN RULE.

  3. 03 → Orient reads state.

    The orient action reads corpus/state/CODEBASE_STATE.md and active migrations affecting the touched region. State is the only corpus file pulled into context at task start.

  4. 04 → Stack-specific guides lazy-load by region.

    guides/idioms/<stack>/ for the touched paths comes into context. Other stacks don't.

  5. 05 → Sensors fire at phase boundaries.

    No ambient cost. check-drift at implementation exit; verify at the verification gate runs lint, type-check, test, build, drift, commit-message. Output flows back as a transient turn artifact.

  6. 06 → Corpus on demand — forward-link only.

    Each guide ends with a forward-link to its paired corpus file. The agent reads that corpus only when the rule isn't enough — to recall the reasoning, the citation, or an anti-pattern the team called out.

Steps 1, 2, and 5 are mechanical (adapter and process layer enforce them). Steps 3, 4, 6 are model behavior under the rules — the rule "reach for the paired corpus only when needed" is what keeps context low. See Context Budgeting next.

Context Budgeting

A different lens on the same dynamic. Loading order is the how; context budgeting is the why. The harness assumes context is the scarce resource and spends it accordingly.

always-on

Guides — small, dense, high signal

Each guide is short — an IRON LAW plus a few GOLDEN RULES. Tens of files, not hundreds of lines each. Loading them all ambient stays well within a working budget while keeping the agent under every constraint at once.

on-demand

Corpus — long-form, reached only when needed

Each corpus file carries the reasoning, citations, and anti-patterns that justify a guide. Most tasks never need them — the rule is enough. When the rule isn't enough, the agent follows the forward-link and pays the cost only then.

transient

Sensors — zero ambient cost

A sensor's contract document is informational; the sensor itself is shell output. It lives in context for one turn at most, then leaves with the verify cycle. Sensors carry signal density without persistent context cost.

cold

Flywheels — not loaded during normal work

learning/inbox/ and archive/ are operational. The agent writes there during learn and audit; it doesn't read them during implementation. Zero baseline cost.

The asymmetry is deliberate. Always-loading the corpus would crowd context with reasoning the situation often doesn't need. Always-loading the guides keeps the agent honest at all times without paying for that reasoning. The agent reaches for corpus when — and only when — the rule alone is not sufficient.

Iron Laws & Golden Rules

Every guide is built from two kinds of rule. The drift sensor enforces both — IRON LAW violations fail; GOLDEN RULE violations surface as warnings.

non-negotiable

IRON LAW

A constraint the agent must never violate. No reasoning escape hatch. Iron laws are rare, deliberate, and survive across stacks and projects.

strong default

GOLDEN RULES

Strongly preferred behavior — deviation requires explicit reasoning, recorded in the work. Most rules in the harness are golden; iron is reserved for the truly inviolable.

The five process iron laws apply across every phase:

Tool Assumptions

The harness is most useful when the surrounding tooling looks like this. None of these are hard requirements — install runs regardless. Missing one degrades the corresponding phase to "ask the user / surface the gap"; it does not break the harness.

spec phase

Issue tracker

Anywhere on the spectrum from a full tracker (Jira, Linear, GitHub Issues, Asana) to a TODO.md to a conversation. The spec phase opens on a unit of work — a tracker card ID lets the agent fetch the description automatically; without one, the spec is authored inline.

review + release

PR workflow

A change-request flow like GitHub PRs or GitLab MRs. The review phase spawns review agents on the diff and re-runs sensors against comment-driven changes; the release phase opens the PR with the tracker link in the body.

verify phase

Adequate sensors

Lint, type-check, test runner, build, optionally coverage — automated checks that do a good job of constraining behavior. Not strictly required, but as sensor quality improves, so does the harness as a whole. Commands live in harness/corpus/state/CODEBASE_STATE.md; the 12 sensor contracts live in harness/sensors/.

release phase

CI pipeline (CD recommended)

GitHub Actions, GitLab CI, CircleCI, Jenkins — anything that runs sensors on every push. The release phase assumes CI gates the merge. Continuous Delivery (every commit produces a releasable artifact; the deploy itself stays a business decision) is highly recommended and makes the harness work better — small batches, fast feedback, low-risk releases. Continuous Deployment is an extension of CD, not a prerequisite. See corpus/principles/continuous-delivery.md — opt-in via --architecture continuous-delivery.

Install

Homebrew (macOS / Linux)

$ brew install tacoda/tap/keystone

Curl bootstrap (macOS / Linux)

Downloads the binary into ~/.local/bin/keystone, then prompts for the agent and runs keystone init:

$ curl -fsSL https://raw.githubusercontent.com/tacoda/keystone/main/install.sh | sh

Inspect before running (recommended):

$ curl -fsSL https://raw.githubusercontent.com/tacoda/keystone/main/install.sh > install.sh
$ less install.sh
$ sh install.sh

Set KEYSTONE_NO_INIT=1 to install the binary only.

PowerShell (Windows)

PS> iwr -useb https://raw.githubusercontent.com/tacoda/keystone/main/install.ps1 | iex

Manual

Download a prebuilt archive from the releases page, extract keystone (or keystone.exe), and place it on your PATH.

CLI Usage
# Scaffold harness/ and the agent's menu file into <dir>.
# Prompts interactively (huh) for any unset options when stdin is a TTY;
# fails fast with a useful error when stdin is not a TTY and --agent is unset.
$ keystone init [<dir>] [flags]

# Print the catalog of allowed labels for every option flag
$ keystone options

# Print the binary version
$ keystone version

# Print usage
$ keystone help

init is interactive in a terminal, scriptable in CI. In a TTY, every unset option becomes a prompt; outside a TTY, options must be supplied as flags. Multi-select flags accept comma-separated values (e.g. --architecture hexagonal,continuous-delivery).

--agent <label>

Agent target

One of: claude-code, codex, cursor, aider, github-copilot, continue, cline, goose, pi, generic. If omitted, keystone detects from marker files; falls back to interactive prompt in a TTY.

--force

Overwrite harness/

Re-run with --force to overwrite an existing harness/. Menu files (CLAUDE.md, CONVENTIONS.md, etc.) merge additively.

Install-time Options — declared intent

These are the questions only a human can answer — declared intent, not facts a script could infer from the codebase. In a TTY the installer prompts for each; in CI you pass them as flags. Selections land in harness/corpus/state/INSTALL_PROFILE.md; the bootstrap action reads them as the starting hint when it audits the codebase.

Each --architecture and --compliance label installs a paired corpus + guide — the explanatory file under corpus/principles/ and the rule-bearing file under guides/principles/. Some labels also seed concern-specific files (e.g., --architecture mvc drops corpus/idioms/mvc/{models,controllers,views}.md and paired guides/idioms/mvc/ files).

--agent

Agent

Which AI coding agent will read this harness. Determines the menu file and adapter bundle installed.

--app-type

App type

web-application · web-api · cli-tool · library · mobile-app · desktop-app · data-pipeline · embedded

--architecture

Architecture (multi)

Opinionated patterns the project follows: hexagonal, clean-architecture, onion-architecture, layered, mvc, mvvm, event-driven, microservices, monolith, serverless, spa, continuous-delivery. Each label installs paired corpus + guide files; mvc also seeds concern-specific idioms (models, controllers, views).

--testing

Testing (multi)

Disciplines the team practices: tdd, bdd, atdd, property-based, snapshot, manual-qa, none-yet.

--compliance

Compliance scope (multi)

Regimes the system must satisfy: gdpr, hipaa, pci-dss, soc2, fedramp. Each label installs paired corpus + guide files under corpus/principles/ and guides/principles/.

bootstrap-inferred

Not asked here

Language stack, database, CI platform, deployment target, project maturity, team size — bootstrap detects these from the codebase on first run, where it has accurate context. No --language flag exists in v0.2.0+.

Run keystone options for the full catalog of allowed labels per flag.

First-run: the Bootstrap step

Immediately after keystone init, run the bootstrap heavyweight action in your agent. Bootstrap is an initial audit of a fresh install — it makes the harness specific to your codebase.

Bootstrap reads corpus/state/INSTALL_PROFILE.md for your declared intent, then detects everything else from the codebase: language stack, database, CI platform, deployment target. It seeds three things — paired corpus/idioms/<stack>/ + guides/idioms/<stack>/ from real patterns in your code, initializes corpus/state/CODEBASE_STATE.md with sensor commands and activation map, and confirms the sensor command set. Until bootstrap runs, the harness is generic.

Three steps after install:

  1. 01 → Read harness/README.md — orientation to the four components and the iron laws.
  2. 02 → Ask your agent to run the bootstrap action. It seeds corpus/state/CODEBASE_STATE.md, paired corpus/idioms/<your-stack>/ + guides/idioms/<your-stack>/, and confirms sensor commands.
  3. 03 → Commit harness/ and any agent-specific files the installer created.
The Six-Phase Workflow
01spec
02plan
03implement
04verify
05review
06release
Every task flows through these six phases. spec anchors a unit of work (tracker card, TODO line, conversation). planning orients to the region and writes a plan. implementation works inside ambient idioms. verification gates the commit on sensors (lint, type, tests, build). review spawns review agents on the diff. release opens the PR; CI gates the merge.
Lifecycle Actions — invoked by the agent inside phases
Task entry

specspec phase

Capture intent and acceptance criteria from a tracker card or inline. Gate: spec approved before planning begins.

Pre-task

orientplanning entry

Read corpus/state/CODEBASE_STATE.md, lazy-load matching guide rules for the touched region, sketch a plan. Gate: plan approved.

Post-edit

check-driftimplementation exit

Did the edit violate a loaded guide? The drift sensor compares the diff against IRON LAWs (fail) and GOLDEN RULES (warn). Cheaper to fix while the change is fresh.

Pre-commit

verifyverification gate

Run sensors from corpus/state/CODEBASE_STATE.md: lint, type-check, tests, build, drift, commit-message. No completion claims without fresh evidence — sensors must have run this turn.

Post-verify

reviewreview phase

Walk spec acceptance criteria + functional review + security review on the diff. Gate: no blockers, AC met.

Post-commit

learnrelease / capture

Stage candidates into learning/inbox/ classified as rule (→ guides) or information (→ corpus). Nothing is promoted yet.

How each action is actually invoked in your agent (slash command, rules-file trigger, natural language) lives in harness/adapters/<your-agent>/lifecycle.md.

Heavyweight Actions — invoked by the user
Day 1

bootstrap

Initial audit of a fresh install. Detects stack, seeds paired corpus/idioms/<stack>/ + guides/idioms/<stack>/, initializes corpus/state/, confirms sensor commands.

Periodic

audit

Asymmetric dual audit. Guides are audited regularly against the codebase (rules churn as code does). Corpus is audited rarely, only when the team's design or strategy has shifted.

After learn

synthesize

Promote items from learning/inbox/. Classifies each as rule (lands in guides) or information (lands in corpus); some land in both. Human gates by confidence.

Config

mode

Set the pacing mode (paired / solo / autopilot). Controls how often the agent stops to ask.

The Two Flywheels

⟳ Learning Flywheel — additive

01 Agent encounters a situation not covered by the harness — makes a judgment call.
02 learn writes the candidate to harness/learning/inbox/ with a candidate_kind hint: rule, information, or both.
03 Human gates by confidence — the higher bar for new rules (they narrow the agent across every project) than for new info.
04 synthesize classifies and lands the content: rules into guides/<layer>/, info into corpus/<layer>/, sometimes both.
05 Harness grows — fewer gaps, cleaner signal on remaining gaps, loop tightens.

⟲ Pruning Flywheel — subtractive · asymmetric

01 audit runs two passes: a regular sweep of guides/ against the codebase, and a rare sweep of corpus/.
02 Guide staleness — a rule names a removed API, contradicts a newer rule, or the code no longer follows it. Guide audits happen on cadence.
03 Corpus staleness — only when the team's design, strategy, direction, or ideals have moved on. Not a maintenance task; a deliberate act.
04 Content moves to archive/ with reasoning recorded — never deleted, history preserved.
05 Harness accuracy improves — higher signal-to-noise, next audit surfaces real problems faster.
Pacing Modes
Default

paired

Ask at every phase boundary; the user drives. Best for unfamiliar code or high-risk changes.

Independent

solo

Work independently through phases; stop only on hard problems. Best for well-scoped work in familiar code.

Hands-off

autopilot

Maximally autonomous; produces an assumption log at the end for review. Best for batch refactors and routine cleanup.

Set with the mode action: mode <paired|solo|autopilot>. Agents without a notion of autonomy levels collapse to a single mode; the phases still run.

Supported Agents

Every adapter ships lifecycle.md, activation.md, and sensors.md. Adapters with capability gaps (no slash commands, no sub-agent parallelism, etc.) get an install-time warning — see below.

Agent Adapter Menu file installed
Claude Code full CLAUDE.md
Codex CLI full AGENTS.md
pi.dev full AGENTS.md + .pi/prompts/
Cursor full .cursor/rules/*.mdc (7 scaffolded)
Aider real · gaps CONVENTIONS.md
GitHub Copilot (VS Code + CLI) real · gaps .github/copilot-instructions.md
Continue real · gaps .continuerules
Cline / Roo Code real · gaps cline-instructions.md (paste)
Goose real · gaps .goosehints
(any other) fallback AGENTS.md

full — adapter covers every harness feature natively. real · gaps — adapter covers the lifecycle but the agent lacks one or more native features (slash commands, sub-agent parallelism, autonomous tracker integration, etc.). keystone init prints the specific gaps after install with actionable remedies.

Install-time Warnings

When an agent does not natively cover a harness feature, keystone init prints a ⚠ <agent> adapter — capability gaps to address section before the success message. Each gap names two paths the user can take:

configure

Configure the agent

Where the gap is closeable: enable an extension, set a config flag, paste a workflow definition. For example, Aider needs auto-commits: false; Cline needs an auto-approve whitelist for the verify cycle; Goose needs the developer extension enabled.

document

Add a harness file

Where the agent can't be configured for the feature — or you'd rather work around it explicitly: add a file like harness/adapters/<agent>/review-strategy.md describing how your team handles the gap. The agent reads that file as part of the adapter's activation contract.

# Example: keystone init --agent aider

⚠ aider adapter — capability gaps to address

  • sub-agent parallelism (review)
      Impact: review runs each concern sequentially in one chat.
      Or document: add harness/adapters/aider/review-strategy.md.

  • commit-message sensor gate
      Impact: Aider's default auto-commits fire before the sensor can run.
      Configure: set `auto-commits: false` in `.aider.conf.yml`.

  • native tracker integration
      Impact: tracker-card-fetcher falls back to shell or paste.
      Configure: use `/run gh issue view <id>` or paste card content.
      Or document: add harness/adapters/aider/tracker-workflow.md.

✓ keystone installed for aider.

Fully-supported adapters (claude-code, codex, pi, cursor) print no warning. The warning content is per-agent in the binary's warnings.go; if a gap closes upstream, the warning drops in the next keystone release.

What gets installed
your-project/
├──CLAUDE.md← agent menu file (merges additively under your H1)
└──harness/← four-component harness (project-owned)
├──README.md← orientation to the four components · iron laws
├──guides/← rules · always loaded · enforced by the drift sensor
│ ├──principles/29 rule files extracted from corpus/principles/
│ ├──idioms/<stack>/stack-specific rules (lazy by region · bootstrap fills in)
│ ├──domain/business-rule constraints
│ └──process/spec · planning · implementation · verification · review · release · modes
├──corpus/← informational reference · on demand · reasoning behind the rules
│ ├──principles/29 cited files (SOLID · TDD · BDD · concurrency · security · …)
│ ├──idioms/<stack>/stack-specific reasoning (bootstrap fills in)
│ ├──domain/product shape, invariants, vocabulary
│ └──state/CODEBASE_STATE.md · INSTALL_PROFILE.md · risk-fingerprints · migrations
├──sensors/← 12 automated checks · invoked at phase boundaries
│ └──lint · type-check · test · build · drift · coverage · risk-fingerprint · …
├──learning/← Learning flywheel staging · inbox / promoted / rejected
├──archive/← Pruning flywheel destination · guides regularly, corpus rarely
└──adapters/← per-agent bindings · lifecycle + activation + sensors
└──claude-code · cursor · aider · github-copilot · codex · cline · continue · goose · pi · _generic
+ opt-in content for selected --architecture and --compliance labels lands as paired corpus + guide files. --architecture mvc also seeds concern-specific idioms (models, controllers, views).