A self-updating harness — corpus, guides, sensors, and flywheels — that scaffolds a Level 2 project harness around your AI coding agent. One install command, owned by your repo, agent-agnostic.
Prefer to skip the prompts? Pass every option as a flag and init runs non-interactively — handy for CI, dotfiles, and onboarding scripts:
$ keystone init \
--agent claude-code \
--app-type web-api \
--architecture hexagonal,continuous-delivery \
--testing tdd,property-based \
--compliance soc2
Multi-select flags take comma-separated values. Run keystone options for the full catalog of labels per flag.
Keystone generates a Level 2 project harness — codebase-level scaffolding an AI coding agent reads on every task. No language runtime, no daemon. The keystone CLI is a single Go binary with the entire harness embedded; keystone init writes the harness and an agent menu file into your repo, then gets out of the way.
After install, the binary is no longer required. The harness is plain markdown you own and version alongside your code. It has four components: guides (rules — always loaded), corpus (informational reference — loaded on demand), sensors (automated checks that fire at phase boundaries), and two flywheels — Learning adds rules and reasoning, Pruning removes stale ones (guides regularly, corpus rarely).
Your agent drives a six-phase workflow (spec → planning → implementation → verification → review → release) under the always-loaded guide rules, reaching for corpus reasoning only when the rules aren't enough.
The harness stack has five levels. Keystone generates a Level 2 (Project) harness — codebase-scoped scaffolding the agent reads on every task. It blurs into Level 3 (Organization) because the upper corpus layers — principles, process, adapters — are not codebase-specific and naturally travel across an organization's projects, even though the install target is a single repository.
~/.claude/CLAUDE.md.
CLAUDE.md files, static analysis config — the terrain a single codebase presents to its agents.
← keystone
Why the blur? Keystone's corpus/principles/, guides/principles/, guides/process/, and sensors/ are universal; adapters/ are per-agent, not per-project. Cherry-pick those into every repo in your org and you have de facto Level 3 coverage — without a separate Level 3 mechanism. Only corpus/idioms/, corpus/domain/, guides/idioms/, guides/domain/, and corpus/state/ are genuinely project-scoped.
Four components. Three are content (corpus, guides, sensors); one is the dynamics that keep the content current (the flywheels). Each has a distinct activation — when and how the agent picks it up.
guides/principles/, guides/idioms/<stack>/, guides/domain/, guides/process/. The drift sensor enforces them.corpus/principles/, corpus/idioms/<stack>/, corpus/domain/, corpus/state/. Each pairs with a guide via forward-link.sensors/: lint, type-check, test, build, drift, coverage, risk-fingerprint, traffic-topology, state-region, commit-message, tracker-card-fetcher, spec-adherence. Each declares trigger, inputs, exit condition, output, state writes.learning/ stages candidates (rule → guides, info → corpus); synthesize promotes. archive/ receives pruned content — guides regularly, corpus rarely (only on design/strategy shifts).
The corpus/guides split is the central move in 0.3.0. Rules are short and high-value-per-token, so they live always-loaded in guides/. Long-form reasoning sits in corpus/ and is reached only when an agent needs the why behind a rule — see Context Budgeting below.
There is no enforcement mechanism — the loading order is a discipline encoded in the menu file, the lifecycle actions, and the adapter activation docs. The agent follows it because the rules tell it to, and because each step earns its way into context.
The agent reads its menu (CLAUDE.md, AGENTS.md, .continuerules, etc.) — a short pointer at harness/ with the four components and the iron laws inline.
Per-agent activation projects harness/guides/ into the agent's rules surface — Cursor's .cursor/rules/*.mdc, Claude Code's directive block, Cline's custom-instructions field. The agent is now under every IRON LAW and GOLDEN RULE.
The orient action reads corpus/state/CODEBASE_STATE.md and active migrations affecting the touched region. State is the only corpus file pulled into context at task start.
guides/idioms/<stack>/ for the touched paths comes into context. Other stacks don't.
No ambient cost. check-drift at implementation exit; verify at the verification gate runs lint, type-check, test, build, drift, commit-message. Output flows back as a transient turn artifact.
Each guide ends with a forward-link to its paired corpus file. The agent reads that corpus only when the rule isn't enough — to recall the reasoning, the citation, or an anti-pattern the team called out.
Steps 1, 2, and 5 are mechanical (adapter and process layer enforce them). Steps 3, 4, 6 are model behavior under the rules — the rule "reach for the paired corpus only when needed" is what keeps context low. See Context Budgeting next.
A different lens on the same dynamic. Loading order is the how; context budgeting is the why. The harness assumes context is the scarce resource and spends it accordingly.
Each guide is short — an IRON LAW plus a few GOLDEN RULES. Tens of files, not hundreds of lines each. Loading them all ambient stays well within a working budget while keeping the agent under every constraint at once.
Each corpus file carries the reasoning, citations, and anti-patterns that justify a guide. Most tasks never need them — the rule is enough. When the rule isn't enough, the agent follows the forward-link and pays the cost only then.
A sensor's contract document is informational; the sensor itself is shell output. It lives in context for one turn at most, then leaves with the verify cycle. Sensors carry signal density without persistent context cost.
learning/inbox/ and archive/ are operational. The agent writes there during learn and audit; it doesn't read them during implementation. Zero baseline cost.
The asymmetry is deliberate. Always-loading the corpus would crowd context with reasoning the situation often doesn't need. Always-loading the guides keeps the agent honest at all times without paying for that reasoning. The agent reaches for corpus when — and only when — the rule alone is not sufficient.
Every guide is built from two kinds of rule. The drift sensor enforces both — IRON LAW violations fail; GOLDEN RULE violations surface as warnings.
A constraint the agent must never violate. No reasoning escape hatch. Iron laws are rare, deliberate, and survive across stacks and projects.
Strongly preferred behavior — deviation requires explicit reasoning, recorded in the work. Most rules in the harness are golden; iron is reserved for the truly inviolable.
The five process iron laws apply across every phase:
--no-verify.The harness is most useful when the surrounding tooling looks like this. None of these are hard requirements — install runs regardless. Missing one degrades the corresponding phase to "ask the user / surface the gap"; it does not break the harness.
Anywhere on the spectrum from a full tracker (Jira, Linear, GitHub Issues, Asana) to a TODO.md to a conversation. The spec phase opens on a unit of work — a tracker card ID lets the agent fetch the description automatically; without one, the spec is authored inline.
A change-request flow like GitHub PRs or GitLab MRs. The review phase spawns review agents on the diff and re-runs sensors against comment-driven changes; the release phase opens the PR with the tracker link in the body.
Lint, type-check, test runner, build, optionally coverage — automated checks that do a good job of constraining behavior. Not strictly required, but as sensor quality improves, so does the harness as a whole. Commands live in harness/corpus/state/CODEBASE_STATE.md; the 12 sensor contracts live in harness/sensors/.
GitHub Actions, GitLab CI, CircleCI, Jenkins — anything that runs sensors on every push. The release phase assumes CI gates the merge. Continuous Delivery (every commit produces a releasable artifact; the deploy itself stays a business decision) is highly recommended and makes the harness work better — small batches, fast feedback, low-risk releases. Continuous Deployment is an extension of CD, not a prerequisite. See corpus/principles/continuous-delivery.md — opt-in via --architecture continuous-delivery.
$ brew install tacoda/tap/keystone
Downloads the binary into ~/.local/bin/keystone, then prompts for the agent and runs keystone init:
$ curl -fsSL https://raw.githubusercontent.com/tacoda/keystone/main/install.sh | sh
Inspect before running (recommended):
$ curl -fsSL https://raw.githubusercontent.com/tacoda/keystone/main/install.sh > install.sh $ less install.sh $ sh install.sh
Set KEYSTONE_NO_INIT=1 to install the binary only.
PS> iwr -useb https://raw.githubusercontent.com/tacoda/keystone/main/install.ps1 | iex
Download a prebuilt archive from the releases page, extract keystone (or keystone.exe), and place it on your PATH.
# Scaffold harness/ and the agent's menu file into <dir>. # Prompts interactively (huh) for any unset options when stdin is a TTY; # fails fast with a useful error when stdin is not a TTY and --agent is unset. $ keystone init [<dir>] [flags] # Print the catalog of allowed labels for every option flag $ keystone options # Print the binary version $ keystone version # Print usage $ keystone help
init is interactive in a terminal, scriptable in CI. In a TTY, every unset option becomes a prompt; outside a TTY, options must be supplied as flags. Multi-select flags accept comma-separated values (e.g. --architecture hexagonal,continuous-delivery).
One of: claude-code, codex, cursor, aider, github-copilot, continue, cline, goose, pi, generic. If omitted, keystone detects from marker files; falls back to interactive prompt in a TTY.
Re-run with --force to overwrite an existing harness/. Menu files (CLAUDE.md, CONVENTIONS.md, etc.) merge additively.
These are the questions only a human can answer — declared intent, not facts a script could infer from the codebase. In a TTY the installer prompts for each; in CI you pass them as flags. Selections land in harness/corpus/state/INSTALL_PROFILE.md; the bootstrap action reads them as the starting hint when it audits the codebase.
Each --architecture and --compliance label installs a paired corpus + guide — the explanatory file under corpus/principles/ and the rule-bearing file under guides/principles/. Some labels also seed concern-specific files (e.g., --architecture mvc drops corpus/idioms/mvc/{models,controllers,views}.md and paired guides/idioms/mvc/ files).
Which AI coding agent will read this harness. Determines the menu file and adapter bundle installed.
web-application · web-api · cli-tool · library · mobile-app · desktop-app · data-pipeline · embedded
Opinionated patterns the project follows: hexagonal, clean-architecture, onion-architecture, layered, mvc, mvvm, event-driven, microservices, monolith, serverless, spa, continuous-delivery. Each label installs paired corpus + guide files; mvc also seeds concern-specific idioms (models, controllers, views).
Disciplines the team practices: tdd, bdd, atdd, property-based, snapshot, manual-qa, none-yet.
Regimes the system must satisfy: gdpr, hipaa, pci-dss, soc2, fedramp. Each label installs paired corpus + guide files under corpus/principles/ and guides/principles/.
Language stack, database, CI platform, deployment target, project maturity, team size — bootstrap detects these from the codebase on first run, where it has accurate context. No --language flag exists in v0.2.0+.
Run keystone options for the full catalog of allowed labels per flag.
Immediately after keystone init, run the bootstrap heavyweight action in your agent. Bootstrap is an initial audit of a fresh install — it makes the harness specific to your codebase.
Bootstrap reads corpus/state/INSTALL_PROFILE.md for your declared intent, then detects everything else from the codebase: language stack, database, CI platform, deployment target. It seeds three things — paired corpus/idioms/<stack>/ + guides/idioms/<stack>/ from real patterns in your code, initializes corpus/state/CODEBASE_STATE.md with sensor commands and activation map, and confirms the sensor command set. Until bootstrap runs, the harness is generic.
Three steps after install:
harness/README.md — orientation to the four components and the iron laws.
corpus/state/CODEBASE_STATE.md, paired corpus/idioms/<your-stack>/ + guides/idioms/<your-stack>/, and confirms sensor commands.
harness/ and any agent-specific files the installer created.
Capture intent and acceptance criteria from a tracker card or inline. Gate: spec approved before planning begins.
Read corpus/state/CODEBASE_STATE.md, lazy-load matching guide rules for the touched region, sketch a plan. Gate: plan approved.
Did the edit violate a loaded guide? The drift sensor compares the diff against IRON LAWs (fail) and GOLDEN RULES (warn). Cheaper to fix while the change is fresh.
Run sensors from corpus/state/CODEBASE_STATE.md: lint, type-check, tests, build, drift, commit-message. No completion claims without fresh evidence — sensors must have run this turn.
Walk spec acceptance criteria + functional review + security review on the diff. Gate: no blockers, AC met.
Stage candidates into learning/inbox/ classified as rule (→ guides) or information (→ corpus). Nothing is promoted yet.
How each action is actually invoked in your agent (slash command, rules-file trigger, natural language) lives in harness/adapters/<your-agent>/lifecycle.md.
Initial audit of a fresh install. Detects stack, seeds paired corpus/idioms/<stack>/ + guides/idioms/<stack>/, initializes corpus/state/, confirms sensor commands.
Asymmetric dual audit. Guides are audited regularly against the codebase (rules churn as code does). Corpus is audited rarely, only when the team's design or strategy has shifted.
Promote items from learning/inbox/. Classifies each as rule (lands in guides) or information (lands in corpus); some land in both. Human gates by confidence.
Set the pacing mode (paired / solo / autopilot). Controls how often the agent stops to ask.
candidate_kind hint: rule, information, or both.
guides/<layer>/, info into corpus/<layer>/, sometimes both.
guides/ against the codebase, and a rare sweep of corpus/.
Ask at every phase boundary; the user drives. Best for unfamiliar code or high-risk changes.
Work independently through phases; stop only on hard problems. Best for well-scoped work in familiar code.
Maximally autonomous; produces an assumption log at the end for review. Best for batch refactors and routine cleanup.
Set with the mode action: mode <paired|solo|autopilot>. Agents without a notion of autonomy levels collapse to a single mode; the phases still run.
Every adapter ships lifecycle.md, activation.md, and sensors.md. Adapters with capability gaps (no slash commands, no sub-agent parallelism, etc.) get an install-time warning — see below.
full — adapter covers every harness feature natively. real · gaps — adapter covers the lifecycle but the agent lacks one or more native features (slash commands, sub-agent parallelism, autonomous tracker integration, etc.). keystone init prints the specific gaps after install with actionable remedies.
When an agent does not natively cover a harness feature, keystone init prints a ⚠ <agent> adapter — capability gaps to address section before the success message. Each gap names two paths the user can take:
Where the gap is closeable: enable an extension, set a config flag, paste a workflow definition. For example, Aider needs auto-commits: false; Cline needs an auto-approve whitelist for the verify cycle; Goose needs the developer extension enabled.
Where the agent can't be configured for the feature — or you'd rather work around it explicitly: add a file like harness/adapters/<agent>/review-strategy.md describing how your team handles the gap. The agent reads that file as part of the adapter's activation contract.
# Example: keystone init --agent aider
⚠ aider adapter — capability gaps to address
• sub-agent parallelism (review)
Impact: review runs each concern sequentially in one chat.
Or document: add harness/adapters/aider/review-strategy.md.
• commit-message sensor gate
Impact: Aider's default auto-commits fire before the sensor can run.
Configure: set `auto-commits: false` in `.aider.conf.yml`.
• native tracker integration
Impact: tracker-card-fetcher falls back to shell or paste.
Configure: use `/run gh issue view <id>` or paste card content.
Or document: add harness/adapters/aider/tracker-workflow.md.
✓ keystone installed for aider.
Fully-supported adapters (claude-code, codex, pi, cursor) print no warning. The warning content is per-agent in the binary's warnings.go; if a gap closes upstream, the warning drops in the next keystone release.