Overview · keystone-mcp v0.2.0

The end-to-end harness manager for any project.

A single MCP server (keystone-mcp on PyPI) that owns the full lifecycle of a project harness — scaffold the template tree under .keystone/harness/, broker rules and reasoning from any source (markdown, GitHub, Confluence, Notion, Jira, Linear, Slack), run sensors at phase boundaries, resolve a cascade across external sources, and apply forward-only shipped-template patches as the manager evolves. No central service. Markdown only. uvx keystone-mcp and your agent has a harness.

~/your-app

$ uvx keystone-mcp
● keystone-mcp v0.2.0 ready (FastMCP · stdio transport)
$ claude
> use keystone_harness_bootstrap to scaffold the harness
▸ scaffolding → .keystone/harness/{guides,corpus,sensors,actions,playbooks,skills,adapters}/
▸ wrote .keystone/context.yaml + CLAUDE.md overlay
▸ registered 4 prompts · 14 tools · 9 resources
✓ harness ready. agent sees keystone://context/list.
> list topics

┌─ keystone_list_topics ─────────────────────────┐ │ › deploy-policy rules · reasoning │ │ release-checklist skills │ │ repo-ownership rules (codeowners) │ └────────────────────────────────────────────────┘

▸ 3 topics · 2 sources
✶ Thinking… (3s · esc to interrupt)

Overview · The broker model

Agent asks for a topic. Broker fans out. One envelope returns.

Instead of cramming organizational context into every system prompt, the agent calls keystone_get_context(topic) or reads keystone://context/{topic} and the broker resolves the request across every configured source, classifies each fragment into one of four kinds, and returns a single typed envelope. The agent treats each kind differently.

constraints to obey

`rules`

Hard constraints. Three severities: must / should / may. When two sources contribute the same rule (normalized text match), highest severity wins; ties keep both citations.

background & intent

`reasoning`

Why the rule exists, what incident drove it, what tradeoffs were considered. Reasoning is additive across sources — no deduplication.

how-to playbooks

`skills`

Multi-step procedural knowledge — how to deploy, how to roll back, how to run the release checklist. Each H3 inside the configured skills heading becomes one named skill.

canned invocations

`commands`

Shell commands, scripts, named recipes. Each H3 becomes one command; the first code block under it is the invocation, the rest is documentation.

Why a broker. Organizational context lives where the people who own it already work — markdown in the repo, pages in Confluence, tickets in Jira, pinned messages in Slack. Hand-copying that context into every system prompt is brittle and goes stale fast. The broker turns each external source into a uniform envelope the agent can consume on demand, with per-topic TTLs and an optional sqlite cache.

Overview · The wire shape

Every retrieval returns the same envelope.

Whether the topic resolves against one markdown file or fans out across markdown, GitHub, and Notion, the agent sees one consistent shape — four typed lists plus provenance.

{
  "topic": "deploy-policy",
  "rules": [
    {
      "id": "rules-001",
      "text": "run full CI green before any production deploy.",
      "source": "markdown://deploy-policy.md#rules",
      "severity": "must"
    }
  ],
  "reasoning": [
    {
      "text": "The team adopted these rules after a 2025 incident.",
      "source": "markdown://deploy-policy.md#background"
    }
  ],
  "skills": [],
  "commands": [],
  "fetched_at": "2026-06-11T14:32:00+00:00",
  "cache_hit": false
}

source is a URI scheme — markdown://, github://, confluence://, notion://, jira://, linear://, slack://, harness://. The agent (or you) can trace any rule back to the file, page, or message it came from. cache_hit tells the agent whether to trust the freshness implied by the topic's TTL.

Getting started · Install

Install the MCP server, then wire it into your agent.

Published to PyPI as keystone-mcp. Pick the install method that matches how you already run Python tools. The server has no runtime daemon — it speaks MCP over stdio, started on demand by your agent.

uvx (zero-install one-shot)

$ uvx keystone-mcp

Best for trying it out and for .mcp.json configs that should not pin a global install.

pip

$ pip install keystone-mcp
$ pip install "keystone-mcp[tokens]"   # + tiktoken-backed budget tokenizer

Without the tokens extra, keystone://harness/budget falls back to a deterministic word-count proxy (~0.75 words / token). With the extra, the budget reports exact cl100k_base token counts.

pipx (isolated install + on PATH)

$ pipx install keystone-mcp

From source

$ git clone https://github.com/tacoda/keystone-mcp.git
$ cd keystone-mcp
$ uv sync
$ uv run keystone-mcp

Wire into a Claude Code (or any MCP host) project

Add to .mcp.json:

{
  "mcpServers": {
    "keystone": {
      "command": "uvx",
      "args": ["keystone-mcp"],
      "env": {
        "KEYSTONE_CONFIG": "/path/to/your/project/.keystone/context.yaml"
      }
    }
  }
}

KEYSTONE_CONFIG defaults to .keystone/context.yaml relative to the working directory. The .keystone/ directory is team-shared and version-controlled — never put secrets there; reference env vars with the env: prefix instead.

Getting started · Your first topic

Three files to a working topic.

A topic is the agent-facing abstraction: a named bundle of rules, reasoning, skills, and commands resolved from one or more sources. Here is the smallest end-to-end shape — one markdown source, one topic, one file.

Create .keystone/context.yaml

sources:
  docs:
    type: markdown
    root: .keystone/context/

topics:
  deploy-policy:
    description: |
      Rules and context for production deploys.
    sources:
      - source: docs
        query: { file: deploy-policy.md }
        classify:
          rules: { heading: "Rules", severity: must }
          reasoning: { heading: "Background" }
    cache: 15m

Create .keystone/context/deploy-policy.md

# Deploy Policy

## Rules

- MUST run full CI green before any production deploy.
- SHOULD prefer Tuesday/Wednesday morning deploys.

## Background

The team adopted these rules after a 2025 incident.

Start the server. The agent sees the topic.
keystone_list_topics now returns deploy-policy. The agent reads keystone://context/deploy-policy and gets the envelope — one must rule, one should rule, one reasoning bullet, all cited back to the markdown file.

The repo's own .keystone/context.yaml is a working example with topics for deploys, ownership, coding standards, and a release playbook — plus commented-out examples of every external adapter.

MCP surface · Tools

Tools — parameterized retrieval and scaffold operations.

Every tool is prefixed keystone_. Retrieval tools return envelopes; scaffold tools materialize files under .keystone/harness/ using the shipped templates.

Tool Purpose Kind

keystone_get_context(topic)retrieval

keystone_list_topics(tag?)retrieval

keystone_harness_bootstrap()scaffold the harness skeleton at .keystone/harness/scaffold

keystone_new_guide(name, tier?)new guide; tier ∈ iron-law / golden / rulesscaffold

keystone_new_sensor(name, kind?, mode?)scaffold

keystone_new_script(name, body?)scaffold

keystone_new_prompt(name, body?)scaffold

keystone_new_skill(name, description?)skills/<name>/SKILL.md; manager-authored auto-prefixed keystone-scaffold

keystone_new_action(name)actions/<name>.mdscaffold

keystone_new_playbook(name)playbooks/<name>.mdscaffold

keystone_new_corpus(name)corpus/<name>.mdscaffold

keystone_new_adapter(agent)scaffold

keystone_target_add(agent, project_root?)overlay

keystone_apply_patches()patch

Scaffold tools refuse to write files whose names look like secrets (secret, token, credential, password, api_key, private, envfile, …). The harness root is fixed at .keystone/harness/ — not configurable.

MCP surface · Prompts

Prompts — lifecycle workflows the agent walks through.

Prompts seed multi-step conversations. The agent invokes a prompt, walks its phases, and calls scaffold tools along the way. Same broker, same envelope shape, but driven by a named workflow instead of a single retrieval.

one-time

`bootstrap()`

Codebase analysis + state ledger fill under corpus/state/. Run once after keystone_harness_bootstrap to seed the harness with what's actually in the repo today.

end-to-end task

`task(description)`

Six-phase workflow: spec → orient → implement → check-drift → verify → review. Each phase has its own gate; the prompt walks them in order.

dual flywheel

`audit()`

Learning (capture surprises) plus Pruning (retire stale rules). The two flywheels balance each other so the harness grows in surface area without growing in noise.

capture

`learn(finding)`

One-shot capture into learning/inbox/. Findings batch up between audits, then get promoted to rules during audit.

MCP surface · Resources

Resources — read-only retrieval, rooted at `keystone://`.

Resources expose the same broker as keystone_get_context through MCP's resource-URI shape. Agents that prefer URIs over tool calls (or hosts that surface resources in a sidebar) get the full envelope without invoking a tool.

URI Purpose Kind

keystone://context/listtopics

keystone://context/{topic}topics

keystone://source/{name}/healthsources

keystone://harness/statusharness layout audit (root = .keystone/harness/)harness

keystone://harness/optionsharness

keystone://harness/verifycascade

keystone://harness/doctorcascade

keystone://harness/patch/pendingpatches

keystone://harness/budgetbudget

Configuration · Topics

Topics bind one or more sources to one envelope.

Each topic declares the sources to consult and how their output maps into the four kinds. Multi-source topics let one envelope draw rules from CODEOWNERS, branch protection, and the team's own deploy policy markdown — without the agent having to know any of that exists.

Multi-source topic

topics:
  repo-policy:
    description: Combined ownership and branch-protection rules.
    sources:
      - source: docs
        query: { file: owners.md }
        classify:
          rules: { heading: "Required reviewers" }
      - source: gh
        query: { type: codeowners }
      - source: gh
        query: { type: branch_protection, branch: main }
    cache: 5m

Shorthand for a single source

topics:
  rollback:
    description: Rollback procedure.
    source: docs
    query: { file: rollback.md }
    classify:
      rules: { heading: "Rules" }

Multi-source merge. When two sources contribute rules whose normalized text matches, the highest severity wins (must > should > may). Ties at the top severity keep both rules so each source stays cited. Reasoning, skills, and commands stay additive — no deduplication.

Per-topic TTL. Cache uses 5s / 10m / 2h / 1d syntax. Default backend is in-memory (lost on restart); switch to sqlite for persistence across server restarts.

Configuration · Source adapters

Adapters — one per source type.

Each type: in .keystone/context.yaml picks an adapter. Markdown and folder adapters are repo-local and need no credentials; remote adapters reference environment variables via the env: prefix.

Type Auth What it emits

markdownone file → all four kinds

folderwalks tree; include/exclude globs

reporesolves owner/repo@version or git URL; immutable cache for tag/sha refs

githubCODEOWNERS, branch protection (rules); PRs, releases (reasoning)

confluencepage content (all four kinds)

notionpage content (all four kinds), DB rows (reasoning)

jiraissues, JQL search (reasoning)

linearissues, GraphQL filter (reasoning)

slackpinned messages (rules), recent discussion (reasoning)

harnessthe project's own .keystone/harness/ tree

Configuration · Classify selectors

Classify — how an adapter's output maps into the four kinds.

The markdown, confluence, and notion adapters share a heading-based vocabulary. Sections split by H2; skills and commands sub-split by H3. For github / jira / linear / slack the query.type determines the kind directly.

classify:
  rules:
    heading: "Rules"             # single or list: ["Rules", "Must"]
    severity: must               # default when bullets lack MUST/SHOULD/MAY
  reasoning:
    heading: "Background"
    # or
    all: true                    # everything not matched by another kind
  skills:
    heading: "Procedures"       # each H3 → one skill (name + body)
  commands:
    heading: "Commands"         # each H3 → one command (first code block = invocation)

Configuration · Secrets and cache

Secrets stay in the environment. Cache survives restarts on request.

Never put secrets in .keystone/context.yaml — that file is version-controlled and team-shared. Reference environment variables with the env: prefix; the loader fails fast at startup if a referenced var is unset.

Env-var references

sources:
  gh:
    type: github
    repo: acme/widgets
    auth: env:GITHUB_TOKEN

Persistent cache

cache:
  backend: sqlite
  path: .keystone/cache.db

Default backend is in-memory (lost on restart). The sqlite backend persists envelopes across server restarts and survives uvx re-invocations. Per-topic TTLs use 5s / 10m / 2h / 1d syntax.

Harness · Layout

One tree. Seven primitives. Markdown all the way down.

Everything inside .keystone/harness/ is one of seven primitives. Each one has a directory, a shipped template, a keystone_new_* scaffolder, and a precise role in the lifecycle. Edit the markdown by hand or scaffold via MCP — both paths converge on byte-identical files.

Directory Role Scaffold tool

guides/rules the agent obeys, tiered iron-law / golden / ruleskeystone_new_guide

corpus/reasoning, background, ADRs; corpus/state/ holds the codebase ledgerskeystone_new_corpus

sensors/keystone_new_sensor

scripts/ · prompts/keystone_new_script · keystone_new_prompt

actions/focused operations the agent walks (spec, orient, implement, …)keystone_new_action

playbooks/keystone_new_playbook

skills/FastMCP-native how-to entries; manager-authored auto-prefixed keystone-keystone_new_skill

adapters/per-agent activation bindings (claude-code, codex, cursor, …)keystone_new_adapter

learning/inbox/ · archive/filled by playbooks

Single source of truth. The MCP resources at keystone:// are projections of these files. The broker, sensors-runner, and cascade verifier all read from the same tree git tracks.

Iron-law constraint. Never put secrets in this tree — it's version-controlled and team-shared. Reference environment variables via env:VAR in .keystone/context.yaml.

Harness · Guides

Guides — the rules the agent obeys.

guides/ holds the project's constraints, tiered by severity. Each guide is one markdown file; the broker emits its bullets as rules in the envelope with a severity tag derived from the tier.

iron-law

Iron law

Non-negotiable invariants. must severity. Violations block the phase. Reserve for safety, security, and contract-level rules.

golden

Golden path

The default way. should severity. The agent follows these unless a guide-tier exception applies. Most architectural decisions live here.

rules

Rules

Local conventions. may severity. Style, naming, formatting — agents apply by default, humans override case-by-case.

Scaffold with keystone_new_guide(name, tier?). The tier becomes both the file's frontmatter and the severity attached to every emitted rule.

Harness · Corpus

Corpus — reasoning the agent references.

corpus/ is background: architecture decisions, domain glossaries, "why we do it this way" notes. The broker emits corpus content as reasoning — context the agent reads but does not enforce.

Ledger Path What it tracks

Codebase statecorpus/state/CODEBASE_STATE.mdtop-level map of subsystems, owners, conventions

Risk fingerprintscorpus/state/risk-fingerprints.mdfiles and patterns that have broken before

Quality radarcorpus/state/quality-radar.mdrecurring smells, coverage gaps, hotspots

Traffic topologycorpus/state/traffic-topology.mdrequest flow, queue topology, deploy targets

Code debtcorpus/state/code-debt.mdacknowledged debt with rationale and exit plan

The bootstrap playbook fills these ledgers from a one-time codebase analysis; the audit playbook keeps them current. Scaffold ad-hoc corpus notes with keystone_new_corpus(name).

Harness · Sensors

Sensors — blocking checks at phase boundaries.

A sensor is one markdown file declaring a kind (computational or inferential) and a mode (blocking or advisory). The keystone-sensor-runner skill walks them at phase boundaries; failures stop the lifecycle.

computational

Script-backed

Deterministic — runs a shell body from scripts/<name>.sh. Default sensors: lint, type, test, build, coverage, drift. Exit code is the verdict.

inferential

Prompt-backed

Agent-judged — runs a prompt body from prompts/<name>.md against the diff. Default sensors: code-review, security-review, performance-review, accessibility-review.

Scaffold with keystone_new_sensor(name, kind?, mode?) — the tool writes both the sensor and a matching script or prompt stub. Health surfaces at keystone://sensor/<name>/health; the doctor playbook rolls them up.

Harness · Actions

Actions — focused operations the agent walks.

An action is one markdown procedure for one job. Playbooks compose actions; the agent reads the action body verbatim, asks for missing inputs, runs the steps, and reports back.

Action Phase Purpose

specclarify requirements; write acceptance criteria

orientmap affected files; check state ledgers

implementmake the change; keep diffs reviewable

verifyrun sensors; confirm acceptance criteria

reviewself-review against guides + state

learncapture a surprise into learning/inbox/

auditwalk inbox, classify into guides / corpus / archive

releasecut a release; update changelog

Scaffold with keystone_new_action(name). Actions are small and composable — favor adding a new action over growing an existing one.

Harness · Playbooks

Playbooks — ordered phase flows.

A playbook composes actions and sensors into a multi-phase flow. Where an action does one job, a playbook drives a lifecycle. Shipped playbooks cover bootstrap, task, audit, install, verify, doctor, patch, and release.

Playbook When What it does

bootstrapcodebase analysis; fill corpus/state/ ledgers

taskspec → orient → implement → verify → review

auditdual flywheel — learn (additive) + prune (subtractive)

installadd a new external source via the source-installer skill

verifyrun all blocking sensors against current diff

doctorharness layout audit + sensor health rollup

patchapply pending shipped template patches

releasedrive the release action + changelog handoff

Scaffold custom playbooks with keystone_new_playbook(name). Existing playbooks are surface API contracts — patch them via the template patch system rather than free-form editing if you want forward updates.

Harness · Skills

Skills — FastMCP-native how-to entries.

Each skill is a directory with a SKILL.md at its root plus any supporting files. FastMCP's SkillsDirectoryProvider auto-discovers them as skill://<name>/SKILL.md resources; Claude Code, Cursor, and any FastMCP-aware host surface them in the agent's skill registry.

Skill Purpose Triggered by

keystone-sensor-runnerverify playbook

keystone-source-installeradd a new external source to .keystone/context.yamlinstall playbook

keystone-archiveretire content into archive/ without losing historyaudit playbook (pruning flywheel)

keystone-budget-reporterdoctor playbook

keystone-reload-noticepatch + scaffold tools

Scaffold with keystone_new_skill(name, description?). Manager-authored skills are auto-prefixed keystone-; project-local skills you author keep whatever name you give them.

Harness · Scaffolding

Scaffold once. Edit the markdown. Patch forward.

The keystone_new_* tools materialize files under .keystone/harness/ from shipped templates. Once written, the files are yours to edit. When a new template version ships, keystone_apply_patches applies the forward-only diff against unmodified files and skips anything you've touched.

templates

Shipped templates

Every primitive (guide, sensor, action, playbook, skill, corpus, adapter) has a shipped template. Scaffolding writes the template verbatim; the broker doesn't care which fields you change.

overlay

Menu-file overlay

keystone_target_add(agent) installs the agent's menu file (CLAUDE.md, AGENTS.md, …) at the project root as an overlay — pre-existing user content is preserved, only the Keystone-managed block is rewritten.

forward-only patches

Apply patches

keystone_apply_patches() walks pending shipped patches and applies each one to files you haven't modified. Conflicts surface in keystone://harness/patch/pending for manual resolution.

Secret-name refusal. Scaffold tools refuse to write files whose names look like secrets — secret, token, credential, password, api_key, private, envfile, and the obvious variations. The .keystone/ directory is team-shared; the scaffolder doesn't make it easy to accidentally commit a secret.

Skills are FastMCP-native. Manager-authored skills under .keystone/harness/skills/<name>/SKILL.md are auto-prefixed keystone- and discovered by FastMCP's SkillsDirectoryProvider — Claude Code, Cursor, and any other FastMCP-aware host load them automatically.

Support · Open source

Contribute to keystone-mcp.

MIT-licensed and built in the open. The test suite uses respx to mock every external API — no live credentials required to run the full 361-test suite locally.

The broker model only works if the adapters fit the sources teams actually use.

Markdown, GitHub, Confluence, Notion, Jira, Linear, Slack, and harness-local are wired today. If your team's source of truth lives somewhere else — Coda, Linear docs, an internal wiki, a Google Doc — the adapter shape is small and the test fixtures are the bulk of the work.

Use itTell us where the defaults match — or clash with — what you've already built. We learn the most from people who've tried something else first.
Hit a bugFile it, even with a rough repro. The cascade, classify, and merge paths get sharper every time someone shows us a corner we missed.
Write an adapterEach adapter is a single module under src/keystone_mcp/adapters/ plus a respx-mocked test file. New sources land in a single PR.
Author a templateThe shipped template tree under .keystone/harness/ is the manager's opinion — guides, sensors, actions, playbooks, skills. Improvements ship as forward-only patches.

▸ Start a Discussion ◆ Open an Issue

Support · Services

Huckleberry — adoption support for keystone-mcp.

keystone-mcp is open source. Huckleberry is the consulting firm that helps teams adopt it well — wiring it into your MCP host, authoring topics that match how your org actually documents policy, and tuning the broker so the agent sees the right context without drowning in it.

When the open-source bits aren't the whole answer.

Most teams can get keystone-mcp running in an afternoon. The question that matters six months later is does the agent actually catch what your team would have caught at review? — and that takes deliberate work: authoring topics for the policies your team already enforces in PRs, calibrating severity tiers to the way your team actually ships, and closing the source gaps the defaults can't cover.

RolloutWire keystone-mcp into your MCP host across a multi-repo codebase with the right topic taxonomy and source mix from day one.
Topic authoringTranslate your existing review comments, deploy runbooks, and incident postmortems into topics the agent can read on demand.
Source tuningAdapter selection, classify selectors, TTL strategy, multi-source merge tuning — get the right shape for your team's actual sources of truth.
Custom adaptersIf your source of truth lives somewhere keystone-mcp doesn't already speak, we'll write the adapter (and the respx-mocked tests) and upstream it.

▸ Visit Huckleberry ◆ About Ian Johnson

Huckleberry is run by Ian Johnson, the author of keystone-mcp. Engagements are scoped per-team and per-codebase; expect a deliberate, small-batch approach.