The end-to-end harness manager for any project.
A single MCP server (keystone-mcp on PyPI) that owns the full lifecycle of a project harness — scaffold the template tree under .keystone/harness/, broker rules and reasoning from any source (markdown, GitHub, Confluence, Notion, Jira, Linear, Slack), run sensors at phase boundaries, resolve a cascade across external sources, and apply forward-only shipped-template patches as the manager evolves. No central service. Markdown only. uvx keystone-mcp and your agent has a harness.
● keystone-mcp v0.2.0 ready (FastMCP · stdio transport)
$ claude
> use keystone_harness_bootstrap to scaffold the harness
▸ scaffolding → .keystone/harness/{guides,corpus,sensors,actions,playbooks,skills,adapters}/
▸ wrote .keystone/context.yaml + CLAUDE.md overlay
▸ registered 4 prompts · 14 tools · 9 resources
✓ harness ready. agent sees keystone://context/list.
> list topics
✶ Thinking… (3s · esc to interrupt)
Agent asks for a topic. Broker fans out. One envelope returns.
Instead of cramming organizational context into every system prompt, the agent calls keystone_get_context(topic) or reads keystone://context/{topic} and the broker resolves the request across every configured source, classifies each fragment into one of four kinds, and returns a single typed envelope. The agent treats each kind differently.
rules
Hard constraints. Three severities: must / should / may. When two sources contribute the same rule (normalized text match), highest severity wins; ties keep both citations.
reasoning
Why the rule exists, what incident drove it, what tradeoffs were considered. Reasoning is additive across sources — no deduplication.
skills
Multi-step procedural knowledge — how to deploy, how to roll back, how to run the release checklist. Each H3 inside the configured skills heading becomes one named skill.
commands
Shell commands, scripts, named recipes. Each H3 becomes one command; the first code block under it is the invocation, the rest is documentation.
Why a broker. Organizational context lives where the people who own it already work — markdown in the repo, pages in Confluence, tickets in Jira, pinned messages in Slack. Hand-copying that context into every system prompt is brittle and goes stale fast. The broker turns each external source into a uniform envelope the agent can consume on demand, with per-topic TTLs and an optional sqlite cache.
Every retrieval returns the same envelope.
Whether the topic resolves against one markdown file or fans out across markdown, GitHub, and Notion, the agent sees one consistent shape — four typed lists plus provenance.
{
"topic": "deploy-policy",
"rules": [
{
"id": "rules-001",
"text": "run full CI green before any production deploy.",
"source": "markdown://deploy-policy.md#rules",
"severity": "must"
}
],
"reasoning": [
{
"text": "The team adopted these rules after a 2025 incident.",
"source": "markdown://deploy-policy.md#background"
}
],
"skills": [],
"commands": [],
"fetched_at": "2026-06-11T14:32:00+00:00",
"cache_hit": false
}
source is a URI scheme — markdown://, github://, confluence://, notion://, jira://, linear://, slack://, harness://. The agent (or you) can trace any rule back to the file, page, or message it came from. cache_hit tells the agent whether to trust the freshness implied by the topic's TTL.
Install the MCP server, then wire it into your agent.
Published to PyPI as keystone-mcp. Pick the install method that matches how you already run Python tools. The server has no runtime daemon — it speaks MCP over stdio, started on demand by your agent.
uvx (zero-install one-shot)
$ uvx keystone-mcp
Best for trying it out and for .mcp.json configs that should not pin a global install.
pip
$ pip install keystone-mcp $ pip install "keystone-mcp[tokens]" # + tiktoken-backed budget tokenizer
Without the tokens extra, keystone://harness/budget falls back to a deterministic word-count proxy (~0.75 words / token). With the extra, the budget reports exact cl100k_base token counts.
pipx (isolated install + on PATH)
$ pipx install keystone-mcp
From source
$ git clone https://github.com/tacoda/keystone-mcp.git $ cd keystone-mcp $ uv sync $ uv run keystone-mcp
Add to .mcp.json:
{
"mcpServers": {
"keystone": {
"command": "uvx",
"args": ["keystone-mcp"],
"env": {
"KEYSTONE_CONFIG": "/path/to/your/project/.keystone/context.yaml"
}
}
}
}
KEYSTONE_CONFIG defaults to .keystone/context.yaml relative to the working directory. The .keystone/ directory is team-shared and version-controlled — never put secrets there; reference env vars with the env: prefix instead.
Three files to a working topic.
A topic is the agent-facing abstraction: a named bundle of rules, reasoning, skills, and commands resolved from one or more sources. Here is the smallest end-to-end shape — one markdown source, one topic, one file.
-
Create
.keystone/context.yamlsources: docs: type: markdown root: .keystone/context/ topics: deploy-policy: description: | Rules and context for production deploys. sources: - source: docs query: { file: deploy-policy.md } classify: rules: { heading: "Rules", severity: must } reasoning: { heading: "Background" } cache: 15m
-
Create
.keystone/context/deploy-policy.md# Deploy Policy ## Rules - MUST run full CI green before any production deploy. - SHOULD prefer Tuesday/Wednesday morning deploys. ## Background The team adopted these rules after a 2025 incident.
-
Start the server. The agent sees the topic.
keystone_list_topicsnow returnsdeploy-policy. The agent readskeystone://context/deploy-policyand gets the envelope — onemustrule, oneshouldrule, one reasoning bullet, all cited back to the markdown file.
The repo's own .keystone/context.yaml is a working example with topics for deploys, ownership, coding standards, and a release playbook — plus commented-out examples of every external adapter.
Tools — parameterized retrieval and scaffold operations.
Every tool is prefixed keystone_. Retrieval tools return envelopes; scaffold tools materialize files under .keystone/harness/ using the shipped templates.
keystone_get_context(topic)retrievalkeystone_list_topics(tag?)retrievalkeystone_harness_bootstrap()scaffoldkeystone_new_guide(name, tier?)scaffoldkeystone_new_sensor(name, kind?, mode?)scaffoldkeystone_new_script(name, body?)scaffoldkeystone_new_prompt(name, body?)scaffoldkeystone_new_skill(name, description?)scaffoldkeystone_new_action(name)scaffoldkeystone_new_playbook(name)scaffoldkeystone_new_corpus(name)scaffoldkeystone_new_adapter(agent)scaffoldkeystone_target_add(agent, project_root?)overlaykeystone_apply_patches()patch
Scaffold tools refuse to write files whose names look like secrets (secret, token, credential, password, api_key, private, envfile, …). The harness root is fixed at .keystone/harness/ — not configurable.
Prompts — lifecycle workflows the agent walks through.
Prompts seed multi-step conversations. The agent invokes a prompt, walks its phases, and calls scaffold tools along the way. Same broker, same envelope shape, but driven by a named workflow instead of a single retrieval.
bootstrap()
Codebase analysis + state ledger fill under corpus/state/. Run once after keystone_harness_bootstrap to seed the harness with what's actually in the repo today.
task(description)
Six-phase workflow: spec → orient → implement → check-drift → verify → review. Each phase has its own gate; the prompt walks them in order.
audit()
Learning (capture surprises) plus Pruning (retire stale rules). The two flywheels balance each other so the harness grows in surface area without growing in noise.
learn(finding)
One-shot capture into learning/inbox/. Findings batch up between audits, then get promoted to rules during audit.
Resources — read-only retrieval, rooted at keystone://.
Resources expose the same broker as keystone_get_context through MCP's resource-URI shape. Agents that prefer URIs over tool calls (or hosts that surface resources in a sidebar) get the full envelope without invoking a tool.
keystone://context/listtopicskeystone://context/{topic}topicskeystone://source/{name}/healthsourceskeystone://harness/statusharnesskeystone://harness/optionsharnesskeystone://harness/verifycascadekeystone://harness/doctorcascadekeystone://harness/patch/pendingpatcheskeystone://harness/budgetbudgetTopics bind one or more sources to one envelope.
Each topic declares the sources to consult and how their output maps into the four kinds. Multi-source topics let one envelope draw rules from CODEOWNERS, branch protection, and the team's own deploy policy markdown — without the agent having to know any of that exists.
topics: repo-policy: description: Combined ownership and branch-protection rules. sources: - source: docs query: { file: owners.md } classify: rules: { heading: "Required reviewers" } - source: gh query: { type: codeowners } - source: gh query: { type: branch_protection, branch: main } cache: 5m
topics: rollback: description: Rollback procedure. source: docs query: { file: rollback.md } classify: rules: { heading: "Rules" }
Multi-source merge. When two sources contribute rules whose normalized text matches, the highest severity wins (must > should > may). Ties at the top severity keep both rules so each source stays cited. Reasoning, skills, and commands stay additive — no deduplication.
Per-topic TTL. Cache uses 5s / 10m / 2h / 1d syntax. Default backend is in-memory (lost on restart); switch to sqlite for persistence across server restarts.
Adapters — one per source type.
Each type: in .keystone/context.yaml picks an adapter. Markdown and folder adapters are repo-local and need no credentials; remote adapters reference environment variables via the env: prefix.
markdownone file → all four kindsfolderwalks tree; include/exclude globsreporesolves owner/repo@version or git URL; immutable cache for tag/sha refsgithubCODEOWNERS, branch protection (rules); PRs, releases (reasoning)confluencepage content (all four kinds)notionpage content (all four kinds), DB rows (reasoning)jiraissues, JQL search (reasoning)linearissues, GraphQL filter (reasoning)slackpinned messages (rules), recent discussion (reasoning)harnessthe project's own .keystone/harness/ treeClassify — how an adapter's output maps into the four kinds.
The markdown, confluence, and notion adapters share a heading-based vocabulary. Sections split by H2; skills and commands sub-split by H3. For github / jira / linear / slack the query.type determines the kind directly.
classify: rules: heading: "Rules" # single or list: ["Rules", "Must"] severity: must # default when bullets lack MUST/SHOULD/MAY reasoning: heading: "Background" # or all: true # everything not matched by another kind skills: heading: "Procedures" # each H3 → one skill (name + body) commands: heading: "Commands" # each H3 → one command (first code block = invocation)
Secrets stay in the environment. Cache survives restarts on request.
Never put secrets in .keystone/context.yaml — that file is version-controlled and team-shared. Reference environment variables with the env: prefix; the loader fails fast at startup if a referenced var is unset.
sources: gh: type: github repo: acme/widgets auth: env:GITHUB_TOKEN
cache: backend: sqlite path: .keystone/cache.db
Default backend is in-memory (lost on restart). The sqlite backend persists envelopes across server restarts and survives uvx re-invocations. Per-topic TTLs use 5s / 10m / 2h / 1d syntax.
One tree. Seven primitives. Markdown all the way down.
Everything inside .keystone/harness/ is one of seven primitives. Each one has a directory, a shipped template, a keystone_new_* scaffolder, and a precise role in the lifecycle. Edit the markdown by hand or scaffold via MCP — both paths converge on byte-identical files.
guides/keystone_new_guidecorpus/keystone_new_corpussensors/keystone_new_sensorscripts/ · prompts/keystone_new_script · keystone_new_promptactions/keystone_new_actionplaybooks/keystone_new_playbookskills/keystone_new_skilladapters/keystone_new_adapterlearning/inbox/ · archive/filled by playbooksSingle source of truth. The MCP resources at keystone:// are projections of these files. The broker, sensors-runner, and cascade verifier all read from the same tree git tracks.
Iron-law constraint. Never put secrets in this tree — it's version-controlled and team-shared. Reference environment variables via env:VAR in .keystone/context.yaml.
Guides — the rules the agent obeys.
guides/ holds the project's constraints, tiered by severity. Each guide is one markdown file; the broker emits its bullets as rules in the envelope with a severity tag derived from the tier.
Iron law
Non-negotiable invariants. must severity. Violations block the phase. Reserve for safety, security, and contract-level rules.
Golden path
The default way. should severity. The agent follows these unless a guide-tier exception applies. Most architectural decisions live here.
Rules
Local conventions. may severity. Style, naming, formatting — agents apply by default, humans override case-by-case.
Scaffold with keystone_new_guide(name, tier?). The tier becomes both the file's frontmatter and the severity attached to every emitted rule.
Corpus — reasoning the agent references.
corpus/ is background: architecture decisions, domain glossaries, "why we do it this way" notes. The broker emits corpus content as reasoning — context the agent reads but does not enforce.
The bootstrap playbook fills these ledgers from a one-time codebase analysis; the audit playbook keeps them current. Scaffold ad-hoc corpus notes with keystone_new_corpus(name).
Sensors — blocking checks at phase boundaries.
A sensor is one markdown file declaring a kind (computational or inferential) and a mode (blocking or advisory). The keystone-sensor-runner skill walks them at phase boundaries; failures stop the lifecycle.
Script-backed
Deterministic — runs a shell body from scripts/<name>.sh. Default sensors: lint, type, test, build, coverage, drift. Exit code is the verdict.
Prompt-backed
Agent-judged — runs a prompt body from prompts/<name>.md against the diff. Default sensors: code-review, security-review, performance-review, accessibility-review.
Scaffold with keystone_new_sensor(name, kind?, mode?) — the tool writes both the sensor and a matching script or prompt stub. Health surfaces at keystone://sensor/<name>/health; the doctor playbook rolls them up.
Actions — focused operations the agent walks.
An action is one markdown procedure for one job. Playbooks compose actions; the agent reads the action body verbatim, asks for missing inputs, runs the steps, and reports back.
specclarify requirements; write acceptance criteriaorientmap affected files; check state ledgersimplementmake the change; keep diffs reviewableverifyrun sensors; confirm acceptance criteriareviewself-review against guides + statelearncapture a surprise into learning/inbox/auditwalk inbox, classify into guides / corpus / archivereleasecut a release; update changelogScaffold with keystone_new_action(name). Actions are small and composable — favor adding a new action over growing an existing one.
Playbooks — ordered phase flows.
A playbook composes actions and sensors into a multi-phase flow. Where an action does one job, a playbook drives a lifecycle. Shipped playbooks cover bootstrap, task, audit, install, verify, doctor, patch, and release.
bootstrapcodebase analysis; fill corpus/state/ ledgerstaskspec → orient → implement → verify → reviewauditdual flywheel — learn (additive) + prune (subtractive)installadd a new external source via the source-installer skillverifyrun all blocking sensors against current diffdoctorharness layout audit + sensor health rolluppatchapply pending shipped template patchesreleasedrive the release action + changelog handoffScaffold custom playbooks with keystone_new_playbook(name). Existing playbooks are surface API contracts — patch them via the template patch system rather than free-form editing if you want forward updates.
Skills — FastMCP-native how-to entries.
Each skill is a directory with a SKILL.md at its root plus any supporting files. FastMCP's SkillsDirectoryProvider auto-discovers them as skill://<name>/SKILL.md resources; Claude Code, Cursor, and any FastMCP-aware host surface them in the agent's skill registry.
keystone-sensor-runnerverify playbookkeystone-source-installerinstall playbookkeystone-archiveaudit playbook (pruning flywheel)keystone-budget-reporterdoctor playbookkeystone-reload-noticepatch + scaffold toolsScaffold with keystone_new_skill(name, description?). Manager-authored skills are auto-prefixed keystone-; project-local skills you author keep whatever name you give them.
Scaffold once. Edit the markdown. Patch forward.
The keystone_new_* tools materialize files under .keystone/harness/ from shipped templates. Once written, the files are yours to edit. When a new template version ships, keystone_apply_patches applies the forward-only diff against unmodified files and skips anything you've touched.
Shipped templates
Every primitive (guide, sensor, action, playbook, skill, corpus, adapter) has a shipped template. Scaffolding writes the template verbatim; the broker doesn't care which fields you change.
Menu-file overlay
keystone_target_add(agent) installs the agent's menu file (CLAUDE.md, AGENTS.md, …) at the project root as an overlay — pre-existing user content is preserved, only the Keystone-managed block is rewritten.
Apply patches
keystone_apply_patches() walks pending shipped patches and applies each one to files you haven't modified. Conflicts surface in keystone://harness/patch/pending for manual resolution.
Secret-name refusal. Scaffold tools refuse to write files whose names look like secrets — secret, token, credential, password, api_key, private, envfile, and the obvious variations. The .keystone/ directory is team-shared; the scaffolder doesn't make it easy to accidentally commit a secret.
Skills are FastMCP-native. Manager-authored skills under .keystone/harness/skills/<name>/SKILL.md are auto-prefixed keystone- and discovered by FastMCP's SkillsDirectoryProvider — Claude Code, Cursor, and any other FastMCP-aware host load them automatically.
Contribute to keystone-mcp.
MIT-licensed and built in the open. The test suite uses respx to mock every external API — no live credentials required to run the full 361-test suite locally.
The broker model only works if the adapters fit the sources teams actually use.
Markdown, GitHub, Confluence, Notion, Jira, Linear, Slack, and harness-local are wired today. If your team's source of truth lives somewhere else — Coda, Linear docs, an internal wiki, a Google Doc — the adapter shape is small and the test fixtures are the bulk of the work.
- Use itTell us where the defaults match — or clash with — what you've already built. We learn the most from people who've tried something else first.
- Hit a bugFile it, even with a rough repro. The cascade, classify, and merge paths get sharper every time someone shows us a corner we missed.
- Write an adapterEach adapter is a single module under
src/keystone_mcp/adapters/plus a respx-mocked test file. New sources land in a single PR. - Author a templateThe shipped template tree under
.keystone/harness/is the manager's opinion — guides, sensors, actions, playbooks, skills. Improvements ship as forward-only patches.
Huckleberry — adoption support for keystone-mcp.
keystone-mcp is open source. Huckleberry is the consulting firm that helps teams adopt it well — wiring it into your MCP host, authoring topics that match how your org actually documents policy, and tuning the broker so the agent sees the right context without drowning in it.
When the open-source bits aren't the whole answer.
Most teams can get keystone-mcp running in an afternoon. The question that matters six months later is does the agent actually catch what your team would have caught at review? — and that takes deliberate work: authoring topics for the policies your team already enforces in PRs, calibrating severity tiers to the way your team actually ships, and closing the source gaps the defaults can't cover.
- RolloutWire keystone-mcp into your MCP host across a multi-repo codebase with the right topic taxonomy and source mix from day one.
- Topic authoringTranslate your existing review comments, deploy runbooks, and incident postmortems into topics the agent can read on demand.
- Source tuningAdapter selection, classify selectors, TTL strategy, multi-source merge tuning — get the right shape for your team's actual sources of truth.
- Custom adaptersIf your source of truth lives somewhere keystone-mcp doesn't already speak, we'll write the adapter (and the respx-mocked tests) and upstream it.
Huckleberry is run by Ian Johnson, the author of keystone-mcp. Engagements are scoped per-team and per-codebase; expect a deliberate, small-batch approach.