import { defineHook, Verdict, GitEnv } from "@ax/hooks-sdk"; export default defineHook({ name: "main-branch-guard", events: ["PreToolUse"], matcher: { tools: ["Bash"] }, run: (event) => Effect.gen(function* () { const cmd = event.tool?.input.command ?? ""; if (!/^git (push|commit)\b/.test(cmd)) return Verdict.allow; const branch = yield* (yield* GitEnv).currentBranch(event.cwd); if (/^(main|master|production)$/.test(branch ?? "")) return Verdict.block("direct write to protected branch"); return Verdict.allow; }), });
what ax actually does
Every scenario, one graph.
Concrete demos of what your local ax instance already exposes. Each one is something you can run today, on your own history.
Backtest a hook against history. Search every session you've ever had. See where your tokens go. Watch a verdict earn its place at +30 sessions. Route the intern work to cheaper models. Keep your plan budget in view. Take proposals mined from your own transcripts. Find out which sessions thrash. Publish your receipts and hand the graph to an agent.
before-you-ship · cases sample output
Ask the graph what your hook would have caught.
You write a guardrail. You don't know if it'll catch real mistakes or just become noise. ax hooks cases scores the candidate against labeled cases from your own session history - true and false positives, a real precision number - so the decision to ship is evidence, not vibes.
~/.claude $ ax hooks cases main-branch-guard --since=7 ↳ replay window 2026-05-21 → 2026-05-28 (7d) ↳ sessions 14 claude_code, 3 codex (17 total) ↳ tool_calls 1,247 bash invocations indexed replaying… ████████████████████ 1247/1247 4.2s ─────────────────────────────────────────────────────────── verdict SHIP · HIGH-CONFIDENCE ─────────────────────────────────────────────────────────── fires 12 / 1,247 calls (0.96%) ├─ true positives 11 would have blocked actual main-branch pushes └─ false positives 1 legitimate release → main · 2026-05-24 precision 0.917 recall 0.917 F1 0.917 prevented rollbacks 5 (traced via post-event reverts) by repo ~/Projects/api 8 ▮▮▮▮▮▮▮▮ ~/Projects/web 3 ▮▮▮ ~/Projects/infra 1 ▮ ← false positive lives here one to review: sess_8af3·turn-42 release/v2-cutover → allow-list? install with: ax hooks install ~/.ax/hooks/main-branch-guard.ts --providers=claude,codex ~/.claude $
search the graph
Find what you shipped last time you did this.
Every transcript ax has ever ingested is full-text searchable - Claude Code, Codex, every turn, every tool call, every reasoning text. Ranked excerpts come back with the session, the file, the commit, and whether it stuck.
Built the OAuth refresh token rotation. The middleware now checks expiry with <= not < after the bug we hit last quarter - tests cover the boundary tick and the clock-skew window.
PR #847 - OAuth refresh path. Tests cover both expiry edge cases; the middleware guards against double-refresh by holding a per-tenant lock for the duration of the rotation.
Initial OAuth wiring. Note for future me: don't reuse the access token endpoint for refresh - separate route, separate rate limit, separate audit log.
Spike on OAuth session-binding inside the middleware - rejected, returned to the PR #420 approach. Leaving the diff in scratch/ in case the threat model changes.
ax · local taste & telemetry graph · prototype
see the bleed · token-impact
Where your agent context goes.
Every agent user is bleeding money on cache misses they can't see. ax insights token-impact --since=7d joins your local claude + codex transcripts, reconciles provider metadata against transcript bytes, and shows the spend, the hit rate, and the workflows burning the budget.
By workflow epoch & expensive sessions
Bar length = share of total tokens. Color split inside each bar = cached vs. paid for the same workload. ad-hoc is half the tokens of gsd but burns more dollars - fewer rituals, lower cache hit.
ax insights workflow-impact for the cohort comparisoncache_creation_input_tokens, cache_read_input_tokens, input_tokens, output_tokens - and falls back to transcript-byte estimates when a turn predates cache reporting.the compounding part
Every change earns its place by session 30.
Accepting a proposal doesn't make it true. ax turns each acceptance into an experiment with three forward-looking checkpoints — t+3, t+10, t+30 sessions — and watches the next runs to see if the change actually held. Days are the wrong unit when an agent ships eight sessions a day. The verdict at t+30 sessions is locked. Future proposals know.
ax doesn't trust the moment you accept — it earns the verdict by watching what happens across the next 30 sessions. Marker still landed? File still healthy? Pattern not recurring? Tests still green? Each checkpoint joins evidence from the same graph that generated the proposal. Sessions, not days — a weekend doesn't artificially delay; a productive afternoon doesn't artificially rush. The verdict at +30 sessions is locked and feeds the next round. Verdicts live in the improve queue — ax improve verdict confirms or overrides one from the CLI.
recent experiments5 of 47
- post-feature-verify+30 sessmarker landed · 0 rollbacks · 1 dependentadopted
- main-branch-guardrail+10 sessmarker landed · 2 of 4 callsites bypassedpartial
- skill-ts-default+3 sessawaiting first signal · 1 session remainingpending
- ingest-regression+30 sesspattern not recurred over 30 sessions · tests greenadopted
- cache-warm-on-start+10 sessadded 800ms cold start · reverted at session 6regressed
route the intern work · dispatches
Stop paying frontier rates for mechanical dispatches.
Every sub-task your agent spawns inherits your most expensive model unless something says otherwise. ax dispatches --candidates finds the dispatches that ran on fable or opus but matched a mechanical routing class - and reprices each one against the cheaper model, from the tokens it actually burned.
Top candidates, repriced
ax dispatches --candidatesInherited an expensive model + matched a mechanical class. Each row carries a suggested model and the dollars it would have saved.
ax routing compileWrites the class table to ~/.ax/hooks/routing-table.json - merge-preserving, your own classes survive a regenerate.
route-dispatch hookSuggests the cheaper model at dispatch time, in Claude Code and Codex. The next "Fix ingest run lifecycle" rides sonnet, not fable.
tune ax routing tune mines the unmatched expensive dispatches into new classes - two-token prefix clustering, ≥3 members. Mechanical classes auto-apply; judgment-flagged ones (review / design / plan / audit) only ship via an emitted brief and an agent backtest.
tool_call to the child session it spawned. Savings are repriced from the tokens the child actually burned - not a projection, a receipt.measure + tune, live
Your bill, broken out and tunable.
ax studio's /cost view renders the same numbers the CLI prints — the main-vs-subagent spend split, per-model cost, and the dispatch candidates worth routing down — live off your local graph. And routing is regex underneath, so it ships an interactive tuner: edit a class pattern, watch which past dispatches it catches (and which it shouldn't), flag false positives into an exclude list, and save — the route-dispatch hook picks it up live.

/cost — main-thread routability and the interactive routing tunerknow the envelope · quota
Your plan limits, live, everywhere you look.
Claude tells you about your usage limit when you hit it. ax quota reads the same usage endpoint the Claude app does - your 5-hour and 7-day rolling windows, live, with the OAuth token you already have. No new login, no DB, nothing leaves your machine but the one call Claude already makes.
One cached read, three surfaces
ax quota~ $ ax quota window used resets 5h 64% 04:29 7d 63% 04:59 7d sonnet 5% 04:59 extra off (fetched 0s ago, live)
ax quota --statuslineOne plain line for the statusLine command. Poll every render - it's the cache answering, not the API.
ax quota --swiftbarA SwiftBar/xbar plugin body - the burn rate lives next to the clock. Fetch failures degrade to the stale cache, never a crash in the menubar.
api.anthropic.com/api/oauth/usage endpoint the Claude app polls, read with your existing Claude Code OAuth token - macOS Keychain first, ~/.claude/.credentials.json fallback. ax never refreshes the token.~/.ax/quota-cache.json (60s TTL) so statusline and menubar can poll freely without hammering the endpoint.the graph talks back · improve from our own graph · 2026-06
Proposals mined from your own transcripts.
ax improve recommend scores improvement proposals out of your transcript graph - each one with an evidence trail and a backtested projected value. Accept one and it becomes a brief an agent acts on. Lint reconciles what actually got applied. Verdicts confirm it or retire it.
Route mechanical subagent dispatches to cheaper models
evidence 39 model-less dispatches on fable/opus matched mechanical routing classes in the last 2d; est $209.59 redirectable. Top classes: well-specified-impl ($95.27), bug-fix ($44.59), spec-review ($32.57).
apply axctl improve accept hook__17b5aaf6aade53e5
Accept is not the end - it's the experiment
recommendscored, with evidenceaccept.ax/tasks/<id>.md briefagent applieslike any task filelintreconciles guidanceverdictconfirms or retiresAgents write back too - ax improve propose / ax improve analyze let a session file its own proposal mid-run; origin badges keep agent-derived and system-derived suggestions distinguishable.
ax improve recommend for yoursax improve lint checks it landed. The whole deck - proposals, impact, and past bets measured at +3/+10/+30 sessions - lives in the studio improve dashboard: ax serve.who's thrashing · churn
Landed, edited, repaired - by source.
Lines of code is a vanity metric until you split it. ax sessions churn --here classifies 30 days of writes into landed vs edit vs repair LOC per provider, counts failed checks, and groups the failures into episodes - so "which sessions thrash" has a number.
Composition of added LOC · 30d
The repair sliver is the point - a tiny repair share means checks catch problems before they ship. The edit band is where the real rework hides: claude-subagent reworks a third of everything it writes.
What an episode is
tool_call that runs a check (tests, typecheck, lint, build) is classified pass/fail by family. LOC written after a failure, touching the same files, counts as repair; later rework of landed lines counts as edit.--here, a specific --project, or one --source. 30d window by default, --since=N to change it.receipts, public · profiles
Publish what you actually ran.
ax profile publish turns your local graph into a public gist - counts, dates, trends, the skills and hooks you really lean on. No transcripts, no code, no paths. The nightly compile ranks everyone who opted in.
/leaders| # | user | tokens |
|---|---|---|
| 1 | @you | 1.8B |
| 2 | @abuilder | 1.2B |
| 3 | @cferreira | 940M |
Boards rebuild nightly from registered gists. Trending skills filter out personal local:* skills - a skill only trends once 2+ builders publish it. See the live boards →
ax-profile.json{
"v": 1,
"github": "you",
"window_days": 30,
"stats": {
"sessions": 412,
"streak_days": 9,
"tokens": { "total": 1.8e9 },
"cost_usd": 605
},
"rig": {
"skills": [
{ "name": "superpowers:tdd", "runs": 88 }
],
"hooks": ["enforce-worktree"],
"routing_table": true
}
}Aggregates only - the exact JSON is shown to you for consent before the first publish. Your profile page renders it live.
Hand the graph to an agent
ax mcpax mcp runs a stdio MCP server exposing ax's read-only queries as 17 tools, so an agent can interrogate your graph in-context - recall a past session, pull weighted skills, read a proposal - mid-task. Mutating ops are deliberately not exposed.
recallsessions_aroundsession_showskills_weightedskills_by_roleskills_rolesrolesimprove_recommendimprove_showimprove_list
~/.ax/profile-publish.json; ax profile unpublish deletes the gist and resets it. Nothing leaves your machine until you say yes.