A safer lane for AI code triage

ClawSweeper is interesting because it points at a real problem: issue and pull request backlogs get noisy, and maintainers need help separating stale work from things that deserve attention.

The tempting version is also the dangerous one: give an agent a GitHub token, let it read the backlog, and let it comment, close, or mutate state automatically. That may be fine for a disposable repo. It is not where I want the first useful product boundary to live.

The useful product is not an auto-closer. It is a proposal ledger that makes review cheaper without moving authority away from the operator.

The product shape

The version I want is small and boring on purpose. Hermes can snapshot issues and pull requests into a local ledger, classify them with conservative safety gates, ask a model for bounded proposals, and produce a compact report. Nothing is written back to GitHub.

That gives the operator a review surface instead of another autonomous system to supervise. The model can help read, summarize, and suggest. The operator still decides what happens.

What made the cut

Local durable records for every reviewed issue or pull request.
A proposal/apply split, with no remote apply lane in the first phase.
Stable item snapshot hashes so stale decisions can be detected.
Stable policy hashes so the safety rules are part of the evidence.
Protected author and label gates for maintainers, security, release blockers, and similar cases.
Human-readable markdown records, plus machine-readable JSON for reports.

The first rule is simple: eligible does not mean safe to close. It means eligible for human review.

The workflow

The current operator loop is:

snapshot --include-body --comment-limit N → propose-model --limit N → report

The snapshot step captures the current issue or pull request metadata, body, and bounded recent comments. The proposal step uses a constrained model command with no tools and one turn. The report step gives the operator a compact table with safety status, context availability, proposal decisions, and confidence.

That is enough to answer the useful product question: did this help the operator make a better decision faster?

Why body context mattered

The first model-backed run behaved correctly but conservatively. With only titles and metadata, it mostly said: inspect the issue body before deciding. That was the right answer, but not a very useful product.

Adding bounded body and comment intake changed the quality of the proposals. The model could reference the actual claim in the issue, explain what would need verification, and recommend the next operator step without pretending it had enough evidence to close anything.

The bounded part matters. Long context is not a substitute for a product boundary. Capturing enough evidence to be useful, while marking truncation and preserving hashes, keeps the system reviewable.

What is deliberately missing

There is no GitHub mutation lane yet. No comments. No closes. No merges. No background agent deciding that old work should disappear.

That is not a missing feature. It is the safety architecture doing its job. The system needs a measured baseline first: run it on a few low-risk records, review every proposal, and see whether human agreement is high enough to justify the next layer.

The product lesson

AI automation gets more useful when it becomes more inspectable. The first product should not be an agent that acts like a maintainer. It should be a ledger that gives the maintainer better evidence, clearer proposals, and fewer tabs to juggle.

If that loop proves reliable, an apply lane can be added later with explicit approval, snapshot-hash checks, policy-hash checks, and narrow actions. If it does not prove reliable, nothing important was delegated to it.

That is the direction I like for Hermes: make delegation cheap, keep authority explicit, and let the operator see exactly why the machine suggested what it suggested.