What the operator stack actually does now

A few months ago, most of my AI work still felt like separate tools taped together.

Chat window. Shell scripts. Local models. Notes I half-remembered writing somewhere. Cron jobs that worked until they didn't. When something went wrong, the system usually had enough pieces to help, but not enough shape to tell me what had happened.

That has changed a bit.

Hermes is now less of a chatbot and more of the place the work passes through. I use it from Discord, but the interesting part is not Discord. The interesting part is that one surface can kick off a site check, ask Mads for a second opinion, search old session context, inspect a repo, run a browser check, write a report, and then tell me what is still unverified.

That sounds obvious when written down. It was not obvious while building it. Most of the work was not about finding a smarter model. It was about making the boring edges visible.

The control plane became the daily path

The main shift is simple: I stopped treating the assistant as a place to ask questions and started treating it as the control plane for the work around the questions.

If I ask what should change on hsynneva.com, the useful answer is not just a list of ideas. The useful path is to inspect the live site, inspect the repo, check what changed recently, ask Mads for a read-only review when the decision has taste or risk in it, make the smallest patch, verify locally, deploy, verify production, and write down what actually happened.

That last part matters more than I expected. A green message from an agent is cheap. A URL, a commit, a failing check, a redirect status, or a report artifact is harder to fake.

This is where Hermes has become useful. Not because it knows everything, but because it can hold the path together long enough for me to see what is real.

Agent work is mostly verification work

The more I use agents, the less impressed I am by autonomous motion on its own.

An agent can produce a lot. That is not the problem anymore. The problem is knowing whether any of it should be trusted.

So the system around the agent has become more important than the model call itself. Mads on OpenClaw is useful as a review lane because it is bounded. It can challenge the plan before I touch the site. It can say “hide a half-finished project instead of deploying it just to clear a 404.” It can catch the tone drift from notebook to product page.

But Mads is not the source of truth. Hermes is still the control plane. The repo, live site, command output, browser console, and deployment response are what settle the question.

That boundary took a while to get right. If every agent can edit, deploy, publish, and declare success, the system becomes fast in the worst possible way. Fast confusion is still confusion.

The current rule is dull, which is usually a good sign: advice first, mutation second, verification last.

Memory needed a split brain

Memory was another place where the naive version was wrong.

It is tempting to dump everything into long-term memory and hope the assistant gets smarter. In practice, that just creates a fuzzier source of truth. Some things should be remembered because I do not want to repeat myself. Other things should live in a notebook because they need to be inspected, edited, backed up, and treated as records.

So the memory stack split into roles.

Hindsight is useful for recurring facts: preferences, project constraints, stable quirks, things that reduce repeated steering. The Obsidian operator vault is where durable notes belong. Nextcloud backup is the boring safety layer so the vault does not only exist on one machine.

That split is not elegant. It is just clearer.

The important part is that memory is no longer treated as magic. It is plumbing. Some plumbing should be searchable. Some should be human-readable. Some should be restorable after I break something at midnight.

What is still messy

The stack still has rough edges.

Some tasks take too many steps. Some publishing paths are more manual than they should be. Some review artifacts are better than others. The distinction between durable memory, project notes, session history, and site content still needs discipline every time.

That is probably the shape of the work for a while.

Not building a perfect autonomous agent. Building enough control surfaces that the agent can be useful without pretending it is trustworthy by default.

That feels less glamorous than the usual AI demo.

Good.