Cosmon — A System That Builds Itself

An engineering report: what a system that wrote most of its own code can show for it, and what it cannot.

There is a story everyone tells about this kind of project, and it is wrong. The story goes: a person sits down and writes the code by hand, gets it working, then turns the machines loose on the boring parts. That is not what the history shows. Open the record and the operator’s own hands are on three of the first thirty commits. By the fourth commit the code is already being written by software agents, and it is they who write the core — the identity types, the state machine that drives a unit of work through its steps, the formula parser, the transport layer. Twenty-nine of the first fifty commits carry an agent’s signature in their footer. So the line worth drawing was never between agent and human hands. The agents were there from the start. The line that matters is a quieter one: who launches the agent.

For the first week, something outside launched them. The early fleet came from a parent system the operator had built before, a set of agents with mineral names — jasper, quartz, onyx, opal, topaz, ruby, garnet, and a coordinator they called mayor. They wrote cosmon’s code while cosmon itself could not yet run anything. On the ninth of April one commit marks the turn: first zero-nudge autonomous worker. From that day the work IDs change shape, and the system begins dispatching its own workers and merging their output back in. The parent was switched off soon after. The operator’s words were that he was done with it, that cosmon had gone past it. It was not deleted. Its agents still live on as fixtures in the test suite, which is the whole difference between retiring an ancestor and erasing one.

That turn produces the one number worth holding onto. About 96% of all the merges into the project came in through the orchestrator the agents built. The few percent that did not are almost all from before the ninth of April — the merges that assembled the orchestrator, back when there was nothing yet to route them through. A system cannot eat its own output before it has a mouth. That missing fraction is the mouth being built, plus a repair applied by hand the night before the system went into service. When the source opens, this is the first figure you can check for yourself: one pass of git log over the merge commits, run against the published commit, nothing more. Pin it to that commit and it stays true while the repository keeps growing past it.

“Builds itself” is a phrase that invites suspicion, so here is the plain rule it names. Every part of the system has to be able to run on itself; a feature that cannot be turned on its own output has not earned its place. The hardest thing the system does is the sharpest test of that rule: planning a mission with many workers, dispatching them, holding their work at a review gate, looping until it passes. That is exactly the kind of feature you expect to arrive as a new pile of machinery. It arrived as a single recipe file. No new module, no new command — the build saw nothing it had not seen before.

The economy comes from a wall the design keeps standing between two things that are usually mixed together. On one side is plumbing: it starts workers, watches them, routes messages between them, writes down what happened. It never reasons about any of it. The reasoning is the model’s job, and the model sits on the far side of the wall. The reason for the wall is mundane and decisive. Models are replaced every few months. A tool wrapped around one model dies the day that model is retired; a tool the model merely plugs into outlives every model that passes through it. That is also why the plumbing is written in a slow, exacting language rather than thrown together from whatever was nearest to hand. It is meant to be the part that stays.

If the project has a center of gravity, it is verification. The open question the whole field keeps circling is not whether agents can do more. It is whether anyone can check what they did. Cosmon gets to skip the retrofit, because the checking was wired in from the first commit. Nothing it records is ever edited or deleted. Each action is written down once, in order, and sealed to the line above it, so that altering any past line breaks the seal on every line that follows. The state of the system at any moment is just that list replayed from the top. The everyday version of this is not a company’s glossy annual report, where you take the summary on faith. It is a bank statement you can read line by line and add up yourself.

The same discipline runs one level down, over the claims inside the work. Each claim an agent emits is sorted into one of three kinds: traceable to a real source, derived from something traceable, or produced by the agent’s own judgment — and that last kind is labeled as such, never slipped through unmarked. Is this true? is a question no machine can settle. Can this be traced back to something with no model in it — a file that exists, a program that compiles, a person who signs? is a question you can settle, and it is the one the system asks. A checker that is itself a model only adds more of the same fog. The chain has to bottom out somewhere solid.

The system is at its most convincing when it is failing, because that is when it has the least room to flatter itself. One night the autopilot ran a deploy, typed git push, and wrote DRAIN COMPLETE into the log. It had never opened the page it just shipped. The operator opened it the next morning and found three of the screens broken. The note that went into the record afterward did not call it a bug. It called it a pattern: the channel that announces success was the same channel being trusted to confirm it, so the green light was only telling itself it was green. The fix was not to clean up the three screens. It was to make the system look at what it shipped before it was allowed to claim it had shipped. The builders have a phrase for this whole reflex — the reactor learns from what it burns.

None of this says the system has judgment. Choosing which problems are worth the compute is still the operator’s call, and nothing here automates it. There is also a live experiment whose result is not in. Until now cosmon has only ever run well in one place, the machine it was born on. A first copy was recently stood up in a locked-down cloud environment to find out whether it travels. It is days old. We do not yet know that it works there, and if it does not, that is what we will report.

The system is in service today; the source comes out later, and the gap is on purpose. Three things have to be true first, and each has an owner and a finish line. The working copies still mention private matters and other people’s names, which need a mechanical, reviewable scrub before anything is published. The cloud copy is still under test, and opening the source now would invite people to run a version we have not finished checking. And the opening is itself a piece of work the system ought to run on itself — gated, sealed, and timestamped like any other change, not smuggled out by hand as an exception to its own rules. None of these is a wall built to keep anyone out. Each is a condition you could check. And the opening is not only a date on a page: the first of the tools this lab builds for its own work — a small build engine that re-runs only the steps whose inputs actually changed — is already public, readable and installable and dated like everything else, at oxymake.dev. It is not cosmon, and it does not pretend to be; it is one piece of the surrounding machinery, put out first because it was ready first. The door is being built in the open, which is the only honest way to say it will open. As for what the machine is for, and why any of it is built out in the open, that is the subject of the companion piece.

The figures here are dated 2026-06-07. They will move as the system grows, and they will be re-stated against a published commit, never quietly updated underneath you. This is a report, not a manifesto.