What 67 days of agent-orchestrated development measure — and the one bit that matters.
For most of software’s history, a person drove every step in the development cycle. Cosmon is an experiment in handing a growing share of its own construction to the agents it orchestrates — and then measuring, honestly, how far that actually goes.
It is not a product announcement. It is a report on a system that has been building itself, in the open, for sixty-seven days, run by one operator. Below are the numbers, with their caveats, and the single idea underneath all of them.
Cosmon rests on four founding principles. Three are consequences of the first.
The first principle is the one to watch, because it is exactly the dynamic the rest of the field is now naming out loud: a system that increasingly builds itself. Cosmon’s bet is that this is only safe if it is bounded (minimum action resists the runaway loop), non-cognitive at the seam (the self-building substrate does not itself think), and measured (no claim survives that the running system contradicts).
The point of the self-reference principle is that it is falsifiable. Here is what it measures, as of 7 June 2026:
The caveats are not a disclaimer; they are the product. The intended reader is a scientist, and for a scientist a number without its method is noise.
Cosmon borrows its vocabulary from physics, and with it the debt: a beautiful metaphor that does not survive contact with observation is decoration, not architecture. Observe the running system; attach numbers; propose the simplest mechanism that compresses them; derive a prediction the model has not yet been tested against; if it fails, update the model — not the data.
“It doesn’t matter how beautiful your theory is… If it doesn’t agree with experiment, it’s wrong.” — R. Feynman
Every claim above is subject to that test. The phase-transition language, the “self-building” language — each is a prediction with a measurable threshold, or it is struck.
There is a reason this report arrives now. The field’s conversation has converged on a single hard problem — not capability, but verification: when an agent acts and a human reviews, what is the smallest record that makes the act checkable by anyone, with no one to trust?
Cosmon answers it structurally, in the code, not in a policy: the events are the system (the state is a projection of an append-only, hash-chained event log), and time is anchored independently of the operator. The bit “what was desired” and the bit “what was observed” are kept separate, with a deliberate one-tick lag between writing and reading — which is also where the safety margin lives.
This is the quiet inversion. The hardest thing to build at the scale of the whole field — a verification regime no actor can fake — is the thing a small, self-referential system can have by construction, because it was instrumented from the first commit.
Honesty cuts both ways. Cosmon does not yet demonstrate research judgment — choosing which problems are worth the compute remains the operator’s. And there is a live, falsifiable experiment now underway: cosmon has only ever run well in its cradle, one operator’s machine. A first instance has just been deployed in a hardened cloud environment, with an encrypted core, to test whether it works as well outside that cradle. We do not yet know that it does. We will report what we find — including if it fails.
That is the whole posture: a system that builds itself, measured against the discipline of someone who would rather be proven wrong early than admired late.
The system runs in production today. The public source release comes later — and the gap is deliberate, not a delay. Four engineering reasons hold the door:
None of these is a moat. Each is a checkable precondition with an owner and a finish line. When they are met, the source opens.
Cosmon is an open-source framework; the source will be released under AGPL. This is a report, not a manifesto; the numbers are dated 2026-06-07 and will be re-stated as the system grows.