A Build Tool You Can Check

A workflow engine has exactly one job that matters: deciding what not to redo. Everything else it does — ordering steps, running them, drawing the progress bar — serves that single judgment call. And if you can’t trust the engine on that one point, it’s worse than useless. You’ll re-run everything by hand to be safe, and you’re back where you started, minus the time you spent installing it.

So the real question under every computational pipeline isn’t “how fast is it?” It’s quieter than that: can I trust what the machine decided to skip? A genomics run over hundreds of samples, an alpha pipeline from ingestion to backtest, a model-training sweep — anywhere a result has to be reproducible, that question is already sitting there. Reproducibility isn’t a box you tick at the end. It’s the thing that earns the engine the right to skip work at all.

OxyMake is a workflow engine built around taking that question seriously. Here is what that buys you on an ordinary morning.

What that looks like on a Tuesday morning

You switch to an old branch to check one thing. git checkout, a few seconds, done. You re-run your pipeline and it starts rebuilding — all of it, including the hours-long job from last week whose inputs are sitting right there, untouched. Nothing about those inputs changed; the checkout just rewrote the dates on the files. You watch the progress bar knowing with complete certainty that the work is pointless. A tool that decides by timestamp can’t know that. It saw fresh dates and did the safe, useless thing.

Under OxyMake the same morning is quiet. It looks at the inputs, recognizes it has built this exact content before, and hands back the answer it already has. The phantom rebuild never starts. How it knows is the next section. That it knows is something you can check on your own pipeline in about five minutes — and you don’t even have to leave the Snakefile you already have:

cargo install --git https://github.com/noogram/oxymake ox-cli
ox translate Snakefile -o Oxymakefile.toml   # reads the Snakefile you already have
ox run                                        # builds it once
ox run                                        # second time: nothing rebuilds

The honest boundary belongs right here, because it is what makes the rest believable. OxyMake only knows about the inputs your rule declares. There is no sandbox. If a step quietly reads a file it never mentioned, that file is invisible to the fingerprint, and the cache can hand you a stale answer with a straight face — a silent false hit, which is worse than the phantom rebuild ever was. Declare what you read, and the guarantee holds. Hide an input, and you are on your own: no tool can fingerprint a dependency it was never told about.

Ask the bytes, not the clock

So how does it know? Make and Snakemake — and Nextflow, and the rest of the family — look at when a file was last written. But a timestamp is a rumor. A git checkout resets it to now. So does an unguarded copy, a backup restore, a clock drifting on a shared disk, a fresh clone in CI: all of them push the date forward without a single byte changing.

OxyMake asks the bytes instead. For each step it computes a fingerprint (BLAKE3) of the things that actually determine the result — the rule’s recipe, the content of its declared inputs, the parameters, the declared environment, the platform. Same content, same fingerprint, nothing to redo. The timestamp survives only as a shortcut: if the date hasn’t moved, don’t bother re-reading the file; if it has, re-read and find out whether the content really changed.

That swap — the clock for the content — is the whole idea, and the project is blunt about where it ranks. Determinism here isn’t a feature sitting on top of the engine; it’s the floor the engine stands on. A cook doesn’t remake a dish because the clock struck the hour. He remakes it when the ingredients have actually changed.

A question with a long history

None of this is new, and OxyMake doesn’t pretend it is. Make taught pipelines, fifty years ago, to rebuild only what changed. Snakemake carried that discipline into research in 2012 — wildcards, a real dependency graph, rules that read like a recipe — and three thousand papers later its core ideas are still sound. People reach for Snakemake for good reasons, and OxyMake keeps almost everything they built.

What aged is the implementation, not the idea. The pipeline description is a full Python program, so you can’t always tell what it will do without running it. And change detection that trusts a file’s timestamp has exactly the weak spot the Tuesday morning showed. The ideas are fine. The plumbing under them has a soft joint.

What you keep, and what you give up

Two things are true of the engine that are easy to miss, then the part most announcements leave out.

The pipeline is a file you read, not a program you run. It is flat, declarative TOML — rules, inputs, outputs — and you can open it and see what it does without executing anything. The code lives inside the steps; the shape of the pipeline stays inspectable. The cost, stated plainly: generating a complicated config means doing that generation outside, in your own scripts. A deliberate trade, not an oversight.

There is no daemon and no server. A single SQLite file coordinates everything: nothing to install, nothing running in the background, nothing to keep alive. ox run doesn’t mean “launch these jobs”; it means “make sure these outputs exist,” which is convergent and idempotent. Run it twice and nothing extra happens. Run two of them on disjoint parts of the graph and they cooperate instead of colliding. The cost again, said out loud: that file lives on a local disk, not a network share.

Now the part the launch posts skip: the price of the swap. Start with what you gain, because that is the case you hit every day. On a no-op re-run at ten thousand jobs — the second time, the tenth, the run right after a branch switch — OxyMake is 7.54× faster down the metadata path, and still 4.02× faster when it re-hashes every byte to be certain. That is the pipeline you already built, asked to prove it’s current, and answering fast.

The cost is the cold run. The first time, with nothing cached, OxyMake is slower than Snakemake, because it stops to fingerprint everything: 0.80×, 0.44×, 0.70× at 100, 1,000, and 10,000 jobs — 1.25 to 2.3 times slower. The paper measures it end to end and doesn’t flinch; its own words are “we own this result.” You pay once, cold, to make every later run cheap and certain.

Every one of these numbers comes from one reproducible benchmark, bash bench/snakemake-vs-oxymake/run.sh, measured on 2026-06-10 on a single Apple M4 Max — a Linux/x86_64 re-run is still pending, and the hundred-thousand-job point is projected, not measured. So don’t take the 7.54× on faith either: run the bench on your own box and see what your hardware says.

A different bet, and where it fits

One fact keeps the lineage honest, and a happy Snakemake user will reach for it first: under that same clock-churn test, Snakemake 7.x rebuilds zero jobs too. Since its 7.x series it records provenance in a side file next to each output and compares against that instead of the timestamp. So this isn’t OxyMake fixing a broken Snakemake — both answer the same danger; they answer it differently.

Here is the difference, concretely. Snakemake trusts a record it wrote alongside the result. OxyMake recomputes the answer from the bytes themselves and keeps no record it has to trust. When the side file and the bytes agree, you will never tell them apart — and if your pipelines live on one stable workstation, Snakemake 7.x already does this well and you should stay. The gap opens when the side file and the bytes can’t agree: a throwaway CI runner with no history, a fresh clone, a result another machine copied onto a shared disk. There the bookkeeping is missing or not yours to trust, and recomputing from content is the bet that pays. That’s the day to switch, and not before. You can find out which bet you prefer without leaving Snakemake at all — ox translate reads your Snakefile and runs the DAG you already wrote; don’t like it, and nothing has changed, the Snakefile is untouched. (The steps inside can be Python, R, or Julia; OxyMake talks to all three without marrying a version of any.)

That posture — compose, don’t compete — is the one OxyMake takes toward the rest of the stack too. Reproducibility has layers, and OxyMake is honest about which ones it owns. It handles the workflow layers: same plan, same graph; same graph, same outputs; a run you can audit afterward. The layer underneath — same input, same binary — it hands to the tool built for that: Nix, Guix, or Docker. OxyMake records the environment’s hash and runs each step inside it (environment.nix = "<flake>" wraps the step in nix develop -c). OxyMake and Nix together cover the whole stack: Nix pins the bytes of your tools, OxyMake decides which steps to re-run. Neither tries to be the other.

That same honesty is where OxyMake meets FAIR — specifically the R, reproducible and reusable. The content fingerprint gives every artefact a stable identity instead of “the file from Tuesday” (Findable); the readable TOML and ox translate keep it open to the tools already in use (Interoperable); the --json event stream is the raw material of a provenance trace. Packaging that trace in the community’s own formats — RO-Crate, W3C PROV — isn’t in the engine today; it is a named commitment for the v1.1 cycle, and the paper says so in the same breath. The alignment that ships is real; the export layer is promised, not pretended.

The guarantee, verified by a machine

You don’t need any of this to try OxyMake — the command is below, and the engine runs whether or not you read on. This section is for the reader who wants to know why the skip is safe to trust, not just that it usually is.

Picture a concurrent program as several cooks in one kitchen. Each moves at their own pace; one can stall at the worst possible moment; two can reach for the same pot. The nasty bugs aren’t in any single line of code — they live in the order the steps happen to interleave, and there are billions of possible orders where a test suite only ever tries a handful. TLA+ is a way of writing the kitchen’s rules down — here are the allowed moves, here is what must never happen — and a checker called TLC then takes a small kitchen, two or three cooks and two or three pots, and walks every possible order without leaving one out.

OxyMake points this at the corner that matters most: the promise that the cache never lies. CacheConsistency states it precisely enough for a machine to attack it — a cache hit can only return content that was actually produced and recorded, filed under a name that is computed, never guessed. That is the contract the Tuesday morning rests on. One assumption inside it has teeth, and it is worth knowing before you deploy: the guarantee holds only while SQLite commits each change in one indivisible step, which is true on a local disk and false on a network share like NFS, Lustre, or GPFS — exactly where you would most want several sessions at once. So the engine insists its state journal live on local disk; point it at a network share and that guarantee loses its footing.

“Verified,” here, means bounded model-checking — every interleaving of a small fixed kitchen — not a proof for instances of any size. The paper states the bounds exactly, lists the handful of things the model trusts rather than checks (that rename is atomic, that fsync really persists, that the executor reports status honestly, and a few more), and ships the deliberately broken specs you can run to watch the checker catch the bug in a few steps. That last part is the tell that the specs test something real rather than decorate the repository.

Built where you can watch

There is a second place OxyMake shows its work instead of asking for trust, and it is the part that makes this a noogram release rather than one more tool on a shelf.

Start with the rule that held the entire time: a person owns the merge queue. No commit reached the main line without passing the full automated check and a human reading the diff — and the parts that matter most for correctness, the cache logic that decides what not to rebuild, were the most reviewed of all, their tests written before the code that had to pass them. Only against that backdrop is the next sentence the right kind of surprising: most of the code in this repository was written by AI agents under one human maintainer. Of about fifteen hundred commits, roughly two thirds were authored by agents.

For a tool whose whole job is deciding what work to skip, “who wrote the part that decides” is a fair question, and the project answers it. The agents are named, one by one, in the project’s making-of write-up. A caveat in the same breath: the commit history was squashed when the repository was opened, so that write-up — not the public commit log — is where the per-agent record now lives, and a write-up is something its author controls. So don’t lean your trust on it. Lean it on what you can check for yourself: the code is public and the tests are public, the cache logic most of all. You can read the part that decides what to skip, and run the tests that gate it, without taking anyone’s word for who typed which line.

The work ran in two chapters. The first months used Gas Town, Steve Yegge’s agent-orchestration rig, where worker agents carry names out of Mad Max and a patrol watches the merge queue. Then the maintainer’s own orchestrator, cosmon, took over, and runs the repository still. Two different machines for pointing a fleet of agents at one problem, one after the other — and through both, the same human-owned merge queue.

The full version

If you want the complete argument, it is in the paper: OxyMake: A Formally-Specified, Content-Addressable Workflow Engine. The detail worth the click isn’t a superlative — it’s that the paper publishes the benchmark that changed the authors’ minds. OxyMake’s own default setting was once the engine that got fooled by clock churn, rebuilding two thirds of a pipeline; the measurement condemned the shortcut, and the default was changed. A project that prints the bench that made it change its mind is one worth reading in full.

→ https://github.com/noogram/oxymake/blob/main/docs/paper/oxymake-paper.pdf

Try it

OxyMake is public: the repository, the docs, the first tagged release with ready-to-run binaries.

Docs and worked examples at oxymake.noogram.dev
Code at github.com/noogram/oxymake
Install straight from the repository:

cargo install --git https://github.com/noogram/oxymake ox-cli

Two honest notes, in the order that matters. For support, this is a best-effort project from a single maintainer with a demanding day job: answers come in weeks rather than days, and the quiet stretches are announced rather than left to guesswork. But to try it you need no one. What’s there works on its own — the tests gate every change, the entire state is one SQLite file you can delete and rebuild, and if you came in through ox translate, your Snakefile never moved, so the worst case is simply the setup you already had. The rebuild that shouldn’t happen doesn’t. That is the part you can lean on.