Skip to content
serbyn.io

methodology note

How DW runs and passes are counted

DW is an autonomous, spec-driven development pipeline I designed and built. A short definition of the numbers so they mean something.

101 · 0.94logged runs · pass rateon DW, a system I designed and built

What counts as a run

A run is one specification submitted to the pipeline and executed end to end to a terminal state — planning, change, and verification — without a human taking over mid-way. Each run is logged, which is why the count is exact rather than estimated. Aborted or half-completed attempts are logged too; they simply don’t count as passes.

What counts as a pass

A pass is a run whose output satisfies the spec’s acceptance checks — it builds, the tests and evals it was asked to meet are green, and the result matches what the spec described — with no human edits to the produced change. A run that needs a person to finish or fix it is recorded as a non-pass, even if the final code eventually shipped.

Why the honesty matters

A pass rate is only worth quoting if a “pass” can’t quietly include work a human rescued. Counting human-touched runs as non-passes is the same discipline I bring to client agents: a green has to be earned and has to point at its evidence, or it isn’t a green. This figure is measured on my own system, not a client’s.

More on false greens and evidence-binding