Insights·2026-06-23

How do you prevent coding agents from faking completion and sharing blind spots?

Coding agents fail in three recurring ways — false completion, unconsensused design, and single-model blind spots — and you close each with a harness layer: consensus, verification, and cross-review. The open-source harness oh-my-claudecode (OMC) ships ralplan, which forces a plan through a Planner-Architect-Critic consensus loop before any code; ralph, which declares done only after a separate reviewer verifies each user story; and ccg, cross-review that fills one model's blind spot with another. The gap in AI results usually comes from the harness, not the model.

How coding agents fail

Put a coding agent into real work and the failures take three shapes. First, it implements half the task and declares it done, reporting completion while tests are skipped and edge cases forgotten. Second, it pours out code from an under-discussed design, only to discover the direction was wrong. Third, it trusts one model's judgment and falls into that model's blind spot along with it.

None of these is a problem of model intelligence. They are problems of how the work is run — the harness. oh-my-claudecode (OMC) offers a workflow for each of the three gaps.

ralplan forces consensus before any code

ralplan is a consensus-based planning workflow. A Planner lays out principles, decision drivers, and two or more options; an Architect raises the strongest counter-argument; a Critic checks the verification standard and testable acceptance criteria.

The point is the loop. Planner, Architect, and Critic re-run up to five times until the Critic approves, and not a single line of code is touched before the plan passes. It catches design-stage mistakes before they are translated into code.

ralph treats completion as something to be verified

ralph is a persistence loop. It breaks the work into testable user stories written to prd.json and iterates on each until it passes. Progress and learnings persist across sessions, so it resumes even after stopping midway.

The key difference is the definition of done. ralph does not finish when it says so itself. Only after a separate reviewer verifies against the acceptance criteria is completion granted. The structure blocks passing off partial work as done, or deleting tests to make them pass.

ccg is cross-review that does not lean on one model

ccg is cross-review across three models — Claude, Codex, and Gemini. It asks Codex about architecture, correctness, risks, and test strategy, and Gemini about usability, alternatives, and documentation clarity, on the same problem at once.

Claude then synthesizes the two answers. The points where the two models disagree are the most valuable, because the blind spot one model misses is exposed by the other. It structurally trims the confirmation bias of single-model review.

Installation and the AX view

There are two install paths. The plugin route is /plugin install oh-my-claudecode; the npm route is npm i -g oh-my-claude-sisyphus@latest, both followed by /oh-my-claudecode:omc-setup to configure. To use cross-review, install the Codex CLI (npm install -g @openai/codex) and the Gemini CLI (npm install -g @google/gemini-cli) as well.

In enterprise AX work, most of 'we tried AI and it was underwhelming' comes from the harness, not the model. With the same model, running without consensus spins in place, stopping without verification piles up false completions, and trusting one model misses the blind spots. Change how the work is run before changing the model.