OPEN_SOURCE ↗
REDDIT · REDDIT// 11d agoNEWS
Codex Runs 21 Hours, Ships Feature
A Reddit user claims Codex stayed autonomous for 21 hours while building, testing, and merging a cross-domain feature under an O.M.A.R. gate. It reads less like a benchmark and more like a workflow case study: long agent runs only work when failures get bounced back into the loop instead of handed to a human.
// ANALYSIS
The real signal here is not the duration, it’s the system design. If an agent can keep going through repeated rejections, re-evaluate, and eventually land clean code, that’s a stronger story than raw speed.
- –This is anecdotal, not independently verified, but it suggests Codex can sustain unusually long unattended sessions when the task loop is engineered well.
- –The O.M.A.R. gate is the interesting part: deterministic checks turn a fragile agent into a self-correcting pipeline.
- –The post reinforces a broader pattern in agentic coding: durability comes from guardrails, not from hoping the model “just knows” the right answer.
- –For teams, the practical lesson is that autonomous coding is a systems problem as much as a model problem.
- –The Claude Code comparison is mostly rhetorical, but it highlights the current competitive axis: not just code quality, but how long an agent can operate safely without intervention.
// TAGS
codexclaude-codeagentclitestingautomation
DISCOVERED
11d ago
2026-03-31
PUBLISHED
11d ago
2026-03-31
RELEVANCE
9/ 10
AUTHOR
Adept-Minute-2663