OpenAI's latest Codex research matters because it measures a capability that benchmark tables usually miss: how much delegated work people are actually comfortable giving agents in production.

What OpenAI Reported

On June 25, 2026, OpenAI published new economic research on Codex usage. OpenAI says that by May 2026, 80.6% of sampled individual users had made at least one Codex request estimated to exceed 30 minutes of human work, 70.2% had made one estimated to exceed one hour, and 25.6% had made one estimated to exceed eight hours.

OpenAI also reports that by June 2026, users at the 99th percentile were regularly generating more than 60 hours of Codex agent turns per day across multiple parallel agents.

Why This Capability Signal Is Different

Most capability discussions focus on whether a model can solve a benchmark or a toy task. This paper is more operational. It asks whether people are willing to hand off work that would otherwise consume large fractions of a human day and whether they are willing to do that repeatedly across multiple parallel runs.

That is a much more useful lens for zero-human companies. The important question is not only “is the model smart?” It is “can the system take ownership of enough work, for long enough, across enough threads, to change the labor model?”

Why Non-Developer Adoption Matters

OpenAI says non-developer adoption grew faster than developer adoption and that legal, finance, and recruiting shifted into majority Codex use around April 2026. That matters because it suggests agentic work is escaping the engineering sandbox and moving into the operating functions that define real companies.

Once non-technical teams treat agents as their primary AI tool, the unit of work changes. Instead of “ask a model for help,” it becomes “delegate a block of labor and wait for a reviewed outcome.”

The Take

OpenAI's latest data suggests the frontier is shifting from agent demos toward agent labor markets. Multi-hour, cross-functional, parallel delegated work is becoming normal enough to measure. That is one of the strongest practical capability signals in the market right now.

Related: See our earlier research on GPT-5.6 Sol, subagent scaling patterns, and subagent performance optimization.