Skip to content
← Back to Home

February 21, 2026 · Podcast · 1h 3min

OpenAI's Codex Lead: Why Coding as We Know It Is Over

#AI Coding#OpenAI Codex#AI Agents#Software Engineering#SaaS Disruption

Code generation is trivial. Code review is the bottleneck. And in five years, manually managing deployment will seem as absurd as writing assembly by hand.

That’s the thesis from Alexander Embiricos, product lead for Codex at OpenAI, in this wide-ranging conversation with Harry Stebbings on 20VC. The discussion covers everything from why PMs might be obsolete to how the SaaS landscape is being restructured, but the core thread is a surprisingly concrete vision of how AI agents will reshape software engineering in phases.

The Assembly Language Analogy

Embiricos rejects the lazy framing that “coding will be automated.” His counter: when we moved from assembly to high-level languages, nobody said coding was automated. We just wrote vastly more code, which meant we needed more engineers.

The word “computer” originally referred to humans at Bletchley Park punching cards and doing tabulated math. The first spreadsheet software was modeled on an office of desks arranged in a grid. Every time a specific task gets automated, demand for the output explodes.

“Now that we no longer write assembly, when that change happened and we moved to higher level languages, did we say coding is automated? Not really.”

His prediction: more builders in five years, not fewer. But the “talent stack” is compressing. The frontend/backend split is fading. On the Codex team, almost everyone is full-stack. And PMs? He half-jokes that you don’t need them. A strong engineering lead or design-minded person can cover everything a PM does, and a PM who isn’t the perfect fit “might do more harm than good.”

The Human Typing Speed Bottleneck

The most provocative idea: human typing speed and validation work is a key bottleneck to AGI, not model compute or architecture.

Harry uses AI 30+ times a day. How many times could AI help if it required zero effort? Tens of thousands. The gap between actual usage and potential is enormous, and it’s created by the friction of prompting.

“I work on this stuff. I know I should be using AI for everything, but I’m too lazy to type out that many prompts and I am too uncreative to figure out all the ways that AI can help me.”

When Embiricos joined OpenAI, he expected multimodal screen-sharing agents within a year. He was “completely wrong.” Multimodal progress was slower than expected. The real path turned out to be agents working through code and text, running independently so humans are no longer the bottleneck.

Three Phases of Agent Evolution

Embiricos lays out a clear phased model:

Phase 1 (current mainstream): AI assists while you code. Tab completion, pair programming. You’re at your laptop with hands on keyboard. This is the Cursor/Copilot model.

Phase 2 (Codex’s current target): You delegate tasks; agents execute independently. With GPT-5.2 Codex in December, the team hit an inflection point. “I’m just going to fully delegate this task. I’m going to have a plan with it, make sure we like the spec, and then let it cook.” Most people on the Codex team don’t open editors anymore.

Phase 3 (future): Agents own entire microservices. Full iterative loop including user feedback, without human review. This requires solving intelligence, safety, and controls simultaneously.

The critical barrier from phase 1 to 2 isn’t technology but user habits. You can’t jump straight to workflow automation before users are fluent with the tooling. Codex learned this the hard way: their cloud agent launched first last year with the brilliant idea of giving agents their own cloud computers, but “it didn’t work as well.” They pivoted to interactive products to build user fluency, and are now ready to return to cloud.

Codex App: Delegation, Not Pair Programming

The Codex app is deliberately not an IDE. No text editing. The mental model is managing a team: you assign tasks, provide context, review plans, and wait for results.

“Plan mode” is one of the most valued features. The agent proposes how it’s going to do something in a detailed plan, then asks questions before executing. Like a new hire presenting an RFC before writing code. Review of the plan is becoming more important than review of the code itself.

The team is also investing heavily in automated code review. Codex has been explicitly trained for it, optimized for few false positives so you can trust its feedback. Nearly all code at OpenAI is now automatically reviewed by Codex on push.

The agents.md Standard

A small but significant move: Quinn from AMP (out of the Sourcegraph team) tweeted asking OpenAI to buy the agents.md domain so they could standardize. OpenAI did. Now agents.md is a shared configuration format across most coding agents (with one notable exception that Embiricos pointedly doesn’t name but is clearly Anthropic’s Claude, which uses its own CLAUDE.md).

Skills are being standardized too, stored in a neutral agents/ folder rather than tool-specific directories. The goal: making it easy to switch between agents. Agent tasks are “episodic,” with vendor-neutral inputs (agents.md) and outputs (git patches).

But this ease of switching is temporary. As agents connect to external systems (Sentry, Google Docs, enterprise tools), they become stickier. Enterprise trust in agent security controls becomes a moat.

The Slack Analogy: Center of Gravity Wins

From his Dropbox years, Embiricos draws a key lesson. Dropbox thought users should comment directly on documents (more efficient). Instead, everyone discussed documents in Slack, because Slack was the center of gravity for communication. Even less efficient, it won on habit.

Applied to AI: the market will converge to a few super-assistant products. Companies don’t need 12 specialized agents. If employees have to figure out which agent handles what, they won’t achieve fluency, and without fluency, they won’t pull automation into their roles.

“Nobody wants to comment on the document. I just want to Slack you.”

The winning onboarding: “Go talk to this thing about anything you need.” Teams share best practices. Hackathons emerge around the tool. A single agent becomes the center of gravity for work.

SaaS: Who Lives, Who Dies

Harry pushes hard on the SaaS question. Embiricos offers a clean framework:

Will survive: Companies owning human relationships or systems of record. Both are more important than ever.

At risk: “Glue layer” companies that own neither.

Will be disrupted: Customer support. “I wouldn’t want to be in that category.”

Harry argues the SaaS sell-off is massively exaggerated. Monday.com users could vibe-code a to-do list, but the customization cost isn’t worthwhile. But Dropbox, his blunt assessment: “very difficult position.”

The investor landscape has shifted too. The “temporary anomaly” where pure product-building ability was an investment thesis is ending. Building good product is relatively easier now. Invest in founders with distribution thinking and domain expertise. Safe bets: physical infrastructure (energy) and complex relationship networks (Southeast Asian fintech with 500 bank partnerships). “Things OpenAI won’t do.”

The Competitive Landscape

On winning: compute advantage and having the best models, then build businesses to generate revenue, which creates pressure to improve models faster. A virtuous cycle.

On the “20-minute SOTA”: a competitor launched a model update 20 minutes before Codex’s GPT-5.3 update. Briefly state of the art. Then Codex shipped and reclaimed it.

On Claude Code: “I think the genius of when Claude Code first shipped was they had this tool that was super easy to use in whatever context you want, just in your terminal.” High praise from a direct competitor.

On pricing: Codex Cloud was effectively unlimited for a while. When rolled back to reasonable limits, a vocal minority generated outsized social backlash. “You can’t make things unlimited for too long.”

Career Advice: Agency, Taste, Quality

For CS students entering the workforce: it’s never been a better time. AI tools let you ramp into complex codebases at unprecedented speed. But because building is easier, the scarcer qualities are agency, taste, and quality.

“When someone writes to me with some interesting thoughts and a link to an interesting project, that gets my attention much more than a normal resume does.”

Some Thoughts

A few threads worth pulling on:

  • The “codegen is trivial, code review is the bottleneck” framing is the most important takeaway. If true, it means the competitive battleground for AI coding tools is shifting from generation quality to verification quality. The company that solves trusted autonomous code review first wins phase 3.
  • The Slack/Dropbox analogy is compelling but cuts both ways. If one super-assistant becomes the center of gravity, that’s extreme winner-take-all. Embiricos’s own logic suggests OpenAI (with ChatGPT’s distribution) is best positioned, which makes this less analysis than positioning.
  • His admission about being “completely wrong” on multimodal agents is refreshingly honest. The bet that “all agents are actually coding agents because coding is just the best way for an agent to use a computer” is a bold claim that has held up so far but could be invalidated by breakthroughs in computer vision.
  • The phased model explains a lot of Codex’s seemingly inconsistent product moves. Launching cloud first, pivoting to interactive, then returning to cloud wasn’t indecision; it was discovering that you can’t skip the fluency step.
Watch original →