February 12, 2026 · Podcast · 1h 19min

OpenAI's Head of Platform on Why Engineers Are Becoming Sorcerers

#AI Agents#Future of Work#OpenAI#Software Engineering#AI Platform Strategy

Engineers don’t write code anymore. They manage fleets.

At OpenAI, 95% of engineers use Codex. 100% of PRs are reviewed by Codex. Engineers routinely manage 10 to 20 parallel AI agents working on their behalf. Sherwin Wu, head of platform engineering at OpenAI, describes a world that most engineering orgs haven’t caught up to yet: one where the engineer’s job has shifted from writing code to orchestrating agents, reviewing their output, and deciding what to build next.

Sherwin joined Lenny Rachitsky on his podcast for a wide-ranging conversation covering how AI is reshaping engineering work, why managers are becoming less relevant, the coming wave of one-person billion-dollar startups, and what OpenAI sees in the next 12-24 months of model capability. The tone is practical and grounded. Sherwin speaks from inside the machine.

Managing Agents Is the New Job

The biggest shift isn’t tools. It’s the shape of work. Engineers at OpenAI now spend most of their time as “tech leads” managing agent fleets rather than writing code line by line. Sherwin compares it to casting spells: you dispatch agents to do things, they come back with results, and you evaluate and redirect.

This creates a new kind of stress. There’s a real anxiety people feel when their agents are working and they can’t see what’s happening. One team inside OpenAI is running an experiment with a 100% Codex-written codebase, where humans never touch the code directly. They hit all the expected problems: when something breaks, the team can’t just roll up their sleeves and fix it manually. They don’t have that escape hatch.

The parallel agent workflow also changes how you think about tasks. Instead of sequential deep work, you’re doing breadth-first exploration: kicking off multiple agents on different approaches, evaluating which one is working, and doubling down. Sherwin describes running 10-20 agents simultaneously, which means you need to context-switch rapidly and maintain a mental model of what each agent is doing.

Code Review in 2 Minutes, Not 15

One concrete win: OpenAI cut code review times from 10-15 minutes to 2-3 minutes using Codex. Every PR goes through automated review before a human sees it. The model catches style issues, potential bugs, and consistency problems, leaving human reviewers to focus on architecture and design decisions.

This isn’t just a speed improvement. It changes the review culture. When the machine handles the tedious parts, human reviewers can spend their limited attention on what actually matters: whether the approach is right, whether the abstraction makes sense, whether this is the right thing to build.

The Changing Role of Managers

Sherwin is blunt: the traditional engineering manager role is under pressure. When individual contributors can manage fleets of agents and multiply their output by 5-10x, the ratio of ICs to managers shifts dramatically. You need fewer managers, and the ones you keep need to be more technical.

The management skills that matter now are different. It’s less about project management and more about technical judgment: can you evaluate the output of agents? Can you spot when an agent is going down the wrong path? Can you design the right decomposition of a problem so agents can work on it effectively?

Sherwin’s advice to managers: get technical again. The managers who will thrive are the ones who can review agent output with the same rigor they’d review human code.

The One-Person Billion-Dollar Startup

Sherwin believes we’re entering the era of the one-person billion-dollar startup. Not as a hypothetical, but as an emerging reality. When one engineer can manage 20 agents, the effective team size of a solo founder is already 20+.

But the second-order effects are even more interesting. To enable that one-person billion-dollar startup, you might need a hundred small startups building bespoke software and services around it. Sherwin argues this could trigger a golden age of B2B SaaS, not the death of it. The market for specialized tools and services explodes when individuals can operate at enterprise scale.

“To enable a one-person billion-dollar startup, there might be a hundred other small startups building bespoke software.”

Models Will Eat Your Scaffolding for Breakfast

This is Sherwin’s sharpest piece of advice for builders on the OpenAI platform. Don’t over-invest in scaffolding, guardrails, and elaborate prompt chains. The models themselves are improving so fast that today’s clever workaround becomes tomorrow’s native capability.

He’s seen this pattern repeatedly: teams build complex orchestration layers to compensate for model limitations, then a new model release makes all that scaffolding unnecessary. The teams that built thin layers and stayed close to the model’s native capabilities adapted fastest.

“The models will eat your scaffolding for breakfast.”

The practical implication: build for where models are going, not where they are today. Kevin Weil, OpenAI’s VP of product, has a line Sherwin quotes often:

“This is the worst the models will ever be.”

Don’t Listen to Customers (Sometimes)

Sherwin shares a counterintuitive stance on customer feedback in AI: listening to customers is not always the right strategy. The field and the models are changing so fast that customer requests often reflect the current state of the technology, not where it’s heading. By the time you build what a customer asked for, the models may have evolved past the need entirely.

This doesn’t mean ignoring customers. It means understanding that in a rapidly improving capability landscape, the right product decision is often to wait for the model to catch up rather than building elaborate workarounds. The teams that tend to disrupt themselves are the ones who over-index on current customer pain points.

The Next 18 Months of Models

Sherwin offers a window into what OpenAI expects over the next 12-24 months:

Reasoning gets much better: Models will become significantly better at multi-step reasoning, planning, and executing complex tasks autonomously.
Agent reliability crosses a threshold: The current generation of agents works well for narrow tasks but fails on complex, multi-step workflows. That’s about to change. Agents that can reliably work for hours on open-ended tasks are coming.
Cost drops dramatically: The cost curve for inference is plummeting. What costs dollars today will cost pennies. This unlocks use cases that are currently economically impossible.
The productivity gap widens: The difference between engineers who are AI power users and those who aren’t is already large and will grow much wider. Early adopters will have a compounding advantage.

OpenAI’s Platform Stack

Sherwin lays out the layers of OpenAI’s developer platform:

Responses API: The lowest-level primitive. You send text, the model works for a while, you get results back. Optimized for long-running agents. Most popular API endpoint.
Agents SDK: A framework layer for building agent systems with sub-agents, guardrails, and task delegation. Handles orchestration of agent swarms.
Agent Kit & Widgets: UI components for building beautiful interfaces on top of agents. Standardized components for common agent interaction patterns.
Evals API: Quantitative testing for models, agents, and workflows. Lets you measure whether your system is actually improving.

The philosophy is deliberate: unpopinionated at the bottom, increasingly opinionated as you go up. Start at the lowest level you’re comfortable with and add abstraction only when you need it.

Business Process Automation Beyond Code

While most attention goes to code generation, Sherwin sees an even larger opportunity in business process automation. Every company has hundreds of internal processes that run on spreadsheets, email chains, and manual coordination. AI agents can automate these at a fraction of the cost of traditional enterprise software.

This is where the democratization story gets real. You don’t need to be a software engineer to use these tools. Connect ChatGPT to your Notion, Slack, and GitHub. See what it can and can’t do. Understand the limitations now so you can watch the trend as models improve.

A Rare Window

Sherwin entered the workforce in 2014. He describes a stretch of five to six years where tech wasn’t particularly exciting. The last three years have been the most energizing period of his career, and he expects the next two to three years to be a continuation.

His message: don’t take this window for granted. At some point this wave will play out and become incremental. In the meantime, lean in. Build things. Use the tools. You don’t need to track every new release on X. Start small with one or two tools and engage genuinely with what’s possible.

“The next two to three years are going to be some of the most fun in tech and in the startup world that we’ll have in a very long time.”

Afterthoughts

This conversation is valuable less for any single revelation and more for the texture it provides of what it’s actually like to work inside the most advanced AI engineering org in the world. A few things worth sitting with:

The 100% Codex codebase experiment is the most telling detail. It reveals both the ambition and the honest reckoning with failure modes. The fact that “you can’t just roll up your sleeves” when agents fail is the defining constraint of this new paradigm.
The “models eat scaffolding” insight is the single most actionable takeaway for builders. It’s a strong argument for staying thin, staying close to the model, and resisting the urge to over-engineer.
Sherwin’s framing of B2B SaaS as a beneficiary, not a victim, of the one-person startup trend is a contrarian take worth tracking. If he’s right, the current panic about AI killing SaaS is precisely backwards.
The widening productivity gap between AI power users and everyone else is perhaps the most consequential trend he names. This is not a technology story. It’s a labor market story.

Watch original →