January 29, 2026 · Podcast · 1h 3min

Jerry Tworek on Why Static Models Will Never Be AGI

#AGI Timeline#Reinforcement Learning#Continual Learning#OpenAI#Research Culture

The architect of OpenAI’s reasoning models has concluded that everything he built isn’t enough. Jerry Tworek, who led the development of o1, o3, and Codex as VP of Research at OpenAI, left the lab precisely because he believes the current paradigm has a ceiling. The models are brilliant at what they’re trained for. They just can’t get unstuck.

Episode Overview

Jerry Tworek spent seven years at OpenAI, from a 30-person research lab to one of the most valuable companies in history. He introduced reasoning models to the world, a shift he calls “tectonic.” Now he’s walked away, citing a very specific intellectual itch: the models he helped build can solve olympiad problems and prove new mathematical theorems, but they become “hopeless” the moment they hit an unfamiliar wall. No mechanism for self-correction, no ability to update beliefs based on failure. For Jerry, that gap is the gap between a tool and intelligence.

This conversation covers the honest mechanics of scaling RL, why labs are converging on similar approaches (and why that might be dangerous), the pivotal decisions that made OpenAI what it is, and why focus explains 95% of competitive outcomes in AI.

You Get What You Train For

Jerry’s mental model of current AI capabilities is brutally simple: scaling works, it’s predictable, and you get exactly what you train for. Nothing more.

Pre-training builds a linguistic world model. Reinforcement learning acquires specific skills. In both cases, there are “basically no limits” to improving on your training objective. If you care about a skill, you do RL on it and the model gets good. That part is settled.

The unsettled question is generalization. How do models perform outside their training distribution? The answer, Jerry suggests, is “probably not that great.” This creates a fork in the road: either we’re still early in RL and generalization will emerge with more scale, or we fundamentally need something new.

Jerry leans toward the latter, but frames it as an economics question. The current loop of adding targeted data to fix weaknesses is “really powerful” but slow. Every quarter, every lab releases a better model, mostly by scaling compute and, more importantly, adding data targeted at what the previous model was bad at. It works. The question is whether something could give us “more results with less data.”

The Surgeon Who Breaks the Rules

When asked what RL can and can’t do today, Jerry reaches for a striking analogy. The boundary isn’t about domain but about feedback signals.

Coding and math competitions have clear, fast feedback. Writing a good book? You might need to wait years for market signals. Starting a company? Five to ten years before you know if early decisions were good or just lucky. RL needs a signal, and some of the most important human activities have signals that are slow, ambiguous, or nonexistent.

But the most interesting case is the expert surgeon who goes against established practice based on experience, does something never done before, and succeeds. Jerry believes models could eventually do this too, given enough time and ability to try. The question is how long that would take with today’s architectures.

Why Static Models Can Never Be AGI

This is Jerry’s biggest intellectual update and the reason he left OpenAI. He now believes continual learning is a necessary condition for AGI, not just a nice-to-have.

The core observation: when current models fail, they become “hopeless pretty quickly.” You can paste error messages, offer words of encouragement, try different prompts. But fundamentally, there’s no mechanism for a model to update its internal knowledge based on failure. It either solves the problem on its first approach or it doesn’t.

“Intelligence always finds a way. Intelligence works at the problem and probes it until it solves it, which the current models do not.”

This makes AGI definitions deeply personal, Jerry notes. Models solving olympiad problems and proving new theorems might qualify as AGI by some definitions. But the feeling of “hopelessness” when a model hits a wall it can’t get past disqualifies it for him.

The Fragility Problem

The technical challenge behind continual learning is more fundamental than it appears. Current deep learning training is inherently fragile. Keeping models “on the rails” requires enormous effort. Without that effort, training “explodes” and you don’t get a good model.

“It’s fundamentally different to how humans learn. Human learning is much more anti-fragile.”

Jerry marvels at how rarely humans “crash out and start talking gibberish” after receiving new information, while AI models do it “pretty frequently.” This robustness of the learning process itself is what he sees as necessary for continual learning to work.

Why hasn’t continual learning been solved? Jerry suspects it requires research at scale, and only a handful of well-funded labs can operate at that scale. Those labs have been “busy doing other things,” pursuing the proven RL paradigm rather than exploring fundamentally different approaches.

The Airplane Problem

When asked about the convergence of major labs working on increasingly similar things, Jerry offers a memorable analogy: why do all commercial airplanes look the same? Because the forces of economics are fundamentally strong. If you want to compete, you need the best models at the lowest price. Customers can switch whenever they want.

This creates a prisoner’s dilemma. Exploration, trying something fundamentally different, exposes you to losing market share. There’s a natural tension between exploration and exploitation that has “no real solution” because you don’t know the landscape of what you haven’t tried.

“You have 100 researchers that think the same thing. You essentially have one researcher.”

But Jerry pushes back on the idea that it doesn’t matter who discovers the next breakthrough. OpenAI’s first-mover advantage in both pre-training transformers and large-scale RL gave it compounding advantages, much like early semiconductor manufacturing leads that some countries could never close. Ideas diffuse, yes, but “the lead can be a very, very powerful thing.”

Inside OpenAI’s Pivotal Decisions

Jerry walked through OpenAI’s evolution from a 30-40 person lab to a global force, highlighting three decisions that could have gone either way:

Releasing ChatGPT: The virality was completely unexpected internally. “No one I heard about” predicted it. Combined with GPT-4 releasing soon after, it created momentum that “made OpenAI largely what it is today.”

Betting on GPT-4 timing: Pulling massive resources to train GPT-4 at a specific moment involved significant trade-offs. It “turned out to be a really good decision.”

The reasoning models pivot: Saying “we are doing reasoning models right now” when there was no product-market fit. o1 was “kind of cool with puzzles” but wasn’t practical. Only with o3 and tool use did real product-market fit emerge. “Getting to that moment was a great journey… OpenAI really passed the exam.”

Why Anthropic Won Coding

Jerry is remarkably candid about OpenAI’s competitive positioning. When asked why Anthropic has been so successful at coding, his answer is one word: focus.

“Focus can explain 95% of things.”

He knows Anthropic’s founders from their OpenAI days. They were “always, always extremely fond of coding.” That singular focus, sustained over years, is why Anthropic can credibly say very few people at the company type code themselves these days.

The flip side: OpenAI “really lost focus on coding for quite a while” when it focused on the consumer product. That cost market share they’re now “working very hard on regaining.” Jerry frames this as a general law: “Companies are very bad at doing multiple hard things successfully.”

Being at the center of “probably the biggest technological shift of our lifetime” makes it feel wasteful not to try everything. But that impulse is precisely the risk.

Data World vs. Research World

Jerry introduces a framework for how AI competition might play out. Two possible futures:

If data drives improvements: The market splinters into specialization. Training resources spent on one skill come at the cost of another. Labs naturally differentiate. Application companies can compete by focusing on specific domains.

If research is king: A single breakthrough can “improve your model in all domains at the same time.” One lab leapfrogs everyone. Application companies’ domain advantages are fleeting because the next model generation wipes them out.

He doesn’t know which world we’re in. But either way, he sees a natural evolution for successful AI application companies: start with an AI application, then post-train your own models, then pre-train, then eventually build your own data centers. “This is just how the stack works.”

What Makes a Great AI Researcher

Jerry identifies three essential traits, none of them about raw IQ:

Dual fluency in systems and theory. Understanding both how computers work at the engineering level and the theory of neural networks and optimization. Being “at least okay” in both makes you “easily 10 times more productive.”

Independent thinking. Groups naturally converge on median viewpoints, which “kills research.” Being a researcher means being slightly contrarian all the time, because you’re working on something that “by default people don’t really believe in.”

Courage. ML experiments now cost as much as Hollywood movies. Standing up and saying “let’s do something different” when that much is at stake requires more than intelligence. “You don’t know if the movie will be successful, but with a big budget you risk.”

Quickfire: Timelines and Anxieties

Robotics: A ChatGPT-like moment in 2-3 years. “Things are slightly better than most people realize.”

Biology: Longer, maybe 3-4 years. Requires more fundamental precision. “A three or four year old can manipulate things in the world, but they’re not world-class biologists.”

What we’re underestimating: Widely deployed work automation becoming reality over “the coming decades.” Jerry thinks society is not talking about this seriously enough.

Existential risk: Not very worried. No human wants humanity to go extinct, and capitalism is “fully aligned” on not killing everyone. What worries him more: a dystopia where entertainment becomes more interesting than the real world and humans retreat into VR. “That’s not an AI problem. That’s a human problem.”

Parenting: Not pushing his daughters to study hard or specialize. Given how different the job market will look, he just wants them to “have a happy childhood.”

Closing Notes

This conversation is valuable precisely because Jerry has no product to sell and no position to defend. He’s in the rare space between having done the thing and figuring out what’s next.

The “hopelessness” framing is the most honest description of current model limitations from an insider. Not “models hallucinate” or “models lack reasoning,” but: when they’re stuck, they have no mechanism to get unstuck. That’s the gap.
His 95%-focus theory of competitive advantage is deceptively simple but explains a lot: Anthropic on coding, OpenAI on consumer, and the risk that trying to do everything means doing nothing well enough.
The exploration-vs-exploitation framing applied to AI labs themselves is deeply recursive. The very models that suffer from poor generalization are built by organizations suffering from the same problem at the strategic level.
Jerry’s parenting philosophy may be the most revealing data point. When the person who built the reasoning models tells his kids not to optimize for the job market, the signal is worth listening to.

Watch original →