February 10, 2026 · Speech · 1h 18min

Jeff Dean: How We Got Here, What We Can Do Now, and Where AI Is Headed

#Google DeepMind#AI for Science#Scaling#Machine Learning#AGI Roadmap

Jeff Dean has spent three decades at the intersection of systems engineering and machine learning, first building the infrastructure that made Google Search work, then co-leading the Gemini project that produced Google’s frontier models. When he speaks at Princeton about “important trends in AI,” he’s narrating a story he helped write.

The Talk

Invited as part of Princeton’s CS Distinguished Colloquia series, Dean delivers a sweeping 78-minute tour of AI’s trajectory. The talk is structured as a history lesson that accelerates into a capabilities showcase and then opens into a vision of the future. The audience is academic, the tone is measured, but the underlying message is unmistakable: the pace of change is not slowing down, and the most impactful applications haven’t arrived yet.

Princeton professor Kai Li introduces Dean by noting he met him in the 1990s at Digital Western Research, calling him “quietly exceptional for a very long time” and pointing out that Dean did his undergraduate thesis on neural networks “before it was cool.”

The 10x-Per-Year Compute Curve

Dean opens with a chart that anchors the entire talk: the computational cost of training state-of-the-art models has grown by roughly 10x per year since 2011. That’s not Moore’s Law (which gives you roughly 2x every 18 months). It’s something far more aggressive, enabled by three simultaneous trends:

Hardware: From general-purpose CPUs to specialized ML accelerators (GPUs, then Google’s TPUs). Google started building TPUs in 2013 precisely because they realized “if everyone used speech recognition for three minutes a day, we’d need to double our data centers.”
Data: The shift from curated datasets to internet-scale training corpora.
Algorithmic innovation: Better architectures, training techniques, and the Transformer breakthrough in 2017.

The key insight: these three curves multiplied together. Hardware got faster, datasets got bigger, and algorithms got more efficient at exploiting both. Dean quantifies it: roughly 20x from scale improvements multiplied by 50x from algorithmic improvements equals a 1,000x capability gain over the period.

“We went from things that could do interesting pattern matching in narrow domains to systems that can genuinely help with complex, open-ended tasks.”

The Origin Story: Google Brain Was Born in a Micro Kitchen

Dean reveals the surprisingly casual origin of what became one of the most consequential AI research groups. In 2011, he ran into Andrew Ng in a Google micro kitchen. Ng mentioned his Stanford students were using neural nets for speech and vision. Dean said, “We have lots of computers. Why don’t we train really big neural networks?” That became Google Brain.

They built DistBelief, an asynchronous distributed training system that trained neural networks 50-100x larger than any previously reported. One early experiment: train on 10 million random YouTube frames with no labels. The model spontaneously learned to detect cat faces, human faces, and body outlines. Using this unsupervised pre-training for initialization yielded a 70% relative improvement on the ImageNet 22K state-of-the-art.

Dean also shares a personal twist: his 1990 undergraduate thesis was on parallel neural network training, exploring model parallelism and data parallelism. But he made a critical mistake, failing to scale the model alongside the processors, and then “ignored neural nets for 25 years.”

From Perception to Reasoning: The Capability Transitions

Dean walks through the milestones chronologically, but the narrative arc is about capability transitions:

2011-2014: Perception. Speech recognition accuracy jumps dramatically. Image classification goes from “interesting demo” to “better than most humans” on ImageNet. Google integrates these into products: voice search, photo recognition, translation.

Word2Vec and semantic space. Training word vectors to predict surrounding words revealed that cat/puma/tiger cluster together in high-dimensional space, and that directions carry semantic meaning (king minus queen corresponds to man minus woman).

Sequence-to-sequence learning. The encoder-decoder architecture by Ilya Sutskever, Oriol Vinyals, and Quoc Le enabled neural machine translation, which replaced the old phrase-based systems. Dean shows a stunning chart: Google Translate’s quality improvement from switching to neural models was “equivalent to the total progress of the previous ten years of the old system, achieved in a single step.”

The Transformer (2017). Solved two fundamental LSTM problems. First, sequential dependency prevented parallelization. Second, compressing all history into a single vector lost information. The Transformer’s key insight: save all vectors and attend to them via a learnable attention mechanism. Result: 10-100x less compute for equivalent quality. Dean notes this is the paper he gets cited for most, with over 150,000 citations.

Self-supervised learning at scale (2018+). Nearly unlimited text data provides rich supervision through predicting the next word or filling in blanks. This was the insight that unlocked training on the entire internet.

The TPU: Born from a Crisis

One of the talk’s most revealing stories is the origin of Google’s Tensor Processing Unit. Dean did a back-of-the-envelope calculation showing that deploying a new speech recognition model to 1 billion users (3 minutes per day each) would require doubling Google’s entire computer fleet. The model in question was just an 8-layer fully connected neural net trained on massive data, but it was equivalent to “compressing 20 years of speech research progress” into one system.

“If we wanted to deploy this to a scenario where we had a billion users talking to this model 3 minutes a day, we would need to double the number of computers Google had.”

This forced Google into building custom silicon. Two properties of neural networks made specialized chips viable: 7-8 bit precision suffices for inference (no need for 16/32/64 bit), and nearly all models are composed of a small set of linear algebra operations.

The results were dramatic. TPUv1 was 15-30x faster and 30-80x more energy efficient than contemporary CPUs and GPUs. It became the most cited paper in ISCA’s 50-year history. By the sixth generation (Ironwood), per-pod performance is 3,600x that of TPUv2, with roughly 30x better energy efficiency.

Dean also describes a fascinating engineering challenge at scale: Silent Data Corruption. At thousands of chips, some will non-deterministically produce incorrect results, sometimes correlated with temperature. A single exponent bit flip can propagate a gradient of 10^20. Google’s countermeasures include monitoring gradient norms per layer, automatic deterministic replay (re-run the same batch; different result indicates hardware fault), hot spare pods, and transparent replacement via the Pathways system.

Sparse Models: The Brain Analogy That Actually Works

Dean introduces Mixture of Experts with an analogy he clearly enjoys: “When you’re worrying about a garbage truck hitting your car, the Shakespearean poetry region of your brain doesn’t activate.” Sparse models route different inputs to different “experts” via a learned routing mechanism. The key benefit: 8x training compute reduction at equivalent accuracy, or substantially better models at equivalent compute.

The Pathways system abstracts tens of thousands of TPU chips into a single giant computer, handling intra-pod custom interconnects, inter-pod data center networking, cross-building and cross-region communication, and hardware fault recovery. TPUv4 introduced optical interconnects that make racks 100 meters apart appear adjacent in network topology.

Gemini: Unification Over Fragmentation

Dean reveals that Gemini originated from his observation that “multiple teams at Google separately building language and multimodal models is stupid.” They should pool compute and ideas. The project launched in February 2023, bringing together Google DeepMind, Google Research, and other Google teams. Over 1,000 collaborators work across the Bay Area, London, and global offices, with “only about three not-too-terrible overlapping hours” between California and London.

The team has generated more than 5,000 internal RFCs (Requests for Comments), ranging from one-page early ideas to full technical reports. Their experiment strategy: run many experiments at small scale, progress only promising ones to medium and large scale, incorporate successful ones into new baselines, repeat.

Key design decisions:

Natively multimodal: Trained from the start on interleaved text, images, audio, video, plus small amounts of LiDAR and robotic control data. Recent versions added audio and video decoders.
Five generations: Gemini 1, 1.5, 2, 2.5, and 3. The Flash model has consistently outperformed the previous generation’s Pro model for 3-4 consecutive generations.
Distillation pipeline: Pro-scale models are distilled into Flash-scale models. Dean notes that 3% of training data plus distillation approximately matches baseline performance on 100% of data.

The Million-Token Context Window

One of the most technically interesting segments covers Gemini’s expansion to million-token context. Dean draws a precise distinction between context and training data:

“Training data is trillions of tokens stirred into hundreds of billions of parameters, a bit muddled. Context is 900 pages of unmixed raw text, very crisp.”

The practical implication: you can feed an entire codebase, a full-length book, or hours of video into the model and ask questions about it. Dean demonstrates with examples including analyzing a 44-minute silent Buster Keaton film, processing an entire repository to find subtle bugs, and reading all Apollo 11 transcripts to answer specific mission questions.

His forward-looking vision: scaling from million to trillion tokens by combining learned retrieval algorithms, lightweight relevance-scoring models, and placing the most relevant content into the context window. Applications include “personalized Gemini” (processing all your email and photos with permission) and coding agents that attend to an entire codebase.

Inference: The Hidden Bottleneck

Dean devotes significant time to inference optimization, an area he clearly considers underappreciated:

Chain-of-Thought (2022): Having the model show its work essentially grants more inference-time compute. This yielded significant accuracy improvements on math benchmarks, which Dean flags as “remember that” foreshadowing.

Speculative Decoding (2023): Autoregressive decoding is memory-bandwidth-bound, not compute-bound. A fast draft model generates the next ~8 tokens, the large target model verifies them in parallel, accepting the correct prefix. No retraining, no architecture changes, guaranteed identical output distribution, with significant inference speedup.

Reinforcement Learning: Reward signals from three sources: human feedback (RLHF), machine feedback (reward models), and verifiable domains (math proofs, code compilation plus unit tests). Dean identifies “improving RL effectiveness in non-verifiable domains” as an important open research question.

The IMO Gold Medal

Dean builds to this as the talk’s signature achievement. In 2025, a general-purpose Gemini Pro model (not a specialized system) with a high inference-time thinking budget solved 5 out of 6 problems at the International Mathematical Olympiad for a gold medal. The prior year still required specialized geometry models and theorem provers.

The progression is striking: from GSM8K (grade school math) to IMO gold in just two years. Dean attributes this to the combination of chain-of-thought reasoning, reinforcement learning from verifiable rewards, and scale.

AI for Science: The Quietly Enormous Frontier

The most passionate segment focuses on AI’s impact on scientific research. Dean argues this is where AI will ultimately have its most transformative effect, and it’s getting less attention than consumer applications.

Weather forecasting. Google’s GenCast model produces more accurate medium-range forecasts (up to 15 days) than the European Centre for Medium-Range Weather Forecasts (ECMWF), the previous gold standard. It generates 15-day probabilistic forecasts in 8 minutes on a single TPU, versus hours on a supercomputer. Dean shows it successfully predicted Hurricane Lee’s Nova Scotia landfall 6 days before the ECMWF model did.

Protein structure. AlphaFold moved from predicting individual protein structures to predicting the structure of protein complexes and their interactions with other molecules. AlphaFold 3 can predict the 3D structure of complexes involving proteins, DNA, RNA, and small molecules together, critical for drug design.

Materials science. Google’s GNoME project used AI to discover 380,000 new stable inorganic materials, expanding the known stable materials universe by an order of magnitude. Many have potential applications in batteries, superconductors, and solar cells. Independent labs have already experimentally verified over 700 of these predictions.

Virtual biology. Dean describes a “virtual cell” concept: building simulations of cellular processes that can predict outcomes of interventions before running expensive wet-lab experiments.

“The pace of scientific discovery is fundamentally bottlenecked by human ability to process information and explore hypothesis spaces. AI can dramatically expand both.”

Agents and the Computer-Use Frontier

The forward-looking section covers what Dean sees as the next major capability frontier: AI agents that can use tools, navigate interfaces, and execute multi-step tasks autonomously.

He describes Project Mariner and its evolution into more general agent capabilities: AI that can use a web browser, fill out forms, navigate between sites, and complete complex tasks. The “Deep Research” feature gets particular attention: you give it a topic, it creates a research plan, executes multi-step web searches, reads dozens of sources, and synthesizes a comprehensive report.

Dean’s vision: moving from single-person chatbot interactions to humans coordinating dozens or hundreds of AI agents. Open question he poses: “What’s the right HCI paradigm for managing 50 virtual assistants?”

The Q&A: Safety, Hallucinations, and Advice for Researchers

On AI safety: Dean considers safety concerns “a little overblown,” believing careful engineering can enable safe deployment. He’s more worried about near-term misinformation, as extremely realistic fake video and audio are now possible. This positions him notably more optimistic than peers like Hinton and Bengio.

On hallucinations: The key technique is multiple rollouts plus model self-evaluation, using more inference compute to reduce hallucination rates, the same approach used in the IMO competition.

On world models: Gemini models already have world model capabilities. Genie 3 can generate interactive virtual worlds from text prompts with spatial consistency. The Waymo collaboration uses it to generate long-tail test scenarios (e.g., “an elephant appears in the middle of the road”).

On research with limited compute: This may be Dean’s most valuable practical advice. Focus on trend slopes across scales rather than absolute values. Run experiments at extra-tiny/tiny/small scales and observe the curve:

“If the slope looks good but it’s below the baseline, that’s a really interesting idea. If it’s above the baseline at the smallest scale but rapidly plummeting, that’s less interesting.”

He urges the academic community to reward “different and interesting” work over incremental state-of-the-art improvements.

Some Thoughts

Jeff Dean’s talk is valuable less for any single revelation than for the panoramic view it offers from someone who has been inside the machine room for the entire journey.

The 10x-per-year compute curve is the single most important chart in AI. It’s not a law of physics, but it has held for 15 years, and Dean sees no signs of it breaking. If it continues, the models of 2028 will operate with 100x the effective compute of today’s.
AI for science is dramatically under-covered relative to its potential impact. Weather forecasting, materials discovery, protein structure prediction, and virtual biology are each individually transformative. Together, they represent a fundamental acceleration of the scientific method itself.
The “context is crisp, training data is muddled” distinction is a genuinely useful mental model for understanding why million-token context windows enable qualitatively different capabilities, not just quantitatively more.
Dean’s safety stance is striking for its departure from the Hinton/Bengio camp. Where his former collaborators warn of existential risk, Dean frames the challenges as engineering problems: misinformation, hallucination, workforce transition. Whether this reflects genuine conviction or institutional positioning is an exercise left to the reader.
His advice on evaluating research by trend slopes rather than absolute performance at small scale is one of the most practically useful insights in the entire talk. It’s a methodology that democratizes AI research for labs without Google-scale compute.
The 5,000+ internal RFCs within the Gemini team is a detail that hasn’t been widely reported. It reveals the sheer scale of knowledge management required for frontier model development, something no academic lab can replicate.

Watch original →