January 5, 2026 · Speech · 1h 50min
Jensen Huang's CES 2026 Keynote: The Six-Chip Revolution and the Age of Physical AI
Two simultaneous platform shifts are reshaping the entire computing industry. Not just AI as a new application layer, but a fundamental reinvention of how software is built, run, and delivered. Jensen Huang’s CES 2026 keynote is a nearly two-hour tour through NVIDIA’s full-stack ambition: from open frontier models to autonomous vehicles to a new class of supercomputer that cools with hot water.
The Year Everything Happened at Once
Huang opens with a compressed timeline of AI breakthroughs. BERT in 2015. Transformers in 2017. The ChatGPT moment in 2022. Then the real inflection: OpenAI’s o1 model in 2023, which introduced test-time scaling, a fancy way of saying “the AI thinks before it answers.” Each phase demands exponentially more compute: pre-training, post-training with reinforcement learning, and now inference itself as a thinking process.
The funding math: roughly $10 trillion of legacy computing infrastructure is being modernized. Hundreds of billions in annual VC funding. And corporate R&D budgets across a $100 trillion global economy are shifting from classical methods to AI. That’s where the money comes from.
2025 brought three specific catalysts. Agentic systems proliferated, with Huang name-checking Cursor as having “revolutionized how we do software programming at NVIDIA.” DeepSeek R1 proved open-source reasoning models could reach near-frontier quality. And physical AI moved from research concept to deployable stack.
“DeepSeek R1, the first open model that’s a reasoning system. It caught the world by surprise and it activated literally this entire movement.”
NVIDIA as a Frontier AI Lab
A less obvious part of Huang’s pitch: NVIDIA now operates billions of dollars in DGX Cloud supercomputers not as a cloud business, but as its own AI research infrastructure. The output is a portfolio of open frontier models across domains most AI labs don’t touch:
- Proteina and OpenFold 3: protein synthesis and structure prediction
- EVO 2: multi-protein generation, the beginnings of cellular-level representation
- Earth 2: AI physics for weather prediction, with Forecast Net and CorDiff
- Nemotron 3: a hybrid Transformer-SSM architecture that can think longer or faster, with more variants coming
- Cosmos: a world foundation model that understands physical laws and aligns with language
- GROOT: humanoid robotics articulation and locomotion
All models are open-sourced with training data, accompanied by lifecycle management libraries (NeMo, BioNeMo, PhysicsNeMo, Clara) covering data processing through deployment. Huang notes NVIDIA’s contribution to open AI research is “bar none” and claims these models top leaderboards in intelligence benchmarks, PDF parsing, speech recognition, and semantic search.
“Not only do we open source the models, we also open source the data that we use to train those models, because only in that way can you truly trust how the models came to be.”
The Agentic Architecture
Huang outlines what he considers the canonical architecture for future applications: multi-modal (speech, images, text, video, 3D), multi-model (choosing the best model for each subtask), multi-cloud by definition, and hybrid cloud for edge deployment.
The key insight is the “intent-based model router,” essentially a manager layer that routes prompts to the right model based on the task. A live demo shows a personal assistant built on DGX Spark that uses a frontier cloud model for general tasks but routes email-related prompts to a locally-running open model for privacy. The same agent controls a physical robot (Hugging Face’s Reachi) with tool calls and uses 11 Labs for voice synthesis.
Huang’s observation: he first noticed this multi-model pattern at Perplexity and “thought it was completely genius.” The implication is profound: AI applications are no longer monolithic. They’re compositions of specialized models orchestrated by reasoning.
“Not only is this the way that you develop applications now, this is going to be the user interface of your platform.”
Enterprise integrations already deploying this pattern: Palantir, ServiceNow, Snowflake, Code Rabbit, CrowdStrike, NetApp. The agentic system becomes the interface itself, replacing traditional dashboards and command lines.
Physical AI: Three Computers, One Problem
The physical AI section is the keynote’s centerpiece. Huang frames it around a fundamental challenge: how do you give AI common sense about the physical world? Object permanence, causality, friction, gravity, inertia, things obvious to a toddler but unknown to a language model.
The solution requires three computers working together:
- Training computer: DGX systems for model training
- Inference computer: edge processors (Orin, Thor) running in cars and robots
- Simulation computer: Omniverse for digital twins, where Huang says NVIDIA is “most comfortable”
The data problem is the bottleneck. Real-world video captures are never diverse enough. The breakthrough Huang highlights is converting compute into synthetic training data: feed a traffic simulator’s output into Cosmos, which generates physically plausible surround video that AI can learn from. Cosmos performs generation, reasoning, and trajectory prediction from a single image, and supports interactive closed-loop simulations where the AI acts and the world responds.
“The ChatGPT moment for physical AI is nearly here.”
Alpamo: The Car That Explains Itself
NVIDIA’s first end-to-end autonomous driving AI, trained camera-in to actuation-out. What makes Alpamo distinct: it reasons about its actions, explaining what it’s about to do and why before executing.
The training pipeline combines three data sources: human demonstration driving, Cosmos-generated synthetic data, and hundreds of thousands of carefully labeled examples. The reasoning capability specifically targets the long-tail problem. It’s impossible to collect every possible driving scenario for every country and circumstance. But any novel scenario, when decomposed into smaller sub-scenarios, becomes manageable. The AI reasons through combinations of familiar situations to handle new ones.
The safety architecture is deliberately redundant. Alpamo (the learned stack) runs alongside a classical AV stack that took six to seven years to build and is fully traceable. A policy and safety evaluator continuously decides which stack should control the car. High confidence scenarios go to Alpamo. Low confidence falls back to the classical stack. This is the only car in the world running both AV stacks simultaneously.
The first Alpamo-powered Mercedes-Benz CLA ships Q1 2026 in the US, Q2 in Europe, Q3-Q4 in Asia. NCAP rated it the world’s safest car, with every line of code and chip safety-certified. The partnership with Mercedes started five years ago. The entire Alpamo stack, including the model, is open-sourced.
“In the next 10 years, I’m fairly certain a very large percentage of the world’s cars will be autonomous or highly autonomous.”
Industrial AI: Returning to Origins
Huang positions NVIDIA’s technology as coming full circle to serve the industries that originally made NVIDIA possible. Three major integrations announced:
- Cadence: CUDA X integrated across simulations and solvers, physical AI for plant simulation, AI physics for EDA
- Synopsys: logic design and IP acceleration
- Siemens: CUDA X, physical AI, agentic AI, and Nemotron deeply integrated into EDA, CAE, and digital twin platforms, covering the full industrial lifecycle from design to production to operations
The vision: chips designed with AI-assisted tools, manufactured in factories that are themselves “giant robots,” all simulated end-to-end in digital twins before anything physical is built. Agentic chip designers and system designers working alongside human engineers, mirroring the pattern of agentic software engineers in code today.
Vera Rubin: Breaking the Rules to Keep the Pace
The hardware centerpiece. NVIDIA has an internal rule: no more than one or two chip changes per generation. With Vera Rubin, they broke it, redesigning all six chips simultaneously. The reason is arithmetic.
Moore’s Law delivers roughly 1.6x more transistors per generation. But AI models grow 10x annually. Token generation grows 5x per year. Token costs drop 10x per year. The gap between what semiconductors deliver and what AI demands is unbridgeable by incremental improvement. Huang calls the response “extreme co-design”: 15,000 engineer-years of investment to innovate across every chip and every layer of the stack simultaneously.
“It is impossible to keep up with those kind of rates unless we deploy aggressive extreme co-design, basically innovating across all of the chips across the entire stack all at the same time.”
The Six Chips
Vera CPU: 88 cores with spatial multi-threading (176 effective threads at full performance). 2x performance per watt versus the world’s most advanced CPUs. Co-designed with ConnectX-9 for a new type of data processing.
Rubin GPU: 5x Blackwell floating-point with only 1.6x the transistors. The key innovation is the MVFP4 tensor core, not a simple 4-bit floating-point data path but a complete processing unit that dynamically adjusts precision across transformer layers. It increases throughput where precision can be traded and restores maximum precision where needed, all happening adaptively inside the processor because it’s too fast for software control. Huang hints this format could become an industry standard.
ConnectX-9: 1.6 Tb/s scale-out bandwidth per GPU, co-designed with Vera CPU and never released independently.
BlueField 4 DPU: handles virtualization, security, and north-south networking. Also enables a new product category: in-rack KV cache context memory storage.
NVLink 6 Switch: four switch chips with 400 Gb/s SerDes (the industry barely reaches 200). Cross-sectional bandwidth per rack: 240 TB/s, roughly twice the entire global internet’s ~100 TB/s.
Spectrum X Photonics Switch: the world’s first manufacturing chip using TSMC’s CoUPE co-packaged optics process, with silicon photonics directly integrated. 512 ports at 200 Gb/s. Lasers connect directly into the chip.
System-Level Numbers
A single Vera Rubin NVLink 72 rack: 144 Rubin GPUs (each is two GPU dies connected), 220 trillion transistors, roughly two tons. The chassis was revolutionized: 43 cables reduced to zero, assembly from two hours to five minutes. Two miles of shielded copper cables forming the NVLink spine, 5,000 total. 100% liquid-cooled at 45 degrees C inlet (no chillers needed), saving approximately 6% of data center power despite 2x higher power consumption than Grace Blackwell.
Performance versus Blackwell: 4x fewer systems to train a 10-trillion-parameter model in one month. ~10x factory throughput per watt. ~10x cost reduction per token.
New system capabilities: confidential computing with encryption across every bus (PCIe, NVLink, CPU-GPU, GPU-GPU), and system-wide power smoothing that eliminates the need to over-provision by 25% for all-reduce spikes.
The KV Cache Crisis
An extended section that reveals a real operational pain point. Every token generated requires the GPU to read the entire model and entire KV cache (working memory). As conversations grow longer, models grow larger, and agents maintain persistent context, HBM capacity is overwhelmed.
The progression: Grace Blackwell expanded context memory to fast CPU memory. Still not enough. The next step is going off to network storage, but the north-south network can’t handle the traffic when many AIs run simultaneously.
Vera Rubin’s answer: BlueField 4-powered in-rack KV cache storage. Behind each BlueField 4 sits 150 TB of memory. Allocated across GPUs, each gets an additional 16 TB of context memory (on top of the ~1 TB HBM per GPU), accessed at full east-west fabric speed of 200 Gb/s. Huang says cloud providers and AI labs are “really suffering” from KV cache traffic, making this a genuinely new product category rather than a marketing exercise.
“A $50 billion data center can only consume one gigawatt of power. And so if your throughput per watt is very good versus quite poor, that directly translates to your revenues.”
Some Thoughts
The most telling signal in this keynote is the breaking of NVIDIA’s own design rules. Redesigning six chips simultaneously rather than the usual one or two is not just engineering ambition. It’s an acknowledgment that the semiconductor treadmill alone can no longer sustain AI’s growth curve. NVIDIA is transitioning from “chip company” to “AI infrastructure systems company,” where the unit of innovation is the entire rack, not the individual processor.
Alpamo’s dual-stack safety architecture offers a pragmatic middle path in the industry’s end-to-end versus modular debate. Rather than betting entirely on learned driving or entirely on handcrafted rules, NVIDIA runs both simultaneously with a policy evaluator choosing between them. The open-sourcing of an eight-year, several-thousand-person effort is a bet that standardizing on NVIDIA’s AV compute platform (Orin, Thor) matters more than owning the software stack.
A few threads worth tracking:
- The MVFP4 tensor core represents a conceptual shift from fixed-precision arithmetic to adaptive, context-aware computation. If it becomes an industry standard as Huang predicts, it changes how every AI chip maker thinks about the precision-performance tradeoff.
- The KV cache infrastructure play (BlueField 4 + in-rack storage) may be the sleeper announcement. As AI agents maintain longer persistent contexts, inference memory management could become the dominant bottleneck in AI infrastructure economics.
- Huang’s praise for DeepSeek R1 (“caught the world by surprise and activated this entire movement”) is strategically generous: every open model downloaded is another customer for NVIDIA GPUs.
- The Spectrum X economic argument is striking in its simplicity. A gigawatt data center costs $50 billion. 10% better networking throughput is worth $5 billion. The networking hardware “is basically free.” This kind of value-capture logic explains why NVIDIA became the world’s largest networking company in just two years.