January 25, 2026 · Podcast · 47min

Intelligence Is Legos: Why True AI Needs Modular, Composable Building Blocks

#AGI Definition#Energy-Based Models#Continual Learning#Scientific AI#AI Safety

Intelligence is not a binary property. It is not a threshold you cross. It is, according to computational neuroscientist Jeff Beck, more like Legos: modular pieces that connect in certain ways, capable of producing structures never before imagined. This framing, offered casually toward the end of a wide-ranging conversation on Machine Learning Street Talk, quietly redefines how we should think about building and evaluating AI systems.

The Conversation

Jeff Beck, a researcher working at the intersection of computational neuroscience and machine learning, sits down with Tim Scarfe to discuss agency, intelligence, energy-based models, and the future of AI. The conversation is technically dense but philosophically rich, moving fluidly between mathematical formalisms and big-picture questions about what it means to be intelligent. Beck brings a Bayesian perspective grounded in the Free Energy Principle, and he is refreshingly willing to say “I don’t know” while still offering precise, thought-provoking frameworks.

There Is No Structural Difference Between an Agent and a Rock

Beck opens with a provocation rooted in the Free Energy Principle: from a purely mathematical perspective, there is no structural distinction between how we model an agent and how we model an object. Both execute policies that map inputs to outputs. A rock has a policy. So does a human.

The difference is one of sophistication, not kind. An agent has internal states that represent things over very long time scales, engages in planning and counterfactual reasoning, and maintains complex internal computations. A rock does not. But the mathematical framework that describes both is identical.

This leads to what Beck calls the “black box problem of agency”: from the outside, you cannot definitively tell whether a system is truly planning or simply executing a pre-computed lookup table that happens to give correct answers. The best we can do is ask which model provides the simplest explanation of observed behavior, essentially applying Occam’s razor to intelligence attribution.

“There’s no difference between an agent and an object in a very real way, or at least there’s nothing structurally distinct between how we model an agent and how we model an object. It’s really just a question of degrees.”

Energy-Based Models: Optimizing States, Not Just Weights

Beck delivers what amounts to a masterclass on energy-based models (EBMs) and why they matter. The key insight is deceptively simple: traditional neural networks only optimize weights during training, while energy-based models optimize both weights and internal states.

In a standard neural network, you fix the input and adjust weights to minimize a loss function. In an EBM, you also optimize the internal activations themselves, treating them as free variables. This dual optimization connects directly to Bayesian inference: the internal states become posterior estimates, and the optimization process is equivalent to computing beliefs about the world.

This distinction matters for representation learning. Beck explains the “motor collapse” problem in self-supervised learning: without careful regularization, models collapse to trivial solutions where every input maps to the same representation. Different approaches (contrastive learning, JEPA, VICReg) all address this in different ways, but Beck argues the energy-based formulation provides a cleaner theoretical framework for understanding why collapse happens and how to prevent it.

The practical implication: if you want AI systems that maintain rich, reusable representations rather than task-specific shortcuts, the energy-based perspective suggests optimizing internal states jointly with weights, not pre-processing data through fixed pipelines.

Your Brain May Have Evolved from Your Nose

One of the most surprising moments in the conversation: Beck proposes that the complex, non-smooth nature of olfactory space may have driven the evolution of our associative cortex and planning abilities.

Visual space has nice properties: translation symmetries, smoothness, spatial continuity. Olfactory space has none of these. It is deeply combinatorial and complicated, with no obvious geometric structure. The part of the brain that evolved to solve the olfactory problem, Beck argues, is the part that eventually became our frontal cortex.

He qualifies this (“Don’t quote me on that. There’s a lot of disagreement there.”), but the logic is compelling: the hardest sensory problem our ancestors faced may have been the evolutionary pressure that produced general-purpose associative reasoning. The brain region that had to handle arbitrary, non-smooth mappings between chemical combinations and meanings was pre-adapted for the kind of flexible, combinatorial thinking we now call intelligence.

JEPA and Learning in Latent Space

Beck dives into Yann LeCun’s Joint Embedding Prediction Architecture (JEPA) and why predicting in latent space rather than pixel space might be key to robust AI representations.

The core issue with generative models that predict raw observations (every pixel, every token) is that they waste enormous capacity modeling irrelevant variation. JEPA sidesteps this by learning to predict in a compressed latent space, focusing on the abstract structure of the world rather than its surface appearance.

Beck connects this to a deeper point about pre-processing: the standard practice of running data through a VAE or PCA before analysis is pragmatically useful but theoretically unsatisfying. He admits he runs PCA on every new neural dataset as a first step, but the ideal would be jointly learning the representation and the downstream model. JEPA moves in this direction.

A cautionary note on PCA specifically: in neural data, the dimensions with the least variability are often the most important. PCA by design discards low-variance dimensions, potentially throwing away the most valuable signals. This is a concrete example of why joint optimization of representations and inference matters.

Intelligence Is Legos

Beck’s central metaphor crystallizes toward the end: intelligence is like Legos. Individual bricks connect in specific ways, but the combinatorial possibilities are vast. True intelligence is not about having one monolithic system that does everything; it is about having modular components that can be composed in novel ways to handle situations never previously encountered.

This connects to his view on brain evolution: simple specialized modules learned to communicate with each other, and through that communication acquired emergent capabilities. The olfactory cortex talked to the visual cortex, which talked to the motor cortex, and the result was something none of them could do alone.

Beck explicitly rejects the concept of AGI as a misnomer:

“I don’t believe in AGI. AGI seems like a bit of a misnomer to me. What we really want is not artificial general intelligence. We want collective specialized intelligences.”

The practical implication for AI development: rather than pursuing a single general-purpose model, the path forward may be systems of specialized modules that can be dynamically composed. GFlowNets from Yoshua Bengio’s group are one example Beck cites: a generative model of generative models, capable of instantiating new latent variables when existing ones cannot explain novel observations.

Continual Learning as the Missing Piece

Beck identifies continual learning as the most critical missing element in current AI. The ability to encounter something unexpected, recognize it as novel, turn on learning to figure it out, and incorporate that knowledge into an existing model, all without catastrophic forgetting, is what separates current systems from anything approaching true intelligence.

He illustrates this with a vivid example: a robot encountering a beach ball for the first time. You do not want it to stop and wait for instructions. You want it to do what a child would do: poke it, observe what happens, update its model. This is empirical inquiry at the most basic level, and it requires both the ability to recognize novelty and the ability to design experiments on the fly.

The object-centered physics discovery framework Beck works on has a natural mechanism for this: because it models the world in terms of discrete objects, it can instantiate entirely new objects to explain novel situations without disrupting its understanding of previously known objects.

A Safer Path to AI Alignment

Beck takes a grounded stance on AI safety. He is less worried about rogue superintelligences and more concerned about malicious human actors using AI tools. His reasoning: all current AI systems simply do what they are told. As long as humans specify the objective function and understand it, the technology itself is manageable.

But he goes further, proposing a concrete mechanism for safer AI alignment based on maximum entropy inverse reinforcement learning (which he notes is closely related to active inference). The idea:

Observe the current distribution of human actions and outcomes (how many people go hungry, what resources are allocated where, etc.)
Use inverse RL to estimate the implicit reward function that produces this distribution
Rather than specifying a new goal from scratch (“end world hunger”), make small perturbations to the empirically estimated reward function
Evaluate the consequences of each perturbation before scaling up

The key insight is that hand-specifying reward functions is the dangerous part. The “Skynet ends world hunger by killing all humans” scenario is not a failure of AI; it is a failure of naive goal specification. By starting from empirically estimated reward functions and making incremental adjustments, you maintain a tether to reality and can catch dangerous consequences before they compound.

“You don’t say end world hunger. You perturb that distribution over outcomes a little bit, and then you evaluate the consequences.”

Some Thoughts

This conversation is technically demanding but rewards careful attention. Beck manages to connect deep mathematical formalism to intuitive, almost philosophical insights about the nature of intelligence.

The “intelligence is Legos” metaphor is more than a metaphor. It is a research program: build modular, composable systems rather than monolithic ones, and let intelligence emerge from composition rather than scale.
The olfactory cortex hypothesis, even if speculative, reframes the evolution of intelligence as driven by the hardest sensory problems, not the most obvious ones. The implication for AI: the most important capabilities may emerge from domains that seem peripheral.
Beck’s AI safety proposal via incremental reward function perturbation is one of the more pragmatically grounded alignment ideas in recent discourse. It does not require solving philosophical problems about human values; it just requires careful empirical estimation and small steps.
The rejection of AGI in favor of “collective specialized intelligences” aligns with how biological intelligence actually works, and may be a more productive framing for the field than the current race toward a single general-purpose model.

Watch original →