February 24, 2026 · Podcast · 38min

The Brutal Truth About AI From the People Actually Building It

#AI Bubble#AI Investment#AI Coding#Video Generation#Data Labeling

This is Weights & Biases’ “Best of Gradient Dissent,” stitching together five conversations that each landed a punch. No single thesis holds the episode together; instead, five builders working at different layers of the AI stack each say something their PR teams probably wished they hadn’t. Mike Cannon-Brookes on why AI won’t replace humans but someone using AI will replace you. David Cahn on the $600 billion hole in AI’s business case. Cristóbal Valenzuela on why language models are fundamentally limited. Martin Shkreli on AI trading and the coming bubble. Edwin Chen on why training data went from 5-second tasks to weeks-long problems.

The force multiplier, not the replacement

Mike Cannon-Brookes (Atlassian co-CEO) lays out a practical framework for thinking about AI in enterprise software. His core claim: AI is a force multiplier for human creativity, not a replacement.

“I’m not worried about being replaced by AI. I’m worried about being replaced by somebody who’s really good at using AI.”

The technical version: you need a human-AI loop running repeatedly. At Atlassian, they demonstrated this with a concrete case. An internal service changed its API shape and URL. Over 500 repositories needed updating, sometimes configuration, sometimes actual code. They used coding agents (Rodev) combined with their “teamwork graph” (a 100+ billion object graph connecting documents, pull requests, Salesforce records, and more across hundreds of SaaS apps).

The process: write a few examples of the migration in JavaScript and Java, commit them, let the graph-aware agents find every other place that needs the same treatment. But the human stays in the loop. Cannon-Brookes explicitly warns against AI code reviewing AI-written code with no human orientation: “If it’s off by a few percent and then you multiply it through a thousand loops, we’re in trouble.”

His metaphor: coding agents handle “gardening” (mowing the lawn, pulling weeds, fertilizing), which frees humans to return to “landscape architecture” (deciding where to put the waterfall, planting the big tree). The day-to-day maintenance is where AI delivers immediate, measurable value in existing running businesses.

Atlassian’s “teamwork graph” also powers what Cannon-Brookes calls the largest enterprise search engine in the world. Built since 2019, it connected workflow apps with external SaaS tools. When LLMs arrived three years ago, the graph became an “organizational memory” because models could now understand the content of documents, not just their links. A popular feature: select any word in Confluence or Jira and ask “define this word.” People don’t look up “progress” or “disruption.” They look up internal code words like “fairy dust” or “alchemists.”

AI’s $600 billion question

David Cahn (Sequoia Capital) walks through his widely-read napkin math on AI’s revenue gap. The arithmetic:

In 2024, Nvidia’s run-rate GPU revenue hit ~$150 billion
For every dollar on GPUs, another dollar goes to data centers, energy, power → $300 billion total infrastructure spend
Startups using that infrastructure need ~50% gross margins → they need to generate $2 of revenue for every $1 of AI cost
Total required revenue: $600 billion per year just to justify one year of investment
This isn’t a one-time number. If 2025 brings another $150 billion in GPU spend, the required revenue climbs to $1.2 trillion. The debt compounds.

Where does the revenue actually stand? OpenAI is still the lion’s share. Big tech hasn’t fully unlocked AI revenue (Google just started forcing AI product purchases through Gmail). The gap between required and actual revenue remains roughly $500 billion.

The “prisoner’s dilemma” explains why spending continues anyway. Cloud is a ~$500 billion business. Seven companies represent 33% of the S&P 500. Microsoft spends ~$20 billion per quarter on data centers, Google ~$13 billion. Each is terrified of falling behind in the AI race and losing their position in the cloud oligopoly. The money funding AI infrastructure comes from existing cloud profits built over the last decade. Nobody can afford to stop spending, even if the revenue hasn’t materialized.

Cahn’s update: the spending is stabilizing. Microsoft and Google have plateaued. Amazon will likely settle in the low 20s (similar to Microsoft), Meta in the low teens (similar to Google). So “AI’s $600 billion question” probably won’t become “AI’s trillion-dollar question.” The investment side is flattening. The revenue side still needs to catch up.

Beyond language to reality

Cristóbal Valenzuela (Runway co-founder/CEO) addresses a fundamental limitation of language models:

“One of the bottlenecks of language models is that language is always constrained by what language actually is, which is a human abstraction of reality. We’ve created this mechanism for us to communicate with each other and describe the world, but it’s not an accurate representation of the real world.”

His argument: training on observational data, real data, video data, allows models to grasp reality and how the world works in a much more consistent way. Runway’s models are becoming reasoning systems that understand spatial-temporal consistency, cause and effect. This matters beyond video generation; it’s a path toward general intelligence.

On the question of open vs. closed source, Valenzuela is blunt: closed source models will continue winning. The economics force it.

“If you ever try to build open source models, eventually you’re going to be forced to close source them.”

His reasoning: models are so expensive to train that the incentive to capture full value always wins. Even Meta has considered closing Llama. Unless someone invents a different incentive structure, the history of open source AI will be a history of gradual closing.

On the bubble: Valenzuela predicted AI would enter a bubble six months before the episode. His take remains that it could all come tumbling down tomorrow, but that AI has more immediate real-world impact than the internet did at a comparable stage. The biggest beneficiaries won’t be AI companies, they’ll be companies like Verizon and Procter & Gamble that can save a billion dollars here and there deploying AI tools.

He also names audio generation as the most underrated frontier in ML: “Nobody’s really been paying attention, but it’s going to be really transformative.”

AI enters trading

Martin Shkreli offers the episode’s spiciest predictions. On AI trading: humans still have a small edge in judging emotional states of other traders and connecting disparate information (a little clue about inflation from one company, another from a different company, building a narrative). But high-frequency trading is already computer-dominated.

The specific opportunity he’s building toward: trading the news. An LLM can read breaking news and react in seconds while the market takes minutes to digest. His example: HIMS announced a GLP product; the market took several minutes to price it in. An LLM would have immediately said “buy.” He wants to build this for retail investors, not just big firms.

On the AI bubble: Shkreli compares it to the 1990s dotcom bubble, which had false peaks before the real one.

“I think OpenAI will go public. I think it’ll be a trillion-dollar market cap. The bubble will get bigger.”

He thinks this could be “one of the bubbles to end all bubbles” but also notes that in every bubble, smart people eventually convince themselves it might be real. The distinction: AI has more immediate organizational impact than the internet did at the same stage.

From 5-second labels to weeks-long problems

Edwin Chen (Surge AI founder) traces the evolution of AI training data. He left Google to start Surge after seeing the same data quality problem at Twitter, Google, and Facebook: the labeling industry was built for commodity tasks (drawing bounding boxes around cars, something a three-year-old could do), optimized for scale, not quality.

The shift has been dramatic:

Modality: from text-only to multimodal (images, audio, video simultaneously). Example: filming something on a phone, then asking a model to create a program that simulates it.
Languages: from English-only to 50+ languages, with hyper-specialized support (coding in Argentinian Spanish, legal expertise in Bolivia). Models are still surprisingly bad at cultural and dialectal nuances.
Complexity: the defining change. Tasks that once took 5 seconds (label this image) now take days or even weeks. Models are winning IMO gold medals, so the training data needs serious thinking power behind it.

Chen’s company explicitly turns away customers whose goals aren’t aligned with AGI. A newspaper wanting to train a category classifier? No. A company building video generators? Yes, because that’s part of building AGI. The freedom comes from having no external board or VCs “dying to make as much money as possible.”

Afterthoughts

Five conversations, five layers of the stack, and a few threads that cut across all of them:

The revenue gap is the elephant in the room. $600 billion in required revenue against maybe $100 billion in actual AI revenue. The spending is stabilizing, but the gap isn’t closing fast. The prisoner’s dilemma keeps the cloud giants spending anyway.
“Force multiplier” is boring but right. Cannon-Brookes’ Atlassian example (500 repos updated by AI agents with human oversight) is exactly the kind of unglamorous, high-ROI work that justifies enterprise AI spend. Gardening, not landscape architecture.
Language is a ceiling. Valenzuela’s point that language models are trained on a human abstraction of reality, not reality itself, is the most philosophically interesting claim in the episode. Video and multimodal training may be the escape hatch.
Open source is losing the incentive game. When the model costs hundreds of millions to train, the pressure to capture value always wins. This isn’t a technical argument; it’s an economic one.
Data complexity is the hidden scaling law. Chen’s observation that training tasks went from 5-second labels to weeks-long research problems mirrors the broader shift: the easy wins are behind us, and the next frontier requires qualitatively different human input.

Watch original →