January 27, 2026 · Podcast · 48min

Zach Lloyd: The Terminal Is Becoming AI's Cockpit

#Developer Tools#AI Agents#AI Coding#Terminal#Future of Work

The terminal was supposed to be a relic. Instead, it’s becoming the control center for the age of AI agents. That’s the bet Zach Lloyd, founder and CEO of Warp, is making. In a conversation with Sonya Huang at Sequoia Capital, Lloyd lays out why the terminal’s text-based, time-ordered format is uniquely suited for agentic work, how Warp is pivoting from a single-player developer tool to a team-level agent orchestration platform, and why he believes coding itself will be “solved” within a few years.

Why the Terminal Wins the Agent Era

Lloyd’s core thesis is almost counterintuitive: the terminal, one of computing’s oldest interfaces, has the right form factor for the newest paradigm.

“The general form factor of the terminal is perfect for agentic work because everything is time-based. It’s all about input of text and output of text. You get to log what you’re doing. You can multitask agents in the terminal really easily.”

Warp started as a modern terminal for professional developers. The company rewrote the terminal from scratch in Rust to make it faster, more collaborative, and more user-friendly. But the rise of coding agents transformed the trajectory. What was a nice-to-have developer tool became, in Lloyd’s telling, the natural workbench for a world where developers spend more time prompting than typing code.

The original Warp pitch was about the terminal as a single-player productivity tool. The shift to “multiplayer” happened when teams started wanting shared environments, shared context, and eventually shared agents. That evolution from solo terminal to team workspace is what Lloyd sees as the real product opportunity.

Competing Against Anthropic and OpenAI

One of the most candid segments of the conversation is about the competitive landscape. Lloyd acknowledges the brutal reality: model providers like Anthropic and OpenAI are building their own coding tools (Claude Code, ChatGPT coding agents) and subsidizing them, sometimes below cost.

Warp’s response is to differentiate on the harness layer rather than the model layer. The “harness” is everything around the model call: how you prompt, what tools you expose, how you manage context windows, when you use sub-agents, when you summarize or truncate.

Lloyd describes a systematic approach to harness quality: internal evals, public benchmark performance, and user data analysis through platforms like Braintrust. Pattern-matching on failure modes, replaying them as evals, tuning the harness, measuring again.

“That was a big mindset shift for us, to get to doing that, but that was 100% necessary to do it all data-driven to get to something that was good.”

The model integration strategy is pragmatic. Warp currently supports Claude, GPT, and Gemini. Grok has reached out multiple times but hasn’t been added because every new model requires tuning the harness, and Lloyd wants concrete user benefit before investing that engineering effort.

On pricing, Lloyd is direct about the challenge. When your competitors give their product away for free or below cost, you either find differentiation or you lose. His bet is that the application layer (harness + orchestration + team features) is where durable value lives, not the model API layer.

From Interactive Agents to Cloud Agents

The biggest product bet Lloyd describes is the move from interactive, developer-at-keyboard agents to ambient “cloud agents” triggered by system events. This is Warp’s top product priority.

The vision: instead of a developer sitting at a terminal giving prompts, system events trigger agents autonomously. A server crash, a cluster of user reports, a security incident, each becomes context fed into an agent that runs in the cloud, not on anyone’s local machine.

This shifts Warp from a product to a platform with multiple layers:

Agent SDK for building custom agents
Agent hosting for companies that don’t want to manage infrastructure
API layer for status, takeover, progress tracking, and logs
Management layer, a cockpit view of all running agents, their states, outputs, and PRs

Lloyd notes an internal debate at Warp: should this orchestration view be a separate product or integrated into the existing terminal? The advantage of integration is seamless handoff. An agent does work in the cloud, then you pull it onto your local machine and keep iterating in the same environment. The counter-argument is that orchestration feels more web-centric and potentially serves a different user.

The practical reality today: Warp already runs agents through Slack and Linear. Someone tags a task, an agent picks it up, produces a PR, and a developer ties the loop.

Where Agents Actually Are (and Aren’t)

Lloyd puts current coding agent capability at “about a 6 out of 10.” He uses agents daily on Warp’s own codebase, a large, custom Rust project, which he considers a harder-than-average test case.

What agents can do well:

Medium-complexity tasks with guidance
Zero-to-one app creation
Kind-of-hard bugs
Medium-sized features (e.g., adding a new slash command, resulting in a ~300-line PR that’s “basically right”)

What agents can’t do:

Whole big projects
Fundamental architecture decisions
Sustained autonomous work beyond 20-30 minutes before “going in circles”

The biggest bottlenecks Lloyd identifies:

Context window limitations. Even with larger windows, maintaining attention over the full context remains hard. There’s no continuous learning; agents are “big stateless things” that always start from scratch, requiring expensive context repopulation.

No standards for agent use. Lloyd admits that even within Warp, there’s high variance in how engineers use agents. There are rigorous coding standards but “almost no standards” for how to use agents. No one has been taught. There aren’t agreed best practices. He considers this a significant blocker.

Verification gap. Agents now produce code that compiles almost 100% of the time (a milestone from just 4-5 months prior), but the code still frequently has bugs. The missing piece is agents verifying their own work from the user’s perspective, not just the code’s perspective. Lloyd sees browser use and computer use APIs as the solution, especially as more agent work moves to remote execution.

Coding Will Be Solved

Lloyd’s most provocative claim: coding itself will be “solved” by models within a few years. Not superintelligence, exactly, but something more specific.

“The limiting factor that we’re going to come up against is just expression of intent from humans. What do you want built? How do you express that clearly? English is ambiguous.”

The irony he identifies: we’re moving from a world where people expressed intent precisely through code to one where they express it ambiguously through English, then rely on a translation layer (the model) to produce code. It’s “an interesting step backwards” in precision, even as it’s massively more efficient.

His competitive implication: if coding gets solved, you won’t need frontier models for coding tokens. Non-frontier models will produce code just as well-matched to intent. Which means the API business for coding becomes a commodity. This, Lloyd argues, is precisely why Anthropic, OpenAI, and Google are pushing so hard into the application layer. The margin in coding-specific API calls may evaporate.

The Economics Aren’t There Yet

When it comes to enterprises viewing AI coding tools as labor replacement rather than productivity tools, Lloyd says we’re not there yet. Companies still evaluate these tools through subjective developer feedback (“do you feel like you’re getting value?”) or Dora metrics. No one is seriously pricing a $200,000 agent against a $200,000 engineer.

What would change that: companies shipping products with minimal or no engineering staff. Lloyd says this will happen but hasn’t happened much yet. When it does, the cost comparison will become unavoidable.

Ask and Adjust

Lloyd wrote a blog post shortly after ChatGPT launched arguing that productivity interfaces would shift from hand-editing (drawing in Figma, typing in VS Code, entering cells in Sheets) to “ask and adjust,” where you ask the AI to do the thing, then adjust the output.

Two years later, he thinks the thesis has largely held up. The interesting nuance: in creative domains with many acceptable solutions (like image generation), you can just reprompt until you get something good. In domains like code, where there’s one correct answer, you still need the hand-editing interface to get it perfect.

One detail Lloyd is proud of: Warp coined the term “agent mode” before it became an industry-wide label. He jokes they should have trademarked it.

Some Thoughts

This conversation is worth paying attention to not for the terminal advocacy specifically, but for what it reveals about the developer tools landscape at this inflection point.

The most interesting dynamic is model providers competing with their own customers. Lloyd is building on top of Claude and GPT while Claude Code and ChatGPT compete directly with Warp. His response, betting on the harness and orchestration layers, is the classic platform strategy, but whether it’s durable against subsidized, vertically-integrated competitors remains the open question.
The “coding will be solved” framing is worth sitting with. If the bottleneck shifts entirely to intent expression, the most valuable skill becomes knowing what to build, not how to build it. Product taste and domain knowledge appreciate; raw coding ability depreciates.
Lloyd’s admission about having “almost no standards” for agent use within his own company is remarkably honest and probably representative of most engineering organizations right now. The gap between coding standards and agent-use standards may be one of the biggest unexploited productivity opportunities in the industry.
The 20-30 minute agent runtime ceiling, beyond which agents “go in circles,” is a concrete data point that cuts through the hype. It implies that for now, agents are collaborators on bounded tasks, not autonomous workers on open-ended projects.

Watch original →