Nvidia Just Bet $20 Billion That the AI Gold Rush Is Moving to Inference

The first few years of the AI boom were all about training: who could build the biggest model, who could burn the most compute, who could stack the most GPUs in a single cluster. That era isn't over, but on Monday at GTC 2026 in San Jose, Jensen Huang made it crystal clear that Nvidia is planting its flag in what comes next: inference. And the price tag for that bet? $20 billion.

The Groq 3 LPU: Not Just Another Chip

The star of Huang's keynote wasn't the new Vera Rubin GPU platform (though we'll get to that). It was the Groq 3 Language Processing Unit, the first chip to emerge from Nvidia's blockbuster acquisition of inference startup Groq, finalized in December 2025 for $20 billion.

Each Groq 3 LPU is packed with 500 MB of SRAM and delivers 150 TB/s of memory bandwidth. For context, Nvidia's own Rubin GPU manages 22 TB/s via HBM4 memory. That's roughly a 7x bandwidth advantage for the LPU, which matters enormously for inference workloads where the bottleneck isn't raw compute power but how fast you can move data through the system.

The chip is purpose-built for the "decode" phase of large language model inference, the token-by-token generation step where bandwidth is everything. Groq's architecture uses SRAM instead of the HBM memory that GPUs rely on, delivering deterministic, single-cycle latency that makes responses feel instantaneous.

LPX Racks: Inference at Industrial Scale

Nvidia isn't shipping the Groq 3 as a standalone chip. It's packaging 256 LPUs into a new Groq LPX rack that sits alongside the Vera Rubin NVL72 GPU rack. The idea is a paired system: GPUs handle the compute-heavy "prefill" phase (processing your entire prompt at once), then hand off to the LPU rack for rapid token generation.

The numbers Nvidia is claiming are eye-popping: 128 GB of SRAM per rack, 40 PB/s of aggregate bandwidth, and a dedicated 640 TB/s scale-up interconnect. When paired with Vera Rubin, Nvidia says the combo delivers 35x higher throughput per megawatt and 10x more revenue potential for AI service providers.

That last metric is telling. Nvidia isn't just selling hardware anymore; it's selling a business case. If you run AI inference as a service, this is a pitch that you'll make dramatically more money per watt of power you consume.

$1 Trillion Through 2027

Huang dropped what might be the most staggering number in GTC history: he expects purchase orders across Blackwell and Vera Rubin platforms to reach $1 trillion through 2027. That's not revenue, it's orders, but it signals the scale of capital flowing into AI infrastructure right now.

The Vera Rubin platform itself is shipping in the second half of 2026, with seven chips now in production including the Groq 3 LPU. The timing matters because inference demand is exploding as companies move from experimenting with AI to deploying it at scale. Training a model is a one-time cost; serving it to millions of users is a recurring expense that dwarfs the initial investment.

NemoClaw and the Agent Play

The other major reveal was NemoClaw, Nvidia's open-source stack for building and deploying autonomous AI agents. Huang called these "claws," long-running agents that can reason, plan, write code, call tools, and continuously improve without human intervention.

NemoClaw runs on the NVIDIA DGX Spark and DGX Station systems, which can cluster up to four units into a compact "desktop data center." The open-source approach mirrors what CUDA did for GPU computing: make the software free, ensure every deployment runs on Nvidia hardware, and build an ecosystem that's nearly impossible to leave.

Huang went as far as saying that every company in the world needs an "OpenClaw strategy," comparing it to the way organizations once needed an internet strategy. It's a bold claim, but it reflects a real shift in how enterprises are thinking about AI: not as a chatbot bolted onto a website, but as autonomous systems that can handle complex, multi-step workflows.

The Mellanox Playbook, Revisited

Industry analysts are already calling the Groq acquisition Nvidia's "Mellanox moment." When Nvidia bought Mellanox in 2020 for $7 billion, skeptics questioned the price. That networking technology became the connective tissue of every major AI training cluster in the world. Now Nvidia is betting that Groq's inference technology will play a similar role as the AI industry shifts from building models to serving them.

The strategic logic is straightforward: Nvidia already dominates AI training with its GPUs. By acquiring Groq, it's locking down the inference side before competitors like AMD, Intel, or startups like Cerebras can grab it. The $20 billion price tag was 2.9x Groq's last private valuation, a premium that says Nvidia viewed this as a "must-have" rather than a "nice-to-have."

What This Means for the AI Industry

The GTC 2026 keynote crystallized something that's been building for months: the AI industry is entering its deployment phase. The race to build the biggest model isn't stopping, but the economic center of gravity is shifting to inference, where the real money gets made.

For cloud providers and AI startups, the Groq 3 LPU + Vera Rubin combo means inference costs could drop dramatically while throughput skyrockets. That's good news for anyone building products on top of large language models. For Nvidia's competitors, it's a signal that the window to challenge Nvidia's dominance in AI infrastructure is narrowing fast.

The LPX racks ship in the second half of 2026. By then, we'll know whether Huang's $1 trillion forecast was visionary or just Silicon Valley hyperbole. Based on the packed house at the SAP Center on Monday, a lot of very wealthy people are betting on the former.