Groq: Jensen Huang's Christmas Present to Himself

Remember when Cisco bought Cerent?

Dec 28, 2025

August 1999 — The Cerent Purchase

Cisco announced it was paying $6.9 billion in stock—about $13 billion in today’s dollars—for Cerent Corporation, a two-year-old optical networking startup in Petaluma, California with about 200 employees. At the time, it was the third-largest telecom merger ever, behind Lucent’s $20 billion purchase of Ascend and Nortel’s $9 billion deal for Bay Networks.

The numbers were absurd even by dot-com standards. Cerent had total revenues of about $10 million over its entire existence. It had never turned a profit. It had one product—the Cerent 454, a next-generation SONET add-drop multiplexer that bridged data from slow copper wires onto high-speed fiber networks.

Just four months earlier, Cisco had made an acquisition offer that Cerent rejected. Now Cisco came back offering twenty times more. Cerent had just filed its S-1 for a $100 million IPO and was weeks away from going public. Cerent accepted the offer and the deal closed on November 1, 1999.

In hindsight? The Cerent 454, rebranded as the Cisco 15454, became one of the fastest products ever to hit $1 billion in annual sales.

Christmas Eve 2025 — The Groq Deal

Reports emerged that Nvidia was paying $20 billion in cash for Groq, a nine-year-old AI chip startup. Groq had raised $750 million just three months earlier at a $6.9 billion valuation. If the $20 billion figure holds, Jensen paid a 190% premium in 90 days. That’s urgency. Groq was targeting $500 million in revenue this year.

Except Nvidia didn’t announce an acquisition. Groq announced a “non-exclusive licensing agreement” for its inference technology—with the CEO, president, and much of the engineering team reportedly joining Nvidia. Groq will “continue to operate as an independent company” under the CFO, now promoted to CEO.

This alternate structure to a straight acquisition likely will have its own story separate from this op-ed.

Why would NVIDIA do this deal?

Groq makes a fundamentally different kind of processor than what Nvidia makes. Nvidia makes a GPU, which was originally developed for gaming and now is what powers AI. Groq’s processor architecture is a statically scheduled computer—a rare beast in the history of computing.

Most processors, whether the CPU in your laptop or a GPU, are dynamically scheduled. They figure out what to do next on the fly, juggling tasks, managing memory, making decisions at runtime.

Statically scheduled machines don’t work that way. The entire sequence of operations is determined in advance. The hardware just executes on a predetermined schedule.

There are very few commercial statically scheduled machines in the history of computing. The main reason: they’re brutally difficult to program. The other major example is Google’s TPU—and not coincidentally, Groq’s founder Jonathan Ross was one of the architects of Google’s first-generation TPU before leaving to start Groq. Other chip companies have incorporated elements of static scheduling, but Groq and the TPU are built around it.

For large classes of LLM workloads, you can’t beat a well-executed statically scheduled machine. It is an ASIC—application-specific integrated circuit—for modern AI. Many teams have bent the pick trying to design and program these kinds of architectures.

CPUs are all fundamentally the same architecturally. GPUs are also all fundamentally the same architecturally. They are both in the class of dynamically scheduled machines.

Statically scheduled machines are very rare and vary a lot in architecture.

The quick reason this matters: Large Language Models have a kind of predictability that allows enormous amounts of logic to be factored out of what would normally go into a CPU or GPU. This results in huge power and area savings on the chip. LLMs are essentially just massive amounts of matrix multiplication. The fundamental operation in matrix multiplication is multiply-and-accumulate—MACs in the jargon. The more you can simplify everything else on the chip, the more room you have for MACs or on chip memory. Silicon is a finite physical thing.

MACs are the compute—the more you have, the faster you can go, assuming you can feed them data. Otherwise they bottleneck. All dressed up with nowhere to go.

The goal of efficient static scheduling architecture design for AI is to reduce the architecture to just MACs, wires and on chip memory. You don’t need anything else.

If you can do anything other than AI on the chip, then you have added something that is wasting power and chip real estate.

Why does Nvidia need this?

Nvidia dominates AI training—the process of building the models in the first place. But the real volume is in inference—actually running those trained models to answer your questions, generate your images, write your code. Training happens once. Inference happens billions of times a day. And Groq is built for inference.

In a follow-on op-ed, I’ll explain the technical details of why static scheduling matters for AI inference.

Disclosure: The Cranky Old Guy was a founding member of Sima.ai, a maker of AI accelerators, and owns stock in the company. He has considerable experience with statically scheduled architecture design and building toolchains for them, and holds many patents in this area.

Cranky Old Guy

Discussion about this post

Ready for more?