Champions aren’t forever. Last week, DeepSeek AI sent shivers down the spines of investors and tech companies alike with its high-flying performance on the cheap. Now, two computer chip startups are drafting on those vibes.
Cerebras Systems makes huge computer chips—the size of dinner plates—with a radical design. Groq, meanwhile, makes chips tailor-made for large language models. In a head-to-head test, these alt-chips have blown the competition out of the water running a version of DeepSeek’s viral AI.
Whereas answers can take minutes to complete on other hardware, Cerebras said that its version of DeepSeek knocked out some coding tasks in as little as 1.5 seconds. According to Artificial Analysis, the company’s wafer-scale chips were 57 times faster than competitors running the AI on GPUs and hands down the fastest. That was last week. Yesterday, Groq overtook Cerebras at the top with a new offering.
By the numbers, DeepSeek’s advance is more nuanced than it appears, but the trend is real. Even as labs plan to significantly scale up AI models, the algorithms themselves are getting substantially more efficient. On the hardware side, those gains are being matched by Nvidia, but also by chip startups, like Cerebras and Groq, that can outperform on inference.
Big tech is committed to buying more hardware, and Nvidia won’t be cast aside soon, but alternatives may begin nibbling at the edges, especially if they can serve AI models faster or cheaper than more traditional options.
Table of Contents
ToggleBe Reasonable
DeepSeek’s new AI, R1, is a “reasoning” model, like OpenAI’s o1. This means that instead of spitting out the first answer generated, it chews on the problem, piecing its answer together step by step.
For a casual chat, this doesn’t make much difference, but for complex—and valuable—problems, like coding or mathematics, it’s a leap forward.
DeepSeek’s R1 is already extremely efficient. That was the news last week.
Not only was R1 cheaper to train—allegedly just $6 million (though what this number means is disputed)—it’s cheap to run, and its weights and engineering details are open. This is in contrast to headlines about impending investments in proprietary AI efforts that are larger than the Apollo program.
The news gave investors pause—maybe AI won’t need as much cash and as many chips as tech leaders think. Nvidia, the likely beneficiary of those investments, took a big stock market hit.
Small, Quick—Still Smart
All that’s on the software side, where algorithms are getting cheaper and more efficient. But the chips training or running AI are improving too.
Last year, Groq, a startup founded by Jonathan Ross, the engineer who previously developed Google’s in-house AI chips, made headlines with chips tailor-made for large language models. Whereas popular chatbot responses spooled out line-by-line on GPUs, conversations on Groq’s chips approached real time.
That was then. The new crop of reasoning AI models takes much longer to provide answers, by design.
Called “test-time compute,” these models churn out multiple answers in the background, select the best one, and offer a rationale for their answer. Companies say the answers get better the longer they’re allowed to “think.” These models don’t beat older models across the board, but they’ve made strides in areas where older algorithms struggle, like math and coding.
As reasoning models shift the focus to inference—the process where a finished AI model processes a user’s query—speed and cost matter more. People want answers fast, and they don’t want to pay more for them. Here, especially, Nvidia is facing growing competition.
In this case, Cerebras, Groq, and several other inference providers decided to host a crunched down version of R1.
Instead of the original 671-billion-parameter model—parameters are a measure of an algorithm’s size and complexity—they’re running DeepSeek R1 Llama-70B. As the name implies, the model is smaller, with only 70 billion parameters. But even so, according to Cerebras, it can still outperform OpenAI’s o1-mini on select benchmarks.
Artificial Analysis, an AI analytics platform, ran head-to-head performance comparisons of several inference providers last week, and Cerebras came out on top. For a similar cost, the wafer-scale chips spit out some 1,500 tokens per second, compared to 536 and 235 for SambaNova and Groq, respectively. In a demonstration of the efficiency gains, Cerebras said its version of DeepSeek took 1.5 seconds to complete a coding task that took OpenAI’s o1-mini 22 seconds.
Yesterday, Artificial Analysis ran an update to include a new offering from Groq that overtook Cerebras.
The smaller R1 model can’t match larger models pound for pound, but Artificial Analysis noted the results are the first time reasoning models have hit speeds comparable to non-reasoning models.
Beyond speed and cost, inference companies also host models wherever they’re based. DeepSeek shot to the top of the charts in popularity last week, but its models are hosted on servers in China, and experts have since raised concerns about security and privacy. In its press release, Cerebras made sure to note it’s hosting DeepSeek in the US.
Less Is More
Whatever its longer term impact, the news exemplifies a strong—and it’s worth noting, already existing—trend toward greater efficiency in AI.
Since OpenAI previewed o1 last year, the company has moved on to its next model, o3. They gave users access to a smaller version of the latest model, o3-mini, last week. Yesterday, Google released versions of its own reasoning models whose efficiency approaches R1. And because DeepSeek’s models are open and include a detailed paper on their development, incumbents and upstarts will adopt the advances.
Meanwhile, labs at the frontier remain committed to going big. Google, Microsoft, Amazon, and Meta will spend $300 billion—largely on AI data centers—this year. And OpenAI and Softbank have agreed to a four-year, $500-billion data-center project called Stargate.
Dario Amodei, the CEO of Anthropic, describes this as a three-part flywheel. Bigger models yield leaps in capability. Companies later refine these models which, among other improvements, now includes developing reasoning models. Woven throughout, hardware and software advances make the algorithms cheaper and more efficient.
The latter trend means companies can scale more for less on the frontier, while smaller, nimbler algorithms with advanced abilities open up new applications and demand down the line. Until this process exhausts itself—which is a topic of some debate—there’ll be demand for AI chips of all kinds.