The Scramble to Loosen Nvidia's AI Chip Grip

For most of the generative-AI boom, the story of compute has been a story of one company. Nvidia’s GPUs, its CUDA software moat, and its grip on the high-bandwidth memory supply chain made it the default — and the toll-collector — for anyone training or serving large models. That is still true today. But the cracks are widening, and they are appearing in interesting places.

Qualcomm is reportedly circling a RISC-V chip designer. Microsoft is pushing its own inference silicon hard enough that rival labs are reportedly testing it. AMD keeps closing the gap, and an open-standard movement is gathering momentum underneath all of it. None of this dethrones Nvidia tomorrow. But the direction of travel matters enormously for the people who actually pay the bills: the founders, marketers, and engineering teams running models in production. Here is what is happening, and why it should change how you think about your compute roadmap.

The contenders

The marquee move in the chip-competition story is Qualcomm’s reported interest in Tenstorrent. According to Crescendo AI’s news roundup (which we’d flag should be checked against primary reporting), Qualcomm is in early talks to acquire the RISC-V chip designer for somewhere in the region of $8–10 billion. The logic is straightforward: Qualcomm wants a credible seat at the AI-hardware table currently dominated by Nvidia and AMD, and Tenstorrent — led by veteran chip architect Jim Keller — offers both a respected design team and a bet on RISC-V, the open instruction-set architecture that sidesteps the licensing tolls of Arm and the proprietary lock-in of x86.

That RISC-V angle is the part to watch. An open ISA means chipmakers can design custom AI accelerators without paying a gatekeeper for the underlying architecture, and it lowers the barrier for new entrants to ship competitive silicon. If Qualcomm closes a deal like this, it is not just buying a product line — it is buying into a different model of how AI hardware gets built.

Microsoft, meanwhile, is taking the in-house route. Its Maia line of custom inference silicon is the clearest sign yet that the hyperscalers no longer want to be wholly dependent on Nvidia for serving models. Per reporting attributed to Crescendo and CNBC (again, worth verifying), Anthropic is in early talks to run inference on Microsoft’s Maia 200 chips — a 3nm part that Microsoft claims delivers more than 30% better performance-per-dollar than rival silicon. The significance is less the spec sheet and more the signal: a frontier lab diversifying its inference away from a single vendor is exactly the kind of behaviour that erodes a monopoly at the margins.

And then there is AMD, which has spent the last two years turning its Instinct accelerators into a genuine alternative for both training and inference, while investing heavily in its open ROCm software stack to chip away at CUDA’s lock-in. Around AMD sits a broader coalition of open-standard challengers — RISC-V proponents, custom-silicon startups, and the cloud providers building their own chips — all of whom benefit from the same thing: a world where the AI software layer is not permanently welded to one company’s hardware.

Why inference is the battleground

It is tempting to think of AI compute as a training problem, because training runs grab the headlines with their eye-watering budgets and cluster sizes. But over the life of a successful product, inference — the cost of actually running the model every time a user sends a query — almost always dwarfs the one-time cost of training. You train a model once; you serve it millions or billions of times. That asymmetry is why inference, not training, is where the real competitive war is being fought.

This reframes the metric that matters. For training, raw peak performance and cluster scale dominate the conversation. For inference, the number that actually shows up on your cloud bill is performance-per-dollar — how much useful throughput you get per unit of spend. It is no accident that Microsoft is marketing Maia on exactly that axis. A chip that is slightly slower in absolute terms but materially cheaper to run can win the inference market outright, because at production scale, efficiency compounds.

The hidden constraint underneath all of this is memory. Large models are bottlenecked less by raw compute and more by high-bandwidth memory (HBM) — the expensive, supply-constrained stacks that feed data to the processor fast enough to keep it busy. HBM availability has been one of Nvidia’s structural advantages, because it has the scale and supplier relationships to secure allocation that smaller players struggle to match. Any serious challenger has to solve the memory problem, not just the compute one. This is precisely why architectural bets — including RISC-V designs optimised for memory efficiency rather than brute force — are interesting: they attack the bottleneck rather than trying to out-muscle it.

What it means for builders

For the teams building on top of all this, the headline is encouraging: if competition genuinely bites, inference gets cheaper. More credible alternatives mean more price pressure, and inference is the cost line that scales directly with your usage. Even a modest improvement in performance-per-dollar, applied across millions of queries, is the difference between a product with healthy unit economics and one that bleeds money on every interaction.

But cheaper compute is only half the story. The more strategic shift is portability. For years, building on Nvidia effectively meant building on CUDA, and CUDA-optimised code does not move easily to other silicon. As open software stacks mature and more vendors compete, the ability to run the same workload across different chips — Nvidia today, a Maia or an AMD part tomorrow, a RISC-V accelerator the day after — becomes a real lever. Portability is leverage: it lets you negotiate, switch, and avoid being captive to one supplier’s pricing and roadmap.

The flip side is the risk worth naming clearly: model-and-chip lock-in. If you over-optimise your stack for one vendor’s hardware, or build on a proprietary model that only runs efficiently on specific silicon, you inherit someone else’s supply constraints and pricing power. The practical advice for builders today is to treat your inference layer as something that should be portable by design — favour open formats and abstraction layers where you can, benchmark on more than one platform, and avoid architectural decisions that quietly weld you to a single chip. The competition only benefits you if you are positioned to take advantage of it.

The India angle

For India, this reshuffling of the AI-hardware deck arrives at a useful moment. The country’s compute ambitions have been climbing the national agenda, with public push around domestic GPU access and incentives for semiconductor design and fabrication. India is unlikely to out-spend the hyperscalers on bleeding-edge fabs in the near term, but the design layer — where RISC-V’s open architecture lowers the barrier to entry — is exactly the kind of opportunity that plays to the country’s deep bench of chip-design talent. A world where AI accelerators no longer require expensive proprietary architecture licences is a more accessible world for Indian design houses and startups.

Where can local startups plug in? Several places. First, in inference-optimisation and serving infrastructure — the software that squeezes more performance-per-dollar out of whatever silicon a company can get, which is valuable precisely because hardware remains scarce and expensive. Second, in building portable, multi-vendor stacks that help domestic enterprises avoid lock-in as they scale. Third, in the design and IP layer, where RISC-V opens a path to contribute to the architecture itself rather than just consuming finished chips. And fourth, in the application layer, where access to cheaper inference — whether from a domestic compute push or from global price competition — directly improves the economics of building India-first AI products.

The broader lesson is the same whether you are a frontier lab, a Bengaluru startup, or a marketing team running an internal model: the era of taking Nvidia as the only option is ending, slowly and unevenly. It will not collapse overnight, and CUDA’s moat is real. But every credible challenger — Qualcomm’s reported Tenstorrent bet, Microsoft’s Maia, AMD’s persistence, the RISC-V movement — adds optionality to a market that badly needed some. For the people paying the inference bills, optionality is the whole point.

The Race to Break Nvidia’s Grip on AI Compute Is On

The contenders

Why inference is the battleground

What it means for builders

The India angle

Maya V

The Signal — one email, every Tuesday.