Together AI's $800M Raise Bets on Cheaper AI

The economics of running AI are quietly becoming as important as the models themselves. For every headline about a bigger, smarter system, there is a less glamorous question keeping founders and finance teams awake: what does it cost to actually serve this thing to users, at scale, every day? That question has created an opening — and a fast-growing class of infrastructure companies known as “neoclouds” is racing to fill it. Together AI just raised a very large sum to prove the bet is real.

The raise

Together AI raised $800 million at an $8.3 billion valuation to scale its open-model and neocloud infrastructure, according to coverage from The Neuron citing BusinessWire, TechCrunch and the New York Times. That marks a steep climb from a roughly $3.3 billion valuation in early 2025 — more than a doubling in about eighteen months, a pace that only makes sense in a market where demand for compute is running well ahead of supply.

What is striking is how the deal was framed. Rather than positioning itself as another frontier-lab wannabe, Together AI has leaned into a different pitch: cheaper AI. The company’s proposition is that teams building on open models shouldn’t have to pay hyperscaler prices — or accept hyperscaler lock-in — to run inference at scale. The raise is, in effect, a vote of confidence in that positioning. Investors are backing not a single model but a cost structure: the idea that a leaner, open-model-friendly stack can win business precisely because it is less expensive.

The capital is earmarked for the unglamorous but essential work of scaling infrastructure — securing GPUs, standing up capacity, and hardening the software layer that lets customers deploy, fine-tune and serve open models without building it all themselves. In a phase where the constraint is often raw availability of compute rather than demand for it, money buys capacity, and capacity buys market share.

Why neoclouds are winning

The neocloud thesis is disarmingly simple. Hyperscalers were built to run a sprawling menu of enterprise workloads — databases, storage, analytics, everything. Neoclouds are built for one thing: throwing GPUs at AI training and inference as efficiently as possible. Specialisation, the argument goes, translates into lower prices and more flexibility for the specific job most AI teams actually need done.

According to The Neuron, the round reflects strong demand for lower-cost, open-model-friendly compute as inference usage scales, positioning neoclouds as a price-competitive alternative to the big clouds. That framing matters because it names the real growth engine. Training a model is a one-off event; inference — serving the model to users — is a recurring cost that grows with every new feature, every new customer, every additional query. As AI moves from demos to production, inference becomes the dominant line item. Winning on inference cost is therefore winning where the money is spent.

Open models are the other half of the story. A team standing up a proprietary, closed API has limited leverage over where and how their workload runs. A team running open-weight models can, in principle, run them anywhere — which turns compute into a genuinely competitive market. Neoclouds court exactly these builders by offering hosting, fine-tuning and serving for popular open models, often with pricing and configuration options that hyperscalers, wedded to broader product portfolios, are slower to match. The value proposition is portability plus price: run the model you want, on infrastructure you can switch, at a rate you can defend to your CFO.

The risks

None of this is a free lunch, and the neocloud model carries real structural risks that a soaring valuation can obscure.

Margin pressure and capital intensity. Competing on price is a dangerous game when your core input — GPUs — is expensive and your business is fundamentally about reselling compute. Buying hardware and data-centre capacity is capital-intensive, and if pricing power erodes in a race to the bottom, thin margins can get thinner. The $800 million raise is partly a reflection of just how much cash it takes to stay in this game.
Competition from every direction. The hyperscalers are not standing still; they have deep pockets, existing enterprise relationships, and every incentive to defend AI workloads with their own optimised offerings and aggressive discounting. Meanwhile, a crowd of rival neoclouds is chasing the same open-model customers. Being the cheap option only works until someone else is cheaper — or until the incumbents decide to compete on price.
Dependence on GPU supply. A neocloud lives and dies by access to accelerators. Supply constraints, allocation politics and pricing set by a small number of chipmakers all sit upstream of the business model. A company whose entire pitch is cheaper compute is exposed if the cost or availability of that compute moves against it.

The honest read: the thesis is sound, but execution risk is high. Valuations that double in a year assume a market that keeps expanding and a company that keeps winning share within it. Both are plausible. Neither is guaranteed.

The India read

For Indian AI builders, the rise of neoclouds is arguably more consequential than for their better-funded peers in the US. Cost discipline is not a nice-to-have here; it is often the difference between a product that ships and one that dies in the unit economics. Cheaper inference directly widens the set of AI features an Indian startup can afford to put in front of users — and, crucially, keep running as usage grows.

Open models fit this reality neatly. For teams operating on tight margins, an open-weight model served on lower-cost infrastructure can deliver most of the capability at a fraction of the recurring spend of a premium closed API. Pair that with fine-tuning on domain- or language-specific data — a genuine need in India’s multilingual market — and you get systems that are both cheaper and better suited to local use cases. The combination of open models and cost-competitive compute is close to tailor-made for the constraints Indian operators actually face.

The broader shift worth internalising is that the menu of compute options is widening. A few years ago, serving a model at scale effectively meant renting from one of a handful of hyperscalers. Now there is a growing tier of neoclouds competing on exactly the axis — inference cost — that matters most for production AI. For Indian founders and finance teams, that means leverage: the ability to benchmark providers, negotiate, and architect for portability rather than lock-in.

A few practical takeaways for teams watching this space:

Treat inference cost as a first-class metric. Model it per-query and per-user before you scale, not after.
Design for portability. Favouring open models and avoiding deep provider-specific dependencies keeps your negotiating leverage intact.
Shop the widening market. Neoclouds like Together AI are one option among a growing set; benchmark on cost, latency and reliability for your actual workload rather than the headline price.

Together AI’s raise is a bet that cheaper, open-model-friendly compute is not a temporary arbitrage but a durable business. For teams hunting lower inference costs — in Bengaluru as much as in San Francisco — the more important signal may be simpler: the era of taking whatever the big clouds charged is over, and competition is finally arriving where it counts.

Together AI’s $800M Bet: Can ‘Neoclouds’ Really Undercut the Big Clouds on AI Compute?

The raise

Why neoclouds are winning

The risks

The India read

Charlotte Evans

The Signal — one email, every Tuesday.