The AI memory shortage just rewrote the cloud cost-optimisation playbook
DRAM and NAND just had their worst quarters on record, and none of the standard cloud cost levers touch the actual cause.
Cloud cost optimisation runs on a standard playbook: rightsize instances, buy reserved capacity, autoscale to match demand, push stateful workloads onto managed caches instead of overprovisioned memory. It has worked because the underlying assumption held — the unit cost of memory stayed roughly flat, and any volatility was something procurement could smooth over with a longer contract.
That assumption broke in 2026. The AI memory shortage is no longer a forecast — it is sitting in TrendForce's quarterly contract price data, in Samsung's own statements about its product line, and in AWS's last earnings call. None of the standard cost levers touch the actual cause, because the cause is not a usage pattern. It is a global reallocation of fabrication capacity, away from the DRAM and NAND ordinary servers run on, and toward the high-bandwidth memory AI accelerators need.
The numbers, plainly
Start with what changed, because the scale matters more than the direction. TrendForce's most recent industry forecast, published 31 March 2026, projects DRAM contract prices rising 58 to 63 per cent quarter-on-quarter in Q2 2026, with NAND Flash contract prices rising 70 to 75 per cent over the same period. Those are large numbers, and they are not even the worst quarter on record. That distinction belongs to Q1 2026, when DRAM and NAND contract prices both rose by roughly 95 per cent quarter-on-quarter, according to TrendForce data reported by Tom's Hardware: the largest quarterly increase logged for either product category.
Samsung's numbers, separate from TrendForce's industry-wide figures, tell the same story from the supplier side. The company's head of global marketing, Wonjin Lee, told Network World in January that a 32GB DDR5 module rose from $149 to $239 in a single month in September 2025, a 60 per cent jump, and that DDR5 contract pricing had more than doubled across roughly the same window, from about $7 to $19.50 per unit. Samsung was projecting a further 30 per cent increase for Q4 2025 and another 20 per cent in early 2026, on top of that.
| Period | Metric | Move | Source |
|---|---|---|---|
| Sept 2025 | 32GB DDR5 module, contract price | $149 to $239 (+60%) | Samsung, via Network World |
| Sept 2025 to Jan 2026 | DDR5 contract price per unit | ~$7 to $19.50 (+100%+) | Samsung, via Network World |
| Q1 2026 | DRAM contract price, QoQ | +~95% | TrendForce, via Tom’s Hardware |
| Q1 2026 | NAND contract price, QoQ | +~95% | TrendForce, via Tom’s Hardware |
| Q2 2026 (forecast) | DRAM contract price, QoQ | +58% to 63% | TrendForce |
| Q2 2026 (forecast) | NAND contract price, QoQ | +70% to 75% | TrendForce |
Read the table row by row rather than as one trend line: each entry is an independent data point from a different source, and they all point the same direction at a pace with no recent precedent in either DRAM or NAND. The product mix matters too. TrendForce's analysts note that high-capacity server RDIMMs are the specific item North American cloud providers are chasing, because that is the form factor AI inference workloads actually consume, and suppliers are prioritising it over consumer parts because the margins are better.
Why the AI memory shortage is not a normal pricing cycle
Memory has always been cyclical. Oversupply pushes prices down, manufacturers cut capacity, demand catches up, prices spike, manufacturers add capacity, and the cycle repeats. Anyone who bought a hard drive after the 2011 Thailand floods, or RAM after a fire at a fab, has seen this before. What is different this time is that the supply response is not really a supply response. It is a reallocation.
High-bandwidth memory, the stacked DRAM that sits next to AI accelerators, needs roughly three times the wafer capacity per gigabyte that standard DRAM does. Every wafer a fab commits to HBM is a wafer it is not using to make the DDR5 modules that go into an ordinary server. Micron has already exited the consumer memory segment entirely, redirecting that capacity towards higher-margin server and AI products. Samsung’s new fab capacity, meanwhile, is not slated to reach mass production until 2028. There is no quarter in between where a glut shows up and corrects this.
“In 2026, there's going to be issues around semiconductor supplies, and it's going to affect everyone, not just Samsung.”
That is a notable admission from a company that is itself one of the firms best placed to benefit from higher memory prices. It is not a competitor's complaint. It is a supplier conceding that the shortage is structural, not a normal demand spike that the market will absorb in a quarter or two.
The cloud-migration pitch, and the scepticism it is getting
The shortage has produced a fast-moving secondary narrative: that owning servers is now a worse deal than renting them, because hyperscalers can buy at a scale and price individual enterprises cannot match. AWS CEO Andy Jassy leaned into exactly that framing on the company's Q1 2026 earnings call, reported by The Register, saying memory costs have “skyrocketed” and that “there is just not enough capacity for the amount of demand”, while claiming AWS itself has “a lot more supply than what others have”. He described cloud-migration conversations as accelerating rapidly.
Other voices in the channel back that up, at least directionally. Peter FitzGibbon, an SVP at Insight Enterprises, told The Register that clients are accelerating data centre exits, citing the chip shortage as the trigger. Gartner VP Tony Harvey has drawn a similar line between rising server prices, longer shipment delays, and renewed enterprise interest in cloud.
Where this actually lands on a cloud bill
Strip away the vendor framing and the shortage shows up in three concrete places on an infrastructure budget, not just as a vague sense that things cost more.
- Memory-optimised instance pricing. Cloud providers’ own procurement costs for server DRAM are rising, and that eventually reaches list price for memory-heavy instance families, even where compute pricing stays flat.
- Renewals on existing commitments. Reserved Instances and Savings Plans signed in 2024 or 2025 were priced against a memory-cost assumption that no longer holds. The discount on the existing term is unaffected, but the renewal quote a year from now will reflect the new reality.
- Self-managed in-memory infrastructure. Anything backed by Redis or Memcached scales by adding RAM, not CPU. A workload's working set is a function of the data, not the request rate, so the underlying memory cost is far more exposed to this shock than a typical compute-bound service.
That third category is easy to miss because it never shows up as a line item called "memory". It shows up as the bill for the managed caching service, or the node count on a self-hosted cluster, both priced, directly or indirectly, off the same DRAM that just had its worst two quarters on record.
Why rightsizing and reserved instances do not fix this
The standard cost-optimisation playbook was built to fix waste, not to fix a unit-price shock, and the difference matters more here than usual.
| Lever | What it actually optimises | Effective here? |
|---|---|---|
| Rightsizing | Removes over-provisioned instances | No — fixes waste, not unit price |
| Reserved Instances / Savings Plans | Locks a discount for the term of an existing commitment | Partial — protects the current term, not renewals |
| Autoscaling | Matches compute capacity to traffic | No — memory-bound workloads scale with data size |
| Spot / preemptible instances | Cuts compute cost for fault-tolerant jobs | No — rarely used for stateful, memory-bound workloads |
| Multi-cloud arbitrage | Shops pricing across providers | Limited — providers source memory from the same supply chain |
Autoscaling is the clearest case. It was designed to fix a mismatch between provisioned capacity and request volume: scale out for the traffic spike, scale back in afterwards. A Redis cluster holding a working set of cached data does not have that mismatch. It needs enough memory to hold the data regardless of how many requests are arriving, so there is nothing for an autoscaler to scale down without evicting data the application still needs.
What an infrastructure team can actually do about it
None of this means infrastructure teams are powerless, only that the response has to target the actual cost driver rather than the usual proxies for it.
- Separate memory-bound workloads from compute-bound ones in the cost model. A CPU-bound service responds normally to rightsizing and autoscaling. A cache or in-memory database does not, and lumping the two together hides the real exposure.
- Shrink the working set before adding nodes. Tighter TTLs, more selective caching, and smaller serialised representations all reduce the memory a Redis or Memcached cluster needs to hold, which is a cheaper lever to pull than buying more RAM at 2026 prices.
- Reconsider tiered storage for data that is warm rather than hot. Data accessed occasionally, rather than on every request, can often move from memory to a fast NVMe-backed store at a fraction of the per-gigabyte cost, even allowing for the latency trade-off.
- Re-price multi-year commitments with a wider volatility band. A reserved-capacity or TCO model that treats memory as a flat input is now wrong by construction. Build the renewal scenario at current contract prices, not the prices from when the commitment was first signed.
- Weigh hardware-lifecycle extension against migration, rather than assuming migration wins by default. Meta's choice to run servers for seven years instead of six is a reminder that renting from a hyperscaler is not the only rational response to a supply shock. It depends on what is already owned and how it is depreciating.
The open question
How long this lasts is the part nobody actually knows. TrendForce's published forecast only runs through Q2 2026. Samsung's own timeline for new fab capacity points to 2028 before supply meaningfully loosens. Two years is a long time to run a cost model on the assumption that an input price will eventually behave itself.
The more useful signal is not TrendForce's number or Samsung's roadmap. It is that AWS and Meta, two of the most sophisticated infrastructure operators on the planet, are responding to the identical shortage in opposite ways: one pitching migration, the other extending the life of hardware it already owns. When the best-resourced players in the industry do not agree on the right response, that is a sign the shortage is genuinely unresolved, not that one of them has missed something obvious. Treat memory cost as a volatile, externally driven input for the next two years, and build the cost model accordingly.
Frequently asked questions
Related reading
X, Zoom, and Teams went down from one fibre cut. The transit layer doesn’t show up on most redundancy diagrams.
A severed Zayo fibre route took down X, Zoom, Reddit, and Teams within minutes. Anycast and multi-region failover were never the layer protecting against this.
AI coding agents pushed GitHub's commit volume up 14x. Its infrastructure didn't keep pace.
GitHub logged nine incidents in May and dipped near 88% uptime in June as AI agents pushed weekly commits past 275 million. Microsoft is now routing overflow through AWS.
Coinbase's AWS outage lasted 18 hours. The postmortem shows why multi-AZ didn't help.
A single AWS zone failure turned into an 18-hour Coinbase outage. The postmortem reveals two specific ways 'multi-AZ' architecture quietly wasn't, and how to check your own systems for the same gap.