Use case · LLM response cache
An LLM cache should save you money, not add a second meter.
Caching model responses is the highest-value cache you run: every hit skips a slow, paid API call. So you want a high hit rate and a generous cache — which means lots of lookups over larger payloads. On a per-command bill that becomes a tax on the exact behaviour you are trying to encourage. For a steady, high-lookup cache, flat-rate managed Valkey is cheaper and predictable. Here is the monthly math, computed from source-checked rates.
The shape of it
An LLM cache at 50 lookups/s over 4 GiB (~259M commands/mo) runs about $519/mo on Upstash pay-as-you-go versus $29.88/mo on Steada's flat-rate target — and that is before counting the model spend each cache hit already saved you.
Worked monthly cost (2 ops/request, 4 GiB)
| Lookup rate | Upstash PAYG | Steada target |
|---|---|---|
| 10 lookups/s 52M commands/mo | $105 | $24.70 $79.98/mo lower |
| 50 lookups/s 259M commands/mo | $519 | $29.88 $490/mo lower |
| 200 lookups/s 1037M commands/mo | $2,075 | $49.32 $2,025/mo lower |
Steada figures are a controlled-beta target, not a public offer. Assumes 2 Redis ops per cache lookup (GET on read, SET on miss) and 4 GiB of cached completions. Upstash: $0.20/100,000 commands + $0.25/GiB; Steada target: $19.00/mo + $0.03/1,000,000 commands + $1.10/GiB. Source-checked 2026-06-01. Model your own lookup rate in the calculator.
The cache is supposed to save the expensive call
A response cache exists to avoid the model call — the slow, metered, dollar-a-thousand-tokens part. The cache layer underneath it should be the cheap, boring part of the stack. When that layer is itself billed per command, you have re-introduced a meter on the path you built specifically to take off the meter. Worse, raising your hit rate — adding semantic lookups, widening the key space, caching more aggressively — increases lookups, so the better your cache works, the more the per-command layer charges. A flat tier removes that incentive conflict: cache as hard as you like at a fixed price.
Frequently asked questions
- Why use Redis or Valkey to cache LLM responses?
- A response cache keys on the prompt (or its hash/embedding) and returns a stored completion on a hit, so you skip a slow, paid model call entirely. Every cache hit saves real model spend and shaves seconds off latency — which is why the cache layer itself should be cheap and predictable, not a second meter that grows with traffic.
- How much does an LLM response cache cost per month?
- It scales with how often you check the cache, and cached completions are large. At 50 lookups/second and ~2 ops per request over 4 GiB, that is roughly 259M commands a month. On Upstash pay-as-you-go that runs about $519; on Steada's flat-rate target it is about $29.88 for the same workload. Model your own lookup rate in the calculator.
- Why does per-command pricing hurt an LLM cache specifically?
- A response cache is double-taxed on a per-command meter: you pay per cache lookup and for storing the larger completion payloads. Semantic caching makes it worse, because a single request can fan out to several candidate lookups. The whole point of the cache is to drive your hit rate up — but a per-command bill punishes exactly the high lookup volume you are trying to encourage.
- Can I move an LLM cache to Valkey without rewriting it?
- Usually yes. Caching libraries built on standard Redis clients — including LangChain’s Redis cache, LiteLLM’s cache backend, and plain redis-py / ioredis GET/SET — work against Valkey unchanged because it preserves the RESP protocol and Redis 7.2 commands. Point the client at the new TLS endpoint and confirm your TTL and eviction settings. Code bound to the Upstash REST API needs a standard TCP client to move.
Related & sources
See the full Upstash cost crossover, the Valkey vs Redis explainer, or the other worked workloads: rate limiting · session store.
Per-command and storage rates read from Upstash pricing; client compatibility per the Valkey project. Checked 2026-06-01.
Last reviewed