Question 1

Why use Redis or Valkey to cache LLM responses?

Accepted Answer

A response cache keys on the prompt (or its hash/embedding) and returns a stored completion on a hit, so you skip a slow, paid model call entirely. Every cache hit saves real model spend and shaves seconds off latency — which is why the cache layer itself should be cheap and predictable, not a second meter that grows with traffic.

Question 2

How much does an LLM response cache cost per month?

Accepted Answer

It scales with how often you check the cache, and cached completions are large. At 50 lookups/second and ~2 ops per request over 4 GiB, that is roughly 259M commands a month. On Upstash pay-as-you-go that runs about $519; on Steada's flat-rate target it is about $29.88 for the same workload. Model your own lookup rate in the calculator.

Question 3

Why does per-command pricing hurt an LLM cache specifically?

Accepted Answer

A response cache is double-taxed on a per-command meter: you pay per cache lookup and for storing the larger completion payloads. Semantic caching makes it worse, because a single request can fan out to several candidate lookups. The whole point of the cache is to drive your hit rate up — but a per-command bill punishes exactly the high lookup volume you are trying to encourage.

Question 4

Can I move an LLM cache to Valkey without rewriting it?

Accepted Answer

Usually yes. Caching libraries built on standard Redis clients — including LangChain’s Redis cache, LiteLLM’s cache backend, and plain redis-py / ioredis GET/SET — work against Valkey unchanged because it preserves the RESP protocol and Redis 7.2 commands. Point the client at the new TLS endpoint and confirm your TTL and eviction settings. Code bound to the Upstash REST API needs a standard TCP client to move.

Lookup rate	Upstash PAYG	Steada target
10 lookups/s 52M commands/mo	$105	$24.70 $79.98/mo lower
50 lookups/s 259M commands/mo	$519	$29.88 $490/mo lower
200 lookups/s 1037M commands/mo	$2,075	$49.32 $2,025/mo lower

An LLM cache should save you money, not add a second meter.

Worked monthly cost (2 ops/request, 4 GiB)

The cache is supposed to save the expensive call

Frequently asked questions