Language & LLMs

What Is a KV Cache?

A KV cache stores the key and value vectors computed for previous tokens during autoregressive generation. By reusing these cached values, the model avoids recomputing attention for earlier tokens at each step. This significantly speeds up text generation but increases memory use for long contexts.

What Is a KV Cache?

Related topics

Further reading