Language & LLMs

What Is a KV Cache?

A KV cache stores the key and value vectors computed for previous tokens during autoregressive generation. By reusing these cached values, the model avoids recomputing attention for earlier tokens at each step. This significantly speeds up text generation but increases memory use for long contexts.

Further reading

Read more about KV cache — articles and blogs from around the web: