Language & LLMs

What Is Speculative Decoding?

Speculative decoding accelerates text generation by using a small, fast model to propose several candidate tokens, which a larger model then verifies in parallel. Accepted tokens are kept, and rejected ones are recomputed by the large model. This reduces latency while producing the same output distribution as the large model alone.

What Is Speculative Decoding?

Related topics

Further reading