Language & LLMs

What Is Sliding Window Attention?

Sliding window attention restricts each token to attend only to a fixed number of nearby tokens rather than the entire sequence. This reduces the computational cost of attention, making it more efficient for long inputs. Some models combine it with other mechanisms to still capture longer-range dependencies.

What Is Sliding Window Attention?

Related topics

Further reading