Language & LLMs

What Is Sliding Window Attention?

Sliding window attention restricts each token to attend only to a fixed number of nearby tokens rather than the entire sequence. This reduces the computational cost of attention, making it more efficient for long inputs. Some models combine it with other mechanisms to still capture longer-range dependencies.

Further reading

Read more about sliding window attention — articles and blogs from around the web: