Vision & Generative Media

What Is a Vision Transformer?

A vision transformer, or ViT, splits an image into patches and processes them with a transformer using self-attention. It offers an alternative to convolutional networks for image tasks.

Further reading

Read more about vision transformer — articles and blogs from around the web: