Vision & Generative Media

What Is a Vision Transformer?

A vision transformer, or ViT, splits an image into patches and processes them with a transformer using self-attention. It offers an alternative to convolutional networks for image tasks.

What Is a Vision Transformer?

Related topics

Further reading