Vision & Generative Media

What Is CLIP?

CLIP learns a shared embedding space for images and text by training on image-caption pairs. This lets it match images to text descriptions and perform zero-shot classification.

What Is CLIP?

Related topics

Further reading