Language & LLMs

What Is a Transformer Decoder?

A transformer decoder produces an output sequence step by step, using masked self-attention so each position can only attend to earlier tokens. It is the core of autoregressive language models like the GPT family. Decoders may also use cross-attention to reference encoder outputs in sequence-to-sequence models.

Further reading

Read more about transformer decoder — articles and blogs from around the web: