Infrastructure & Agents

What Is Gradient Checkpointing?

Gradient checkpointing trades extra computation for lower memory use by not storing all intermediate activations during training. Selected values are recomputed during the backward pass, enabling larger models on limited hardware.

Further reading

Read more about gradient checkpointing — articles and blogs from around the web: