Core Concepts

What Is Model Quantization?

Quantization reduces the precision of the numbers used to store a model’s parameters — for example from 16-bit to 8- or 4-bit — shrinking its memory footprint and speeding up inference with only a small loss of accuracy. It is a key technique for running large models on modest hardware, including laptops and phones.

What Is Model Quantization?

Related topics

Further reading