Core Concepts
What Is Model Quantization?
Quantization reduces the precision of the numbers used to store a model’s parameters — for example from 16-bit to 8- or 4-bit — shrinking its memory footprint and speeding up inference with only a small loss of accuracy. It is a key technique for running large models on modest hardware, including laptops and phones.
Further reading
Read more about Model quantization — articles and blogs from around the web: