Language & LLMs

What Is AWQ?

AWQ, short for Activation-aware Weight Quantization, is a technique for compressing large language models to low-bit precision. It identifies and protects the weights most important to model outputs based on activation patterns. This helps maintain accuracy while reducing memory and improving inference speed.

What Is AWQ?

Related topics

Further reading