Language & LLMs

What Is QLoRA?

QLoRA is a fine-tuning method that applies LoRA on top of a quantized model, storing the base weights in a compressed low-bit format. This reduces memory requirements enough to fine-tune very large models on a single GPU. It preserves much of the quality of full-precision fine-tuning.

Further reading

Read more about QLoRA — articles and blogs from around the web: