Not all fine-tuning is created equal. Full fine-tuning updates every model weight and needs expensive hardware. LoRA injects small adapter matrices for 10x lower cost. QLoRA lets you fine-tune a 70B model on a single consumer GPU. The right choice depends on budget, dataset size, and target behavior.

Fine-tuning method selection has real cost and quality implications. Full fine-tuning updates all model parameters and produces the highest task-specific accuracy, but it requires substantial GPU memory (a 7B model needs roughly 56GB VRAM in bfloat16) and risks catastrophic forgetting — where the model loses general capabilities while specializing. Use full fine-tuning only when you have a large, high-quality dataset and need maximum performance. LoRA (Low-Rank Adaptation) injects small trainable adapter matrices into the model's attention layers while freezing the base model. Only the adapter weights update during training, reducing memory and compute requirements by 10x or more. Adapters can be swapped in and out, letting you maintain multiple specialized versions of a base model cheaply. QLoRA goes further: quantize the frozen base model to 4-bit precision, then fine-tune LoRA adapters on top. This enables fine-tuning a 70B parameter model on a single 48GB GPU, collapsing the hardware barrier dramatically. DoRA (Weight-Decomposed LoRA) and other variants further improve efficiency. Beyond parameter-efficient methods, modern alignment techniques like DPO (Direct Preference Optimization) and ORPO simplify the RLHF pipeline by training directly on preference pairs without a separate reward model. For most production teams in 2026, the default choice is LoRA or QLoRA — they deliver 95% of full fine-tuning quality at a fraction of the cost, they preserve general capabilities better, and they enable rapid experimentation. Full fine-tuning only makes sense when you have both the budget and a clear accuracy-critical use case.

Fine-Tuning Techniques: LoRA, QLoRA, and Full Fine-Tuning Compared