Not all fine-tuning is equal. The choice between LoRA, full fine-tuning, instruction tuning, and RLHF depends on your dataset size, target behavior, compute budget, and whether you need format compliance, domain accuracy, or value alignment. Choosing the wrong technique is expensive and often produces worse results.

Fine-tuning method selection is a practical decision with significant cost and quality implications. Full fine-tuning updates all model parameters and produces the highest task-specific accuracy, but requires substantial GPU memory (a 7B model needs ~56GB VRAM for full fine-tuning in bf16) and risks catastrophic forgetting of general capabilities. Use it only when you have a large, high-quality domain dataset and need maximum performance. LoRA (Low-Rank Adaptation) and its variants (QLoRA, DoRA) inject small trainable adapter matrices into attention layers while freezing the base model. QLoRA enables fine-tuning a 7B model on a single consumer GPU by quantizing the frozen base to 4-bit. LoRA preserves general capabilities better than full fine-tuning and is the default choice for most production fine-tuning tasks. Instruction tuning is a specific form of supervised fine-tuning on (instruction, response) pairs that teaches models to follow instructions reliably — this is how base models are converted into chat assistants. RLHF and its variants (DPO, ORPO) are used when the target behavior is difficult to specify with examples alone — you want the model to be helpful, harmless, and honest in complex situations where human preference comparison provides a better training signal than supervised labels. Data quality dominates all other factors: 1,000 carefully curated, diverse fine-tuning examples consistently outperform 100,000 noisy ones. Evaluation setup — defining task-specific metrics and a held-out test set before training starts — is as important as the training configuration itself.

Fine-Tuning Strategy: When to Use LoRA, Full Fine-Tuning, and RLHF