Show HN: Fine-tuned Llama 3.2 3B to match 70B models for local transcripts

9 days ago

Copy Link

Fine-tuned Llama 3.2 3B model to clean and analyze raw voice transcripts locally, outputting structured JSON payloads.
Training involved LoRA via Unsloth, taking 4 hours on a single RTX 4090 with a batch size of 16.
Evaluation score improved from 5.35 (base model) to 8.55 (fine-tuned model), outperforming larger general models.
Dataset creation involved generating synthetic transcripts and gold-standard JSON outputs using a teacher model (Kimi K2).
Inference setup includes merging LoRA with the base model, quantizing to GGUF (Q4_K_M), and using LM Studio for local inference.
Comparison tests showed the fine-tuned 3B model outperforming many larger models (12B–70B) on the specific task.
Key benefits include local execution, tailored performance, and cost efficiency compared to API-based solutions.

Hasty Briefsbeta