Show HN: Fine-tuned Llama 3.2 3B to match 70B models for local transcripts
9 days ago
- #Local Inference
- #NLP
- #AI Fine-Tuning
- Fine-tuned Llama 3.2 3B model to clean and analyze raw voice transcripts locally, outputting structured JSON payloads.
- Training involved LoRA via Unsloth, taking 4 hours on a single RTX 4090 with a batch size of 16.
- Evaluation score improved from 5.35 (base model) to 8.55 (fine-tuned model), outperforming larger general models.
- Dataset creation involved generating synthetic transcripts and gold-standard JSON outputs using a teacher model (Kimi K2).
- Inference setup includes merging LoRA with the base model, quantizing to GGUF (Q4_K_M), and using LM Studio for local inference.
- Comparison tests showed the fine-tuned 3B model outperforming many larger models (12B–70B) on the specific task.
- Key benefits include local execution, tailored performance, and cost efficiency compared to API-based solutions.