Hasty Briefsbeta

Show HN: Fine-tuned Llama 3.2 3B to match 70B models for local transcripts

9 days ago
  • #Local Inference
  • #NLP
  • #AI Fine-Tuning
  • Fine-tuned Llama 3.2 3B model to clean and analyze raw voice transcripts locally, outputting structured JSON payloads.
  • Training involved LoRA via Unsloth, taking 4 hours on a single RTX 4090 with a batch size of 16.
  • Evaluation score improved from 5.35 (base model) to 8.55 (fine-tuned model), outperforming larger general models.
  • Dataset creation involved generating synthetic transcripts and gold-standard JSON outputs using a teacher model (Kimi K2).
  • Inference setup includes merging LoRA with the base model, quantizing to GGUF (Q4_K_M), and using LM Studio for local inference.
  • Comparison tests showed the fine-tuned 3B model outperforming many larger models (12B–70B) on the specific task.
  • Key benefits include local execution, tailored performance, and cost efficiency compared to API-based solutions.