Good results fine tuning a local LLM like Qwen 3:0.6B to categorize questions

4 hours ago

Experiment focuses on fine-tuning a small local LLM (Qwen 3 0.6B) to categorize household questions for better metadata-aware vector search in a chatbot.
Baseline performance without fine-tuning showed only 10% accuracy, with issues like overusing broad labels and inventing categories.
First fine-tuning attempt improved accuracy to 79%, but problems remained with fragmented category outputs and confusion over semantically overlapping categories.
Second fine-tuning attempt used a prompt mapping categories to two-character opaque IDs, boosting accuracy to ~92% by reducing semantic overlap in outputs.
Remaining issues include specific misclassifications (e.g., water heater to pool) due to overlapping meanings, suggesting further training data refinement is needed.
The fine-tuned LLM is now usable in the chatbot for question categorization, with real-time category tags displayed during interactions.

Hasty Briefsbeta