Good results fine tuning a local LLM like Qwen 3:0.6B to categorize questions
6 hours ago
- #household chatbot
- #LLM fine-tuning
- #question categorization
- Experiment focuses on fine-tuning a small local LLM (Qwen 3 0.6B) to categorize household questions for better metadata-aware vector search in a chatbot.
- Baseline performance without fine-tuning showed only 10% accuracy, with issues like overusing broad labels and inventing categories.
- First fine-tuning attempt improved accuracy to 79%, but problems remained with fragmented category outputs and confusion over semantically overlapping categories.
- Second fine-tuning attempt used a prompt mapping categories to two-character opaque IDs, boosting accuracy to ~92% by reducing semantic overlap in outputs.
- Remaining issues include specific misclassifications (e.g., water heater to pool) due to overlapping meanings, suggesting further training data refinement is needed.
- The fine-tuned LLM is now usable in the chatbot for question categorization, with real-time category tags displayed during interactions.