Training Qwen 4B to Beat Large Models on Work Tasks

3 months ago

Neurometric focuses on auto-generating Small Language Models (SLMs) for specific tasks.
CRMArena benchmark tests models on realistic Salesforce CRM tasks like lead qualification and activity prioritization.
Fine-tuned a 4B parameter Qwen model to outperform larger models on CRM tasks with 95% accuracy.
Initial attempts to teach SLMs to generate SQL queries were rough but improved with expanded training data.
Phase II involved direct answer generation using the BANT framework, achieving an evaluation score of 0.825.
Key takeaways: SLMs can outperform larger models with task-specific fine-tuning, synthetic data has quality challenges, and constrained answer spaces improve results.

Hasty Briefsbeta