What even is a small language model now?

a year ago

The definition of 'small models' has evolved from a few million parameters in 2018 to 30B-70B parameters today, now considered 'small' if they run on a single GPU.
Small models are categorized into two types: those optimized for mobile/edge devices and those requiring a single GPU.
Small models are often narrow and optimized for specific tasks, offering advantages like lower costs, faster inference, and better privacy.
The bar for what's considered 'small' keeps shifting, with quantization and engineering enabling even 70B models to run on consumer GPUs.
Some long-standing small models, like Google Translate and AWS Textract, continue to be widely used despite not being cutting-edge.
Small models are gaining importance due to their efficiency, cost-effectiveness, and ability to compete with larger models like GPT-3.5 in benchmarks.

Hasty Briefsbeta