Gemma 4 12B: A unified, encoder-free multimodal model

22 days ago

Gemma 4 12B is a new multimodal model designed for laptops with a 12B parameter size.
It features a unified, encoder-free architecture that directly processes vision and audio inputs into the LLM backbone.
The model offers advanced reasoning performance close to a larger 26B model but runs on just 16GB of VRAM or unified memory.
It is open-source under Apache 2.0 and includes Multi-Token Prediction drafters to reduce latency.
Gemma 4 models have surpassed 150 million downloads, used in applications from robotic arms to AI security.
Developers can access the model via platforms like LM Studio, Hugging Face, and Google Cloud, with tools for integration and fine-tuning.

Hasty Briefsbeta