Gemma 4 12B: A unified, encoder-free multimodal model
4 hours ago
- #Multimodal AI
- #AI Model
- #Open Source
- Gemma 4 12B is a new multimodal model designed for laptops with a 12B parameter size.
- It features a unified, encoder-free architecture that directly processes vision and audio inputs into the LLM backbone.
- The model offers advanced reasoning performance close to a larger 26B model but runs on just 16GB of VRAM or unified memory.
- It is open-source under Apache 2.0 and includes Multi-Token Prediction drafters to reduce latency.
- Gemma 4 models have surpassed 150 million downloads, used in applications from robotic arms to AI security.
- Developers can access the model via platforms like LM Studio, Hugging Face, and Google Cloud, with tools for integration and fine-tuning.