On-device small language models with multimodality, RAG, and Function Calling
a year ago
- #AI
- #On-Device
- #Gemma
- Google AI Edge expands support to over a dozen models, including Gemma 3 and Gemma 3n, hosted on the new LiteRT Hugging Face community.
- Gemma 3n is the first multimodal on-device small language model supporting text, image, video, and audio inputs.
- New Retrieval Augmented Generation (RAG) and Function Calling libraries enhance on-device AI capabilities.
- Models are optimized for mobile and web, with easy on-device execution via a few lines of code.
- New quantization tools offer higher quality int4 post-training quantization, reducing model size by 2.5-4X.
- Gemma 3 1B runs up to 2,585 tokens per second on mobile GPU, processing a page of content in under a second.
- Gemma 3n supports enterprise use cases with text, image, video, and audio inputs, available on Hugging Face.
- On-device RAG allows augmentation with application-specific data without fine-tuning.
- AI Edge Function Calling library enables interactive language models to call predefined functions or APIs.
- Python tool simulation library aids in creating custom language models for specific functions.
- Google AI Edge will continue supporting new models and modalities, with updates on LiteRT Hugging Face Community.