On-device small language models with multimodality, RAG, and Function Calling

a year ago

Google AI Edge expands support to over a dozen models, including Gemma 3 and Gemma 3n, hosted on the new LiteRT Hugging Face community.
Gemma 3n is the first multimodal on-device small language model supporting text, image, video, and audio inputs.
New Retrieval Augmented Generation (RAG) and Function Calling libraries enhance on-device AI capabilities.
Models are optimized for mobile and web, with easy on-device execution via a few lines of code.
New quantization tools offer higher quality int4 post-training quantization, reducing model size by 2.5-4X.
Gemma 3 1B runs up to 2,585 tokens per second on mobile GPU, processing a page of content in under a second.
Gemma 3n supports enterprise use cases with text, image, video, and audio inputs, available on Hugging Face.
On-device RAG allows augmentation with application-specific data without fine-tuning.
AI Edge Function Calling library enables interactive language models to call predefined functions or APIs.
Python tool simulation library aids in creating custom language models for specific functions.
Google AI Edge will continue supporting new models and modalities, with updates on LiteRT Hugging Face Community.

Hasty Briefsbeta