Hasty Briefsbeta

Bilingual

On-device small language models with multimodality, RAG, and Function Calling

a year ago
  • #AI
  • #On-Device
  • #Gemma
  • Google AI Edge expands support to over a dozen models, including Gemma 3 and Gemma 3n, hosted on the new LiteRT Hugging Face community.
  • Gemma 3n is the first multimodal on-device small language model supporting text, image, video, and audio inputs.
  • New Retrieval Augmented Generation (RAG) and Function Calling libraries enhance on-device AI capabilities.
  • Models are optimized for mobile and web, with easy on-device execution via a few lines of code.
  • New quantization tools offer higher quality int4 post-training quantization, reducing model size by 2.5-4X.
  • Gemma 3 1B runs up to 2,585 tokens per second on mobile GPU, processing a page of content in under a second.
  • Gemma 3n supports enterprise use cases with text, image, video, and audio inputs, available on Hugging Face.
  • On-device RAG allows augmentation with application-specific data without fine-tuning.
  • AI Edge Function Calling library enables interactive language models to call predefined functions or APIs.
  • Python tool simulation library aids in creating custom language models for specific functions.
  • Google AI Edge will continue supporting new models and modalities, with updates on LiteRT Hugging Face Community.