Hasty Briefsbeta

Bilingual

Gemini Embedding 2: natively multimodal embedding model

3 days ago
  • #AI
  • #embedding
  • #multimodal
  • Gemini Embedding 2 is Google's first natively multimodal embedding model.
  • It supports text, images, videos, audio, and documents in a unified embedding space.
  • The model can process interleaved inputs (e.g., image + text) in a single request.
  • It supports up to 8192 input tokens for text, 6 images, 120 seconds of video, and 6-page PDFs.
  • Gemini Embedding 2 uses Matryoshka Representation Learning (MRL) for flexible output dimensions.
  • It outperforms leading models in text, image, and video tasks.
  • Early access partners are using it for high-value multimodal applications like RAG and semantic search.
  • Available via Gemini API and Vertex AI, with support for LangChain, LlamaIndex, and other tools.