Gemini Embedding 2: natively multimodal embedding model
4 days ago
- #AI
- #embedding
- #multimodal
- Gemini Embedding 2 is Google's first natively multimodal embedding model.
- It supports text, images, videos, audio, and documents in a unified embedding space.
- The model can process interleaved inputs (e.g., image + text) in a single request.
- It supports up to 8192 input tokens for text, 6 images, 120 seconds of video, and 6-page PDFs.
- Gemini Embedding 2 uses Matryoshka Representation Learning (MRL) for flexible output dimensions.
- It outperforms leading models in text, image, and video tasks.
- Early access partners are using it for high-value multimodal applications like RAG and semantic search.
- Available via Gemini API and Vertex AI, with support for LangChain, LlamaIndex, and other tools.