OPEN_SOURCE ↗
PH · PRODUCT_HUNT// 31d agoMODEL RELEASE
Gemini Embedding 2 unifies multimodal retrieval
Google has launched Gemini Embedding 2, its first natively multimodal embedding model, mapping text, images, video, audio, and PDFs into one shared vector space for cross-modal search, classification, and clustering. It is now in public preview through the Gemini API and Vertex AI, giving developers a single retrieval stack for mixed-media AI applications.
// ANALYSIS
This is the kind of model release that matters more than a flashy chatbot demo: Google is collapsing multimodal retrieval from a pile of separate pipelines into one API primitive.
- –A single embedding space for text, images, audio, video, and documents removes a lot of glue code from RAG, search, and classification systems
- –The developer docs position it for real production workloads, with support for 100+ languages, flexible output dimensions, and batch pricing at half the standard embedding cost
- –Cross-modal search is the real unlock here: developers can finally retrieve a video, image, or PDF page from a text query without stitching together separate modality-specific models
- –This is not a drop-in upgrade for existing Gemini embedding users, because Google says the new embedding space is incompatible with gemini-embedding-001 and requires re-embedding old data
- –Public preview availability in both Gemini API and Vertex AI makes it easy to test now, but preview status means teams should treat it as promising infrastructure rather than fully settled foundation
// TAGS
gemini-embedding-2embeddingmultimodalragsearchapi
DISCOVERED
31d ago
2026-03-11
PUBLISHED
32d ago
2026-03-11
RELEVANCE
9/ 10
AUTHOR
[REDACTED]