Embedding model
An embedding model converts text, images, or other inputs into fixed-length numeric vectors that capture semantic meaning. Embeddings power retrieval, classification, and similarity search.
What are embedding models?
An embedding model maps input data — typically text, but increasingly images, code, or multimodal — to fixed-length numeric vectors in a high-dimensional space. The geometry of that space is meaningful: semantically similar inputs land near each other. Vectors are the universal substrate for retrieval, classification, clustering, deduplication, and any similarity-based search task.
Modern production embedding models (OpenAI text-embedding-3-large, Cohere Embed v3, Voyage AI, open-source variants like BGE and E5) emit 256-3072 dimension vectors. Different models tradeoff retrieval quality, multilingual support, vector dimensionality (affects storage cost), and price.
For RAG and search
Embedding models are the retrieval half of RAG. Quality matters: a weak embedding model retrieves irrelevant documents, which a downstream LLM then politely tries to reason over. Standardized benchmarks (MTEB on Hugging Face, BEIR) compare retrieval models on common tasks. For a given language and domain, the gap between best-in-class and median can be 20+ percentage points on recall@10.
Procurement
If you're embedding sensitive data: is the embedding API in scope of the vendor's compliance posture (most are — verify), can the embedding model be self-hosted for high-residency deployments, what is the API cost at your expected volume, and what is the migration path if you need to change embedding models (vector dimensionality differs across models, so a switch typically requires re-embedding everything).