Search by meaning, recommendations, deduplication, and the retrieval behind every AI assistant all rest on one quiet idea: embeddings. They are the foundation beneath vector databases and retrieval-augmented generation, and once they click, a lot of modern AI stops feeling like magic and starts making sense.

This primer explains what embeddings are, how they are made, and how to use them well.

What is an embedding?

An embedding is a list of numbers — a vector — that represents the meaning of a piece of content. A sentence, an image, or an audio clip goes in; a fixed-length array of numbers comes out, often with hundreds or thousands of entries.

The crucial property is this: content with similar meaning produces vectors that sit close together in that numerical space, and unrelated content lands far apart. "How do I reset my password?" and "I forgot my login" end up near each other even though they share almost no words. That is what makes searching by meaning, rather than by keyword, possible.

How embeddings are created

Embeddings come from a neural network — an embedding model — trained so that related inputs map to nearby points. During training the model sees enormous numbers of examples and gradually arranges its internal space so that meaning corresponds to geometry.

You do not train this yourself. You call an embedding model from a provider like OpenAI or Cohere, or run an open model from Hugging Face. Each input you send back comes as a vector you can store and compare.

Measuring closeness

Once content is embedded, "similar" becomes a math problem. The common ways to measure distance between two vectors:

  • Cosine similarity — compares the angle between vectors, ignoring length. The most widely used for text.
  • Dot product — factors in both direction and magnitude.
  • Euclidean distance — straight-line distance between the two points.

The metric you use must match how the embedding model was trained — using the wrong one quietly degrades your results.

What you can build with embeddings

The same representation powers a surprising range of features:

  • Semantic search — find documents by meaning instead of exact keywords.
  • Retrieval for RAG — fetch the right context to ground an AI answer. See our RAG guide.
  • Recommendations — surface items similar to what a user liked.
  • Clustering — group related content automatically, with no predefined categories.
  • Classification — label content by its position in the space.
  • Deduplication — catch near-duplicates that are worded differently.

At any real scale, you store these vectors in a vector database so similarity search stays fast across millions or billions of items.

Choosing an embedding model

Not all embedding models are interchangeable. Weigh:

  • Dimensions. More dimensions can capture more nuance but cost more to store and search. Many modern models let you trade dimensions against precision.
  • Max input length. How much text fits in a single embedding before you must split it.
  • Domain fit. A model trained on general web text may underperform on legal, medical, or code content. Specialized models exist.
  • Multilingual support. If you work across languages, pick a model built for it.
  • Cost and hosting. Proprietary APIs are simple but metered; open models can be self-hosted for control and scale economics.

Practical tips that prevent pain

  • Use the same model for queries and documents. Vectors from different models are not comparable. Mixing them silently breaks search.
  • Chunk thoughtfully. Long documents must be split before embedding. Too large and meaning blurs; too small and context is lost.
  • Re-embed when you switch models. Upgrading your embedding model means regenerating every stored vector — plan for it.
  • Normalize when required. Some setups expect unit-length vectors; check what your metric and database assume.

Frequently asked questions

Are embeddings the same as the AI model itself? No. An embedding model produces vectors that represent meaning. A generative model produces text. RAG uses both: an embedding model to retrieve, a generative model to answer.

How many dimensions should an embedding have? It depends on the model, but more is not automatically better — it raises storage and search cost. Many teams find mid-range dimensions hit the best balance of quality and efficiency.

Can I embed images and audio too? Yes. Multimodal embedding models map images, audio, and text into compatible spaces, enabling things like searching images with a text query.

Do I need a vector database to use embeddings? For a handful of items, no — you can compare them directly. Beyond that, a vector database is what keeps similarity search fast and scalable.

Go deeper

Embedding providers and the infrastructure around them — OpenAI, Cohere, Hugging Face, Pinecone — are all in the ProductListo directory. Next, see embeddings in action in our guides to vector databases and retrieval-augmented generation.

Building an embeddings or search tool we should feature? Submit it to ProductListo.