WeeBytes
Start for free
Embeddings: The Numbers Behind Meaning
IntermediateAI & MLNLPKnowledge

Embeddings: The Numbers Behind Meaning

How does an AI know that 'king' and 'queen' are related, or that 'Paris' is to 'France' as 'Tokyo' is to 'Japan'? It converts words into numbers — and the math is beautiful.

Everything in AI starts with numbers. Text, images, audio — it all gets converted into vectors (lists of numbers) before a model can process it. Embeddings are how we make that conversion meaningful.

An embedding is a high-dimensional vector representation where **similar things end up close together**. 'Dog' and 'puppy' will have similar vectors. 'Dog' and 'spreadsheet' will be far apart.

The magic: relationships are preserved in the math. The famous example:

`king − man + woman ≈ queen`

This works because the model learned gender and royalty as separate dimensions in its vector space during training on billions of text examples.

**Where embeddings are used:**

- Semantic search: find documents by meaning, not just keywords

- RAG systems: convert documents to vectors for retrieval

- Recommendation engines: find similar items

- Clustering: automatically group related content

- Anomaly detection: outliers in vector space are unusual

Modern embedding models (like OpenAI's text-embedding-3-large or Google's Gecko) produce vectors with 1,536 to 3,072 dimensions. Each dimension loosely captures some aspect of meaning — though they're not human-interpretable.

Vector databases (Pinecone, Weaviate, Qdrant) store millions of these vectors and do **approximate nearest-neighbor search** in milliseconds — finding the 10 most semantically similar items in a library of 10 million.

**Key takeaway:** Embeddings turn text into numbers that preserve meaning — the foundation of semantic search, RAG, and recommendation systems.

embeddingsvector-searchsemantic-searchnlpnatural-language-processing

Want more like this?

WeeBytes delivers 25 cards like this every day — personalised to your interests.

Start learning for free