How Do Embeddings Work?

Sid Arora

Mar 18, 2026

14

min read

Product Management Concepts

H

Hello hello,

I am back with yet another edition of “Simplify AI.” Today, I want to talk about one technology that has become extremely popular and is sitting underneath almost every useful AI feature today.

I am talking about Embeddings. Unfortunately, a lot of people still don’t still understand what embeddings are, how they work, and how AI PMs should think about them.

Here is my effort to simplify and answer all of the above questions.

What Are Embeddings

Think of embeddings as GPS coordinates for meaning.

A physical location can be represented by two numbers: latitude and longitude.

An embedding does the same thing for text, except instead of two numbers, it uses 1,536 (or more). Each sentence, paragraph, or document gets converted into a list of numbers that captures what it means, the actual concept behind the words. These two sentences produce nearly identical number patterns (embeddings):

“What’s the tuition fee?”
“How much does it cost?”

Different words. Same meaning. Same neighborhood in “meaning-space.” Meanwhile, “When does it start?” produces a completely different set of numbers because the intent is different, even though it shares common words like “does” and “it.” Traditional software compares strings of characters (or keywords).

Embeddings compare ideas and intent. That single difference changes how you build search, recommendations, and every AI-powered feature in your product.

💡 Pro Tip: When someone on your team says, “we’ll just use keyword search,” ask them what happens when a user types “cheap flights” but your database says “budget airfare.” If the answer is “nothing,” you need embeddings.

Why Embeddings Are the Foundation

Most AI concepts matter for one thing. Embeddings matter for almost everything.

Search, recommendations, chatbots that answer questions, spotting duplicate content, and sorting support tickets into categories. They all start the same way: converting something into an embedding (a list of numbers representing its meaning), then comparing it to other embeddings to find what’s most similar.

Spotify uses embeddings to recommend songs. Airbnb uses them to recommend listings.

Customer support bots use them to find the right help article. Different products, different industries, same core idea underneath. That’s what makes embeddings worth understanding more deeply than other AI concepts. They’re the shared foundation.

When your team builds an embedding system for search, that same system (and often the same embeddings) can power recommendations, grouping similar items together, and personalizing experiences later. One investment, multiple features.

The flip side matters too. If your embeddings are low quality, every feature built on top inherits the problem. Your search returns irrelevant results. Your chatbot gives wrong answers because it pulled the wrong reference material.

Your recommendations feel random.

Three different teams file three different bug reports, but the root cause is the same: the embeddings.

For PMs, that makes the embedding system one of the most important technical decisions on an AI product. You don’t need to pick the model yourself, but you need to understand the trade-offs well enough to ask the right questions when your engineering team proposes one. (More on those trade-offs later.)

How Semantic Search Outperforms Keyword Matching

Keyword search works when your users guess the exact words your system expects. Semantic search (meaning-based search powered by embeddings) works when they don’t. A user emails a course platform asking, “What’s the tuition fee?”

The knowledge base contains: “The price is $3,499.” Keyword search returns nothing because “tuition” and “price” share zero characters. Embedding-based search returns the right answer immediately, with a similarity score of 0.94 out of 1.0 (where 1.0 means a perfect match in meaning).

Same question, same knowledge base, completely different result depending on how you built the search. Amazon Prime Video ran into this with sports content.

Someone searching for “soccer live” got documentaries like “This is Football: Season 1” and “Ronaldo vs Messi: Face Off!” instead of actual live matches.

The words overlapped enough for the keyword search to feel confident, but the intent was completely wrong. After deploying an embedding-based semantic search with a custom text-embedding model, Prime Video saw measurable improvements in search conversion and a drop in irrelevant results. Users started searching more often because the system finally understood what they meant.

Juicebox, an AI recruiting platform, saw similar gains across 800 million candidate profiles. A recruiter searching for “machine learning engineer” would miss profiles describing “ML practitioner” or “deep learning specialist.”

After switching to embedding-based search, the system found 35% more relevant candidates for complex queries. Same profiles in the database. The only change was how the search understood the question.

If your product uses keyword search for anything user-facing, embedding-based search will improve results. Your users are already frustrated by missed matches. They just aren’t telling you. They’re bouncing.

How Embeddings Power RAG (The System Behind Most AI Chatbots)

RAG stands for Retrieval Augmented Generation. The name is a mouthful, but the idea is simple: before an AI writes a response, it first looks up the relevant information, then uses what it found to write an answer.

This is the system behind most useful AI features today, from customer support chatbots to internal knowledge assistants. Embeddings are the engine that makes the “look up” step work. The system has three steps.

Step 1: Retrieve.

When a user asks a question, the system converts that question into an embedding (a list of numbers representing its meaning). Then it compares those numbers against a library of pre-converted document embeddings to find the closest matches. Think of it like asking, “which documents mean the most similar thing to this question?”

The most similar documents rise to the top.

Step 2: Augment.

The top-matching documents get attached to the question before it’s sent to the AI. Instead of hoping the AI already knows the answer from its training, you’re handing it the specific, verified information it needs right now.

Step 3: Generate.

The AI writes a response based on those retrieved documents.

The quality of your embeddings directly determines whether step one finds the right documents. Get that right, and the AI has what it needs to give a great answer.

Get it wrong, and you get a confident, well-written, completely incorrect response because the AI was given the wrong reference material to work with. Think of it like handing a brilliant new hire a filing cabinet.

If the files are organized by meaning (embeddings), they’ll pull the right document every time. If the files are organized alphabetically by filename (keywords), they’ll miss half the relevant information.

💡 Pro Tip: When your AI chatbot gives bad answers, check what documents it was given before blaming the AI itself. In most underperforming systems, the AI generates perfectly fine responses. It’s just working with the wrong information. Run ten real user queries, look at which documents the system retrieves, and see if the right information even makes it to the AI.

‍

Real Companies, Real Results

Airbnb: Embedding Every Listing on the Platform

Airbnb’s engineering team published a paper at KDD 2018 (a top data science conference) describing how they built embeddings for their listings.

They trained on over 800 million search sessions across 4.5 million active listings. Each listing got its own set of numbers that captured its behavioral meaning: which travelers click on it, what they book afterward, and what they skip.

The Airbnb Engineering blog details the full implementation. Their “Similar Listings” carousel (the row of recommendations you see on a listing page) saw a significant increase in click-through rate during online testing, and more guests discovered the listing they ultimately booked through that carousel.

For a company processing millions of searches daily, that’s a meaningful revenue gain from a single change in how they organize information.

One clever detail: when a new listing was created with no interaction data, Airbnb generated a “cold start” embedding by averaging the numbers from the three closest existing listings with a similar type and price range. No user data needed. The system worked from day one.

Spotify: 16 Billion Discoveries Per Month

Spotify creates embeddings for everything: users, tracks, artists, playlists, and podcasts. Oskar Stål, VP of Personalization at Spotify, described these embeddings at Scale’s TransformX conference as “fingerprints for everything we have.”

Two teenage fans of Taylor Swift in the US will have similar user embeddings (similar lists of numbers). A 62-year-old jazz listener in the UK will have very different numbers. Same system, same math, completely different recommendations.

These embeddings power the recommendation engine behind 16 billion artist discoveries per month, meaning 16 billion times a month, a listener hears an artist they’ve never listened to before.

The system goes beyond matching songs by genre labels. It understands behavioral patterns, like the difference between what you listen to at 11 pm versus 7 am on a Monday, and uses those patterns to recommend the right music for the moment.

The Product Decisions Embeddings Force You to Make

Embeddings come with real trade-offs. Here are the ones you’ll face.

Which Embedding Model?

The market has exploded. OpenAI’s text-embedding-3-small is the most common starting point at $0.02 per million tokens (tokens are the chunks of text the model processes, roughly ¾ of a word each).

Open-source alternatives like E5-Mistral and BGE-M3 now rival commercial options in quality, with the added benefit of running on your own servers, where no data leaves your control. The core trade-off is speed-to-market versus data control.

Calling an API (sending your text to someone else’s server and getting back embeddings) gets you started in minutes. Running an open-source model on your own servers keeps everything private but adds setup work.

Your compliance team will have opinions here.

How Many Dimensions?

Remember the GPS analogy? A physical location needs 2 numbers. An embedding uses hundreds or thousands, and each number is called a “dimension.” More dimensions capture more nuance but cost more to store and search.

At 384 dimensions, you get fast, cheap results that miss subtle differences. At 1,536, you hit the sweet spot for most use cases. At 3,072, accuracy improves slightly, but storage and processing costs roughly double.

A customer support bot can probably work with fewer dimensions. A medical search tool, where pulling the wrong document could cause real harm, probably needs more.

Pre-Compute or Compute on the Fly?

Airbnb converts all its listing embeddings once a day in a scheduled batch job. Only the search query (what the user types) gets converted into an embedding in real-time. This is the standard approach, and for good reason.

Converting your entire knowledge base into embeddings once is cheap (pennies). Converting everything fresh on every single search query is expensive and slow.

Most content doesn’t change by the minute. A daily update works. Real-time conversion only makes sense if your data changes constantly, like a live stock trading dashboard or a breaking news feed.

What Similarity Threshold?

When the system compares two embeddings, it produces a score between 0 and 1.

A score of 0.9 means the meaning is nearly identical. A score of 0.3 means they’re unrelated. Most systems return the top 3 to 5 results regardless of score, but well-tuned systems add a minimum cutoff (typically 0.6 to 0.7) to filter out weak matches. This is a user experience decision disguised as a technical one.

Would you rather show users fewer, more accurate results, or more results with some irrelevant ones mixed in? There’s no universal right answer. It depends on what’s worse for your users: seeing nothing or seeing the wrong thing.

Key Takeaways

Embeddings convert meaning into numbers, letting AI systems find matches that keyword search misses entirely.
RAG (the system behind most AI chatbots) depends on retrieval quality. If your embeddings pull the wrong documents, even the best AI will generate wrong answers.
Real companies get real results with embeddings. Airbnb improved its Similar Listings carousel, Spotify powers 16 billion monthly artist discoveries, and Juicebox surfaced 35% more relevant candidates.
The model choice is a product decision. Data privacy, cost, and accuracy all trade off against each other.
Pre-compute everything you can. Converting your knowledge base into embeddings once saves money and keeps your product fast.

That is it for today.

Until next time,
Sid

P.S. Don’t forget to let me know if you enjoyed this post.

How I can help you:

Fundamentals of Product Management - learn the fundamentals that will set you apart from the crowd and accelerate your PM career.
Improve your communication: get access to 20 templates that will improve your written communication as a product manager by at least 10x.

Sid Arora

View Posts

View All

Mixture of Experts (MoE): Reason Behind Cheapest AI Models

12

min read

AI And LLM Observability: What is It?

10

min read

Your AI Agent Always Forgets. Here's Why

11

min read

How Do Embeddings Work?

14

min read

View All

Mixture of Experts (MoE): Reason Behind Cheapest AI Models

Mixture of Experts (MoE) is the architecture behind GPT-4, Gemini 1.5, and Mixtral. Here's a PM-level explanation of how MoE works and why it matters for your API budget.

Apr 9, 2026

12

min read

Spotify Is Using 6 AI Agents For Building Ad Campaigns. Here's why

Spotify's ad planning took 30 minutes and 20+ form fields. Here's how six AI agents cut it to 10 seconds, and what the architecture actually looks like.

Apr 8, 2026

12

min read

Grab Personalised Its Platform For Millions In Under 15 Seconds

Grab's personalisation ran on one-day-old data. Here's how they rebuilt it to react in under 15 seconds with no engineering required for each new use case.

Apr 7, 2026

10

min read

AI And LLM Observability: What is It?

How do you know if your AI product is quietly giving users wrong answers? Learn how LLM observability works: traces, spans, LLM-as-judge, and why a 200 OK status code tells you nothing about quality. (Remember to click on"show pictures")

Apr 6, 2026

10

min read

This is How Claude Changed the Vibe Coding Game

Claude launched "Channels" allowing vibe coders to continue coding without being on their systems. This is how it works.

Apr 5, 2026

10

min read

This is How Notion Won the Productivity Battle

Notion reached a $10 billion valuation on just $344 million raised. Here's how a free personal tier and an accidental template ecosystem became its growth engine.

Apr 4, 2026

11

min read

Twitter

Instagram

Newsletter

Join the Waitlist!

How Do Embeddings Work?

What Are Embeddings

Why Embeddings Are the Foundation

How Semantic Search Outperforms Keyword Matching

How Embeddings Power RAG (The System Behind Most AI Chatbots)

Liking this post? Get the next one in your inbox!

Real Companies, Real Results

Airbnb: Embedding Every Listing on the Platform

Spotify: 16 Billion Discoveries Per Month

The Product Decisions Embeddings Force You to Make

Which Embedding Model?

How Many Dimensions?

Pre-Compute or Compute on the Fly?

What Similarity Threshold?

Key Takeaways

How I can help you:

Sid Arora

More from

Product Management Concepts

Mixture of Experts (MoE): Reason Behind Cheapest AI Models

AI And LLM Observability: What is It?

Your AI Agent Always Forgets. Here's Why

How Do Embeddings Work?

Join Our Newsletter and Get the LatestPosts to Your Inbox

Featured

How Slack Went From A Failed Game To A $28B Behemoth (And What Can Product Managers Learn From It)

How to Build a Personal Brand as a Product Manager

From $4 Billion to Forgotten: The Rise and Fall of Clubhouse

Tags

Newsletter

Recent Posts

Mixture of Experts (MoE): Reason Behind Cheapest AI Models

Spotify Is Using 6 AI Agents For Building Ad Campaigns. Here's why

Grab Personalised Its Platform For Millions In Under 15 Seconds

AI And LLM Observability: What is It?

This is How Claude Changed the Vibe Coding Game

This is How Notion Won the Productivity Battle

JustAnotherPM

Categories

Featured Posts

How Slack Went From A Failed Game To A $28B Behemoth (And What Can Product Managers Learn From It)

How to Build a Personal Brand as a Product Manager

Newsletter

Join Our Newsletter and Get the Latest
Posts to Your Inbox