HyDE RAG
Improve retrieval with Hypothetical Document Embeddings for better semantic matching
HyDE RAG
TL;DR
Queries are short questions; documents are detailed paragraphs. This semantic gap hurts retrieval. HyDE (Hypothetical Document Embeddings) fixes this by first generating what a good answer would look like, then searching for documents similar to that hypothetical answer. It's like asking "find documents that look like this" instead of "find documents about this question."
| Property | Value |
|---|---|
| Difficulty | Intermediate |
| Time | ~4 hours |
| Code Size | ~300 LOC |
| Prerequisites | Intelligent Document Q&A |
Tech Stack
| Technology | Purpose |
|---|---|
| LangChain | RAG orchestration |
| OpenAI | GPT-4 + Embeddings |
| ChromaDB | Vector database |
| FastAPI | REST API |
Prerequisites
- Completed Intelligent Document Q&A tutorial
- Python 3.10+
- OpenAI API key (Get one here)
What You'll Learn
- Understand the HyDE (Hypothetical Document Embeddings) technique
- Generate hypothetical answers to improve retrieval
- Compare HyDE vs traditional query embedding
- Implement multi-hypothesis generation for robustness
- Measure retrieval improvements with HyDE
The Insight Behind HyDE
Traditional RAG embeds the query and searches for similar documents. But queries and documents are fundamentally different:
- Queries: Short, question-form, incomplete
- Documents: Long, declarative, information-rich
┌────────────────────────────────────────────────────────────────────────┐
│ TRADITIONAL RETRIEVAL │
│ │
│ "What causes rain?" ──► Embed Query ──► Search ──► May miss relevant │
│ (short) (question (mismatch) documents ⚠️ │
│ space) │
│ │
│ Problem: Query "What causes rain?" may not match │
│ document "Precipitation occurs when water vapor..." │
└────────────────────────────────────────────────────────────────────────┘HyDE's insight: Instead of searching with the query, generate what a good answer would look like, then search for documents similar to that hypothetical answer.
┌────────────────────────────────────────────────────────────────────────┐
│ HYDE RETRIEVAL │
│ │
│ "What causes rain?" ──► LLM ──► "Rain forms when water ──► Embed │
│ (query) vapor condenses in the Hypothesis│
│ atmosphere..." │
│ (hypothetical document) │ │
│ ▼ │
│ Search │
│ │ │
│ ▼ │
│ Finds: "Precipitation occurs when water │
│ vapor condenses..." ✓ │
│ │
│ Solution: Hypothesis is semantically closer to real documents! │
└────────────────────────────────────────────────────────────────────────┘The hypothetical document is semantically closer to real documents than the query is.
Project Structure
hyde-rag/
├── config.py # Configuration
├── hypothesis_generator.py # Generate hypothetical docs
├── retriever.py # HyDE retrieval
├── rag_pipeline.py # Full pipeline
├── app.py # FastAPI application
├── evaluate.py # Compare HyDE vs traditional
└── requirements.txtStep 1: Configuration
# config.py
from pydantic_settings import BaseSettings
from functools import lru_cache
class Settings(BaseSettings):
"""Application configuration."""
openai_api_key: str
# Model settings
embedding_model: str = "text-embedding-3-small"
llm_model: str = "gpt-4o-mini"
# HyDE settings
num_hypotheses: int = 1 # Generate multiple for robustness
hypothesis_max_tokens: int = 256
# Retrieval settings
top_k: int = 5
# ChromaDB
chroma_persist_dir: str = "./chroma_db"
collection_name: str = "hyde_docs"
class Config:
env_file = ".env"
@lru_cache
def get_settings() -> Settings:
return Settings()Step 2: Hypothesis Generator
The core of HyDE: generating hypothetical documents that would answer the query.
# hypothesis_generator.py
from openai import OpenAI
from pydantic import BaseModel
from config import get_settings
class Hypothesis(BaseModel):
"""A generated hypothetical document."""
content: str
query: str
class HypothesisGenerator:
"""Generates hypothetical documents for queries."""
def __init__(self):
settings = get_settings()
self.client = OpenAI(api_key=settings.openai_api_key)
self.model = settings.llm_model
self.max_tokens = settings.hypothesis_max_tokens
def generate(
self,
query: str,
num_hypotheses: int = 1,
domain: str | None = None
) -> list[Hypothesis]:
"""
Generate hypothetical documents that would answer the query.
Args:
query: The user's question
num_hypotheses: Number of hypotheses to generate
domain: Optional domain context (e.g., "medical", "legal")
Returns:
List of hypothetical documents
"""
system_prompt = self._build_system_prompt(domain)
hypotheses = []
for i in range(num_hypotheses):
# Add variation prompt for multiple hypotheses
user_prompt = f"Question: {query}"
if num_hypotheses > 1:
user_prompt += f"\n\nGenerate hypothesis #{i+1} with a different perspective or focus."
response = self.client.chat.completions.create(
model=self.model,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
],
max_tokens=self.max_tokens,
temperature=0.7 if num_hypotheses > 1 else 0.3
)
hypotheses.append(Hypothesis(
content=response.choices[0].message.content,
query=query
))
return hypotheses
def _build_system_prompt(self, domain: str | None) -> str:
"""Build system prompt for hypothesis generation."""
base_prompt = """You are a document generator. Given a question, write a
short passage that would directly answer that question.
Guidelines:
1. Write in a declarative, informative style (like a textbook or encyclopedia)
2. Include specific facts, terms, and concepts related to the question
3. Be detailed but concise (1-2 paragraphs)
4. Do NOT write "The answer is..." or similar phrases
5. Write as if this is an excerpt from an authoritative document
Your output should read like a passage someone might find when researching this topic."""
if domain:
domain_contexts = {
"medical": "Write in the style of a medical textbook or clinical guideline. Use proper medical terminology.",
"legal": "Write in the style of a legal document or case law. Use proper legal terminology and cite relevant principles.",
"technical": "Write in the style of technical documentation. Be precise and include implementation details.",
"academic": "Write in the style of an academic paper. Be rigorous and cite theoretical foundations.",
"business": "Write in the style of a business report. Focus on practical implications and metrics."
}
base_prompt += f"\n\n{domain_contexts.get(domain, '')}"
return base_prompt
def generate_with_perspectives(
self,
query: str,
perspectives: list[str]
) -> list[Hypothesis]:
"""Generate hypotheses from multiple explicit perspectives."""
hypotheses = []
for perspective in perspectives:
system_prompt = f"""You are a document generator writing from a {perspective} perspective.
Given a question, write a short passage answering it from this viewpoint.
Write in a declarative, informative style (like a textbook excerpt)."""
response = self.client.chat.completions.create(
model=self.model,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": f"Question: {query}"}
],
max_tokens=self.max_tokens
)
hypotheses.append(Hypothesis(
content=response.choices[0].message.content,
query=query
))
return hypothesesUnderstanding the Hypothesis Generation Process:
Query: "What causes rain?"
│
▼
┌─────────────────────────────────────────────────────────────┐
│ LLM generates a hypothetical answer (NOT the final answer!) │
│ │
│ "Precipitation occurs when water vapor in the atmosphere │
│ condenses around particles of dust or pollen. As these │
│ droplets combine and grow heavy, gravity pulls them down │
│ as rain. Temperature and humidity play crucial roles..." │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ This hypothesis is ONLY used for retrieval, not as answer. │
│ It contains key terms: "precipitation", "water vapor", │
│ "condenses", "atmosphere" - words likely in real documents. │
└─────────────────────────────────────────────────────────────┘Why Multiple Hypotheses?
| Setting | Use Case | Trade-off |
|---|---|---|
num_hypotheses=1 | Simple queries, low latency | May miss diverse docs |
num_hypotheses=3 | Complex topics | Better coverage, 3x LLM calls |
| Higher temperature (0.7) | Multiple hypotheses | Creates diverse perspectives |
| Lower temperature (0.3) | Single hypothesis | More focused, consistent |
Domain-Specific Prompts:
The domain parameter tailors the hypothesis style:
- Medical: Uses clinical terminology, structured like a textbook
- Legal: References principles and precedents
- Technical: Includes implementation details
This helps the hypothesis match the vocabulary of your actual documents.
Step 3: HyDE Retriever
# retriever.py
import chromadb
from chromadb.utils import embedding_functions
from openai import OpenAI
import numpy as np
from pydantic import BaseModel
from config import get_settings
from hypothesis_generator import HypothesisGenerator, Hypothesis
class RetrievedDocument(BaseModel):
"""A retrieved document with metadata."""
content: str
source: str
distance: float
retrieval_method: str # "hyde" or "traditional"
class HyDERetriever:
"""Retriever using Hypothetical Document Embeddings."""
def __init__(self):
settings = get_settings()
self.client = OpenAI(api_key=settings.openai_api_key)
self.embedding_model = settings.embedding_model
# ChromaDB setup
self.chroma_client = chromadb.PersistentClient(
path=settings.chroma_persist_dir
)
self.embedding_fn = embedding_functions.OpenAIEmbeddingFunction(
api_key=settings.openai_api_key,
model_name=settings.embedding_model
)
self.collection = self.chroma_client.get_or_create_collection(
name=settings.collection_name,
embedding_function=self.embedding_fn,
metadata={"hnsw:space": "cosine"}
)
self.hypothesis_generator = HypothesisGenerator()
self.settings = settings
def add_documents(
self,
documents: list[str],
sources: list[str],
ids: list[str] | None = None
):
"""Add documents to the collection."""
if ids is None:
ids = [f"doc_{i}" for i in range(len(documents))]
self.collection.add(
documents=documents,
ids=ids,
metadatas=[{"source": src} for src in sources]
)
def retrieve_hyde(
self,
query: str,
k: int | None = None,
num_hypotheses: int = 1,
domain: str | None = None
) -> tuple[list[RetrievedDocument], list[Hypothesis]]:
"""
Retrieve documents using HyDE.
Args:
query: User query
k: Number of documents to retrieve
num_hypotheses: Number of hypothetical docs to generate
domain: Optional domain for hypothesis generation
Returns:
Tuple of (retrieved documents, generated hypotheses)
"""
k = k or self.settings.top_k
# Step 1: Generate hypothetical documents
hypotheses = self.hypothesis_generator.generate(
query=query,
num_hypotheses=num_hypotheses,
domain=domain
)
# Step 2: Get embeddings for hypotheses
hypothesis_texts = [h.content for h in hypotheses]
if num_hypotheses == 1:
# Single hypothesis: use its embedding directly
search_text = hypothesis_texts[0]
results = self.collection.query(
query_texts=[search_text],
n_results=k,
include=["documents", "metadatas", "distances"]
)
else:
# Multiple hypotheses: average their embeddings
response = self.client.embeddings.create(
model=self.embedding_model,
input=hypothesis_texts
)
embeddings = [e.embedding for e in response.data]
avg_embedding = np.mean(embeddings, axis=0).tolist()
results = self.collection.query(
query_embeddings=[avg_embedding],
n_results=k,
include=["documents", "metadatas", "distances"]
)
# Step 3: Format results
documents = []
for i in range(len(results["documents"][0])):
doc = RetrievedDocument(
content=results["documents"][0][i],
source=results["metadatas"][0][i].get("source", "unknown"),
distance=results["distances"][0][i],
retrieval_method="hyde"
)
documents.append(doc)
return documents, hypotheses
def retrieve_traditional(
self,
query: str,
k: int | None = None
) -> list[RetrievedDocument]:
"""Traditional retrieval using query embedding."""
k = k or self.settings.top_k
results = self.collection.query(
query_texts=[query],
n_results=k,
include=["documents", "metadatas", "distances"]
)
documents = []
for i in range(len(results["documents"][0])):
doc = RetrievedDocument(
content=results["documents"][0][i],
source=results["metadatas"][0][i].get("source", "unknown"),
distance=results["distances"][0][i],
retrieval_method="traditional"
)
documents.append(doc)
return documents
def retrieve_hybrid(
self,
query: str,
k: int | None = None,
hyde_weight: float = 0.7
) -> list[RetrievedDocument]:
"""
Hybrid retrieval combining HyDE and traditional.
Uses reciprocal rank fusion to combine results.
"""
k = k or self.settings.top_k
# Get both result sets
hyde_docs, _ = self.retrieve_hyde(query, k=k*2)
trad_docs = self.retrieve_traditional(query, k=k*2)
# Reciprocal Rank Fusion
scores: dict[str, float] = {}
for rank, doc in enumerate(hyde_docs):
key = doc.content[:100] # Use content prefix as key
scores[key] = scores.get(key, 0) + hyde_weight / (rank + 60)
for rank, doc in enumerate(trad_docs):
key = doc.content[:100]
scores[key] = scores.get(key, 0) + (1 - hyde_weight) / (rank + 60)
# Create combined document list
all_docs = {d.content[:100]: d for d in hyde_docs + trad_docs}
# Sort by fused score
sorted_keys = sorted(scores.keys(), key=lambda x: scores[x], reverse=True)
result = []
for key in sorted_keys[:k]:
doc = all_docs[key]
doc.retrieval_method = "hybrid"
result.append(doc)
return resultUnderstanding the Three Retrieval Strategies:
┌─────────────────────────────────────────────────────────────┐
│ METHOD 1: Traditional │
│ │
│ Query: "What causes rain?" ──► Embed ──► Search ──► Docs │
│ │
│ Problem: Query embedding is far from document embeddings │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ METHOD 2: HyDE (Single Hypothesis) │
│ │
│ Query ──► LLM ──► Hypothesis ──► Embed ──► Search ──► Docs │
│ │
│ Benefit: Hypothesis embedding is close to document style │
└─────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────┐
│ METHOD 3: HyDE (Multiple Hypotheses) │
│ │
│ Query ──► LLM ──► [Hyp1, Hyp2, Hyp3] │
│ │ │
│ ▼ │
│ Average Embeddings ──► Search ──► Docs │
│ │
│ Benefit: Averaging reduces bias from any single hypothesis │
└─────────────────────────────────────────────────────────────┘Why Average Multiple Hypothesis Embeddings?
| Hypothesis 1 | Hypothesis 2 | Hypothesis 3 |
|---|---|---|
| Focus: mechanism | Focus: geography | Focus: seasons |
| "Water evaporates..." | "Tropical regions receive..." | "Monsoon patterns..." |
Averaging creates a more robust query embedding that captures multiple aspects of the question, reducing the risk of a biased or narrow hypothesis.
The Hybrid Approach:
hyde_weight = 0.7 # Trust HyDE more
# RRF combines both:
# - HyDE finds semantically similar docs
# - Traditional catches exact keyword matches
# - Fusion gives best of both worldsStep 4: RAG Pipeline
# rag_pipeline.py
from openai import OpenAI
from pydantic import BaseModel
from config import get_settings
from retriever import HyDERetriever, RetrievedDocument
from hypothesis_generator import Hypothesis
class RAGResponse(BaseModel):
"""Response from the RAG pipeline."""
answer: str
sources: list[str]
hypotheses: list[str]
retrieval_method: str
documents_used: int
class HyDERAG:
"""RAG pipeline with HyDE retrieval."""
def __init__(self):
settings = get_settings()
self.client = OpenAI(api_key=settings.openai_api_key)
self.model = settings.llm_model
self.retriever = HyDERetriever()
def query(
self,
question: str,
method: str = "hyde",
num_hypotheses: int = 1,
domain: str | None = None
) -> RAGResponse:
"""
Answer a question using HyDE RAG.
Args:
question: User question
method: "hyde", "traditional", or "hybrid"
num_hypotheses: Number of hypotheses for HyDE
domain: Optional domain context
"""
hypotheses: list[Hypothesis] = []
# Retrieve documents
if method == "hyde":
documents, hypotheses = self.retriever.retrieve_hyde(
query=question,
num_hypotheses=num_hypotheses,
domain=domain
)
elif method == "traditional":
documents = self.retriever.retrieve_traditional(question)
elif method == "hybrid":
documents = self.retriever.retrieve_hybrid(question)
else:
raise ValueError(f"Unknown method: {method}")
# Generate answer
answer = self._generate_answer(question, documents)
return RAGResponse(
answer=answer,
sources=list(set(d.source for d in documents)),
hypotheses=[h.content for h in hypotheses],
retrieval_method=method,
documents_used=len(documents)
)
def _generate_answer(
self,
question: str,
documents: list[RetrievedDocument]
) -> str:
"""Generate answer from retrieved documents."""
context = "\n\n---\n\n".join([
f"Source: {d.source}\n{d.content}"
for d in documents
])
system_prompt = """Answer the question based only on the provided context.
If the context doesn't contain enough information, say so.
Be concise and cite sources when possible."""
response = self.client.chat.completions.create(
model=self.model,
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
]
)
return response.choices[0].message.content
def add_documents(self, documents: list[str], sources: list[str]):
"""Add documents to the knowledge base."""
self.retriever.add_documents(documents, sources)Step 5: Evaluation
Compare HyDE vs traditional retrieval to measure improvements.
# evaluate.py
from pydantic import BaseModel
from retriever import HyDERetriever
class RetrievalMetrics(BaseModel):
"""Metrics for retrieval evaluation."""
method: str
query: str
avg_distance: float
min_distance: float
relevant_in_top_k: int # Based on ground truth
class HyDEEvaluator:
"""Evaluate HyDE vs traditional retrieval."""
def __init__(self):
self.retriever = HyDERetriever()
def compare_methods(
self,
query: str,
relevant_doc_ids: list[str] | None = None,
k: int = 5
) -> dict:
"""Compare HyDE and traditional retrieval for a query."""
# HyDE retrieval
hyde_docs, hypotheses = self.retriever.retrieve_hyde(query, k=k)
# Traditional retrieval
trad_docs = self.retriever.retrieve_traditional(query, k=k)
results = {
"query": query,
"hypothesis": hypotheses[0].content if hypotheses else None,
"hyde": {
"avg_distance": sum(d.distance for d in hyde_docs) / len(hyde_docs),
"min_distance": min(d.distance for d in hyde_docs),
"documents": [d.content[:100] + "..." for d in hyde_docs]
},
"traditional": {
"avg_distance": sum(d.distance for d in trad_docs) / len(trad_docs),
"min_distance": min(d.distance for d in trad_docs),
"documents": [d.content[:100] + "..." for d in trad_docs]
},
"hyde_improvement": {
"avg_distance_reduction": (
(sum(d.distance for d in trad_docs) / len(trad_docs)) -
(sum(d.distance for d in hyde_docs) / len(hyde_docs))
),
"min_distance_reduction": (
min(d.distance for d in trad_docs) -
min(d.distance for d in hyde_docs)
)
}
}
return results
def batch_evaluate(
self,
queries: list[str],
k: int = 5
) -> dict:
"""Evaluate multiple queries and aggregate results."""
all_results = []
hyde_wins = 0
trad_wins = 0
for query in queries:
result = self.compare_methods(query, k=k)
all_results.append(result)
if result["hyde"]["avg_distance"] < result["traditional"]["avg_distance"]:
hyde_wins += 1
else:
trad_wins += 1
return {
"individual_results": all_results,
"summary": {
"total_queries": len(queries),
"hyde_wins": hyde_wins,
"traditional_wins": trad_wins,
"hyde_win_rate": hyde_wins / len(queries),
"avg_distance_improvement": sum(
r["hyde_improvement"]["avg_distance_reduction"]
for r in all_results
) / len(all_results)
}
}Step 6: FastAPI Application
# app.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from rag_pipeline import HyDERAG, RAGResponse
from evaluate import HyDEEvaluator
from contextlib import asynccontextmanager
from typing import Literal
# Globals
hyde_rag: HyDERAG | None = None
evaluator: HyDEEvaluator | None = None
@asynccontextmanager
async def lifespan(app: FastAPI):
global hyde_rag, evaluator
hyde_rag = HyDERAG()
evaluator = HyDEEvaluator()
# Add sample documents
sample_docs = [
"The water cycle, also known as the hydrological cycle, describes the continuous movement of water on Earth. Water evaporates from oceans and lakes, rises as vapor, condenses into clouds, and falls as precipitation (rain, snow, sleet). This cycle is driven by solar energy and gravity.",
"Photosynthesis is the process by which plants convert light energy into chemical energy. Using chlorophyll in their leaves, plants absorb sunlight and use it to transform carbon dioxide and water into glucose and oxygen. This process is essential for life on Earth.",
"Machine learning algorithms learn patterns from data to make predictions or decisions. Supervised learning uses labeled examples, unsupervised learning finds hidden patterns, and reinforcement learning optimizes through trial and error. Deep learning uses neural networks with many layers.",
"The immune system protects the body from pathogens including bacteria, viruses, and parasites. It consists of two main parts: innate immunity (immediate, non-specific response) and adaptive immunity (slower but targeted, with memory). White blood cells play a crucial role in both.",
"Quantum computing uses quantum bits (qubits) that can exist in superposition, representing 0 and 1 simultaneously. This enables quantum computers to solve certain problems exponentially faster than classical computers, particularly in cryptography, optimization, and simulation."
]
sources = ["water_cycle", "photosynthesis", "ml_basics", "immune_system", "quantum_computing"]
hyde_rag.add_documents(sample_docs, sources)
yield
hyde_rag = None
evaluator = None
app = FastAPI(
title="HyDE RAG API",
description="Retrieval-Augmented Generation with Hypothetical Document Embeddings",
lifespan=lifespan
)
class QueryRequest(BaseModel):
query: str
method: Literal["hyde", "traditional", "hybrid"] = "hyde"
num_hypotheses: int = 1
domain: str | None = None
class DocumentsRequest(BaseModel):
documents: list[str]
sources: list[str]
class CompareRequest(BaseModel):
query: str
k: int = 5
@app.post("/query", response_model=RAGResponse)
async def query(request: QueryRequest):
"""Query the HyDE RAG system."""
if not hyde_rag:
raise HTTPException(status_code=503, detail="Service not initialized")
result = hyde_rag.query(
question=request.query,
method=request.method,
num_hypotheses=request.num_hypotheses,
domain=request.domain
)
return result
@app.post("/compare")
async def compare_methods(request: CompareRequest):
"""Compare HyDE vs traditional retrieval for a query."""
if not evaluator:
raise HTTPException(status_code=503, detail="Service not initialized")
result = evaluator.compare_methods(
query=request.query,
k=request.k
)
return result
@app.post("/documents")
async def add_documents(request: DocumentsRequest):
"""Add documents to the knowledge base."""
if not hyde_rag:
raise HTTPException(status_code=503, detail="Service not initialized")
if len(request.documents) != len(request.sources):
raise HTTPException(
status_code=400,
detail="Documents and sources must have same length"
)
hyde_rag.add_documents(request.documents, request.sources)
return {"status": "success", "documents_added": len(request.documents)}
@app.get("/health")
async def health():
return {"status": "healthy", "service": "hyde-rag"}Step 7: Requirements
# requirements.txt
openai>=1.12.0
chromadb>=0.4.22
numpy>=1.24.0
pydantic>=2.0.0
pydantic-settings>=2.0.0
fastapi>=0.109.0
uvicorn>=0.27.0
python-dotenv>=1.0.0Usage Examples
Basic HyDE Query
from rag_pipeline import HyDERAG
# Initialize
rag = HyDERAG()
# Add documents
rag.add_documents(
documents=["Your document content here..."],
sources=["source_name"]
)
# Query with HyDE
result = rag.query(
question="How does rain form?",
method="hyde"
)
print(f"Answer: {result.answer}")
print(f"Hypothesis used: {result.hypotheses[0]}")Compare Methods
from evaluate import HyDEEvaluator
evaluator = HyDEEvaluator()
# Compare for a single query
comparison = evaluator.compare_methods(
query="What is machine learning?",
k=5
)
print(f"HyDE avg distance: {comparison['hyde']['avg_distance']:.4f}")
print(f"Traditional avg distance: {comparison['traditional']['avg_distance']:.4f}")
print(f"Improvement: {comparison['hyde_improvement']['avg_distance_reduction']:.4f}")Multi-Hypothesis HyDE
# Generate multiple hypotheses for robustness
result = rag.query(
question="Explain quantum computing",
method="hyde",
num_hypotheses=3 # Average embeddings from 3 hypotheses
)API Usage
# Start server
uvicorn app:app --reload
# HyDE query
curl -X POST http://localhost:8000/query \
-H "Content-Type: application/json" \
-d '{"query": "How do plants make food?", "method": "hyde"}'
# Compare methods
curl -X POST http://localhost:8000/compare \
-H "Content-Type: application/json" \
-d '{"query": "What is the immune system?", "k": 5}'How HyDE Works
┌─────────────────────────────────────────────────────────────────┐
│ HYDE PIPELINE │
│ │
│ User Query │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ LLM Generation │ ← Generate what a good answer looks like │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────────────┐ │
│ │ Hypothetical Document │ ← Key insight: document-like text │
│ │ "The answer would be..."│ │
│ └────────┬────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ Embed Hypothesis│ ← Now in document embedding space │
│ └────────┬────────┘ │
│ │ │
│ ▼ │
│ ┌─────────┐ │
│ │Vector DB│ │
│ └────┬────┘ │
│ │ │
│ ▼ │
│ Retrieved Documents ──► Generate Final Answer │
└─────────────────────────────────────────────────────────────────┘Why It Works
| Aspect | Query Embedding | HyDE Embedding |
|---|---|---|
| Format | Question | Document-like |
| Length | Short | Paragraph |
| Style | Interrogative | Declarative |
| Semantics | What user wants | What answer looks like |
The hypothesis bridges the semantic gap between questions and documents.
When to Use HyDE
| Scenario | HyDE Benefit |
|---|---|
| Short queries | Expands to full context |
| Domain-specific | Generates domain terminology |
| Abstract questions | Creates concrete examples |
| Keyword mismatch | Bridges vocabulary gap |
When Traditional May Be Better
- Exact keyword matching needed
- Very specific technical queries
- Latency-critical applications (HyDE adds LLM call)
Key Concepts Recap
| Concept | What It Is | Why It Matters |
|---|---|---|
| Semantic Gap | Queries ≠ documents in style and length | Embeddings aren't optimally similar |
| Hypothetical Document | LLM-generated answer used for retrieval | Bridges the query-document gap |
| Embedding Averaging | Mean of multiple hypothesis embeddings | Reduces bias, improves robustness |
| Domain Prompting | Tailored generation for specific fields | Matches your document vocabulary |
| Hybrid Retrieval | Combine HyDE + traditional with RRF | Get semantic + keyword matching |
| Latency Trade-off | HyDE adds one LLM call before search | Better results, ~500ms extra |
Next Steps
- Explore Self-RAG for self-correction
- Try Hybrid Search for keyword + semantic
- Learn RAG with Reranking for better ranking
- Build Agentic RAG for multi-step retrieval