Improve retrieval with Hypothetical Document Embeddings for better semantic matching

HyDE RAG

TL;DR

Queries are short questions; documents are detailed paragraphs. This semantic gap hurts retrieval. HyDE (Hypothetical Document Embeddings) fixes this by first generating what a good answer would look like, then searching for documents similar to that hypothetical answer. It's like asking "find documents that look like this" instead of "find documents about this question."

Property	Value
Difficulty	Intermediate
Time	~4 hours
Code Size	~300 LOC
Prerequisites	Intelligent Document Q&A

Tech Stack

Technology	Purpose
LangChain	RAG orchestration
OpenAI	GPT-4 + Embeddings
ChromaDB	Vector database
FastAPI	REST API

Prerequisites

Completed Intelligent Document Q&A tutorial
Python 3.10+
OpenAI API key (Get one here)

What You'll Learn

Understand the HyDE (Hypothetical Document Embeddings) technique
Generate hypothetical answers to improve retrieval
Compare HyDE vs traditional query embedding
Implement multi-hypothesis generation for robustness
Measure retrieval improvements with HyDE

The Insight Behind HyDE

Traditional RAG embeds the query and searches for similar documents. But queries and documents are fundamentally different:

Queries: Short, question-form, incomplete
Documents: Long, declarative, information-rich

┌────────────────────────────────────────────────────────────────────────┐
│                        TRADITIONAL RETRIEVAL                           │
│                                                                        │
│  "What causes rain?" ──► Embed Query ──► Search ──► May miss relevant │
│        (short)           (question        (mismatch)    documents ⚠️   │
│                           space)                                       │
│                                                                        │
│  Problem: Query "What causes rain?" may not match                     │
│           document "Precipitation occurs when water vapor..."          │
└────────────────────────────────────────────────────────────────────────┘

HyDE's insight: Instead of searching with the query, generate what a good answer would look like, then search for documents similar to that hypothetical answer.

┌────────────────────────────────────────────────────────────────────────┐
│                           HYDE RETRIEVAL                               │
│                                                                        │
│  "What causes rain?" ──► LLM ──► "Rain forms when water   ──► Embed   │
│        (query)                    vapor condenses in the      Hypothesis│
│                                   atmosphere..."                       │
│                                   (hypothetical document)       │      │
│                                                                 ▼      │
│                                                              Search    │
│                                                                 │      │
│                                                                 ▼      │
│                          Finds: "Precipitation occurs when water       │
│                                  vapor condenses..." ✓                 │
│                                                                        │
│  Solution: Hypothesis is semantically closer to real documents!        │
└────────────────────────────────────────────────────────────────────────┘

The hypothetical document is semantically closer to real documents than the query is.

Project Structure

hyde-rag/
├── config.py              # Configuration
├── hypothesis_generator.py # Generate hypothetical docs
├── retriever.py           # HyDE retrieval
├── rag_pipeline.py        # Full pipeline
├── app.py                 # FastAPI application
├── evaluate.py            # Compare HyDE vs traditional
└── requirements.txt

Step 1: Configuration

# config.py
from pydantic_settings import BaseSettings
from functools import lru_cache


class Settings(BaseSettings):
    """Application configuration."""

    openai_api_key: str

    # Model settings
    embedding_model: str = "text-embedding-3-small"
    llm_model: str = "gpt-4o-mini"

    # HyDE settings
    num_hypotheses: int = 1  # Generate multiple for robustness
    hypothesis_max_tokens: int = 256

    # Retrieval settings
    top_k: int = 5

    # ChromaDB
    chroma_persist_dir: str = "./chroma_db"
    collection_name: str = "hyde_docs"

    class Config:
        env_file = ".env"


@lru_cache
def get_settings() -> Settings:
    return Settings()

Step 2: Hypothesis Generator

The core of HyDE: generating hypothetical documents that would answer the query.

# hypothesis_generator.py
from openai import OpenAI
from pydantic import BaseModel
from config import get_settings


class Hypothesis(BaseModel):
    """A generated hypothetical document."""
    content: str
    query: str


class HypothesisGenerator:
    """Generates hypothetical documents for queries."""

    def __init__(self):
        settings = get_settings()
        self.client = OpenAI(api_key=settings.openai_api_key)
        self.model = settings.llm_model
        self.max_tokens = settings.hypothesis_max_tokens

    def generate(
        self,
        query: str,
        num_hypotheses: int = 1,
        domain: str | None = None
    ) -> list[Hypothesis]:
        """
        Generate hypothetical documents that would answer the query.

        Args:
            query: The user's question
            num_hypotheses: Number of hypotheses to generate
            domain: Optional domain context (e.g., "medical", "legal")

        Returns:
            List of hypothetical documents
        """
        system_prompt = self._build_system_prompt(domain)

        hypotheses = []
        for i in range(num_hypotheses):
            # Add variation prompt for multiple hypotheses
            user_prompt = f"Question: {query}"
            if num_hypotheses > 1:
                user_prompt += f"\n\nGenerate hypothesis #{i+1} with a different perspective or focus."

            response = self.client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": user_prompt}
                ],
                max_tokens=self.max_tokens,
                temperature=0.7 if num_hypotheses > 1 else 0.3
            )

            hypotheses.append(Hypothesis(
                content=response.choices[0].message.content,
                query=query
            ))

        return hypotheses

    def _build_system_prompt(self, domain: str | None) -> str:
        """Build system prompt for hypothesis generation."""

        base_prompt = """You are a document generator. Given a question, write a
short passage that would directly answer that question.

Guidelines:
1. Write in a declarative, informative style (like a textbook or encyclopedia)
2. Include specific facts, terms, and concepts related to the question
3. Be detailed but concise (1-2 paragraphs)
4. Do NOT write "The answer is..." or similar phrases
5. Write as if this is an excerpt from an authoritative document

Your output should read like a passage someone might find when researching this topic."""

        if domain:
            domain_contexts = {
                "medical": "Write in the style of a medical textbook or clinical guideline. Use proper medical terminology.",
                "legal": "Write in the style of a legal document or case law. Use proper legal terminology and cite relevant principles.",
                "technical": "Write in the style of technical documentation. Be precise and include implementation details.",
                "academic": "Write in the style of an academic paper. Be rigorous and cite theoretical foundations.",
                "business": "Write in the style of a business report. Focus on practical implications and metrics."
            }
            base_prompt += f"\n\n{domain_contexts.get(domain, '')}"

        return base_prompt

    def generate_with_perspectives(
        self,
        query: str,
        perspectives: list[str]
    ) -> list[Hypothesis]:
        """Generate hypotheses from multiple explicit perspectives."""

        hypotheses = []
        for perspective in perspectives:
            system_prompt = f"""You are a document generator writing from a {perspective} perspective.
Given a question, write a short passage answering it from this viewpoint.
Write in a declarative, informative style (like a textbook excerpt)."""

            response = self.client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": f"Question: {query}"}
                ],
                max_tokens=self.max_tokens
            )

            hypotheses.append(Hypothesis(
                content=response.choices[0].message.content,
                query=query
            ))

        return hypotheses

Understanding the Hypothesis Generation Process:

Query: "What causes rain?"
            │
            ▼
┌─────────────────────────────────────────────────────────────┐
│ LLM generates a hypothetical answer (NOT the final answer!) │
│                                                              │
│ "Precipitation occurs when water vapor in the atmosphere    │
│ condenses around particles of dust or pollen. As these      │
│ droplets combine and grow heavy, gravity pulls them down    │
│ as rain. Temperature and humidity play crucial roles..."    │
└─────────────────────────────────────────────────────────────┘
            │
            ▼
┌─────────────────────────────────────────────────────────────┐
│ This hypothesis is ONLY used for retrieval, not as answer.  │
│ It contains key terms: "precipitation", "water vapor",      │
│ "condenses", "atmosphere" - words likely in real documents. │
└─────────────────────────────────────────────────────────────┘

Why Multiple Hypotheses?

Setting	Use Case	Trade-off
`num_hypotheses=1`	Simple queries, low latency	May miss diverse docs
`num_hypotheses=3`	Complex topics	Better coverage, 3x LLM calls
Higher temperature (0.7)	Multiple hypotheses	Creates diverse perspectives
Lower temperature (0.3)	Single hypothesis	More focused, consistent

Domain-Specific Prompts:

The domain parameter tailors the hypothesis style:

Medical: Uses clinical terminology, structured like a textbook
Legal: References principles and precedents
Technical: Includes implementation details

This helps the hypothesis match the vocabulary of your actual documents.

Step 3: HyDE Retriever

# retriever.py
import chromadb
from chromadb.utils import embedding_functions
from openai import OpenAI
import numpy as np
from pydantic import BaseModel
from config import get_settings
from hypothesis_generator import HypothesisGenerator, Hypothesis


class RetrievedDocument(BaseModel):
    """A retrieved document with metadata."""
    content: str
    source: str
    distance: float
    retrieval_method: str  # "hyde" or "traditional"


class HyDERetriever:
    """Retriever using Hypothetical Document Embeddings."""

    def __init__(self):
        settings = get_settings()

        self.client = OpenAI(api_key=settings.openai_api_key)
        self.embedding_model = settings.embedding_model

        # ChromaDB setup
        self.chroma_client = chromadb.PersistentClient(
            path=settings.chroma_persist_dir
        )

        self.embedding_fn = embedding_functions.OpenAIEmbeddingFunction(
            api_key=settings.openai_api_key,
            model_name=settings.embedding_model
        )

        self.collection = self.chroma_client.get_or_create_collection(
            name=settings.collection_name,
            embedding_function=self.embedding_fn,
            metadata={"hnsw:space": "cosine"}
        )

        self.hypothesis_generator = HypothesisGenerator()
        self.settings = settings

    def add_documents(
        self,
        documents: list[str],
        sources: list[str],
        ids: list[str] | None = None
    ):
        """Add documents to the collection."""
        if ids is None:
            ids = [f"doc_{i}" for i in range(len(documents))]

        self.collection.add(
            documents=documents,
            ids=ids,
            metadatas=[{"source": src} for src in sources]
        )

    def retrieve_hyde(
        self,
        query: str,
        k: int | None = None,
        num_hypotheses: int = 1,
        domain: str | None = None
    ) -> tuple[list[RetrievedDocument], list[Hypothesis]]:
        """
        Retrieve documents using HyDE.

        Args:
            query: User query
            k: Number of documents to retrieve
            num_hypotheses: Number of hypothetical docs to generate
            domain: Optional domain for hypothesis generation

        Returns:
            Tuple of (retrieved documents, generated hypotheses)
        """
        k = k or self.settings.top_k

        # Step 1: Generate hypothetical documents
        hypotheses = self.hypothesis_generator.generate(
            query=query,
            num_hypotheses=num_hypotheses,
            domain=domain
        )

        # Step 2: Get embeddings for hypotheses
        hypothesis_texts = [h.content for h in hypotheses]

        if num_hypotheses == 1:
            # Single hypothesis: use its embedding directly
            search_text = hypothesis_texts[0]
            results = self.collection.query(
                query_texts=[search_text],
                n_results=k,
                include=["documents", "metadatas", "distances"]
            )
        else:
            # Multiple hypotheses: average their embeddings
            response = self.client.embeddings.create(
                model=self.embedding_model,
                input=hypothesis_texts
            )

            embeddings = [e.embedding for e in response.data]
            avg_embedding = np.mean(embeddings, axis=0).tolist()

            results = self.collection.query(
                query_embeddings=[avg_embedding],
                n_results=k,
                include=["documents", "metadatas", "distances"]
            )

        # Step 3: Format results
        documents = []
        for i in range(len(results["documents"][0])):
            doc = RetrievedDocument(
                content=results["documents"][0][i],
                source=results["metadatas"][0][i].get("source", "unknown"),
                distance=results["distances"][0][i],
                retrieval_method="hyde"
            )
            documents.append(doc)

        return documents, hypotheses

    def retrieve_traditional(
        self,
        query: str,
        k: int | None = None
    ) -> list[RetrievedDocument]:
        """Traditional retrieval using query embedding."""
        k = k or self.settings.top_k

        results = self.collection.query(
            query_texts=[query],
            n_results=k,
            include=["documents", "metadatas", "distances"]
        )

        documents = []
        for i in range(len(results["documents"][0])):
            doc = RetrievedDocument(
                content=results["documents"][0][i],
                source=results["metadatas"][0][i].get("source", "unknown"),
                distance=results["distances"][0][i],
                retrieval_method="traditional"
            )
            documents.append(doc)

        return documents

    def retrieve_hybrid(
        self,
        query: str,
        k: int | None = None,
        hyde_weight: float = 0.7
    ) -> list[RetrievedDocument]:
        """
        Hybrid retrieval combining HyDE and traditional.

        Uses reciprocal rank fusion to combine results.
        """
        k = k or self.settings.top_k

        # Get both result sets
        hyde_docs, _ = self.retrieve_hyde(query, k=k*2)
        trad_docs = self.retrieve_traditional(query, k=k*2)

        # Reciprocal Rank Fusion
        scores: dict[str, float] = {}

        for rank, doc in enumerate(hyde_docs):
            key = doc.content[:100]  # Use content prefix as key
            scores[key] = scores.get(key, 0) + hyde_weight / (rank + 60)

        for rank, doc in enumerate(trad_docs):
            key = doc.content[:100]
            scores[key] = scores.get(key, 0) + (1 - hyde_weight) / (rank + 60)

        # Create combined document list
        all_docs = {d.content[:100]: d for d in hyde_docs + trad_docs}

        # Sort by fused score
        sorted_keys = sorted(scores.keys(), key=lambda x: scores[x], reverse=True)

        result = []
        for key in sorted_keys[:k]:
            doc = all_docs[key]
            doc.retrieval_method = "hybrid"
            result.append(doc)

        return result

Understanding the Three Retrieval Strategies:

┌─────────────────────────────────────────────────────────────┐
│ METHOD 1: Traditional                                       │
│                                                              │
│ Query: "What causes rain?" ──► Embed ──► Search ──► Docs   │
│                                                              │
│ Problem: Query embedding is far from document embeddings    │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│ METHOD 2: HyDE (Single Hypothesis)                          │
│                                                              │
│ Query ──► LLM ──► Hypothesis ──► Embed ──► Search ──► Docs │
│                                                              │
│ Benefit: Hypothesis embedding is close to document style    │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│ METHOD 3: HyDE (Multiple Hypotheses)                        │
│                                                              │
│ Query ──► LLM ──► [Hyp1, Hyp2, Hyp3]                        │
│                        │                                     │
│                        ▼                                     │
│              Average Embeddings ──► Search ──► Docs         │
│                                                              │
│ Benefit: Averaging reduces bias from any single hypothesis  │
└─────────────────────────────────────────────────────────────┘

Why Average Multiple Hypothesis Embeddings?

Hypothesis 1	Hypothesis 2	Hypothesis 3
Focus: mechanism	Focus: geography	Focus: seasons
"Water evaporates..."	"Tropical regions receive..."	"Monsoon patterns..."

Averaging creates a more robust query embedding that captures multiple aspects of the question, reducing the risk of a biased or narrow hypothesis.

The Hybrid Approach:

hyde_weight = 0.7  # Trust HyDE more

# RRF combines both:
# - HyDE finds semantically similar docs
# - Traditional catches exact keyword matches
# - Fusion gives best of both worlds

Step 4: RAG Pipeline

# rag_pipeline.py
from openai import OpenAI
from pydantic import BaseModel
from config import get_settings
from retriever import HyDERetriever, RetrievedDocument
from hypothesis_generator import Hypothesis


class RAGResponse(BaseModel):
    """Response from the RAG pipeline."""
    answer: str
    sources: list[str]
    hypotheses: list[str]
    retrieval_method: str
    documents_used: int


class HyDERAG:
    """RAG pipeline with HyDE retrieval."""

    def __init__(self):
        settings = get_settings()
        self.client = OpenAI(api_key=settings.openai_api_key)
        self.model = settings.llm_model
        self.retriever = HyDERetriever()

    def query(
        self,
        question: str,
        method: str = "hyde",
        num_hypotheses: int = 1,
        domain: str | None = None
    ) -> RAGResponse:
        """
        Answer a question using HyDE RAG.

        Args:
            question: User question
            method: "hyde", "traditional", or "hybrid"
            num_hypotheses: Number of hypotheses for HyDE
            domain: Optional domain context
        """
        hypotheses: list[Hypothesis] = []

        # Retrieve documents
        if method == "hyde":
            documents, hypotheses = self.retriever.retrieve_hyde(
                query=question,
                num_hypotheses=num_hypotheses,
                domain=domain
            )
        elif method == "traditional":
            documents = self.retriever.retrieve_traditional(question)
        elif method == "hybrid":
            documents = self.retriever.retrieve_hybrid(question)
        else:
            raise ValueError(f"Unknown method: {method}")

        # Generate answer
        answer = self._generate_answer(question, documents)

        return RAGResponse(
            answer=answer,
            sources=list(set(d.source for d in documents)),
            hypotheses=[h.content for h in hypotheses],
            retrieval_method=method,
            documents_used=len(documents)
        )

    def _generate_answer(
        self,
        question: str,
        documents: list[RetrievedDocument]
    ) -> str:
        """Generate answer from retrieved documents."""

        context = "\n\n---\n\n".join([
            f"Source: {d.source}\n{d.content}"
            for d in documents
        ])

        system_prompt = """Answer the question based only on the provided context.
If the context doesn't contain enough information, say so.
Be concise and cite sources when possible."""

        response = self.client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
            ]
        )

        return response.choices[0].message.content

    def add_documents(self, documents: list[str], sources: list[str]):
        """Add documents to the knowledge base."""
        self.retriever.add_documents(documents, sources)

Step 5: Evaluation

Compare HyDE vs traditional retrieval to measure improvements.

# evaluate.py
from pydantic import BaseModel
from retriever import HyDERetriever


class RetrievalMetrics(BaseModel):
    """Metrics for retrieval evaluation."""
    method: str
    query: str
    avg_distance: float
    min_distance: float
    relevant_in_top_k: int  # Based on ground truth


class HyDEEvaluator:
    """Evaluate HyDE vs traditional retrieval."""

    def __init__(self):
        self.retriever = HyDERetriever()

    def compare_methods(
        self,
        query: str,
        relevant_doc_ids: list[str] | None = None,
        k: int = 5
    ) -> dict:
        """Compare HyDE and traditional retrieval for a query."""

        # HyDE retrieval
        hyde_docs, hypotheses = self.retriever.retrieve_hyde(query, k=k)

        # Traditional retrieval
        trad_docs = self.retriever.retrieve_traditional(query, k=k)

        results = {
            "query": query,
            "hypothesis": hypotheses[0].content if hypotheses else None,
            "hyde": {
                "avg_distance": sum(d.distance for d in hyde_docs) / len(hyde_docs),
                "min_distance": min(d.distance for d in hyde_docs),
                "documents": [d.content[:100] + "..." for d in hyde_docs]
            },
            "traditional": {
                "avg_distance": sum(d.distance for d in trad_docs) / len(trad_docs),
                "min_distance": min(d.distance for d in trad_docs),
                "documents": [d.content[:100] + "..." for d in trad_docs]
            },
            "hyde_improvement": {
                "avg_distance_reduction": (
                    (sum(d.distance for d in trad_docs) / len(trad_docs)) -
                    (sum(d.distance for d in hyde_docs) / len(hyde_docs))
                ),
                "min_distance_reduction": (
                    min(d.distance for d in trad_docs) -
                    min(d.distance for d in hyde_docs)
                )
            }
        }

        return results

    def batch_evaluate(
        self,
        queries: list[str],
        k: int = 5
    ) -> dict:
        """Evaluate multiple queries and aggregate results."""

        all_results = []
        hyde_wins = 0
        trad_wins = 0

        for query in queries:
            result = self.compare_methods(query, k=k)
            all_results.append(result)

            if result["hyde"]["avg_distance"] < result["traditional"]["avg_distance"]:
                hyde_wins += 1
            else:
                trad_wins += 1

        return {
            "individual_results": all_results,
            "summary": {
                "total_queries": len(queries),
                "hyde_wins": hyde_wins,
                "traditional_wins": trad_wins,
                "hyde_win_rate": hyde_wins / len(queries),
                "avg_distance_improvement": sum(
                    r["hyde_improvement"]["avg_distance_reduction"]
                    for r in all_results
                ) / len(all_results)
            }
        }

Step 6: FastAPI Application

# app.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from rag_pipeline import HyDERAG, RAGResponse
from evaluate import HyDEEvaluator
from contextlib import asynccontextmanager
from typing import Literal


# Globals
hyde_rag: HyDERAG | None = None
evaluator: HyDEEvaluator | None = None


@asynccontextmanager
async def lifespan(app: FastAPI):
    global hyde_rag, evaluator
    hyde_rag = HyDERAG()
    evaluator = HyDEEvaluator()

    # Add sample documents
    sample_docs = [
        "The water cycle, also known as the hydrological cycle, describes the continuous movement of water on Earth. Water evaporates from oceans and lakes, rises as vapor, condenses into clouds, and falls as precipitation (rain, snow, sleet). This cycle is driven by solar energy and gravity.",
        "Photosynthesis is the process by which plants convert light energy into chemical energy. Using chlorophyll in their leaves, plants absorb sunlight and use it to transform carbon dioxide and water into glucose and oxygen. This process is essential for life on Earth.",
        "Machine learning algorithms learn patterns from data to make predictions or decisions. Supervised learning uses labeled examples, unsupervised learning finds hidden patterns, and reinforcement learning optimizes through trial and error. Deep learning uses neural networks with many layers.",
        "The immune system protects the body from pathogens including bacteria, viruses, and parasites. It consists of two main parts: innate immunity (immediate, non-specific response) and adaptive immunity (slower but targeted, with memory). White blood cells play a crucial role in both.",
        "Quantum computing uses quantum bits (qubits) that can exist in superposition, representing 0 and 1 simultaneously. This enables quantum computers to solve certain problems exponentially faster than classical computers, particularly in cryptography, optimization, and simulation."
    ]
    sources = ["water_cycle", "photosynthesis", "ml_basics", "immune_system", "quantum_computing"]
    hyde_rag.add_documents(sample_docs, sources)

    yield

    hyde_rag = None
    evaluator = None


app = FastAPI(
    title="HyDE RAG API",
    description="Retrieval-Augmented Generation with Hypothetical Document Embeddings",
    lifespan=lifespan
)


class QueryRequest(BaseModel):
    query: str
    method: Literal["hyde", "traditional", "hybrid"] = "hyde"
    num_hypotheses: int = 1
    domain: str | None = None


class DocumentsRequest(BaseModel):
    documents: list[str]
    sources: list[str]


class CompareRequest(BaseModel):
    query: str
    k: int = 5


@app.post("/query", response_model=RAGResponse)
async def query(request: QueryRequest):
    """Query the HyDE RAG system."""
    if not hyde_rag:
        raise HTTPException(status_code=503, detail="Service not initialized")

    result = hyde_rag.query(
        question=request.query,
        method=request.method,
        num_hypotheses=request.num_hypotheses,
        domain=request.domain
    )
    return result


@app.post("/compare")
async def compare_methods(request: CompareRequest):
    """Compare HyDE vs traditional retrieval for a query."""
    if not evaluator:
        raise HTTPException(status_code=503, detail="Service not initialized")

    result = evaluator.compare_methods(
        query=request.query,
        k=request.k
    )
    return result


@app.post("/documents")
async def add_documents(request: DocumentsRequest):
    """Add documents to the knowledge base."""
    if not hyde_rag:
        raise HTTPException(status_code=503, detail="Service not initialized")

    if len(request.documents) != len(request.sources):
        raise HTTPException(
            status_code=400,
            detail="Documents and sources must have same length"
        )

    hyde_rag.add_documents(request.documents, request.sources)
    return {"status": "success", "documents_added": len(request.documents)}


@app.get("/health")
async def health():
    return {"status": "healthy", "service": "hyde-rag"}

Step 7: Requirements

# requirements.txt
openai>=1.12.0
chromadb>=0.4.22
numpy>=1.24.0
pydantic>=2.0.0
pydantic-settings>=2.0.0
fastapi>=0.109.0
uvicorn>=0.27.0
python-dotenv>=1.0.0

Usage Examples

Basic HyDE Query

from rag_pipeline import HyDERAG

# Initialize
rag = HyDERAG()

# Add documents
rag.add_documents(
    documents=["Your document content here..."],
    sources=["source_name"]
)

# Query with HyDE
result = rag.query(
    question="How does rain form?",
    method="hyde"
)

print(f"Answer: {result.answer}")
print(f"Hypothesis used: {result.hypotheses[0]}")

Compare Methods

from evaluate import HyDEEvaluator

evaluator = HyDEEvaluator()

# Compare for a single query
comparison = evaluator.compare_methods(
    query="What is machine learning?",
    k=5
)

print(f"HyDE avg distance: {comparison['hyde']['avg_distance']:.4f}")
print(f"Traditional avg distance: {comparison['traditional']['avg_distance']:.4f}")
print(f"Improvement: {comparison['hyde_improvement']['avg_distance_reduction']:.4f}")

Multi-Hypothesis HyDE

# Generate multiple hypotheses for robustness
result = rag.query(
    question="Explain quantum computing",
    method="hyde",
    num_hypotheses=3  # Average embeddings from 3 hypotheses
)

API Usage

# Start server
uvicorn app:app --reload

# HyDE query
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"query": "How do plants make food?", "method": "hyde"}'

# Compare methods
curl -X POST http://localhost:8000/compare \
  -H "Content-Type: application/json" \
  -d '{"query": "What is the immune system?", "k": 5}'

How HyDE Works

┌─────────────────────────────────────────────────────────────────┐
│                      HYDE PIPELINE                              │
│                                                                 │
│  User Query                                                     │
│      │                                                          │
│      ▼                                                          │
│  ┌─────────────────┐                                           │
│  │ LLM Generation  │ ← Generate what a good answer looks like  │
│  └────────┬────────┘                                           │
│           │                                                     │
│           ▼                                                     │
│  ┌─────────────────────────┐                                   │
│  │ Hypothetical Document   │ ← Key insight: document-like text │
│  │ "The answer would be..."│                                   │
│  └────────┬────────────────┘                                   │
│           │                                                     │
│           ▼                                                     │
│  ┌─────────────────┐                                           │
│  │ Embed Hypothesis│ ← Now in document embedding space         │
│  └────────┬────────┘                                           │
│           │                                                     │
│           ▼                                                     │
│      ┌─────────┐                                               │
│      │Vector DB│                                               │
│      └────┬────┘                                               │
│           │                                                     │
│           ▼                                                     │
│  Retrieved Documents ──► Generate Final Answer                  │
└─────────────────────────────────────────────────────────────────┘

Why It Works

Aspect	Query Embedding	HyDE Embedding
Format	Question	Document-like
Length	Short	Paragraph
Style	Interrogative	Declarative
Semantics	What user wants	What answer looks like

The hypothesis bridges the semantic gap between questions and documents.

When to Use HyDE

Scenario	HyDE Benefit
Short queries	Expands to full context
Domain-specific	Generates domain terminology
Abstract questions	Creates concrete examples
Keyword mismatch	Bridges vocabulary gap

When Traditional May Be Better

Exact keyword matching needed
Very specific technical queries
Latency-critical applications (HyDE adds LLM call)

Key Concepts Recap

Concept	What It Is	Why It Matters
Semantic Gap	Queries ≠ documents in style and length	Embeddings aren't optimally similar
Hypothetical Document	LLM-generated answer used for retrieval	Bridges the query-document gap
Embedding Averaging	Mean of multiple hypothesis embeddings	Reduces bias, improves robustness
Domain Prompting	Tailored generation for specific fields	Matches your document vocabulary
Hybrid Retrieval	Combine HyDE + traditional with RRF	Get semantic + keyword matching
Latency Trade-off	HyDE adds one LLM call before search	Better results, ~500ms extra

Next Steps

Explore Self-RAG for self-correction
Try Hybrid Search for keyword + semantic
Learn RAG with Reranking for better ranking
Build Agentic RAG for multi-step retrieval

HyDE RAG

TL;DR

Property	Value
Difficulty	Intermediate
Time	~4 hours
Code Size	~300 LOC
Prerequisites	Intelligent Document Q&A

Tech Stack

Technology	Purpose
LangChain	RAG orchestration
OpenAI	GPT-4 + Embeddings
ChromaDB	Vector database
FastAPI	REST API

Prerequisites

Completed Intelligent Document Q&A tutorial
Python 3.10+
OpenAI API key (Get one here)

What You'll Learn

Understand the HyDE (Hypothetical Document Embeddings) technique
Generate hypothetical answers to improve retrieval
Compare HyDE vs traditional query embedding
Implement multi-hypothesis generation for robustness
Measure retrieval improvements with HyDE

The Insight Behind HyDE

Traditional RAG embeds the query and searches for similar documents. But queries and documents are fundamentally different:

Queries: Short, question-form, incomplete
Documents: Long, declarative, information-rich

┌────────────────────────────────────────────────────────────────────────┐
│                        TRADITIONAL RETRIEVAL                           │
│                                                                        │
│  "What causes rain?" ──► Embed Query ──► Search ──► May miss relevant │
│        (short)           (question        (mismatch)    documents ⚠️   │
│                           space)                                       │
│                                                                        │
│  Problem: Query "What causes rain?" may not match                     │
│           document "Precipitation occurs when water vapor..."          │
└────────────────────────────────────────────────────────────────────────┘

HyDE's insight: Instead of searching with the query, generate what a good answer would look like, then search for documents similar to that hypothetical answer.

┌────────────────────────────────────────────────────────────────────────┐
│                           HYDE RETRIEVAL                               │
│                                                                        │
│  "What causes rain?" ──► LLM ──► "Rain forms when water   ──► Embed   │
│        (query)                    vapor condenses in the      Hypothesis│
│                                   atmosphere..."                       │
│                                   (hypothetical document)       │      │
│                                                                 ▼      │
│                                                              Search    │
│                                                                 │      │
│                                                                 ▼      │
│                          Finds: "Precipitation occurs when water       │
│                                  vapor condenses..." ✓                 │
│                                                                        │
│  Solution: Hypothesis is semantically closer to real documents!        │
└────────────────────────────────────────────────────────────────────────┘

The hypothetical document is semantically closer to real documents than the query is.

Project Structure

hyde-rag/
├── config.py              # Configuration
├── hypothesis_generator.py # Generate hypothetical docs
├── retriever.py           # HyDE retrieval
├── rag_pipeline.py        # Full pipeline
├── app.py                 # FastAPI application
├── evaluate.py            # Compare HyDE vs traditional
└── requirements.txt

Step 1: Configuration

# config.py
from pydantic_settings import BaseSettings
from functools import lru_cache


class Settings(BaseSettings):
    """Application configuration."""

    openai_api_key: str

    # Model settings
    embedding_model: str = "text-embedding-3-small"
    llm_model: str = "gpt-4o-mini"

    # HyDE settings
    num_hypotheses: int = 1  # Generate multiple for robustness
    hypothesis_max_tokens: int = 256

    # Retrieval settings
    top_k: int = 5

    # ChromaDB
    chroma_persist_dir: str = "./chroma_db"
    collection_name: str = "hyde_docs"

    class Config:
        env_file = ".env"


@lru_cache
def get_settings() -> Settings:
    return Settings()

Step 2: Hypothesis Generator

The core of HyDE: generating hypothetical documents that would answer the query.

# hypothesis_generator.py
from openai import OpenAI
from pydantic import BaseModel
from config import get_settings


class Hypothesis(BaseModel):
    """A generated hypothetical document."""
    content: str
    query: str


class HypothesisGenerator:
    """Generates hypothetical documents for queries."""

    def __init__(self):
        settings = get_settings()
        self.client = OpenAI(api_key=settings.openai_api_key)
        self.model = settings.llm_model
        self.max_tokens = settings.hypothesis_max_tokens

    def generate(
        self,
        query: str,
        num_hypotheses: int = 1,
        domain: str | None = None
    ) -> list[Hypothesis]:
        """
        Generate hypothetical documents that would answer the query.

        Args:
            query: The user's question
            num_hypotheses: Number of hypotheses to generate
            domain: Optional domain context (e.g., "medical", "legal")

        Returns:
            List of hypothetical documents
        """
        system_prompt = self._build_system_prompt(domain)

        hypotheses = []
        for i in range(num_hypotheses):
            # Add variation prompt for multiple hypotheses
            user_prompt = f"Question: {query}"
            if num_hypotheses > 1:
                user_prompt += f"\n\nGenerate hypothesis #{i+1} with a different perspective or focus."

            response = self.client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": user_prompt}
                ],
                max_tokens=self.max_tokens,
                temperature=0.7 if num_hypotheses > 1 else 0.3
            )

            hypotheses.append(Hypothesis(
                content=response.choices[0].message.content,
                query=query
            ))

        return hypotheses

    def _build_system_prompt(self, domain: str | None) -> str:
        """Build system prompt for hypothesis generation."""

        base_prompt = """You are a document generator. Given a question, write a
short passage that would directly answer that question.

Guidelines:
1. Write in a declarative, informative style (like a textbook or encyclopedia)
2. Include specific facts, terms, and concepts related to the question
3. Be detailed but concise (1-2 paragraphs)
4. Do NOT write "The answer is..." or similar phrases
5. Write as if this is an excerpt from an authoritative document

Your output should read like a passage someone might find when researching this topic."""

        if domain:
            domain_contexts = {
                "medical": "Write in the style of a medical textbook or clinical guideline. Use proper medical terminology.",
                "legal": "Write in the style of a legal document or case law. Use proper legal terminology and cite relevant principles.",
                "technical": "Write in the style of technical documentation. Be precise and include implementation details.",
                "academic": "Write in the style of an academic paper. Be rigorous and cite theoretical foundations.",
                "business": "Write in the style of a business report. Focus on practical implications and metrics."
            }
            base_prompt += f"\n\n{domain_contexts.get(domain, '')}"

        return base_prompt

    def generate_with_perspectives(
        self,
        query: str,
        perspectives: list[str]
    ) -> list[Hypothesis]:
        """Generate hypotheses from multiple explicit perspectives."""

        hypotheses = []
        for perspective in perspectives:
            system_prompt = f"""You are a document generator writing from a {perspective} perspective.
Given a question, write a short passage answering it from this viewpoint.
Write in a declarative, informative style (like a textbook excerpt)."""

            response = self.client.chat.completions.create(
                model=self.model,
                messages=[
                    {"role": "system", "content": system_prompt},
                    {"role": "user", "content": f"Question: {query}"}
                ],
                max_tokens=self.max_tokens
            )

            hypotheses.append(Hypothesis(
                content=response.choices[0].message.content,
                query=query
            ))

        return hypotheses

Understanding the Hypothesis Generation Process:

Query: "What causes rain?"
            │
            ▼
┌─────────────────────────────────────────────────────────────┐
│ LLM generates a hypothetical answer (NOT the final answer!) │
│                                                              │
│ "Precipitation occurs when water vapor in the atmosphere    │
│ condenses around particles of dust or pollen. As these      │
│ droplets combine and grow heavy, gravity pulls them down    │
│ as rain. Temperature and humidity play crucial roles..."    │
└─────────────────────────────────────────────────────────────┘
            │
            ▼
┌─────────────────────────────────────────────────────────────┐
│ This hypothesis is ONLY used for retrieval, not as answer.  │
│ It contains key terms: "precipitation", "water vapor",      │
│ "condenses", "atmosphere" - words likely in real documents. │
└─────────────────────────────────────────────────────────────┘

Why Multiple Hypotheses?

Setting	Use Case	Trade-off
`num_hypotheses=1`	Simple queries, low latency	May miss diverse docs
`num_hypotheses=3`	Complex topics	Better coverage, 3x LLM calls
Higher temperature (0.7)	Multiple hypotheses	Creates diverse perspectives
Lower temperature (0.3)	Single hypothesis	More focused, consistent

Domain-Specific Prompts:

The domain parameter tailors the hypothesis style:

Medical: Uses clinical terminology, structured like a textbook
Legal: References principles and precedents
Technical: Includes implementation details

This helps the hypothesis match the vocabulary of your actual documents.

Step 3: HyDE Retriever

# retriever.py
import chromadb
from chromadb.utils import embedding_functions
from openai import OpenAI
import numpy as np
from pydantic import BaseModel
from config import get_settings
from hypothesis_generator import HypothesisGenerator, Hypothesis


class RetrievedDocument(BaseModel):
    """A retrieved document with metadata."""
    content: str
    source: str
    distance: float
    retrieval_method: str  # "hyde" or "traditional"


class HyDERetriever:
    """Retriever using Hypothetical Document Embeddings."""

    def __init__(self):
        settings = get_settings()

        self.client = OpenAI(api_key=settings.openai_api_key)
        self.embedding_model = settings.embedding_model

        # ChromaDB setup
        self.chroma_client = chromadb.PersistentClient(
            path=settings.chroma_persist_dir
        )

        self.embedding_fn = embedding_functions.OpenAIEmbeddingFunction(
            api_key=settings.openai_api_key,
            model_name=settings.embedding_model
        )

        self.collection = self.chroma_client.get_or_create_collection(
            name=settings.collection_name,
            embedding_function=self.embedding_fn,
            metadata={"hnsw:space": "cosine"}
        )

        self.hypothesis_generator = HypothesisGenerator()
        self.settings = settings

    def add_documents(
        self,
        documents: list[str],
        sources: list[str],
        ids: list[str] | None = None
    ):
        """Add documents to the collection."""
        if ids is None:
            ids = [f"doc_{i}" for i in range(len(documents))]

        self.collection.add(
            documents=documents,
            ids=ids,
            metadatas=[{"source": src} for src in sources]
        )

    def retrieve_hyde(
        self,
        query: str,
        k: int | None = None,
        num_hypotheses: int = 1,
        domain: str | None = None
    ) -> tuple[list[RetrievedDocument], list[Hypothesis]]:
        """
        Retrieve documents using HyDE.

        Args:
            query: User query
            k: Number of documents to retrieve
            num_hypotheses: Number of hypothetical docs to generate
            domain: Optional domain for hypothesis generation

        Returns:
            Tuple of (retrieved documents, generated hypotheses)
        """
        k = k or self.settings.top_k

        # Step 1: Generate hypothetical documents
        hypotheses = self.hypothesis_generator.generate(
            query=query,
            num_hypotheses=num_hypotheses,
            domain=domain
        )

        # Step 2: Get embeddings for hypotheses
        hypothesis_texts = [h.content for h in hypotheses]

        if num_hypotheses == 1:
            # Single hypothesis: use its embedding directly
            search_text = hypothesis_texts[0]
            results = self.collection.query(
                query_texts=[search_text],
                n_results=k,
                include=["documents", "metadatas", "distances"]
            )
        else:
            # Multiple hypotheses: average their embeddings
            response = self.client.embeddings.create(
                model=self.embedding_model,
                input=hypothesis_texts
            )

            embeddings = [e.embedding for e in response.data]
            avg_embedding = np.mean(embeddings, axis=0).tolist()

            results = self.collection.query(
                query_embeddings=[avg_embedding],
                n_results=k,
                include=["documents", "metadatas", "distances"]
            )

        # Step 3: Format results
        documents = []
        for i in range(len(results["documents"][0])):
            doc = RetrievedDocument(
                content=results["documents"][0][i],
                source=results["metadatas"][0][i].get("source", "unknown"),
                distance=results["distances"][0][i],
                retrieval_method="hyde"
            )
            documents.append(doc)

        return documents, hypotheses

    def retrieve_traditional(
        self,
        query: str,
        k: int | None = None
    ) -> list[RetrievedDocument]:
        """Traditional retrieval using query embedding."""
        k = k or self.settings.top_k

        results = self.collection.query(
            query_texts=[query],
            n_results=k,
            include=["documents", "metadatas", "distances"]
        )

        documents = []
        for i in range(len(results["documents"][0])):
            doc = RetrievedDocument(
                content=results["documents"][0][i],
                source=results["metadatas"][0][i].get("source", "unknown"),
                distance=results["distances"][0][i],
                retrieval_method="traditional"
            )
            documents.append(doc)

        return documents

    def retrieve_hybrid(
        self,
        query: str,
        k: int | None = None,
        hyde_weight: float = 0.7
    ) -> list[RetrievedDocument]:
        """
        Hybrid retrieval combining HyDE and traditional.

        Uses reciprocal rank fusion to combine results.
        """
        k = k or self.settings.top_k

        # Get both result sets
        hyde_docs, _ = self.retrieve_hyde(query, k=k*2)
        trad_docs = self.retrieve_traditional(query, k=k*2)

        # Reciprocal Rank Fusion
        scores: dict[str, float] = {}

        for rank, doc in enumerate(hyde_docs):
            key = doc.content[:100]  # Use content prefix as key
            scores[key] = scores.get(key, 0) + hyde_weight / (rank + 60)

        for rank, doc in enumerate(trad_docs):
            key = doc.content[:100]
            scores[key] = scores.get(key, 0) + (1 - hyde_weight) / (rank + 60)

        # Create combined document list
        all_docs = {d.content[:100]: d for d in hyde_docs + trad_docs}

        # Sort by fused score
        sorted_keys = sorted(scores.keys(), key=lambda x: scores[x], reverse=True)

        result = []
        for key in sorted_keys[:k]:
            doc = all_docs[key]
            doc.retrieval_method = "hybrid"
            result.append(doc)

        return result

Understanding the Three Retrieval Strategies:

┌─────────────────────────────────────────────────────────────┐
│ METHOD 1: Traditional                                       │
│                                                              │
│ Query: "What causes rain?" ──► Embed ──► Search ──► Docs   │
│                                                              │
│ Problem: Query embedding is far from document embeddings    │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│ METHOD 2: HyDE (Single Hypothesis)                          │
│                                                              │
│ Query ──► LLM ──► Hypothesis ──► Embed ──► Search ──► Docs │
│                                                              │
│ Benefit: Hypothesis embedding is close to document style    │
└─────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────┐
│ METHOD 3: HyDE (Multiple Hypotheses)                        │
│                                                              │
│ Query ──► LLM ──► [Hyp1, Hyp2, Hyp3]                        │
│                        │                                     │
│                        ▼                                     │
│              Average Embeddings ──► Search ──► Docs         │
│                                                              │
│ Benefit: Averaging reduces bias from any single hypothesis  │
└─────────────────────────────────────────────────────────────┘

Why Average Multiple Hypothesis Embeddings?

Hypothesis 1	Hypothesis 2	Hypothesis 3
Focus: mechanism	Focus: geography	Focus: seasons
"Water evaporates..."	"Tropical regions receive..."	"Monsoon patterns..."

Averaging creates a more robust query embedding that captures multiple aspects of the question, reducing the risk of a biased or narrow hypothesis.

The Hybrid Approach:

hyde_weight = 0.7  # Trust HyDE more

# RRF combines both:
# - HyDE finds semantically similar docs
# - Traditional catches exact keyword matches
# - Fusion gives best of both worlds

Step 4: RAG Pipeline

# rag_pipeline.py
from openai import OpenAI
from pydantic import BaseModel
from config import get_settings
from retriever import HyDERetriever, RetrievedDocument
from hypothesis_generator import Hypothesis


class RAGResponse(BaseModel):
    """Response from the RAG pipeline."""
    answer: str
    sources: list[str]
    hypotheses: list[str]
    retrieval_method: str
    documents_used: int


class HyDERAG:
    """RAG pipeline with HyDE retrieval."""

    def __init__(self):
        settings = get_settings()
        self.client = OpenAI(api_key=settings.openai_api_key)
        self.model = settings.llm_model
        self.retriever = HyDERetriever()

    def query(
        self,
        question: str,
        method: str = "hyde",
        num_hypotheses: int = 1,
        domain: str | None = None
    ) -> RAGResponse:
        """
        Answer a question using HyDE RAG.

        Args:
            question: User question
            method: "hyde", "traditional", or "hybrid"
            num_hypotheses: Number of hypotheses for HyDE
            domain: Optional domain context
        """
        hypotheses: list[Hypothesis] = []

        # Retrieve documents
        if method == "hyde":
            documents, hypotheses = self.retriever.retrieve_hyde(
                query=question,
                num_hypotheses=num_hypotheses,
                domain=domain
            )
        elif method == "traditional":
            documents = self.retriever.retrieve_traditional(question)
        elif method == "hybrid":
            documents = self.retriever.retrieve_hybrid(question)
        else:
            raise ValueError(f"Unknown method: {method}")

        # Generate answer
        answer = self._generate_answer(question, documents)

        return RAGResponse(
            answer=answer,
            sources=list(set(d.source for d in documents)),
            hypotheses=[h.content for h in hypotheses],
            retrieval_method=method,
            documents_used=len(documents)
        )

    def _generate_answer(
        self,
        question: str,
        documents: list[RetrievedDocument]
    ) -> str:
        """Generate answer from retrieved documents."""

        context = "\n\n---\n\n".join([
            f"Source: {d.source}\n{d.content}"
            for d in documents
        ])

        system_prompt = """Answer the question based only on the provided context.
If the context doesn't contain enough information, say so.
Be concise and cite sources when possible."""

        response = self.client.chat.completions.create(
            model=self.model,
            messages=[
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": f"Context:\n{context}\n\nQuestion: {question}"}
            ]
        )

        return response.choices[0].message.content

    def add_documents(self, documents: list[str], sources: list[str]):
        """Add documents to the knowledge base."""
        self.retriever.add_documents(documents, sources)

Step 5: Evaluation

Compare HyDE vs traditional retrieval to measure improvements.

# evaluate.py
from pydantic import BaseModel
from retriever import HyDERetriever


class RetrievalMetrics(BaseModel):
    """Metrics for retrieval evaluation."""
    method: str
    query: str
    avg_distance: float
    min_distance: float
    relevant_in_top_k: int  # Based on ground truth


class HyDEEvaluator:
    """Evaluate HyDE vs traditional retrieval."""

    def __init__(self):
        self.retriever = HyDERetriever()

    def compare_methods(
        self,
        query: str,
        relevant_doc_ids: list[str] | None = None,
        k: int = 5
    ) -> dict:
        """Compare HyDE and traditional retrieval for a query."""

        # HyDE retrieval
        hyde_docs, hypotheses = self.retriever.retrieve_hyde(query, k=k)

        # Traditional retrieval
        trad_docs = self.retriever.retrieve_traditional(query, k=k)

        results = {
            "query": query,
            "hypothesis": hypotheses[0].content if hypotheses else None,
            "hyde": {
                "avg_distance": sum(d.distance for d in hyde_docs) / len(hyde_docs),
                "min_distance": min(d.distance for d in hyde_docs),
                "documents": [d.content[:100] + "..." for d in hyde_docs]
            },
            "traditional": {
                "avg_distance": sum(d.distance for d in trad_docs) / len(trad_docs),
                "min_distance": min(d.distance for d in trad_docs),
                "documents": [d.content[:100] + "..." for d in trad_docs]
            },
            "hyde_improvement": {
                "avg_distance_reduction": (
                    (sum(d.distance for d in trad_docs) / len(trad_docs)) -
                    (sum(d.distance for d in hyde_docs) / len(hyde_docs))
                ),
                "min_distance_reduction": (
                    min(d.distance for d in trad_docs) -
                    min(d.distance for d in hyde_docs)
                )
            }
        }

        return results

    def batch_evaluate(
        self,
        queries: list[str],
        k: int = 5
    ) -> dict:
        """Evaluate multiple queries and aggregate results."""

        all_results = []
        hyde_wins = 0
        trad_wins = 0

        for query in queries:
            result = self.compare_methods(query, k=k)
            all_results.append(result)

            if result["hyde"]["avg_distance"] < result["traditional"]["avg_distance"]:
                hyde_wins += 1
            else:
                trad_wins += 1

        return {
            "individual_results": all_results,
            "summary": {
                "total_queries": len(queries),
                "hyde_wins": hyde_wins,
                "traditional_wins": trad_wins,
                "hyde_win_rate": hyde_wins / len(queries),
                "avg_distance_improvement": sum(
                    r["hyde_improvement"]["avg_distance_reduction"]
                    for r in all_results
                ) / len(all_results)
            }
        }

Step 6: FastAPI Application

# app.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from rag_pipeline import HyDERAG, RAGResponse
from evaluate import HyDEEvaluator
from contextlib import asynccontextmanager
from typing import Literal


# Globals
hyde_rag: HyDERAG | None = None
evaluator: HyDEEvaluator | None = None


@asynccontextmanager
async def lifespan(app: FastAPI):
    global hyde_rag, evaluator
    hyde_rag = HyDERAG()
    evaluator = HyDEEvaluator()

    # Add sample documents
    sample_docs = [
        "The water cycle, also known as the hydrological cycle, describes the continuous movement of water on Earth. Water evaporates from oceans and lakes, rises as vapor, condenses into clouds, and falls as precipitation (rain, snow, sleet). This cycle is driven by solar energy and gravity.",
        "Photosynthesis is the process by which plants convert light energy into chemical energy. Using chlorophyll in their leaves, plants absorb sunlight and use it to transform carbon dioxide and water into glucose and oxygen. This process is essential for life on Earth.",
        "Machine learning algorithms learn patterns from data to make predictions or decisions. Supervised learning uses labeled examples, unsupervised learning finds hidden patterns, and reinforcement learning optimizes through trial and error. Deep learning uses neural networks with many layers.",
        "The immune system protects the body from pathogens including bacteria, viruses, and parasites. It consists of two main parts: innate immunity (immediate, non-specific response) and adaptive immunity (slower but targeted, with memory). White blood cells play a crucial role in both.",
        "Quantum computing uses quantum bits (qubits) that can exist in superposition, representing 0 and 1 simultaneously. This enables quantum computers to solve certain problems exponentially faster than classical computers, particularly in cryptography, optimization, and simulation."
    ]
    sources = ["water_cycle", "photosynthesis", "ml_basics", "immune_system", "quantum_computing"]
    hyde_rag.add_documents(sample_docs, sources)

    yield

    hyde_rag = None
    evaluator = None


app = FastAPI(
    title="HyDE RAG API",
    description="Retrieval-Augmented Generation with Hypothetical Document Embeddings",
    lifespan=lifespan
)


class QueryRequest(BaseModel):
    query: str
    method: Literal["hyde", "traditional", "hybrid"] = "hyde"
    num_hypotheses: int = 1
    domain: str | None = None


class DocumentsRequest(BaseModel):
    documents: list[str]
    sources: list[str]


class CompareRequest(BaseModel):
    query: str
    k: int = 5


@app.post("/query", response_model=RAGResponse)
async def query(request: QueryRequest):
    """Query the HyDE RAG system."""
    if not hyde_rag:
        raise HTTPException(status_code=503, detail="Service not initialized")

    result = hyde_rag.query(
        question=request.query,
        method=request.method,
        num_hypotheses=request.num_hypotheses,
        domain=request.domain
    )
    return result


@app.post("/compare")
async def compare_methods(request: CompareRequest):
    """Compare HyDE vs traditional retrieval for a query."""
    if not evaluator:
        raise HTTPException(status_code=503, detail="Service not initialized")

    result = evaluator.compare_methods(
        query=request.query,
        k=request.k
    )
    return result


@app.post("/documents")
async def add_documents(request: DocumentsRequest):
    """Add documents to the knowledge base."""
    if not hyde_rag:
        raise HTTPException(status_code=503, detail="Service not initialized")

    if len(request.documents) != len(request.sources):
        raise HTTPException(
            status_code=400,
            detail="Documents and sources must have same length"
        )

    hyde_rag.add_documents(request.documents, request.sources)
    return {"status": "success", "documents_added": len(request.documents)}


@app.get("/health")
async def health():
    return {"status": "healthy", "service": "hyde-rag"}

Step 7: Requirements

# requirements.txt
openai>=1.12.0
chromadb>=0.4.22
numpy>=1.24.0
pydantic>=2.0.0
pydantic-settings>=2.0.0
fastapi>=0.109.0
uvicorn>=0.27.0
python-dotenv>=1.0.0

Usage Examples

Basic HyDE Query

from rag_pipeline import HyDERAG

# Initialize
rag = HyDERAG()

# Add documents
rag.add_documents(
    documents=["Your document content here..."],
    sources=["source_name"]
)

# Query with HyDE
result = rag.query(
    question="How does rain form?",
    method="hyde"
)

print(f"Answer: {result.answer}")
print(f"Hypothesis used: {result.hypotheses[0]}")

Compare Methods

from evaluate import HyDEEvaluator

evaluator = HyDEEvaluator()

# Compare for a single query
comparison = evaluator.compare_methods(
    query="What is machine learning?",
    k=5
)

print(f"HyDE avg distance: {comparison['hyde']['avg_distance']:.4f}")
print(f"Traditional avg distance: {comparison['traditional']['avg_distance']:.4f}")
print(f"Improvement: {comparison['hyde_improvement']['avg_distance_reduction']:.4f}")

Multi-Hypothesis HyDE

# Generate multiple hypotheses for robustness
result = rag.query(
    question="Explain quantum computing",
    method="hyde",
    num_hypotheses=3  # Average embeddings from 3 hypotheses
)

API Usage

# Start server
uvicorn app:app --reload

# HyDE query
curl -X POST http://localhost:8000/query \
  -H "Content-Type: application/json" \
  -d '{"query": "How do plants make food?", "method": "hyde"}'

# Compare methods
curl -X POST http://localhost:8000/compare \
  -H "Content-Type: application/json" \
  -d '{"query": "What is the immune system?", "k": 5}'

How HyDE Works

┌─────────────────────────────────────────────────────────────────┐
│                      HYDE PIPELINE                              │
│                                                                 │
│  User Query                                                     │
│      │                                                          │
│      ▼                                                          │
│  ┌─────────────────┐                                           │
│  │ LLM Generation  │ ← Generate what a good answer looks like  │
│  └────────┬────────┘                                           │
│           │                                                     │
│           ▼                                                     │
│  ┌─────────────────────────┐                                   │
│  │ Hypothetical Document   │ ← Key insight: document-like text │
│  │ "The answer would be..."│                                   │
│  └────────┬────────────────┘                                   │
│           │                                                     │
│           ▼                                                     │
│  ┌─────────────────┐                                           │
│  │ Embed Hypothesis│ ← Now in document embedding space         │
│  └────────┬────────┘                                           │
│           │                                                     │
│           ▼                                                     │
│      ┌─────────┐                                               │
│      │Vector DB│                                               │
│      └────┬────┘                                               │
│           │                                                     │
│           ▼                                                     │
│  Retrieved Documents ──► Generate Final Answer                  │
└─────────────────────────────────────────────────────────────────┘

Why It Works

Aspect	Query Embedding	HyDE Embedding
Format	Question	Document-like
Length	Short	Paragraph
Style	Interrogative	Declarative
Semantics	What user wants	What answer looks like

The hypothesis bridges the semantic gap between questions and documents.

When to Use HyDE

Scenario	HyDE Benefit
Short queries	Expands to full context
Domain-specific	Generates domain terminology
Abstract questions	Creates concrete examples
Keyword mismatch	Bridges vocabulary gap

When Traditional May Be Better

Exact keyword matching needed
Very specific technical queries
Latency-critical applications (HyDE adds LLM call)

Key Concepts Recap

Concept	What It Is	Why It Matters
Semantic Gap	Queries ≠ documents in style and length	Embeddings aren't optimally similar
Hypothetical Document	LLM-generated answer used for retrieval	Bridges the query-document gap
Embedding Averaging	Mean of multiple hypothesis embeddings	Reduces bias, improves robustness
Domain Prompting	Tailored generation for specific fields	Matches your document vocabulary
Hybrid Retrieval	Combine HyDE + traditional with RRF	Get semantic + keyword matching
Latency Trade-off	HyDE adds one LLM call before search	Better results, ~500ms extra

Next Steps

Explore Self-RAG for self-correction
Try Hybrid Search for keyword + semantic
Learn RAG with Reranking for better ranking
Build Agentic RAG for multi-step retrieval

HyDE RAG

On this page

HyDE RAG

On this page