Build a complete semantic search system using sentence-transformers and vector similarity

Semantic Search Engine

Build a semantic search system that understands meaning, not just keywords

TL;DR

Text embeddings convert words into vectors where similar meanings are close together. "Cat" and "feline" become nearby points in vector space, enabling search that understands semantics, not just keyword matching.

What You'll Learn

How text embeddings capture semantic meaning
Using sentence-transformers for high-quality embeddings
Implementing cosine similarity for relevance ranking
Building a complete search API with FastAPI

Tech Stack

Component	Technology
Embeddings	sentence-transformers
Vector Storage	NumPy / ChromaDB
API	FastAPI
Frontend	Streamlit

Prerequisites

Python 3.9+
Basic understanding of vectors
Familiarity with REST APIs

Understanding Embeddings

┌─────────────────────────────────────────────────────────────────────────────┐
│                        HOW EMBEDDINGS WORK                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  INPUT TEXT                    MODEL              OUTPUT VECTORS            │
│                                                                             │
│  "The cat sat on the mat"  ─┐                ┌─► [0.2, 0.8, -0.1, ...]      │
│                              │  ┌──────────┐ │                              │
│  "A feline rested on a rug" ─┼─►│Embedding │─┼─► [0.19, 0.82, -0.08, ...]   │
│                              │  │  Model   │ │           ▲                  │
│  "Dogs love playing fetch"  ─┘  └──────────┘ └─► [-0.5, 0.1, 0.9, ...]      │
│                                                          │                  │
│                                                          │                  │
│  SIMILARITY IN VECTOR SPACE:                             │                  │
│                                                          │                  │
│  Cat sentence ●───────● Feline sentence                  │                  │
│               (CLOSE - similar meaning!)                 │                  │
│                                                          │                  │
│  Cat sentence ●                    ● Dog sentence        │                  │
│               (FAR APART - different topics)             │                  │
│                                                                             │
│  Key insight: Similar meanings → Similar vectors → Easy to find matches!    │
└─────────────────────────────────────────────────────────────────────────────┘

Embeddings transform text into dense vectors where similar meanings are close together in vector space. This enables semantic search that understands "cat" and "feline" are related, even without shared keywords.

Project Structure

semantic-search/
├── src/
│   ├── __init__.py
│   ├── embeddings.py      # Embedding generation
│   ├── search.py          # Search engine logic
│   ├── indexer.py         # Document indexing
│   └── api.py             # FastAPI application
├── data/
│   └── documents.json     # Sample documents
├── tests/
│   └── test_search.py
├── app.py                 # Streamlit frontend
├── requirements.txt
└── README.md

Implementation

Step 1: Project Setup

Create your project and install dependencies:

mkdir semantic-search && cd semantic-search
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

requirements.txt

sentence-transformers>=2.2.0
numpy>=1.24.0
fastapi>=0.100.0
uvicorn>=0.23.0
streamlit>=1.28.0
chromadb>=0.4.0
pydantic>=2.0.0

pip install -r requirements.txt

Step 2: Embedding Generator

Create the core embedding functionality:

src/embeddings.py

"""
Embedding generation using sentence-transformers.

Sentence-transformers provides state-of-the-art embeddings
optimized for semantic similarity tasks.
"""

from sentence_transformers import SentenceTransformer
import numpy as np
from typing import Union


class EmbeddingGenerator:
    """Generate embeddings for text using sentence-transformers."""
    
    def __init__(self, model_name: str = "all-MiniLM-L6-v2"):
        """
        Initialize the embedding generator.
        
        Args:
            model_name: The sentence-transformer model to use.
                - all-MiniLM-L6-v2: Fast, good quality (384 dims)
                - all-mpnet-base-v2: Best quality (768 dims)
                - paraphrase-multilingual-MiniLM-L12-v2: Multilingual
        """
        self.model = SentenceTransformer(model_name)
        self.embedding_dim = self.model.get_sentence_embedding_dimension()
    
    def embed(self, texts: Union[str, list[str]]) -> np.ndarray:
        """
        Generate embeddings for one or more texts.
        
        Args:
            texts: Single text or list of texts to embed
            
        Returns:
            Numpy array of shape (n_texts, embedding_dim)
        """
        if isinstance(texts, str):
            texts = [texts]
        
        embeddings = self.model.encode(
            texts,
            convert_to_numpy=True,
            normalize_embeddings=True,  # For cosine similarity
            show_progress_bar=len(texts) > 10
        )
        
        return embeddings
    
    def embed_batch(
        self, 
        texts: list[str], 
        batch_size: int = 32
    ) -> np.ndarray:
        """
        Generate embeddings in batches for large datasets.
        
        Args:
            texts: List of texts to embed
            batch_size: Number of texts per batch
            
        Returns:
            Numpy array of embeddings
        """
        return self.model.encode(
            texts,
            convert_to_numpy=True,
            normalize_embeddings=True,
            batch_size=batch_size,
            show_progress_bar=True
        )


# Quick test
if __name__ == "__main__":
    generator = EmbeddingGenerator()
    
    texts = [
        "The quick brown fox jumps over the lazy dog",
        "A fast auburn fox leaps above a sleepy canine",
        "Python is a programming language"
    ]
    
    embeddings = generator.embed(texts)
    print(f"Embedding shape: {embeddings.shape}")
    print(f"Embedding dimension: {generator.embedding_dim}")
    
    # Check similarity
    from numpy.linalg import norm
    
    def cosine_similarity(a, b):
        return np.dot(a, b) / (norm(a) * norm(b))
    
    print(f"\nSimilarity (text 0 vs 1): {cosine_similarity(embeddings[0], embeddings[1]):.4f}")
    print(f"Similarity (text 0 vs 2): {cosine_similarity(embeddings[0], embeddings[2]):.4f}")

Understanding the Embedding Generator:

┌─────────────────────────────────────────────────────────────────────────────┐
│ HOW SENTENCE-TRANSFORMERS WORKS                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Input Text: "The cat sat on the mat"                                      │
│         │                                                                   │
│         ▼                                                                   │
│  ┌─────────────────────────────────────────────┐                           │
│  │ Tokenizer                                   │                           │
│  │ Split into tokens: ["The", "cat", "sat"...] │                           │
│  └─────────────────────────────────────────────┘                           │
│         │                                                                   │
│         ▼                                                                   │
│  ┌─────────────────────────────────────────────┐                           │
│  │ Transformer Encoder (BERT-based)            │                           │
│  │ Processes all tokens, captures context      │                           │
│  └─────────────────────────────────────────────┘                           │
│         │                                                                   │
│         ▼                                                                   │
│  ┌─────────────────────────────────────────────┐                           │
│  │ Pooling Layer                               │                           │
│  │ Average all token embeddings into one       │                           │
│  └─────────────────────────────────────────────┘                           │
│         │                                                                   │
│         ▼                                                                   │
│  Output: [0.23, -0.45, 0.12, ...] (384 dimensions)                         │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Parameter	Why We Use It
`normalize_embeddings=True`	Converts to unit vectors so dot product = cosine similarity (faster)
`convert_to_numpy=True`	NumPy arrays are faster for math operations than PyTorch tensors
`show_progress_bar=len(texts) > 10`	Only show progress for large batches to avoid noise
`batch_size=32`	Process 32 texts at once for GPU efficiency

Model Choice Explained:

all-MiniLM-L6-v2 is the default because it's 6x faster than larger models while maintaining 90%+ of the quality
For maximum accuracy, use all-mpnet-base-v2 (2x the dimensions, 2x the compute)

Step 3: Search Engine

Build the search functionality:

src/search.py

"""
Semantic search engine using cosine similarity.
"""

import numpy as np
from dataclasses import dataclass
from typing import Optional

from .embeddings import EmbeddingGenerator


@dataclass
class SearchResult:
    """A single search result."""
    id: str
    text: str
    score: float
    metadata: dict


class SemanticSearchEngine:
    """
    A semantic search engine using dense embeddings.
    
    This implementation stores vectors in memory using NumPy.
    For production, consider using a vector database like
    ChromaDB, Pinecone, or Weaviate.
    """
    
    def __init__(self, model_name: str = "all-MiniLM-L6-v2"):
        """Initialize the search engine."""
        self.embedder = EmbeddingGenerator(model_name)
        self.documents: list[dict] = []
        self.embeddings: Optional[np.ndarray] = None
    
    def add_documents(self, documents: list[dict]) -> None:
        """
        Add documents to the search index.
        
        Args:
            documents: List of dicts with 'id', 'text', and optional 'metadata'
        """
        texts = [doc["text"] for doc in documents]
        new_embeddings = self.embedder.embed(texts)
        
        if self.embeddings is None:
            self.embeddings = new_embeddings
        else:
            self.embeddings = np.vstack([self.embeddings, new_embeddings])
        
        self.documents.extend(documents)
        print(f"Indexed {len(documents)} documents. Total: {len(self.documents)}")
    
    def search(
        self, 
        query: str, 
        top_k: int = 5,
        threshold: float = 0.0
    ) -> list[SearchResult]:
        """
        Search for documents similar to the query.
        
        Args:
            query: The search query
            top_k: Number of results to return
            threshold: Minimum similarity score (0-1)
            
        Returns:
            List of SearchResult objects sorted by relevance
        """
        if self.embeddings is None or len(self.documents) == 0:
            return []
        
        # Embed the query
        query_embedding = self.embedder.embed(query)[0]
        
        # Compute cosine similarity with all documents
        # Since embeddings are normalized, dot product = cosine similarity
        similarities = np.dot(self.embeddings, query_embedding)
        
        # Get top-k indices
        top_indices = np.argsort(similarities)[::-1][:top_k]
        
        # Build results
        results = []
        for idx in top_indices:
            score = float(similarities[idx])
            if score >= threshold:
                doc = self.documents[idx]
                results.append(SearchResult(
                    id=doc.get("id", str(idx)),
                    text=doc["text"],
                    score=score,
                    metadata=doc.get("metadata", {})
                ))
        
        return results
    
    def search_with_filter(
        self,
        query: str,
        filter_fn: callable,
        top_k: int = 5
    ) -> list[SearchResult]:
        """
        Search with a metadata filter.
        
        Args:
            query: The search query
            filter_fn: Function that takes a document and returns True/False
            top_k: Number of results to return
        """
        # Filter documents
        filtered_indices = [
            i for i, doc in enumerate(self.documents)
            if filter_fn(doc)
        ]
        
        if not filtered_indices:
            return []
        
        # Embed query
        query_embedding = self.embedder.embed(query)[0]
        
        # Compute similarities only for filtered documents
        filtered_embeddings = self.embeddings[filtered_indices]
        similarities = np.dot(filtered_embeddings, query_embedding)
        
        # Get top-k from filtered
        top_k = min(top_k, len(filtered_indices))
        top_local_indices = np.argsort(similarities)[::-1][:top_k]
        
        results = []
        for local_idx in top_local_indices:
            global_idx = filtered_indices[local_idx]
            doc = self.documents[global_idx]
            results.append(SearchResult(
                id=doc.get("id", str(global_idx)),
                text=doc["text"],
                score=float(similarities[local_idx]),
                metadata=doc.get("metadata", {})
            ))
        
        return results
    
    def save(self, path: str) -> None:
        """Save the index to disk."""
        import json
        
        np.save(f"{path}_embeddings.npy", self.embeddings)
        with open(f"{path}_documents.json", "w") as f:
            json.dump(self.documents, f)
    
    def load(self, path: str) -> None:
        """Load the index from disk."""
        import json
        
        self.embeddings = np.load(f"{path}_embeddings.npy")
        with open(f"{path}_documents.json", "r") as f:
            self.documents = json.load(f)


# Example usage
if __name__ == "__main__":
    engine = SemanticSearchEngine()
    
    # Add sample documents
    documents = [
        {"id": "1", "text": "Python is a versatile programming language", "metadata": {"category": "tech"}},
        {"id": "2", "text": "Machine learning enables computers to learn from data", "metadata": {"category": "ai"}},
        {"id": "3", "text": "Natural language processing helps computers understand text", "metadata": {"category": "ai"}},
        {"id": "4", "text": "Web development involves creating websites and applications", "metadata": {"category": "tech"}},
        {"id": "5", "text": "Deep learning uses neural networks with many layers", "metadata": {"category": "ai"}},
    ]
    
    engine.add_documents(documents)
    
    # Search
    results = engine.search("How do computers understand human language?", top_k=3)
    
    print("\nSearch Results:")
    for result in results:
        print(f"  [{result.score:.4f}] {result.text}")

Understanding the Search Engine:

┌─────────────────────────────────────────────────────────────────────────────┐
│ SEARCH FLOW: How search() Works                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Query: "How do computers understand language?"                            │
│         │                                                                   │
│         ▼                                                                   │
│  ┌─────────────────────────────────────────────┐                           │
│  │ Step 1: Embed the Query                     │                           │
│  │ query_embedding = [0.15, -0.32, 0.78, ...]  │                           │
│  └─────────────────────────────────────────────┘                           │
│         │                                                                   │
│         ▼                                                                   │
│  ┌─────────────────────────────────────────────┐                           │
│  │ Step 2: Compute Similarity with ALL Docs    │                           │
│  │                                             │                           │
│  │   Doc 1: "Python is..."     → sim = 0.23   │                           │
│  │   Doc 2: "Machine learning" → sim = 0.67   │                           │
│  │   Doc 3: "NLP helps..."     → sim = 0.89   │  ← Highest!               │
│  │   Doc 4: "Web development"  → sim = 0.12   │                           │
│  │   Doc 5: "Deep learning"    → sim = 0.71   │                           │
│  └─────────────────────────────────────────────┘                           │
│         │                                                                   │
│         ▼                                                                   │
│  ┌─────────────────────────────────────────────┐                           │
│  │ Step 3: Sort by Score, Return Top K         │                           │
│  │                                             │                           │
│  │   1. Doc 3 (0.89) - NLP                     │                           │
│  │   2. Doc 5 (0.71) - Deep learning           │                           │
│  │   3. Doc 2 (0.67) - Machine learning        │                           │
│  └─────────────────────────────────────────────┘                           │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Why Dot Product = Cosine Similarity (When Normalized):

┌─────────────────────────────────────────────────────────────────────────────┐
│ THE NORMALIZATION TRICK                                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Cosine Similarity Formula:                                                │
│                                                                             │
│            A · B           dot product                                     │
│  cos(θ) = ─────────── = ─────────────────                                  │
│           ‖A‖ × ‖B‖     length A × length B                                │
│                                                                             │
│  When vectors are NORMALIZED (length = 1):                                 │
│                                                                             │
│            A · B         A · B                                             │
│  cos(θ) = ─────────── = ─────── = A · B   ← Just dot product!             │
│            1 × 1           1                                               │
│                                                                             │
│  PERFORMANCE IMPACT:                                                       │
│  • Without normalization: 3 operations (dot + 2 norms)                     │
│  • With normalization: 1 operation (dot only)                              │
│  • 3x faster for similarity computation!                                   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Method	What It Does	When to Use
`search()`	Find similar docs from entire index	General search
`search_with_filter()`	Search only docs matching filter	Category/metadata filtering
`save()` / `load()`	Persist index to disk	Production use

Step 4: Document Indexer

Create a utility to index documents from files:

src/indexer.py

"""
Document indexer for loading and processing documents.
"""

import json
from pathlib import Path
from typing import Iterator
import hashlib


def generate_doc_id(text: str) -> str:
    """Generate a unique ID from text content."""
    return hashlib.md5(text.encode()).hexdigest()[:12]


def load_json_documents(path: str) -> list[dict]:
    """
    Load documents from a JSON file.
    
    Expected format:
    [
        {"text": "...", "metadata": {...}},
        ...
    ]
    """
    with open(path, "r") as f:
        data = json.load(f)
    
    documents = []
    for i, item in enumerate(data):
        doc = {
            "id": item.get("id", generate_doc_id(item["text"])),
            "text": item["text"],
            "metadata": item.get("metadata", {})
        }
        documents.append(doc)
    
    return documents


def load_text_files(directory: str, pattern: str = "*.txt") -> list[dict]:
    """
    Load documents from text files in a directory.
    
    Each file becomes one document.
    """
    documents = []
    path = Path(directory)
    
    for file_path in path.glob(pattern):
        text = file_path.read_text()
        documents.append({
            "id": generate_doc_id(text),
            "text": text,
            "metadata": {
                "filename": file_path.name,
                "path": str(file_path)
            }
        })
    
    return documents


def chunk_text(
    text: str,
    chunk_size: int = 500,
    overlap: int = 50
) -> Iterator[str]:
    """
    Split text into overlapping chunks.
    
    Args:
        text: The text to split
        chunk_size: Maximum characters per chunk
        overlap: Number of characters to overlap
        
    Yields:
        Text chunks
    """
    start = 0
    text_length = len(text)
    
    while start < text_length:
        end = start + chunk_size
        
        # Try to break at a sentence boundary
        if end < text_length:
            # Look for sentence endings
            for sep in [". ", "! ", "? ", "\n\n", "\n"]:
                last_sep = text[start:end].rfind(sep)
                if last_sep != -1:
                    end = start + last_sep + len(sep)
                    break
        
        yield text[start:end].strip()
        start = end - overlap


def load_and_chunk_documents(
    path: str,
    chunk_size: int = 500,
    overlap: int = 50
) -> list[dict]:
    """
    Load documents and split into chunks.
    
    Useful for long documents where you want to
    search within specific sections.
    """
    documents = load_json_documents(path)
    chunked = []
    
    for doc in documents:
        chunks = list(chunk_text(doc["text"], chunk_size, overlap))
        
        for i, chunk in enumerate(chunks):
            chunked.append({
                "id": f"{doc['id']}_chunk_{i}",
                "text": chunk,
                "metadata": {
                    **doc.get("metadata", {}),
                    "parent_id": doc["id"],
                    "chunk_index": i,
                    "total_chunks": len(chunks)
                }
            })

    return chunked

Understanding Document Chunking:

┌─────────────────────────────────────────────────────────────────────────────┐
│ WHY CHUNK DOCUMENTS?                                                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Problem: Long documents dilute the embedding                              │
│                                                                             │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │ Long Document (10 pages)                                             │  │
│  │ ┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────┐ │  │
│  │ │ Intro   │ Topic A │ Topic B │ Topic C │ Topic D │ Topic E │ ... │ │  │
│  │ └─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────┘ │  │
│  │                              │                                       │  │
│  │                              ▼                                       │  │
│  │              Single Embedding = AVERAGE of everything                │  │
│  │              (Too generic, loses specific details)                   │  │
│  └──────────────────────────────────────────────────────────────────────┘  │
│                                                                             │
│  Solution: Split into chunks, embed each separately                        │
│                                                                             │
│  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐                          │
│  │ Chunk 1 │ │ Chunk 2 │ │ Chunk 3 │ │ Chunk 4 │                          │
│  │ Intro   │ │ Topic A │ │ Topic B │ │ Topic C │                          │
│  └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘                          │
│       ▼           ▼           ▼           ▼                               │
│    Embed 1     Embed 2     Embed 3     Embed 4                            │
│   (specific)  (specific)  (specific)  (specific)                          │
│                                                                             │
│  Now queries match SPECIFIC sections, not the whole doc!                   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Chunk Overlap Explained:

┌─────────────────────────────────────────────────────────────────────────────┐
│ WHY OVERLAP CHUNKS?                                                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Without Overlap (chunk_size=500, overlap=0):                              │
│                                                                             │
│  "The cat sat on the | mat and looked at | the window."                    │
│        Chunk 1       |      Chunk 2      |   Chunk 3                       │
│                      ↑                                                      │
│            Sentence cut in half! Context lost.                             │
│                                                                             │
│  With Overlap (chunk_size=500, overlap=50):                                │
│                                                                             │
│  "The cat sat on the mat | mat and looked at the | the window and..."      │
│        Chunk 1           |      Chunk 2          |   Chunk 3               │
│                     └─────┘                 └────┘                          │
│                     Overlap                 Overlap                         │
│                                                                             │
│  Benefits:                                                                 │
│  • Sentences aren't cut mid-thought                                        │
│  • Context preserved at boundaries                                         │
│  • Trade-off: Slightly more storage (10-20% more chunks)                   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Parameter	Recommended Value	Why
`chunk_size`	500-1000 chars	Large enough for context, small enough for specificity
`overlap`	10-20% of chunk_size	Preserve sentence boundaries
Sentence-aware splitting	Yes	Try to break at `.` , `\n`, etc.

Step 5: FastAPI Application

Build the REST API:

src/api.py

"""
FastAPI application for semantic search.
"""

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field
from typing import Optional

from .search import SemanticSearchEngine, SearchResult
from .indexer import load_json_documents


# Request/Response models
class SearchRequest(BaseModel):
    query: str = Field(..., min_length=1, max_length=1000)
    top_k: int = Field(default=5, ge=1, le=100)
    threshold: float = Field(default=0.0, ge=0.0, le=1.0)
    category: Optional[str] = None


class SearchResultResponse(BaseModel):
    id: str
    text: str
    score: float
    metadata: dict


class SearchResponse(BaseModel):
    query: str
    results: list[SearchResultResponse]
    total: int


class DocumentRequest(BaseModel):
    id: Optional[str] = None
    text: str
    metadata: dict = {}


class IndexResponse(BaseModel):
    message: str
    document_count: int


# Initialize app and search engine
app = FastAPI(
    title="Semantic Search API",
    description="Search documents using semantic similarity",
    version="1.0.0"
)

# Global search engine instance
engine = SemanticSearchEngine()


@app.on_event("startup")
async def startup():
    """Load initial documents on startup."""
    try:
        engine.load("data/index")
        print(f"Loaded {len(engine.documents)} documents from disk")
    except FileNotFoundError:
        print("No existing index found. Starting fresh.")


@app.get("/")
async def root():
    """Health check endpoint."""
    return {
        "status": "healthy",
        "documents_indexed": len(engine.documents)
    }


@app.post("/search", response_model=SearchResponse)
async def search(request: SearchRequest):
    """
    Search for documents similar to the query.
    
    - **query**: The search query (required)
    - **top_k**: Number of results to return (default: 5)
    - **threshold**: Minimum similarity score 0-1 (default: 0)
    - **category**: Filter by metadata category (optional)
    """
    if request.category:
        results = engine.search_with_filter(
            query=request.query,
            filter_fn=lambda doc: doc.get("metadata", {}).get("category") == request.category,
            top_k=request.top_k
        )
    else:
        results = engine.search(
            query=request.query,
            top_k=request.top_k,
            threshold=request.threshold
        )
    
    return SearchResponse(
        query=request.query,
        results=[
            SearchResultResponse(
                id=r.id,
                text=r.text,
                score=r.score,
                metadata=r.metadata
            )
            for r in results
        ],
        total=len(results)
    )


@app.post("/index", response_model=IndexResponse)
async def index_documents(documents: list[DocumentRequest]):
    """
    Add documents to the search index.
    
    Each document should have:
    - **text**: The document content (required)
    - **id**: Unique identifier (optional, auto-generated if not provided)
    - **metadata**: Additional metadata dict (optional)
    """
    docs = [
        {
            "id": doc.id or f"doc_{len(engine.documents) + i}",
            "text": doc.text,
            "metadata": doc.metadata
        }
        for i, doc in enumerate(documents)
    ]
    
    engine.add_documents(docs)
    
    return IndexResponse(
        message=f"Indexed {len(documents)} documents",
        document_count=len(engine.documents)
    )


@app.post("/index/file")
async def index_from_file(path: str):
    """Index documents from a JSON file."""
    try:
        documents = load_json_documents(path)
        engine.add_documents(documents)
        return IndexResponse(
            message=f"Indexed {len(documents)} documents from {path}",
            document_count=len(engine.documents)
        )
    except FileNotFoundError:
        raise HTTPException(status_code=404, detail=f"File not found: {path}")


@app.post("/save")
async def save_index(path: str = "data/index"):
    """Save the current index to disk."""
    engine.save(path)
    return {"message": f"Index saved to {path}"}


@app.get("/stats")
async def get_stats():
    """Get index statistics."""
    return {
        "total_documents": len(engine.documents),
        "embedding_dimension": engine.embedder.embedding_dim,
        "model": engine.embedder.model.get_config_dict().get("name_or_path", "unknown")
    }

Understanding the API Architecture:

┌─────────────────────────────────────────────────────────────────────────────┐
│ API ENDPOINTS OVERVIEW                                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Client (Browser/curl/Frontend)                                            │
│         │                                                                   │
│         ▼                                                                   │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │ FastAPI Application                                                  │   │
│  │                                                                      │   │
│  │  GET  /           → Health check (is server running?)               │   │
│  │  POST /search     → Find similar documents                          │   │
│  │  POST /index      → Add new documents                               │   │
│  │  POST /index/file → Bulk load from JSON file                        │   │
│  │  POST /save       → Persist index to disk                           │   │
│  │  GET  /stats      → Get index statistics                            │   │
│  │                                                                      │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│         │                                                                   │
│         ▼                                                                   │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │ SemanticSearchEngine (In-Memory)                                    │   │
│  │                                                                      │   │
│  │  documents: [...]        embeddings: numpy array                    │   │
│  │  embedder: SentenceTransformer model                                │   │
│  │                                                                      │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Pydantic Models Explained:

┌─────────────────────────────────────────────────────────────────────────────┐
│ WHY PYDANTIC FOR REQUEST/RESPONSE MODELS?                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  1. AUTOMATIC VALIDATION                                                   │
│     SearchRequest(query="", top_k=-5)                                      │
│     → Error: "query must be at least 1 character"                          │
│     → Error: "top_k must be >= 1"                                          │
│                                                                             │
│  2. AUTOMATIC DOCUMENTATION                                                │
│     FastAPI generates OpenAPI docs at /docs                                │
│     All fields, types, and constraints are documented                      │
│                                                                             │
│  3. TYPE SAFETY                                                            │
│     IDE autocomplete and type checking work                                │
│     Catch errors before runtime                                            │
│                                                                             │
│  4. SERIALIZATION                                                          │
│     Python objects ↔ JSON automatically                                    │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Endpoint	Method	Purpose	Key Parameters
`/search`	POST	Find similar docs	query, top_k, threshold, category
`/index`	POST	Add documents	list of text and metadata objects
`/stats`	GET	Index info	None

Step 6: Streamlit Frontend

Create an interactive UI:

app.py

"""
Streamlit frontend for semantic search.
"""

import streamlit as st
import requests
from typing import Optional

# Configuration
API_URL = "http://localhost:8000"

st.set_page_config(
    page_title="Semantic Search",
    page_icon="🔍",
    layout="wide"
)


def search(query: str, top_k: int = 5, threshold: float = 0.0) -> dict:
    """Call the search API."""
    response = requests.post(
        f"{API_URL}/search",
        json={
            "query": query,
            "top_k": top_k,
            "threshold": threshold
        }
    )
    return response.json()


def get_stats() -> dict:
    """Get index statistics."""
    try:
        response = requests.get(f"{API_URL}/stats")
        return response.json()
    except:
        return {"error": "Could not connect to API"}


def main():
    st.title("🔍 Semantic Search Engine")
    st.markdown("Search documents using AI-powered semantic similarity")
    
    # Sidebar
    with st.sidebar:
        st.header("Settings")
        top_k = st.slider("Number of results", 1, 20, 5)
        threshold = st.slider("Minimum score", 0.0, 1.0, 0.0, 0.05)
        
        st.divider()
        
        st.header("Index Stats")
        stats = get_stats()
        if "error" not in stats:
            st.metric("Documents Indexed", stats.get("total_documents", 0))
            st.metric("Embedding Dimension", stats.get("embedding_dimension", 0))
            st.caption(f"Model: {stats.get('model', 'unknown')}")
        else:
            st.error("API not available")
    
    # Main search interface
    query = st.text_input(
        "Enter your search query",
        placeholder="What would you like to find?"
    )
    
    col1, col2 = st.columns([1, 5])
    with col1:
        search_button = st.button("🔍 Search", type="primary")
    
    if search_button and query:
        with st.spinner("Searching..."):
            results = search(query, top_k, threshold)
        
        if results.get("results"):
            st.success(f"Found {results['total']} results")
            
            for i, result in enumerate(results["results"], 1):
                score = result["score"]
                
                # Color code by score
                if score >= 0.7:
                    score_color = "🟢"
                elif score >= 0.4:
                    score_color = "🟡"
                else:
                    score_color = "🔴"
                
                with st.expander(
                    f"{score_color} Result {i} (Score: {score:.4f})",
                    expanded=i <= 3
                ):
                    st.markdown(f"**Text:** {result['text']}")
                    st.markdown(f"**ID:** `{result['id']}`")
                    if result.get("metadata"):
                        st.json(result["metadata"])
        else:
            st.warning("No results found. Try a different query.")
    
    # Document upload section
    st.divider()
    st.header("Add Documents")
    
    with st.form("add_document"):
        doc_text = st.text_area(
            "Document text",
            placeholder="Enter the document content..."
        )
        doc_metadata = st.text_input(
            "Metadata (JSON)",
            placeholder='{"category": "tech"}'
        )
        
        if st.form_submit_button("Add Document"):
            if doc_text:
                import json
                metadata = {}
                if doc_metadata:
                    try:
                        metadata = json.loads(doc_metadata)
                    except:
                        st.error("Invalid JSON in metadata")
                        return
                
                response = requests.post(
                    f"{API_URL}/index",
                    json=[{"text": doc_text, "metadata": metadata}]
                )
                
                if response.status_code == 200:
                    st.success("Document added successfully!")
                    st.rerun()
                else:
                    st.error("Failed to add document")


if __name__ == "__main__":
    main()

Understanding the Streamlit UI:

┌─────────────────────────────────────────────────────────────────────────────┐
│ STREAMLIT ARCHITECTURE                                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │ Browser                                                              │   │
│  │  ┌─────────────┐  ┌──────────────────────────────────────────────┐  │   │
│  │  │  Sidebar    │  │  Main Content                                │  │   │
│  │  │             │  │                                              │  │   │
│  │  │  Settings:  │  │  🔍 Search Box                               │  │   │
│  │  │  • top_k    │  │  ┌──────────────────────────────────────┐   │  │   │
│  │  │  • threshold│  │  │ Enter your search query...           │   │  │   │
│  │  │             │  │  └──────────────────────────────────────┘   │  │   │
│  │  │  Stats:     │  │                                              │  │   │
│  │  │  • 10 docs  │  │  Results:                                   │  │   │
│  │  │  • 384 dims │  │  🟢 Result 1 (0.89) - NLP helps...          │  │   │
│  │  │             │  │  🟡 Result 2 (0.67) - Machine learning...   │  │   │
│  │  └─────────────┘  │  🔴 Result 3 (0.23) - Web development...    │  │   │
│  │                   └──────────────────────────────────────────────┘  │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│         │                                                                   │
│         │ HTTP requests                                                     │
│         ▼                                                                   │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │ FastAPI Backend (localhost:8000)                                    │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Score Color Coding Logic:

Score	Color	Meaning
0.7+	🟢 Green	Highly relevant - confident match
0.4-0.7	🟡 Yellow	Somewhat relevant - may be useful
Under 0.4	🔴 Red	Low relevance - likely not what user wants

Why Streamlit?

Rapid prototyping (full UI in ~100 lines)
No JavaScript required
Automatic reactivity (UI updates when data changes)
Built-in widgets (sliders, text inputs, expanders)

Step 7: Sample Documents

Create sample data to test with:

data/documents.json

[
  {
    "text": "Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed.",
    "metadata": {"category": "ai", "topic": "machine-learning"}
  },
  {
    "text": "Natural language processing (NLP) is a field of AI focused on enabling computers to understand, interpret, and generate human language.",
    "metadata": {"category": "ai", "topic": "nlp"}
  },
  {
    "text": "Deep learning uses artificial neural networks with multiple layers to progressively extract higher-level features from raw input.",
    "metadata": {"category": "ai", "topic": "deep-learning"}
  },
  {
    "text": "Python is a high-level, interpreted programming language known for its readability and versatility in web development, data science, and automation.",
    "metadata": {"category": "programming", "topic": "python"}
  },
  {
    "text": "Vector databases are specialized database systems designed to store and query high-dimensional vector data efficiently using similarity search.",
    "metadata": {"category": "databases", "topic": "vector-db"}
  },
  {
    "text": "Transformers are a neural network architecture that uses self-attention mechanisms to process sequential data in parallel.",
    "metadata": {"category": "ai", "topic": "transformers"}
  },
  {
    "text": "Embeddings are dense vector representations of data that capture semantic meaning, enabling similarity comparisons between items.",
    "metadata": {"category": "ai", "topic": "embeddings"}
  },
  {
    "text": "FastAPI is a modern Python web framework for building APIs with automatic documentation and type validation.",
    "metadata": {"category": "programming", "topic": "fastapi"}
  },
  {
    "text": "Cosine similarity measures the cosine of the angle between two vectors, commonly used to determine how similar two documents are.",
    "metadata": {"category": "math", "topic": "similarity"}
  },
  {
    "text": "Retrieval-Augmented Generation (RAG) combines information retrieval with language models to generate more accurate and grounded responses.",
    "metadata": {"category": "ai", "topic": "rag"}
  }
]

Running the Application

Start the API Server

# From project root
uvicorn src.api:app --reload --host 0.0.0.0 --port 8000

Index Sample Documents

# Using curl
curl -X POST "http://localhost:8000/index/file?path=data/documents.json"

Start the Frontend

# In a new terminal
streamlit run app.py

Test with curl

# Search for documents
curl -X POST "http://localhost:8000/search" \
  -H "Content-Type: application/json" \
  -d '{"query": "How do computers understand text?", "top_k": 3}'

# Add a document
curl -X POST "http://localhost:8000/index" \
  -H "Content-Type: application/json" \
  -d '[{"text": "Docker containers package applications with their dependencies", "metadata": {"category": "devops"}}]'

Testing

tests/test_search.py

"""Tests for semantic search engine."""

import pytest
from src.search import SemanticSearchEngine


@pytest.fixture
def engine():
    """Create a search engine with test documents."""
    engine = SemanticSearchEngine()
    engine.add_documents([
        {"id": "1", "text": "Python programming language", "metadata": {"category": "tech"}},
        {"id": "2", "text": "Machine learning algorithms", "metadata": {"category": "ai"}},
        {"id": "3", "text": "Data science and analytics", "metadata": {"category": "data"}},
    ])
    return engine


def test_search_returns_results(engine):
    """Test that search returns relevant results."""
    results = engine.search("programming in Python")
    
    assert len(results) > 0
    assert results[0].id == "1"  # Python doc should be most relevant


def test_search_with_threshold(engine):
    """Test that threshold filters low-scoring results."""
    results = engine.search("unrelated query about cooking", threshold=0.5)
    
    # Should filter out low-scoring results
    assert all(r.score >= 0.5 for r in results)


def test_search_with_filter(engine):
    """Test category filtering."""
    results = engine.search_with_filter(
        query="learning",
        filter_fn=lambda doc: doc.get("metadata", {}).get("category") == "ai"
    )
    
    assert len(results) > 0
    assert all(r.metadata.get("category") == "ai" for r in results)


def test_empty_index_returns_empty():
    """Test searching empty index."""
    engine = SemanticSearchEngine()
    results = engine.search("test query")
    
    assert results == []


def test_similar_texts_have_high_scores(engine):
    """Test that semantically similar texts score highly."""
    results = engine.search("ML models and AI")
    
    # Machine learning doc should have high score
    ml_result = next((r for r in results if r.id == "2"), None)
    assert ml_result is not None
    assert ml_result.score > 0.5


def test_save_and_load(engine, tmp_path):
    """Test persistence."""
    path = str(tmp_path / "test_index")
    engine.save(path)
    
    new_engine = SemanticSearchEngine()
    new_engine.load(path)
    
    assert len(new_engine.documents) == len(engine.documents)

Run tests:

pytest tests/ -v

Key Concepts Explained

Why Normalize Embeddings?

When embeddings are normalized (unit length), cosine similarity simplifies to a dot product:

# Without normalization
cosine_sim = np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# With normalization (||a|| = ||b|| = 1)
cosine_sim = np.dot(a, b)  # Much faster!

Choosing the Right Model

Model	Dimensions	Speed	Quality	Use Case
all-MiniLM-L6-v2	384	Fast	Good	General purpose
all-mpnet-base-v2	768	Medium	Best	High accuracy needs
paraphrase-multilingual-MiniLM-L12-v2	384	Fast	Good	Multi-language

Similarity Score Interpretation

Score Range	Interpretation
0.8 - 1.0	Very similar / Near duplicate
0.6 - 0.8	Highly relevant
0.4 - 0.6	Somewhat relevant
0.2 - 0.4	Loosely related
0.0 - 0.2	Not relevant

Next Steps

Now that you've built a basic semantic search engine, continue to:

Text Clustering - Group similar documents automatically
Similarity Recommendations - Build a recommendation system
Production Pipeline - Scale to millions of documents

Key Concepts Recap

Concept	What It Is	Why It Matters
Text Embeddings	Dense vectors representing text meaning	Capture semantics, not just keywords
sentence-transformers	Library for generating embeddings	Pre-trained models, easy to use
Cosine Similarity	Measure of vector angle (0 to 1)	Compare meaning similarity
Vector Normalization	Scale vectors to unit length	Faster similarity computation
Embedding Dimensions	Size of vector (384, 768, etc.)	Trade-off: quality vs speed/storage
ChromaDB	Vector database for storage	Efficient similarity search at scale
Relevance Threshold	Minimum similarity score	Filter out low-quality matches

Summary

You've built a complete semantic search system that:

Transforms text into dense vector embeddings
Uses cosine similarity for relevance ranking
Provides a REST API for search operations
Includes a user-friendly Streamlit interface
Supports filtering by metadata

This foundation enables powerful AI applications like document search, FAQ matching, and content recommendations.

Semantic Search Engine

Build a semantic search system that understands meaning, not just keywords

TL;DR

What You'll Learn

How text embeddings capture semantic meaning
Using sentence-transformers for high-quality embeddings
Implementing cosine similarity for relevance ranking
Building a complete search API with FastAPI

Tech Stack

Component	Technology
Embeddings	sentence-transformers
Vector Storage	NumPy / ChromaDB
API	FastAPI
Frontend	Streamlit

Prerequisites

Python 3.9+
Basic understanding of vectors
Familiarity with REST APIs

Understanding Embeddings

┌─────────────────────────────────────────────────────────────────────────────┐
│                        HOW EMBEDDINGS WORK                                  │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  INPUT TEXT                    MODEL              OUTPUT VECTORS            │
│                                                                             │
│  "The cat sat on the mat"  ─┐                ┌─► [0.2, 0.8, -0.1, ...]      │
│                              │  ┌──────────┐ │                              │
│  "A feline rested on a rug" ─┼─►│Embedding │─┼─► [0.19, 0.82, -0.08, ...]   │
│                              │  │  Model   │ │           ▲                  │
│  "Dogs love playing fetch"  ─┘  └──────────┘ └─► [-0.5, 0.1, 0.9, ...]      │
│                                                          │                  │
│                                                          │                  │
│  SIMILARITY IN VECTOR SPACE:                             │                  │
│                                                          │                  │
│  Cat sentence ●───────● Feline sentence                  │                  │
│               (CLOSE - similar meaning!)                 │                  │
│                                                          │                  │
│  Cat sentence ●                    ● Dog sentence        │                  │
│               (FAR APART - different topics)             │                  │
│                                                                             │
│  Key insight: Similar meanings → Similar vectors → Easy to find matches!    │
└─────────────────────────────────────────────────────────────────────────────┘

Project Structure

semantic-search/
├── src/
│   ├── __init__.py
│   ├── embeddings.py      # Embedding generation
│   ├── search.py          # Search engine logic
│   ├── indexer.py         # Document indexing
│   └── api.py             # FastAPI application
├── data/
│   └── documents.json     # Sample documents
├── tests/
│   └── test_search.py
├── app.py                 # Streamlit frontend
├── requirements.txt
└── README.md

Implementation

Step 1: Project Setup

Create your project and install dependencies:

mkdir semantic-search && cd semantic-search
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

requirements.txt

sentence-transformers>=2.2.0
numpy>=1.24.0
fastapi>=0.100.0
uvicorn>=0.23.0
streamlit>=1.28.0
chromadb>=0.4.0
pydantic>=2.0.0

pip install -r requirements.txt

Step 2: Embedding Generator

Create the core embedding functionality:

src/embeddings.py

"""
Embedding generation using sentence-transformers.

Sentence-transformers provides state-of-the-art embeddings
optimized for semantic similarity tasks.
"""

from sentence_transformers import SentenceTransformer
import numpy as np
from typing import Union


class EmbeddingGenerator:
    """Generate embeddings for text using sentence-transformers."""
    
    def __init__(self, model_name: str = "all-MiniLM-L6-v2"):
        """
        Initialize the embedding generator.
        
        Args:
            model_name: The sentence-transformer model to use.
                - all-MiniLM-L6-v2: Fast, good quality (384 dims)
                - all-mpnet-base-v2: Best quality (768 dims)
                - paraphrase-multilingual-MiniLM-L12-v2: Multilingual
        """
        self.model = SentenceTransformer(model_name)
        self.embedding_dim = self.model.get_sentence_embedding_dimension()
    
    def embed(self, texts: Union[str, list[str]]) -> np.ndarray:
        """
        Generate embeddings for one or more texts.
        
        Args:
            texts: Single text or list of texts to embed
            
        Returns:
            Numpy array of shape (n_texts, embedding_dim)
        """
        if isinstance(texts, str):
            texts = [texts]
        
        embeddings = self.model.encode(
            texts,
            convert_to_numpy=True,
            normalize_embeddings=True,  # For cosine similarity
            show_progress_bar=len(texts) > 10
        )
        
        return embeddings
    
    def embed_batch(
        self, 
        texts: list[str], 
        batch_size: int = 32
    ) -> np.ndarray:
        """
        Generate embeddings in batches for large datasets.
        
        Args:
            texts: List of texts to embed
            batch_size: Number of texts per batch
            
        Returns:
            Numpy array of embeddings
        """
        return self.model.encode(
            texts,
            convert_to_numpy=True,
            normalize_embeddings=True,
            batch_size=batch_size,
            show_progress_bar=True
        )


# Quick test
if __name__ == "__main__":
    generator = EmbeddingGenerator()
    
    texts = [
        "The quick brown fox jumps over the lazy dog",
        "A fast auburn fox leaps above a sleepy canine",
        "Python is a programming language"
    ]
    
    embeddings = generator.embed(texts)
    print(f"Embedding shape: {embeddings.shape}")
    print(f"Embedding dimension: {generator.embedding_dim}")
    
    # Check similarity
    from numpy.linalg import norm
    
    def cosine_similarity(a, b):
        return np.dot(a, b) / (norm(a) * norm(b))
    
    print(f"\nSimilarity (text 0 vs 1): {cosine_similarity(embeddings[0], embeddings[1]):.4f}")
    print(f"Similarity (text 0 vs 2): {cosine_similarity(embeddings[0], embeddings[2]):.4f}")

Understanding the Embedding Generator:

┌─────────────────────────────────────────────────────────────────────────────┐
│ HOW SENTENCE-TRANSFORMERS WORKS                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Input Text: "The cat sat on the mat"                                      │
│         │                                                                   │
│         ▼                                                                   │
│  ┌─────────────────────────────────────────────┐                           │
│  │ Tokenizer                                   │                           │
│  │ Split into tokens: ["The", "cat", "sat"...] │                           │
│  └─────────────────────────────────────────────┘                           │
│         │                                                                   │
│         ▼                                                                   │
│  ┌─────────────────────────────────────────────┐                           │
│  │ Transformer Encoder (BERT-based)            │                           │
│  │ Processes all tokens, captures context      │                           │
│  └─────────────────────────────────────────────┘                           │
│         │                                                                   │
│         ▼                                                                   │
│  ┌─────────────────────────────────────────────┐                           │
│  │ Pooling Layer                               │                           │
│  │ Average all token embeddings into one       │                           │
│  └─────────────────────────────────────────────┘                           │
│         │                                                                   │
│         ▼                                                                   │
│  Output: [0.23, -0.45, 0.12, ...] (384 dimensions)                         │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Parameter	Why We Use It
`normalize_embeddings=True`	Converts to unit vectors so dot product = cosine similarity (faster)
`convert_to_numpy=True`	NumPy arrays are faster for math operations than PyTorch tensors
`show_progress_bar=len(texts) > 10`	Only show progress for large batches to avoid noise
`batch_size=32`	Process 32 texts at once for GPU efficiency

Model Choice Explained:

all-MiniLM-L6-v2 is the default because it's 6x faster than larger models while maintaining 90%+ of the quality
For maximum accuracy, use all-mpnet-base-v2 (2x the dimensions, 2x the compute)

Step 3: Search Engine

Build the search functionality:

src/search.py

"""
Semantic search engine using cosine similarity.
"""

import numpy as np
from dataclasses import dataclass
from typing import Optional

from .embeddings import EmbeddingGenerator


@dataclass
class SearchResult:
    """A single search result."""
    id: str
    text: str
    score: float
    metadata: dict


class SemanticSearchEngine:
    """
    A semantic search engine using dense embeddings.
    
    This implementation stores vectors in memory using NumPy.
    For production, consider using a vector database like
    ChromaDB, Pinecone, or Weaviate.
    """
    
    def __init__(self, model_name: str = "all-MiniLM-L6-v2"):
        """Initialize the search engine."""
        self.embedder = EmbeddingGenerator(model_name)
        self.documents: list[dict] = []
        self.embeddings: Optional[np.ndarray] = None
    
    def add_documents(self, documents: list[dict]) -> None:
        """
        Add documents to the search index.
        
        Args:
            documents: List of dicts with 'id', 'text', and optional 'metadata'
        """
        texts = [doc["text"] for doc in documents]
        new_embeddings = self.embedder.embed(texts)
        
        if self.embeddings is None:
            self.embeddings = new_embeddings
        else:
            self.embeddings = np.vstack([self.embeddings, new_embeddings])
        
        self.documents.extend(documents)
        print(f"Indexed {len(documents)} documents. Total: {len(self.documents)}")
    
    def search(
        self, 
        query: str, 
        top_k: int = 5,
        threshold: float = 0.0
    ) -> list[SearchResult]:
        """
        Search for documents similar to the query.
        
        Args:
            query: The search query
            top_k: Number of results to return
            threshold: Minimum similarity score (0-1)
            
        Returns:
            List of SearchResult objects sorted by relevance
        """
        if self.embeddings is None or len(self.documents) == 0:
            return []
        
        # Embed the query
        query_embedding = self.embedder.embed(query)[0]
        
        # Compute cosine similarity with all documents
        # Since embeddings are normalized, dot product = cosine similarity
        similarities = np.dot(self.embeddings, query_embedding)
        
        # Get top-k indices
        top_indices = np.argsort(similarities)[::-1][:top_k]
        
        # Build results
        results = []
        for idx in top_indices:
            score = float(similarities[idx])
            if score >= threshold:
                doc = self.documents[idx]
                results.append(SearchResult(
                    id=doc.get("id", str(idx)),
                    text=doc["text"],
                    score=score,
                    metadata=doc.get("metadata", {})
                ))
        
        return results
    
    def search_with_filter(
        self,
        query: str,
        filter_fn: callable,
        top_k: int = 5
    ) -> list[SearchResult]:
        """
        Search with a metadata filter.
        
        Args:
            query: The search query
            filter_fn: Function that takes a document and returns True/False
            top_k: Number of results to return
        """
        # Filter documents
        filtered_indices = [
            i for i, doc in enumerate(self.documents)
            if filter_fn(doc)
        ]
        
        if not filtered_indices:
            return []
        
        # Embed query
        query_embedding = self.embedder.embed(query)[0]
        
        # Compute similarities only for filtered documents
        filtered_embeddings = self.embeddings[filtered_indices]
        similarities = np.dot(filtered_embeddings, query_embedding)
        
        # Get top-k from filtered
        top_k = min(top_k, len(filtered_indices))
        top_local_indices = np.argsort(similarities)[::-1][:top_k]
        
        results = []
        for local_idx in top_local_indices:
            global_idx = filtered_indices[local_idx]
            doc = self.documents[global_idx]
            results.append(SearchResult(
                id=doc.get("id", str(global_idx)),
                text=doc["text"],
                score=float(similarities[local_idx]),
                metadata=doc.get("metadata", {})
            ))
        
        return results
    
    def save(self, path: str) -> None:
        """Save the index to disk."""
        import json
        
        np.save(f"{path}_embeddings.npy", self.embeddings)
        with open(f"{path}_documents.json", "w") as f:
            json.dump(self.documents, f)
    
    def load(self, path: str) -> None:
        """Load the index from disk."""
        import json
        
        self.embeddings = np.load(f"{path}_embeddings.npy")
        with open(f"{path}_documents.json", "r") as f:
            self.documents = json.load(f)


# Example usage
if __name__ == "__main__":
    engine = SemanticSearchEngine()
    
    # Add sample documents
    documents = [
        {"id": "1", "text": "Python is a versatile programming language", "metadata": {"category": "tech"}},
        {"id": "2", "text": "Machine learning enables computers to learn from data", "metadata": {"category": "ai"}},
        {"id": "3", "text": "Natural language processing helps computers understand text", "metadata": {"category": "ai"}},
        {"id": "4", "text": "Web development involves creating websites and applications", "metadata": {"category": "tech"}},
        {"id": "5", "text": "Deep learning uses neural networks with many layers", "metadata": {"category": "ai"}},
    ]
    
    engine.add_documents(documents)
    
    # Search
    results = engine.search("How do computers understand human language?", top_k=3)
    
    print("\nSearch Results:")
    for result in results:
        print(f"  [{result.score:.4f}] {result.text}")

Understanding the Search Engine:

┌─────────────────────────────────────────────────────────────────────────────┐
│ SEARCH FLOW: How search() Works                                             │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Query: "How do computers understand language?"                            │
│         │                                                                   │
│         ▼                                                                   │
│  ┌─────────────────────────────────────────────┐                           │
│  │ Step 1: Embed the Query                     │                           │
│  │ query_embedding = [0.15, -0.32, 0.78, ...]  │                           │
│  └─────────────────────────────────────────────┘                           │
│         │                                                                   │
│         ▼                                                                   │
│  ┌─────────────────────────────────────────────┐                           │
│  │ Step 2: Compute Similarity with ALL Docs    │                           │
│  │                                             │                           │
│  │   Doc 1: "Python is..."     → sim = 0.23   │                           │
│  │   Doc 2: "Machine learning" → sim = 0.67   │                           │
│  │   Doc 3: "NLP helps..."     → sim = 0.89   │  ← Highest!               │
│  │   Doc 4: "Web development"  → sim = 0.12   │                           │
│  │   Doc 5: "Deep learning"    → sim = 0.71   │                           │
│  └─────────────────────────────────────────────┘                           │
│         │                                                                   │
│         ▼                                                                   │
│  ┌─────────────────────────────────────────────┐                           │
│  │ Step 3: Sort by Score, Return Top K         │                           │
│  │                                             │                           │
│  │   1. Doc 3 (0.89) - NLP                     │                           │
│  │   2. Doc 5 (0.71) - Deep learning           │                           │
│  │   3. Doc 2 (0.67) - Machine learning        │                           │
│  └─────────────────────────────────────────────┘                           │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Why Dot Product = Cosine Similarity (When Normalized):

┌─────────────────────────────────────────────────────────────────────────────┐
│ THE NORMALIZATION TRICK                                                     │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Cosine Similarity Formula:                                                │
│                                                                             │
│            A · B           dot product                                     │
│  cos(θ) = ─────────── = ─────────────────                                  │
│           ‖A‖ × ‖B‖     length A × length B                                │
│                                                                             │
│  When vectors are NORMALIZED (length = 1):                                 │
│                                                                             │
│            A · B         A · B                                             │
│  cos(θ) = ─────────── = ─────── = A · B   ← Just dot product!             │
│            1 × 1           1                                               │
│                                                                             │
│  PERFORMANCE IMPACT:                                                       │
│  • Without normalization: 3 operations (dot + 2 norms)                     │
│  • With normalization: 1 operation (dot only)                              │
│  • 3x faster for similarity computation!                                   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Method	What It Does	When to Use
`search()`	Find similar docs from entire index	General search
`search_with_filter()`	Search only docs matching filter	Category/metadata filtering
`save()` / `load()`	Persist index to disk	Production use

Step 4: Document Indexer

Create a utility to index documents from files:

src/indexer.py

"""
Document indexer for loading and processing documents.
"""

import json
from pathlib import Path
from typing import Iterator
import hashlib


def generate_doc_id(text: str) -> str:
    """Generate a unique ID from text content."""
    return hashlib.md5(text.encode()).hexdigest()[:12]


def load_json_documents(path: str) -> list[dict]:
    """
    Load documents from a JSON file.
    
    Expected format:
    [
        {"text": "...", "metadata": {...}},
        ...
    ]
    """
    with open(path, "r") as f:
        data = json.load(f)
    
    documents = []
    for i, item in enumerate(data):
        doc = {
            "id": item.get("id", generate_doc_id(item["text"])),
            "text": item["text"],
            "metadata": item.get("metadata", {})
        }
        documents.append(doc)
    
    return documents


def load_text_files(directory: str, pattern: str = "*.txt") -> list[dict]:
    """
    Load documents from text files in a directory.
    
    Each file becomes one document.
    """
    documents = []
    path = Path(directory)
    
    for file_path in path.glob(pattern):
        text = file_path.read_text()
        documents.append({
            "id": generate_doc_id(text),
            "text": text,
            "metadata": {
                "filename": file_path.name,
                "path": str(file_path)
            }
        })
    
    return documents


def chunk_text(
    text: str,
    chunk_size: int = 500,
    overlap: int = 50
) -> Iterator[str]:
    """
    Split text into overlapping chunks.
    
    Args:
        text: The text to split
        chunk_size: Maximum characters per chunk
        overlap: Number of characters to overlap
        
    Yields:
        Text chunks
    """
    start = 0
    text_length = len(text)
    
    while start < text_length:
        end = start + chunk_size
        
        # Try to break at a sentence boundary
        if end < text_length:
            # Look for sentence endings
            for sep in [". ", "! ", "? ", "\n\n", "\n"]:
                last_sep = text[start:end].rfind(sep)
                if last_sep != -1:
                    end = start + last_sep + len(sep)
                    break
        
        yield text[start:end].strip()
        start = end - overlap


def load_and_chunk_documents(
    path: str,
    chunk_size: int = 500,
    overlap: int = 50
) -> list[dict]:
    """
    Load documents and split into chunks.
    
    Useful for long documents where you want to
    search within specific sections.
    """
    documents = load_json_documents(path)
    chunked = []
    
    for doc in documents:
        chunks = list(chunk_text(doc["text"], chunk_size, overlap))
        
        for i, chunk in enumerate(chunks):
            chunked.append({
                "id": f"{doc['id']}_chunk_{i}",
                "text": chunk,
                "metadata": {
                    **doc.get("metadata", {}),
                    "parent_id": doc["id"],
                    "chunk_index": i,
                    "total_chunks": len(chunks)
                }
            })

    return chunked

Understanding Document Chunking:

┌─────────────────────────────────────────────────────────────────────────────┐
│ WHY CHUNK DOCUMENTS?                                                        │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Problem: Long documents dilute the embedding                              │
│                                                                             │
│  ┌──────────────────────────────────────────────────────────────────────┐  │
│  │ Long Document (10 pages)                                             │  │
│  │ ┌─────────┬─────────┬─────────┬─────────┬─────────┬─────────┬─────┐ │  │
│  │ │ Intro   │ Topic A │ Topic B │ Topic C │ Topic D │ Topic E │ ... │ │  │
│  │ └─────────┴─────────┴─────────┴─────────┴─────────┴─────────┴─────┘ │  │
│  │                              │                                       │  │
│  │                              ▼                                       │  │
│  │              Single Embedding = AVERAGE of everything                │  │
│  │              (Too generic, loses specific details)                   │  │
│  └──────────────────────────────────────────────────────────────────────┘  │
│                                                                             │
│  Solution: Split into chunks, embed each separately                        │
│                                                                             │
│  ┌─────────┐ ┌─────────┐ ┌─────────┐ ┌─────────┐                          │
│  │ Chunk 1 │ │ Chunk 2 │ │ Chunk 3 │ │ Chunk 4 │                          │
│  │ Intro   │ │ Topic A │ │ Topic B │ │ Topic C │                          │
│  └────┬────┘ └────┬────┘ └────┬────┘ └────┬────┘                          │
│       ▼           ▼           ▼           ▼                               │
│    Embed 1     Embed 2     Embed 3     Embed 4                            │
│   (specific)  (specific)  (specific)  (specific)                          │
│                                                                             │
│  Now queries match SPECIFIC sections, not the whole doc!                   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Chunk Overlap Explained:

┌─────────────────────────────────────────────────────────────────────────────┐
│ WHY OVERLAP CHUNKS?                                                         │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Without Overlap (chunk_size=500, overlap=0):                              │
│                                                                             │
│  "The cat sat on the | mat and looked at | the window."                    │
│        Chunk 1       |      Chunk 2      |   Chunk 3                       │
│                      ↑                                                      │
│            Sentence cut in half! Context lost.                             │
│                                                                             │
│  With Overlap (chunk_size=500, overlap=50):                                │
│                                                                             │
│  "The cat sat on the mat | mat and looked at the | the window and..."      │
│        Chunk 1           |      Chunk 2          |   Chunk 3               │
│                     └─────┘                 └────┘                          │
│                     Overlap                 Overlap                         │
│                                                                             │
│  Benefits:                                                                 │
│  • Sentences aren't cut mid-thought                                        │
│  • Context preserved at boundaries                                         │
│  • Trade-off: Slightly more storage (10-20% more chunks)                   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Parameter	Recommended Value	Why
`chunk_size`	500-1000 chars	Large enough for context, small enough for specificity
`overlap`	10-20% of chunk_size	Preserve sentence boundaries
Sentence-aware splitting	Yes	Try to break at `.` , `\n`, etc.

Step 5: FastAPI Application

Build the REST API:

src/api.py

"""
FastAPI application for semantic search.
"""

from fastapi import FastAPI, HTTPException
from pydantic import BaseModel, Field
from typing import Optional

from .search import SemanticSearchEngine, SearchResult
from .indexer import load_json_documents


# Request/Response models
class SearchRequest(BaseModel):
    query: str = Field(..., min_length=1, max_length=1000)
    top_k: int = Field(default=5, ge=1, le=100)
    threshold: float = Field(default=0.0, ge=0.0, le=1.0)
    category: Optional[str] = None


class SearchResultResponse(BaseModel):
    id: str
    text: str
    score: float
    metadata: dict


class SearchResponse(BaseModel):
    query: str
    results: list[SearchResultResponse]
    total: int


class DocumentRequest(BaseModel):
    id: Optional[str] = None
    text: str
    metadata: dict = {}


class IndexResponse(BaseModel):
    message: str
    document_count: int


# Initialize app and search engine
app = FastAPI(
    title="Semantic Search API",
    description="Search documents using semantic similarity",
    version="1.0.0"
)

# Global search engine instance
engine = SemanticSearchEngine()


@app.on_event("startup")
async def startup():
    """Load initial documents on startup."""
    try:
        engine.load("data/index")
        print(f"Loaded {len(engine.documents)} documents from disk")
    except FileNotFoundError:
        print("No existing index found. Starting fresh.")


@app.get("/")
async def root():
    """Health check endpoint."""
    return {
        "status": "healthy",
        "documents_indexed": len(engine.documents)
    }


@app.post("/search", response_model=SearchResponse)
async def search(request: SearchRequest):
    """
    Search for documents similar to the query.
    
    - **query**: The search query (required)
    - **top_k**: Number of results to return (default: 5)
    - **threshold**: Minimum similarity score 0-1 (default: 0)
    - **category**: Filter by metadata category (optional)
    """
    if request.category:
        results = engine.search_with_filter(
            query=request.query,
            filter_fn=lambda doc: doc.get("metadata", {}).get("category") == request.category,
            top_k=request.top_k
        )
    else:
        results = engine.search(
            query=request.query,
            top_k=request.top_k,
            threshold=request.threshold
        )
    
    return SearchResponse(
        query=request.query,
        results=[
            SearchResultResponse(
                id=r.id,
                text=r.text,
                score=r.score,
                metadata=r.metadata
            )
            for r in results
        ],
        total=len(results)
    )


@app.post("/index", response_model=IndexResponse)
async def index_documents(documents: list[DocumentRequest]):
    """
    Add documents to the search index.
    
    Each document should have:
    - **text**: The document content (required)
    - **id**: Unique identifier (optional, auto-generated if not provided)
    - **metadata**: Additional metadata dict (optional)
    """
    docs = [
        {
            "id": doc.id or f"doc_{len(engine.documents) + i}",
            "text": doc.text,
            "metadata": doc.metadata
        }
        for i, doc in enumerate(documents)
    ]
    
    engine.add_documents(docs)
    
    return IndexResponse(
        message=f"Indexed {len(documents)} documents",
        document_count=len(engine.documents)
    )


@app.post("/index/file")
async def index_from_file(path: str):
    """Index documents from a JSON file."""
    try:
        documents = load_json_documents(path)
        engine.add_documents(documents)
        return IndexResponse(
            message=f"Indexed {len(documents)} documents from {path}",
            document_count=len(engine.documents)
        )
    except FileNotFoundError:
        raise HTTPException(status_code=404, detail=f"File not found: {path}")


@app.post("/save")
async def save_index(path: str = "data/index"):
    """Save the current index to disk."""
    engine.save(path)
    return {"message": f"Index saved to {path}"}


@app.get("/stats")
async def get_stats():
    """Get index statistics."""
    return {
        "total_documents": len(engine.documents),
        "embedding_dimension": engine.embedder.embedding_dim,
        "model": engine.embedder.model.get_config_dict().get("name_or_path", "unknown")
    }

Understanding the API Architecture:

┌─────────────────────────────────────────────────────────────────────────────┐
│ API ENDPOINTS OVERVIEW                                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  Client (Browser/curl/Frontend)                                            │
│         │                                                                   │
│         ▼                                                                   │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │ FastAPI Application                                                  │   │
│  │                                                                      │   │
│  │  GET  /           → Health check (is server running?)               │   │
│  │  POST /search     → Find similar documents                          │   │
│  │  POST /index      → Add new documents                               │   │
│  │  POST /index/file → Bulk load from JSON file                        │   │
│  │  POST /save       → Persist index to disk                           │   │
│  │  GET  /stats      → Get index statistics                            │   │
│  │                                                                      │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│         │                                                                   │
│         ▼                                                                   │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │ SemanticSearchEngine (In-Memory)                                    │   │
│  │                                                                      │   │
│  │  documents: [...]        embeddings: numpy array                    │   │
│  │  embedder: SentenceTransformer model                                │   │
│  │                                                                      │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Pydantic Models Explained:

┌─────────────────────────────────────────────────────────────────────────────┐
│ WHY PYDANTIC FOR REQUEST/RESPONSE MODELS?                                   │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  1. AUTOMATIC VALIDATION                                                   │
│     SearchRequest(query="", top_k=-5)                                      │
│     → Error: "query must be at least 1 character"                          │
│     → Error: "top_k must be >= 1"                                          │
│                                                                             │
│  2. AUTOMATIC DOCUMENTATION                                                │
│     FastAPI generates OpenAPI docs at /docs                                │
│     All fields, types, and constraints are documented                      │
│                                                                             │
│  3. TYPE SAFETY                                                            │
│     IDE autocomplete and type checking work                                │
│     Catch errors before runtime                                            │
│                                                                             │
│  4. SERIALIZATION                                                          │
│     Python objects ↔ JSON automatically                                    │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Endpoint	Method	Purpose	Key Parameters
`/search`	POST	Find similar docs	query, top_k, threshold, category
`/index`	POST	Add documents	list of text and metadata objects
`/stats`	GET	Index info	None

Step 6: Streamlit Frontend

Create an interactive UI:

app.py

"""
Streamlit frontend for semantic search.
"""

import streamlit as st
import requests
from typing import Optional

# Configuration
API_URL = "http://localhost:8000"

st.set_page_config(
    page_title="Semantic Search",
    page_icon="🔍",
    layout="wide"
)


def search(query: str, top_k: int = 5, threshold: float = 0.0) -> dict:
    """Call the search API."""
    response = requests.post(
        f"{API_URL}/search",
        json={
            "query": query,
            "top_k": top_k,
            "threshold": threshold
        }
    )
    return response.json()


def get_stats() -> dict:
    """Get index statistics."""
    try:
        response = requests.get(f"{API_URL}/stats")
        return response.json()
    except:
        return {"error": "Could not connect to API"}


def main():
    st.title("🔍 Semantic Search Engine")
    st.markdown("Search documents using AI-powered semantic similarity")
    
    # Sidebar
    with st.sidebar:
        st.header("Settings")
        top_k = st.slider("Number of results", 1, 20, 5)
        threshold = st.slider("Minimum score", 0.0, 1.0, 0.0, 0.05)
        
        st.divider()
        
        st.header("Index Stats")
        stats = get_stats()
        if "error" not in stats:
            st.metric("Documents Indexed", stats.get("total_documents", 0))
            st.metric("Embedding Dimension", stats.get("embedding_dimension", 0))
            st.caption(f"Model: {stats.get('model', 'unknown')}")
        else:
            st.error("API not available")
    
    # Main search interface
    query = st.text_input(
        "Enter your search query",
        placeholder="What would you like to find?"
    )
    
    col1, col2 = st.columns([1, 5])
    with col1:
        search_button = st.button("🔍 Search", type="primary")
    
    if search_button and query:
        with st.spinner("Searching..."):
            results = search(query, top_k, threshold)
        
        if results.get("results"):
            st.success(f"Found {results['total']} results")
            
            for i, result in enumerate(results["results"], 1):
                score = result["score"]
                
                # Color code by score
                if score >= 0.7:
                    score_color = "🟢"
                elif score >= 0.4:
                    score_color = "🟡"
                else:
                    score_color = "🔴"
                
                with st.expander(
                    f"{score_color} Result {i} (Score: {score:.4f})",
                    expanded=i <= 3
                ):
                    st.markdown(f"**Text:** {result['text']}")
                    st.markdown(f"**ID:** `{result['id']}`")
                    if result.get("metadata"):
                        st.json(result["metadata"])
        else:
            st.warning("No results found. Try a different query.")
    
    # Document upload section
    st.divider()
    st.header("Add Documents")
    
    with st.form("add_document"):
        doc_text = st.text_area(
            "Document text",
            placeholder="Enter the document content..."
        )
        doc_metadata = st.text_input(
            "Metadata (JSON)",
            placeholder='{"category": "tech"}'
        )
        
        if st.form_submit_button("Add Document"):
            if doc_text:
                import json
                metadata = {}
                if doc_metadata:
                    try:
                        metadata = json.loads(doc_metadata)
                    except:
                        st.error("Invalid JSON in metadata")
                        return
                
                response = requests.post(
                    f"{API_URL}/index",
                    json=[{"text": doc_text, "metadata": metadata}]
                )
                
                if response.status_code == 200:
                    st.success("Document added successfully!")
                    st.rerun()
                else:
                    st.error("Failed to add document")


if __name__ == "__main__":
    main()

Understanding the Streamlit UI:

┌─────────────────────────────────────────────────────────────────────────────┐
│ STREAMLIT ARCHITECTURE                                                      │
├─────────────────────────────────────────────────────────────────────────────┤
│                                                                             │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │ Browser                                                              │   │
│  │  ┌─────────────┐  ┌──────────────────────────────────────────────┐  │   │
│  │  │  Sidebar    │  │  Main Content                                │  │   │
│  │  │             │  │                                              │  │   │
│  │  │  Settings:  │  │  🔍 Search Box                               │  │   │
│  │  │  • top_k    │  │  ┌──────────────────────────────────────┐   │  │   │
│  │  │  • threshold│  │  │ Enter your search query...           │   │  │   │
│  │  │             │  │  └──────────────────────────────────────┘   │  │   │
│  │  │  Stats:     │  │                                              │  │   │
│  │  │  • 10 docs  │  │  Results:                                   │  │   │
│  │  │  • 384 dims │  │  🟢 Result 1 (0.89) - NLP helps...          │  │   │
│  │  │             │  │  🟡 Result 2 (0.67) - Machine learning...   │  │   │
│  │  └─────────────┘  │  🔴 Result 3 (0.23) - Web development...    │  │   │
│  │                   └──────────────────────────────────────────────┘  │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│         │                                                                   │
│         │ HTTP requests                                                     │
│         ▼                                                                   │
│  ┌─────────────────────────────────────────────────────────────────────┐   │
│  │ FastAPI Backend (localhost:8000)                                    │   │
│  └─────────────────────────────────────────────────────────────────────┘   │
│                                                                             │
└─────────────────────────────────────────────────────────────────────────────┘

Score Color Coding Logic:

Score	Color	Meaning
0.7+	🟢 Green	Highly relevant - confident match
0.4-0.7	🟡 Yellow	Somewhat relevant - may be useful
Under 0.4	🔴 Red	Low relevance - likely not what user wants

Why Streamlit?

Rapid prototyping (full UI in ~100 lines)
No JavaScript required
Automatic reactivity (UI updates when data changes)
Built-in widgets (sliders, text inputs, expanders)

Step 7: Sample Documents

Create sample data to test with:

data/documents.json

[
  {
    "text": "Machine learning is a subset of artificial intelligence that enables systems to learn and improve from experience without being explicitly programmed.",
    "metadata": {"category": "ai", "topic": "machine-learning"}
  },
  {
    "text": "Natural language processing (NLP) is a field of AI focused on enabling computers to understand, interpret, and generate human language.",
    "metadata": {"category": "ai", "topic": "nlp"}
  },
  {
    "text": "Deep learning uses artificial neural networks with multiple layers to progressively extract higher-level features from raw input.",
    "metadata": {"category": "ai", "topic": "deep-learning"}
  },
  {
    "text": "Python is a high-level, interpreted programming language known for its readability and versatility in web development, data science, and automation.",
    "metadata": {"category": "programming", "topic": "python"}
  },
  {
    "text": "Vector databases are specialized database systems designed to store and query high-dimensional vector data efficiently using similarity search.",
    "metadata": {"category": "databases", "topic": "vector-db"}
  },
  {
    "text": "Transformers are a neural network architecture that uses self-attention mechanisms to process sequential data in parallel.",
    "metadata": {"category": "ai", "topic": "transformers"}
  },
  {
    "text": "Embeddings are dense vector representations of data that capture semantic meaning, enabling similarity comparisons between items.",
    "metadata": {"category": "ai", "topic": "embeddings"}
  },
  {
    "text": "FastAPI is a modern Python web framework for building APIs with automatic documentation and type validation.",
    "metadata": {"category": "programming", "topic": "fastapi"}
  },
  {
    "text": "Cosine similarity measures the cosine of the angle between two vectors, commonly used to determine how similar two documents are.",
    "metadata": {"category": "math", "topic": "similarity"}
  },
  {
    "text": "Retrieval-Augmented Generation (RAG) combines information retrieval with language models to generate more accurate and grounded responses.",
    "metadata": {"category": "ai", "topic": "rag"}
  }
]

Running the Application

Start the API Server

# From project root
uvicorn src.api:app --reload --host 0.0.0.0 --port 8000

Index Sample Documents

# Using curl
curl -X POST "http://localhost:8000/index/file?path=data/documents.json"

Start the Frontend

# In a new terminal
streamlit run app.py

Test with curl

# Search for documents
curl -X POST "http://localhost:8000/search" \
  -H "Content-Type: application/json" \
  -d '{"query": "How do computers understand text?", "top_k": 3}'

# Add a document
curl -X POST "http://localhost:8000/index" \
  -H "Content-Type: application/json" \
  -d '[{"text": "Docker containers package applications with their dependencies", "metadata": {"category": "devops"}}]'

Testing

tests/test_search.py

"""Tests for semantic search engine."""

import pytest
from src.search import SemanticSearchEngine


@pytest.fixture
def engine():
    """Create a search engine with test documents."""
    engine = SemanticSearchEngine()
    engine.add_documents([
        {"id": "1", "text": "Python programming language", "metadata": {"category": "tech"}},
        {"id": "2", "text": "Machine learning algorithms", "metadata": {"category": "ai"}},
        {"id": "3", "text": "Data science and analytics", "metadata": {"category": "data"}},
    ])
    return engine


def test_search_returns_results(engine):
    """Test that search returns relevant results."""
    results = engine.search("programming in Python")
    
    assert len(results) > 0
    assert results[0].id == "1"  # Python doc should be most relevant


def test_search_with_threshold(engine):
    """Test that threshold filters low-scoring results."""
    results = engine.search("unrelated query about cooking", threshold=0.5)
    
    # Should filter out low-scoring results
    assert all(r.score >= 0.5 for r in results)


def test_search_with_filter(engine):
    """Test category filtering."""
    results = engine.search_with_filter(
        query="learning",
        filter_fn=lambda doc: doc.get("metadata", {}).get("category") == "ai"
    )
    
    assert len(results) > 0
    assert all(r.metadata.get("category") == "ai" for r in results)


def test_empty_index_returns_empty():
    """Test searching empty index."""
    engine = SemanticSearchEngine()
    results = engine.search("test query")
    
    assert results == []


def test_similar_texts_have_high_scores(engine):
    """Test that semantically similar texts score highly."""
    results = engine.search("ML models and AI")
    
    # Machine learning doc should have high score
    ml_result = next((r for r in results if r.id == "2"), None)
    assert ml_result is not None
    assert ml_result.score > 0.5


def test_save_and_load(engine, tmp_path):
    """Test persistence."""
    path = str(tmp_path / "test_index")
    engine.save(path)
    
    new_engine = SemanticSearchEngine()
    new_engine.load(path)
    
    assert len(new_engine.documents) == len(engine.documents)

Run tests:

pytest tests/ -v

Key Concepts Explained

Why Normalize Embeddings?

When embeddings are normalized (unit length), cosine similarity simplifies to a dot product:

# Without normalization
cosine_sim = np.dot(a, b) / (np.linalg.norm(a) * np.linalg.norm(b))

# With normalization (||a|| = ||b|| = 1)
cosine_sim = np.dot(a, b)  # Much faster!

Choosing the Right Model

Model	Dimensions	Speed	Quality	Use Case
all-MiniLM-L6-v2	384	Fast	Good	General purpose
all-mpnet-base-v2	768	Medium	Best	High accuracy needs
paraphrase-multilingual-MiniLM-L12-v2	384	Fast	Good	Multi-language

Similarity Score Interpretation

Score Range	Interpretation
0.8 - 1.0	Very similar / Near duplicate
0.6 - 0.8	Highly relevant
0.4 - 0.6	Somewhat relevant
0.2 - 0.4	Loosely related
0.0 - 0.2	Not relevant

Next Steps

Now that you've built a basic semantic search engine, continue to:

Text Clustering - Group similar documents automatically
Similarity Recommendations - Build a recommendation system
Production Pipeline - Scale to millions of documents

Key Concepts Recap

Concept	What It Is	Why It Matters
Text Embeddings	Dense vectors representing text meaning	Capture semantics, not just keywords
sentence-transformers	Library for generating embeddings	Pre-trained models, easy to use
Cosine Similarity	Measure of vector angle (0 to 1)	Compare meaning similarity
Vector Normalization	Scale vectors to unit length	Faster similarity computation
Embedding Dimensions	Size of vector (384, 768, etc.)	Trade-off: quality vs speed/storage
ChromaDB	Vector database for storage	Efficient similarity search at scale
Relevance Threshold	Minimum similarity score	Filter out low-quality matches

Summary

You've built a complete semantic search system that:

Transforms text into dense vector embeddings
Uses cosine similarity for relevance ranking
Provides a REST API for search operations
Includes a user-friendly Streamlit interface
Supports filtering by metadata

This foundation enables powerful AI applications like document search, FAQ matching, and content recommendations.

Semantic Search Engine

On this page

Semantic Search Engine

On this page