Text Embeddings & Semantic Search

TL;DR

Build a semantic search engine using sentence-transformers for embedding generation and FAISS for fast similarity search. Learn model selection from the MTEB leaderboard, embedding fine-tuning, and evaluation with NDCG/MAP/MRR metrics.

Build a complete semantic search system using HuggingFace sentence-transformers and Facebook's FAISS library, with model selection, fine-tuning, and evaluation.

What You'll Learn

Generating embeddings with sentence-transformers
Model selection from the MTEB leaderboard
FAISS index types and configuration
Fine-tuning embeddings for your domain
Evaluation metrics (NDCG, MAP, MRR)
FastAPI search API with batched inference

Tech Stack

Component	Technology
Embeddings	`sentence-transformers`
Vector Index	`faiss-cpu`
Evaluation	MTEB metrics
API	FastAPI
Python	3.10+

Architecture

┌──────────────────────────────────────────────────────────────────────────────┐
│                          SEMANTIC SEARCH ENGINE                               │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  INDEXING PIPELINE                                                           │
│  ┌───────────┐   ┌──────────────────┐   ┌────────────────┐                  │
│  │ Documents │──▶│ sentence-        │──▶│ FAISS Index    │                  │
│  │ (corpus)  │   │ transformers     │   │ (IVF + PQ)    │                  │
│  └───────────┘   │ .encode()        │   └────────────────┘                  │
│                  └──────────────────┘                                        │
│                                                                              │
│  QUERY PIPELINE                                                              │
│  ┌───────────┐   ┌──────────────────┐   ┌────────────────┐   ┌──────────┐  │
│  │   Query   │──▶│ sentence-        │──▶│ FAISS Search   │──▶│ Results  │  │
│  │           │   │ transformers     │   │ (k nearest)    │   │ + Scores │  │
│  └───────────┘   │ .encode()        │   └────────────────┘   └──────────┘  │
│                  └──────────────────┘                                        │
│                                                                              │
│  FINE-TUNING PIPELINE                                                        │
│  ┌───────────────┐   ┌─────────────────┐   ┌─────────────────────────────┐  │
│  │ Training Pairs │──▶│ Contrastive     │──▶│ Domain-adapted Embeddings   │  │
│  │ (query, pos,   │   │ Loss Function   │   │ (better for your data)      │  │
│  │  neg)          │   └─────────────────┘   └─────────────────────────────┘  │
│  └───────────────┘                                                           │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

Project Structure

embeddings-search/
├── src/
│   ├── __init__.py
│   ├── embeddings.py          # Embedding generation and model loading
│   ├── index.py               # FAISS index management
│   ├── search.py              # Search engine combining embeddings + index
│   ├── finetune.py            # Fine-tune embeddings for your domain
│   └── evaluate.py            # NDCG, MAP, MRR evaluation
├── api/
│   └── main.py                # FastAPI search application
├── data/
│   └── corpus.jsonl
├── requirements.txt
└── README.md

Implementation

Step 1: Dependencies

requirements.txt

sentence-transformers>=3.0.0
faiss-cpu>=1.8.0
transformers>=4.40.0
datasets>=2.19.0
fastapi>=0.111.0
uvicorn>=0.30.0
numpy>=1.26.0

Step 2: Embedding Generation

src/embeddings.py

"""Embedding generation with sentence-transformers."""

from sentence_transformers import SentenceTransformer
import numpy as np


class EmbeddingModel:
    """
    Wrapper around sentence-transformers for embedding generation.

    sentence-transformers differs from raw transformers by:
    1. Adding mean pooling over token embeddings by default
    2. Normalizing embeddings to unit length (for cosine similarity)
    3. Optimized batch encoding with progress bars
    """

    # Top models from the MTEB leaderboard (as of 2025)
    RECOMMENDED_MODELS = {
        "fast": "all-MiniLM-L6-v2",           # 384-dim, 80MB, fast
        "balanced": "all-mpnet-base-v2",       # 768-dim, 420MB, good quality
        "quality": "BAAI/bge-large-en-v1.5",  # 1024-dim, 1.3GB, best quality
        "multilingual": "intfloat/multilingual-e5-large",  # 1024-dim, multi-lang
    }

    def __init__(
        self,
        model_name: str = "all-MiniLM-L6-v2",
        device: str | None = None,
    ):
        self.model = SentenceTransformer(model_name, device=device)
        self.dimension = self.model.get_sentence_embedding_dimension()
        self.model_name = model_name

    def encode(
        self,
        texts: list[str],
        batch_size: int = 64,
        normalize: bool = True,
        show_progress: bool = True,
    ) -> np.ndarray:
        """
        Encode texts into embeddings.

        Args:
            texts: List of texts to encode
            batch_size: Encoding batch size
            normalize: L2 normalize embeddings (required for cosine similarity)
            show_progress: Show encoding progress bar

        Returns:
            numpy array of shape [len(texts), dimension]
        """
        embeddings = self.model.encode(
            texts,
            batch_size=batch_size,
            normalize_embeddings=normalize,
            show_progress_bar=show_progress,
        )
        return embeddings

    def similarity(
        self,
        texts_a: list[str],
        texts_b: list[str],
    ) -> np.ndarray:
        """Compute pairwise cosine similarity between two text lists."""
        emb_a = self.encode(texts_a, show_progress=False)
        emb_b = self.encode(texts_b, show_progress=False)
        return np.dot(emb_a, emb_b.T)

Model Selection Guide:

Model	Dimensions	Size	Speed	Quality	Use Case
`all-MiniLM-L6-v2`	384	80MB	Fast	Good	Prototyping, low-resource
`all-mpnet-base-v2`	768	420MB	Medium	Better	Production general-purpose
`BAAI/bge-large-en-v1.5`	1024	1.3GB	Slow	Best	Quality-critical applications
`intfloat/multilingual-e5-large`	1024	1.3GB	Slow	Best	Multi-language support

Step 3: FAISS Index

src/index.py

"""FAISS index management for fast similarity search."""

import faiss
import numpy as np
from pathlib import Path


class FAISSIndex:
    """
    FAISS index for fast nearest-neighbor search.

    Index types:
    - Flat: Exact search (brute-force). Best for <100K vectors.
    - IVF: Inverted file index. Partitions space into clusters.
    - HNSW: Hierarchical navigable small world graph. Fast, good recall.
    - PQ: Product quantization. Compresses vectors for memory savings.
    """

    def __init__(self, dimension: int, index_type: str = "flat"):
        self.dimension = dimension
        self.index_type = index_type
        self.index = self._create_index(dimension, index_type)
        self.documents: list[dict] = []

    def _create_index(self, dim: int, index_type: str) -> faiss.Index:
        """Create a FAISS index of the specified type."""
        if index_type == "flat":
            # Exact search — best quality, O(n) per query
            return faiss.IndexFlatIP(dim)  # Inner product (cosine for normalized vecs)

        elif index_type == "ivf":
            # Approximate search — partition into 100 clusters
            nlist = 100
            quantizer = faiss.IndexFlatIP(dim)
            index = faiss.IndexIVFFlat(quantizer, dim, nlist, faiss.METRIC_INNER_PRODUCT)
            return index

        elif index_type == "hnsw":
            # Graph-based approximate search
            index = faiss.IndexHNSWFlat(dim, 32)  # 32 neighbors per node
            index.hnsw.efConstruction = 200  # Build-time quality
            index.hnsw.efSearch = 64          # Search-time quality
            return index

        elif index_type == "ivf_pq":
            # Memory-efficient approximate search
            nlist = 100
            m = 8  # Number of sub-quantizers
            quantizer = faiss.IndexFlatIP(dim)
            index = faiss.IndexIVFPQ(
                quantizer, dim, nlist, m, 8, faiss.METRIC_INNER_PRODUCT,
            )
            return index

        else:
            raise ValueError(f"Unknown index type: {index_type}")

    def add(
        self,
        embeddings: np.ndarray,
        documents: list[dict],
    ):
        """Add embeddings and their associated documents to the index."""
        if self.index_type in ("ivf", "ivf_pq") and not self.index.is_trained:
            self.index.train(embeddings)

        self.index.add(embeddings.astype("float32"))
        self.documents.extend(documents)

    def search(
        self,
        query_embedding: np.ndarray,
        k: int = 10,
    ) -> list[dict]:
        """Search for k nearest neighbors."""
        if query_embedding.ndim == 1:
            query_embedding = query_embedding.reshape(1, -1)

        scores, indices = self.index.search(query_embedding.astype("float32"), k)

        results = []
        for score, idx in zip(scores[0], indices[0]):
            if idx == -1:  # FAISS returns -1 for missing results
                continue
            result = {**self.documents[idx], "score": float(score)}
            results.append(result)

        return results

    def save(self, path: str):
        """Save index to disk."""
        faiss.write_index(self.index, f"{path}.index")
        import json
        with open(f"{path}.meta.json", "w") as f:
            json.dump(self.documents, f)

    def load(self, path: str):
        """Load index from disk."""
        self.index = faiss.read_index(f"{path}.index")
        import json
        with open(f"{path}.meta.json") as f:
            self.documents = json.load(f)

FAISS Index Type Comparison:

┌─────────────────────────────────────────────────────────────────┐
│ FAISS INDEX TYPES                                                │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Flat (exact)         IVF (clusters)        HNSW (graph)         │
│  ┌─────────────┐     ┌─────────────┐     ┌─────────────┐       │
│  │ • • • • • • │     │  C1: • • •  │     │ •─•─•       │       │
│  │ • • • • • • │     │  C2: • •    │     │ │╲│╲│       │       │
│  │ • • • • • • │     │  C3: • • • •│     │ •─•─•       │       │
│  │ • • • • • • │     │  C4: • •    │     │  ╲│╱        │       │
│  └─────────────┘     └─────────────┘     │   •         │       │
│  Compare to ALL      Search only          └─────────────┘       │
│  100% recall         nearby clusters      Navigate graph        │
│  Slow for >100K      Fast, tunable        Fast, high recall     │
│                                                                  │
│  Speed:      Flat < IVF < HNSW                                   │
│  Recall:     Flat > HNSW > IVF                                   │
│  Memory:     Flat = HNSW > IVF_PQ                                │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Step 4: Search Engine

src/search.py

"""Search engine combining embeddings and FAISS index."""

from .embeddings import EmbeddingModel
from .index import FAISSIndex


class SearchEngine:
    """Semantic search engine with indexing and querying."""

    def __init__(
        self,
        model_name: str = "all-MiniLM-L6-v2",
        index_type: str = "flat",
    ):
        self.embed_model = EmbeddingModel(model_name)
        self.index = FAISSIndex(
            dimension=self.embed_model.dimension,
            index_type=index_type,
        )

    def index_documents(self, documents: list[dict], text_field: str = "text"):
        """Index a list of documents."""
        texts = [doc[text_field] for doc in documents]
        embeddings = self.embed_model.encode(texts)
        self.index.add(embeddings, documents)
        print(f"Indexed {len(documents)} documents")

    def search(self, query: str, k: int = 10) -> list[dict]:
        """Search for documents similar to the query."""
        query_emb = self.embed_model.encode([query], show_progress=False)
        return self.index.search(query_emb[0], k=k)

    def batch_search(
        self,
        queries: list[str],
        k: int = 10,
    ) -> list[list[dict]]:
        """Search for multiple queries at once."""
        query_embs = self.embed_model.encode(queries, show_progress=False)
        return [
            self.index.search(emb, k=k)
            for emb in query_embs
        ]

Step 5: Fine-tuning Embeddings

src/finetune.py

"""Fine-tune sentence-transformers for domain-specific search."""

from sentence_transformers import (
    SentenceTransformer,
    InputExample,
    losses,
    evaluation,
)
from torch.utils.data import DataLoader


def finetune_with_pairs(
    model_name: str = "all-MiniLM-L6-v2",
    train_pairs: list[tuple[str, str, float]],
    val_pairs: list[tuple[str, str, float]] | None = None,
    epochs: int = 3,
    batch_size: int = 16,
    output_path: str = "models/finetuned-embeddings",
):
    """
    Fine-tune an embedding model with query-document pairs.

    Args:
        model_name: Base model to fine-tune
        train_pairs: List of (text_a, text_b, similarity_score) tuples
        val_pairs: Validation pairs for evaluation
        epochs: Training epochs
        batch_size: Training batch size
        output_path: Where to save the fine-tuned model
    """
    model = SentenceTransformer(model_name)

    # Convert to InputExamples
    train_examples = [
        InputExample(texts=[a, b], label=score)
        for a, b, score in train_pairs
    ]

    train_loader = DataLoader(
        train_examples,
        shuffle=True,
        batch_size=batch_size,
    )

    # CosineSimilarityLoss: learns to match predicted cosine similarity to labels
    train_loss = losses.CosineSimilarityLoss(model)

    # Optional: evaluation during training
    evaluator = None
    if val_pairs:
        sentences1 = [p[0] for p in val_pairs]
        sentences2 = [p[1] for p in val_pairs]
        scores = [p[2] for p in val_pairs]
        evaluator = evaluation.EmbeddingSimilarityEvaluator(
            sentences1, sentences2, scores,
        )

    # Train
    model.fit(
        train_objectives=[(train_loader, train_loss)],
        epochs=epochs,
        evaluator=evaluator,
        evaluation_steps=100,
        output_path=output_path,
        show_progress_bar=True,
    )

    return model


def finetune_with_triplets(
    model_name: str = "all-MiniLM-L6-v2",
    triplets: list[tuple[str, str, str]],
    epochs: int = 3,
    output_path: str = "models/finetuned-triplet",
):
    """
    Fine-tune with triplet loss: (anchor, positive, negative).

    The model learns to push anchor closer to positive
    and farther from negative.
    """
    model = SentenceTransformer(model_name)

    train_examples = [
        InputExample(texts=[anchor, pos, neg])
        for anchor, pos, neg in triplets
    ]

    train_loader = DataLoader(
        train_examples,
        shuffle=True,
        batch_size=16,
    )

    # TripletLoss: distance(anchor, positive) < distance(anchor, negative) + margin
    train_loss = losses.TripletLoss(model)

    model.fit(
        train_objectives=[(train_loader, train_loss)],
        epochs=epochs,
        output_path=output_path,
    )

    return model

Fine-tuning Loss Functions:

Loss	Input Format	What It Learns
`CosineSimilarityLoss`	(text_a, text_b, score)	Match cosine similarity to target score
`TripletLoss`	(anchor, positive, negative)	Push positive close, negative far
`MultipleNegativesRankingLoss`	(query, positive)	Treats other batch items as negatives
`ContrastiveLoss`	(text_a, text_b, label)	Binary: similar (1) or dissimilar (0)

Step 6: Evaluation

src/evaluate.py

"""Evaluate search quality with standard IR metrics."""

import numpy as np


def compute_ndcg(relevance_scores: list[int], k: int = 10) -> float:
    """
    Normalized Discounted Cumulative Gain (NDCG@k).

    Measures ranking quality — rewards relevant results at higher positions.
    """
    relevance = np.array(relevance_scores[:k])
    dcg = np.sum(relevance / np.log2(np.arange(2, len(relevance) + 2)))

    # Ideal DCG (perfect ranking)
    ideal = np.sort(relevance)[::-1]
    idcg = np.sum(ideal / np.log2(np.arange(2, len(ideal) + 2)))

    return float(dcg / idcg) if idcg > 0 else 0.0


def compute_mrr(results_relevant: list[bool]) -> float:
    """
    Mean Reciprocal Rank.

    1/position of the first relevant result. Higher = better.
    """
    for i, is_relevant in enumerate(results_relevant):
        if is_relevant:
            return 1.0 / (i + 1)
    return 0.0


def compute_map(relevance_scores: list[bool], k: int = 10) -> float:
    """
    Mean Average Precision (MAP@k).

    Average of precision at each relevant result position.
    """
    relevant = np.array(relevance_scores[:k])
    if relevant.sum() == 0:
        return 0.0

    precisions = []
    relevant_count = 0

    for i, is_rel in enumerate(relevant):
        if is_rel:
            relevant_count += 1
            precisions.append(relevant_count / (i + 1))

    return float(np.mean(precisions))


def evaluate_search(
    search_engine,
    queries: list[str],
    ground_truth: list[list[str]],
    k: int = 10,
) -> dict:
    """
    Evaluate a search engine against ground truth.

    Args:
        search_engine: SearchEngine instance
        queries: List of query strings
        ground_truth: For each query, list of relevant document IDs
        k: Number of results to evaluate
    """
    ndcg_scores = []
    mrr_scores = []
    map_scores = []

    for query, relevant_ids in zip(queries, ground_truth):
        results = search_engine.search(query, k=k)
        result_ids = [r.get("id", "") for r in results]

        is_relevant = [rid in relevant_ids for rid in result_ids]
        relevance = [1 if rel else 0 for rel in is_relevant]

        ndcg_scores.append(compute_ndcg(relevance, k))
        mrr_scores.append(compute_mrr(is_relevant))
        map_scores.append(compute_map(is_relevant, k))

    return {
        f"NDCG@{k}": float(np.mean(ndcg_scores)),
        f"MAP@{k}": float(np.mean(map_scores)),
        "MRR": float(np.mean(mrr_scores)),
    }

Step 7: FastAPI Application

api/main.py

"""FastAPI semantic search application."""

from fastapi import FastAPI
from pydantic import BaseModel, Field

from src.search import SearchEngine

app = FastAPI(title="Semantic Search API")
engine = SearchEngine(model_name="all-MiniLM-L6-v2", index_type="flat")


class IndexRequest(BaseModel):
    documents: list[dict]
    text_field: str = "text"


class SearchRequest(BaseModel):
    query: str
    k: int = Field(default=10, ge=1, le=100)


@app.post("/index")
async def index_documents(req: IndexRequest):
    engine.index_documents(req.documents, req.text_field)
    return {"indexed": len(req.documents)}


@app.post("/search")
async def search(req: SearchRequest):
    results = engine.search(req.query, k=req.k)
    return {"results": results}


@app.get("/health")
async def health():
    return {"status": "healthy", "model": engine.embed_model.model_name}

Running the Project

# Install dependencies
pip install -r requirements.txt

# Start the API
uvicorn api.main:app --reload

# Index documents
curl -X POST http://localhost:8000/index \
  -H "Content-Type: application/json" \
  -d '{"documents": [
    {"id": "1", "text": "Python is a programming language"},
    {"id": "2", "text": "Machine learning uses statistical methods"},
    {"id": "3", "text": "Neural networks are inspired by the brain"}
  ]}'

# Search
curl -X POST http://localhost:8000/search \
  -H "Content-Type: application/json" \
  -d '{"query": "deep learning algorithms", "k": 3}'

Key Concepts Recap

Concept	What It Is	Why It Matters
sentence-transformers	Library for computing text embeddings	Produces semantic vectors optimized for similarity
FAISS	Facebook's similarity search library	Sub-millisecond search over millions of vectors
MTEB	Massive Text Embedding Benchmark	Standardized leaderboard for model selection
Cosine Similarity	Angle between two vectors	Standard metric for semantic similarity
NDCG	Normalized Discounted Cumulative Gain	Measures ranking quality with graded relevance
MRR	Mean Reciprocal Rank	How quickly you find the first relevant result
Triplet Loss	(anchor, positive, negative) training	Fine-tunes embeddings for domain-specific similarity

Next Steps

Image Generation with Diffusers — Generate images with Stable Diffusion
Fine-Tuning with PEFT — Fine-tune full models with LoRA

Text Embeddings & Semantic Search

TL;DR

Build a complete semantic search system using HuggingFace sentence-transformers and Facebook's FAISS library, with model selection, fine-tuning, and evaluation.

What You'll Learn

Generating embeddings with sentence-transformers
Model selection from the MTEB leaderboard
FAISS index types and configuration
Fine-tuning embeddings for your domain
Evaluation metrics (NDCG, MAP, MRR)
FastAPI search API with batched inference

Tech Stack

Component	Technology
Embeddings	`sentence-transformers`
Vector Index	`faiss-cpu`
Evaluation	MTEB metrics
API	FastAPI
Python	3.10+

Architecture

┌──────────────────────────────────────────────────────────────────────────────┐
│                          SEMANTIC SEARCH ENGINE                               │
├──────────────────────────────────────────────────────────────────────────────┤
│                                                                              │
│  INDEXING PIPELINE                                                           │
│  ┌───────────┐   ┌──────────────────┐   ┌────────────────┐                  │
│  │ Documents │──▶│ sentence-        │──▶│ FAISS Index    │                  │
│  │ (corpus)  │   │ transformers     │   │ (IVF + PQ)    │                  │
│  └───────────┘   │ .encode()        │   └────────────────┘                  │
│                  └──────────────────┘                                        │
│                                                                              │
│  QUERY PIPELINE                                                              │
│  ┌───────────┐   ┌──────────────────┐   ┌────────────────┐   ┌──────────┐  │
│  │   Query   │──▶│ sentence-        │──▶│ FAISS Search   │──▶│ Results  │  │
│  │           │   │ transformers     │   │ (k nearest)    │   │ + Scores │  │
│  └───────────┘   │ .encode()        │   └────────────────┘   └──────────┘  │
│                  └──────────────────┘                                        │
│                                                                              │
│  FINE-TUNING PIPELINE                                                        │
│  ┌───────────────┐   ┌─────────────────┐   ┌─────────────────────────────┐  │
│  │ Training Pairs │──▶│ Contrastive     │──▶│ Domain-adapted Embeddings   │  │
│  │ (query, pos,   │   │ Loss Function   │   │ (better for your data)      │  │
│  │  neg)          │   └─────────────────┘   └─────────────────────────────┘  │
│  └───────────────┘                                                           │
│                                                                              │
└──────────────────────────────────────────────────────────────────────────────┘

Project Structure

embeddings-search/
├── src/
│   ├── __init__.py
│   ├── embeddings.py          # Embedding generation and model loading
│   ├── index.py               # FAISS index management
│   ├── search.py              # Search engine combining embeddings + index
│   ├── finetune.py            # Fine-tune embeddings for your domain
│   └── evaluate.py            # NDCG, MAP, MRR evaluation
├── api/
│   └── main.py                # FastAPI search application
├── data/
│   └── corpus.jsonl
├── requirements.txt
└── README.md

Implementation

Step 1: Dependencies

requirements.txt

sentence-transformers>=3.0.0
faiss-cpu>=1.8.0
transformers>=4.40.0
datasets>=2.19.0
fastapi>=0.111.0
uvicorn>=0.30.0
numpy>=1.26.0

Step 2: Embedding Generation

src/embeddings.py

"""Embedding generation with sentence-transformers."""

from sentence_transformers import SentenceTransformer
import numpy as np


class EmbeddingModel:
    """
    Wrapper around sentence-transformers for embedding generation.

    sentence-transformers differs from raw transformers by:
    1. Adding mean pooling over token embeddings by default
    2. Normalizing embeddings to unit length (for cosine similarity)
    3. Optimized batch encoding with progress bars
    """

    # Top models from the MTEB leaderboard (as of 2025)
    RECOMMENDED_MODELS = {
        "fast": "all-MiniLM-L6-v2",           # 384-dim, 80MB, fast
        "balanced": "all-mpnet-base-v2",       # 768-dim, 420MB, good quality
        "quality": "BAAI/bge-large-en-v1.5",  # 1024-dim, 1.3GB, best quality
        "multilingual": "intfloat/multilingual-e5-large",  # 1024-dim, multi-lang
    }

    def __init__(
        self,
        model_name: str = "all-MiniLM-L6-v2",
        device: str | None = None,
    ):
        self.model = SentenceTransformer(model_name, device=device)
        self.dimension = self.model.get_sentence_embedding_dimension()
        self.model_name = model_name

    def encode(
        self,
        texts: list[str],
        batch_size: int = 64,
        normalize: bool = True,
        show_progress: bool = True,
    ) -> np.ndarray:
        """
        Encode texts into embeddings.

        Args:
            texts: List of texts to encode
            batch_size: Encoding batch size
            normalize: L2 normalize embeddings (required for cosine similarity)
            show_progress: Show encoding progress bar

        Returns:
            numpy array of shape [len(texts), dimension]
        """
        embeddings = self.model.encode(
            texts,
            batch_size=batch_size,
            normalize_embeddings=normalize,
            show_progress_bar=show_progress,
        )
        return embeddings

    def similarity(
        self,
        texts_a: list[str],
        texts_b: list[str],
    ) -> np.ndarray:
        """Compute pairwise cosine similarity between two text lists."""
        emb_a = self.encode(texts_a, show_progress=False)
        emb_b = self.encode(texts_b, show_progress=False)
        return np.dot(emb_a, emb_b.T)

Model Selection Guide:

Model	Dimensions	Size	Speed	Quality	Use Case
`all-MiniLM-L6-v2`	384	80MB	Fast	Good	Prototyping, low-resource
`all-mpnet-base-v2`	768	420MB	Medium	Better	Production general-purpose
`BAAI/bge-large-en-v1.5`	1024	1.3GB	Slow	Best	Quality-critical applications
`intfloat/multilingual-e5-large`	1024	1.3GB	Slow	Best	Multi-language support

Step 3: FAISS Index

src/index.py

"""FAISS index management for fast similarity search."""

import faiss
import numpy as np
from pathlib import Path


class FAISSIndex:
    """
    FAISS index for fast nearest-neighbor search.

    Index types:
    - Flat: Exact search (brute-force). Best for <100K vectors.
    - IVF: Inverted file index. Partitions space into clusters.
    - HNSW: Hierarchical navigable small world graph. Fast, good recall.
    - PQ: Product quantization. Compresses vectors for memory savings.
    """

    def __init__(self, dimension: int, index_type: str = "flat"):
        self.dimension = dimension
        self.index_type = index_type
        self.index = self._create_index(dimension, index_type)
        self.documents: list[dict] = []

    def _create_index(self, dim: int, index_type: str) -> faiss.Index:
        """Create a FAISS index of the specified type."""
        if index_type == "flat":
            # Exact search — best quality, O(n) per query
            return faiss.IndexFlatIP(dim)  # Inner product (cosine for normalized vecs)

        elif index_type == "ivf":
            # Approximate search — partition into 100 clusters
            nlist = 100
            quantizer = faiss.IndexFlatIP(dim)
            index = faiss.IndexIVFFlat(quantizer, dim, nlist, faiss.METRIC_INNER_PRODUCT)
            return index

        elif index_type == "hnsw":
            # Graph-based approximate search
            index = faiss.IndexHNSWFlat(dim, 32)  # 32 neighbors per node
            index.hnsw.efConstruction = 200  # Build-time quality
            index.hnsw.efSearch = 64          # Search-time quality
            return index

        elif index_type == "ivf_pq":
            # Memory-efficient approximate search
            nlist = 100
            m = 8  # Number of sub-quantizers
            quantizer = faiss.IndexFlatIP(dim)
            index = faiss.IndexIVFPQ(
                quantizer, dim, nlist, m, 8, faiss.METRIC_INNER_PRODUCT,
            )
            return index

        else:
            raise ValueError(f"Unknown index type: {index_type}")

    def add(
        self,
        embeddings: np.ndarray,
        documents: list[dict],
    ):
        """Add embeddings and their associated documents to the index."""
        if self.index_type in ("ivf", "ivf_pq") and not self.index.is_trained:
            self.index.train(embeddings)

        self.index.add(embeddings.astype("float32"))
        self.documents.extend(documents)

    def search(
        self,
        query_embedding: np.ndarray,
        k: int = 10,
    ) -> list[dict]:
        """Search for k nearest neighbors."""
        if query_embedding.ndim == 1:
            query_embedding = query_embedding.reshape(1, -1)

        scores, indices = self.index.search(query_embedding.astype("float32"), k)

        results = []
        for score, idx in zip(scores[0], indices[0]):
            if idx == -1:  # FAISS returns -1 for missing results
                continue
            result = {**self.documents[idx], "score": float(score)}
            results.append(result)

        return results

    def save(self, path: str):
        """Save index to disk."""
        faiss.write_index(self.index, f"{path}.index")
        import json
        with open(f"{path}.meta.json", "w") as f:
            json.dump(self.documents, f)

    def load(self, path: str):
        """Load index from disk."""
        self.index = faiss.read_index(f"{path}.index")
        import json
        with open(f"{path}.meta.json") as f:
            self.documents = json.load(f)

FAISS Index Type Comparison:

┌─────────────────────────────────────────────────────────────────┐
│ FAISS INDEX TYPES                                                │
├─────────────────────────────────────────────────────────────────┤
│                                                                  │
│  Flat (exact)         IVF (clusters)        HNSW (graph)         │
│  ┌─────────────┐     ┌─────────────┐     ┌─────────────┐       │
│  │ • • • • • • │     │  C1: • • •  │     │ •─•─•       │       │
│  │ • • • • • • │     │  C2: • •    │     │ │╲│╲│       │       │
│  │ • • • • • • │     │  C3: • • • •│     │ •─•─•       │       │
│  │ • • • • • • │     │  C4: • •    │     │  ╲│╱        │       │
│  └─────────────┘     └─────────────┘     │   •         │       │
│  Compare to ALL      Search only          └─────────────┘       │
│  100% recall         nearby clusters      Navigate graph        │
│  Slow for >100K      Fast, tunable        Fast, high recall     │
│                                                                  │
│  Speed:      Flat < IVF < HNSW                                   │
│  Recall:     Flat > HNSW > IVF                                   │
│  Memory:     Flat = HNSW > IVF_PQ                                │
│                                                                  │
└─────────────────────────────────────────────────────────────────┘

Step 4: Search Engine

src/search.py

"""Search engine combining embeddings and FAISS index."""

from .embeddings import EmbeddingModel
from .index import FAISSIndex


class SearchEngine:
    """Semantic search engine with indexing and querying."""

    def __init__(
        self,
        model_name: str = "all-MiniLM-L6-v2",
        index_type: str = "flat",
    ):
        self.embed_model = EmbeddingModel(model_name)
        self.index = FAISSIndex(
            dimension=self.embed_model.dimension,
            index_type=index_type,
        )

    def index_documents(self, documents: list[dict], text_field: str = "text"):
        """Index a list of documents."""
        texts = [doc[text_field] for doc in documents]
        embeddings = self.embed_model.encode(texts)
        self.index.add(embeddings, documents)
        print(f"Indexed {len(documents)} documents")

    def search(self, query: str, k: int = 10) -> list[dict]:
        """Search for documents similar to the query."""
        query_emb = self.embed_model.encode([query], show_progress=False)
        return self.index.search(query_emb[0], k=k)

    def batch_search(
        self,
        queries: list[str],
        k: int = 10,
    ) -> list[list[dict]]:
        """Search for multiple queries at once."""
        query_embs = self.embed_model.encode(queries, show_progress=False)
        return [
            self.index.search(emb, k=k)
            for emb in query_embs
        ]

Step 5: Fine-tuning Embeddings

src/finetune.py

"""Fine-tune sentence-transformers for domain-specific search."""

from sentence_transformers import (
    SentenceTransformer,
    InputExample,
    losses,
    evaluation,
)
from torch.utils.data import DataLoader


def finetune_with_pairs(
    model_name: str = "all-MiniLM-L6-v2",
    train_pairs: list[tuple[str, str, float]],
    val_pairs: list[tuple[str, str, float]] | None = None,
    epochs: int = 3,
    batch_size: int = 16,
    output_path: str = "models/finetuned-embeddings",
):
    """
    Fine-tune an embedding model with query-document pairs.

    Args:
        model_name: Base model to fine-tune
        train_pairs: List of (text_a, text_b, similarity_score) tuples
        val_pairs: Validation pairs for evaluation
        epochs: Training epochs
        batch_size: Training batch size
        output_path: Where to save the fine-tuned model
    """
    model = SentenceTransformer(model_name)

    # Convert to InputExamples
    train_examples = [
        InputExample(texts=[a, b], label=score)
        for a, b, score in train_pairs
    ]

    train_loader = DataLoader(
        train_examples,
        shuffle=True,
        batch_size=batch_size,
    )

    # CosineSimilarityLoss: learns to match predicted cosine similarity to labels
    train_loss = losses.CosineSimilarityLoss(model)

    # Optional: evaluation during training
    evaluator = None
    if val_pairs:
        sentences1 = [p[0] for p in val_pairs]
        sentences2 = [p[1] for p in val_pairs]
        scores = [p[2] for p in val_pairs]
        evaluator = evaluation.EmbeddingSimilarityEvaluator(
            sentences1, sentences2, scores,
        )

    # Train
    model.fit(
        train_objectives=[(train_loader, train_loss)],
        epochs=epochs,
        evaluator=evaluator,
        evaluation_steps=100,
        output_path=output_path,
        show_progress_bar=True,
    )

    return model


def finetune_with_triplets(
    model_name: str = "all-MiniLM-L6-v2",
    triplets: list[tuple[str, str, str]],
    epochs: int = 3,
    output_path: str = "models/finetuned-triplet",
):
    """
    Fine-tune with triplet loss: (anchor, positive, negative).

    The model learns to push anchor closer to positive
    and farther from negative.
    """
    model = SentenceTransformer(model_name)

    train_examples = [
        InputExample(texts=[anchor, pos, neg])
        for anchor, pos, neg in triplets
    ]

    train_loader = DataLoader(
        train_examples,
        shuffle=True,
        batch_size=16,
    )

    # TripletLoss: distance(anchor, positive) < distance(anchor, negative) + margin
    train_loss = losses.TripletLoss(model)

    model.fit(
        train_objectives=[(train_loader, train_loss)],
        epochs=epochs,
        output_path=output_path,
    )

    return model

Fine-tuning Loss Functions:

Loss	Input Format	What It Learns
`CosineSimilarityLoss`	(text_a, text_b, score)	Match cosine similarity to target score
`TripletLoss`	(anchor, positive, negative)	Push positive close, negative far
`MultipleNegativesRankingLoss`	(query, positive)	Treats other batch items as negatives
`ContrastiveLoss`	(text_a, text_b, label)	Binary: similar (1) or dissimilar (0)

Step 6: Evaluation

src/evaluate.py

"""Evaluate search quality with standard IR metrics."""

import numpy as np


def compute_ndcg(relevance_scores: list[int], k: int = 10) -> float:
    """
    Normalized Discounted Cumulative Gain (NDCG@k).

    Measures ranking quality — rewards relevant results at higher positions.
    """
    relevance = np.array(relevance_scores[:k])
    dcg = np.sum(relevance / np.log2(np.arange(2, len(relevance) + 2)))

    # Ideal DCG (perfect ranking)
    ideal = np.sort(relevance)[::-1]
    idcg = np.sum(ideal / np.log2(np.arange(2, len(ideal) + 2)))

    return float(dcg / idcg) if idcg > 0 else 0.0


def compute_mrr(results_relevant: list[bool]) -> float:
    """
    Mean Reciprocal Rank.

    1/position of the first relevant result. Higher = better.
    """
    for i, is_relevant in enumerate(results_relevant):
        if is_relevant:
            return 1.0 / (i + 1)
    return 0.0


def compute_map(relevance_scores: list[bool], k: int = 10) -> float:
    """
    Mean Average Precision (MAP@k).

    Average of precision at each relevant result position.
    """
    relevant = np.array(relevance_scores[:k])
    if relevant.sum() == 0:
        return 0.0

    precisions = []
    relevant_count = 0

    for i, is_rel in enumerate(relevant):
        if is_rel:
            relevant_count += 1
            precisions.append(relevant_count / (i + 1))

    return float(np.mean(precisions))


def evaluate_search(
    search_engine,
    queries: list[str],
    ground_truth: list[list[str]],
    k: int = 10,
) -> dict:
    """
    Evaluate a search engine against ground truth.

    Args:
        search_engine: SearchEngine instance
        queries: List of query strings
        ground_truth: For each query, list of relevant document IDs
        k: Number of results to evaluate
    """
    ndcg_scores = []
    mrr_scores = []
    map_scores = []

    for query, relevant_ids in zip(queries, ground_truth):
        results = search_engine.search(query, k=k)
        result_ids = [r.get("id", "") for r in results]

        is_relevant = [rid in relevant_ids for rid in result_ids]
        relevance = [1 if rel else 0 for rel in is_relevant]

        ndcg_scores.append(compute_ndcg(relevance, k))
        mrr_scores.append(compute_mrr(is_relevant))
        map_scores.append(compute_map(is_relevant, k))

    return {
        f"NDCG@{k}": float(np.mean(ndcg_scores)),
        f"MAP@{k}": float(np.mean(map_scores)),
        "MRR": float(np.mean(mrr_scores)),
    }

Step 7: FastAPI Application

api/main.py

"""FastAPI semantic search application."""

from fastapi import FastAPI
from pydantic import BaseModel, Field

from src.search import SearchEngine

app = FastAPI(title="Semantic Search API")
engine = SearchEngine(model_name="all-MiniLM-L6-v2", index_type="flat")


class IndexRequest(BaseModel):
    documents: list[dict]
    text_field: str = "text"


class SearchRequest(BaseModel):
    query: str
    k: int = Field(default=10, ge=1, le=100)


@app.post("/index")
async def index_documents(req: IndexRequest):
    engine.index_documents(req.documents, req.text_field)
    return {"indexed": len(req.documents)}


@app.post("/search")
async def search(req: SearchRequest):
    results = engine.search(req.query, k=req.k)
    return {"results": results}


@app.get("/health")
async def health():
    return {"status": "healthy", "model": engine.embed_model.model_name}

Running the Project

# Install dependencies
pip install -r requirements.txt

# Start the API
uvicorn api.main:app --reload

# Index documents
curl -X POST http://localhost:8000/index \
  -H "Content-Type: application/json" \
  -d '{"documents": [
    {"id": "1", "text": "Python is a programming language"},
    {"id": "2", "text": "Machine learning uses statistical methods"},
    {"id": "3", "text": "Neural networks are inspired by the brain"}
  ]}'

# Search
curl -X POST http://localhost:8000/search \
  -H "Content-Type: application/json" \
  -d '{"query": "deep learning algorithms", "k": 3}'

Key Concepts Recap

Concept	What It Is	Why It Matters
sentence-transformers	Library for computing text embeddings	Produces semantic vectors optimized for similarity
FAISS	Facebook's similarity search library	Sub-millisecond search over millions of vectors
MTEB	Massive Text Embedding Benchmark	Standardized leaderboard for model selection
Cosine Similarity	Angle between two vectors	Standard metric for semantic similarity
NDCG	Normalized Discounted Cumulative Gain	Measures ranking quality with graded relevance
MRR	Mean Reciprocal Rank	How quickly you find the first relevant result
Triplet Loss	(anchor, positive, negative) training	Fine-tunes embeddings for domain-specific similarity

Next Steps

Image Generation with Diffusers — Generate images with Stable Diffusion
Fine-Tuning with PEFT — Fine-tune full models with LoRA

Text Embeddings & Semantic Search

Text Embeddings & Semantic Search

What You'll Learn

Tech Stack

Architecture

Project Structure

Implementation

Step 1: Dependencies

Step 2: Embedding Generation

Step 3: FAISS Index

Step 4: Search Engine

Step 5: Fine-tuning Embeddings

Step 6: Evaluation

Step 7: FastAPI Application

Running the Project

Key Concepts Recap

Next Steps

On this page

Text Embeddings & Semantic Search

Text Embeddings & Semantic Search

What You'll Learn

Tech Stack

Architecture

Project Structure

Implementation

Step 1: Dependencies

Step 2: Embedding Generation

Step 3: FAISS Index

Step 4: Search Engine

Step 5: Fine-tuning Embeddings

Step 6: Evaluation

Step 7: FastAPI Application

Running the Project

Key Concepts Recap

Next Steps

On this page