Build a production AI phone receptionist with LiveKit Agents, SIP trunking, function tools, and warm handoff to human agents

LiveKit AI Phone Receptionist

Property	Value
Difficulty	Advanced
Time	~4-5 days
Code Size	~1,200 LOC
Prerequisites	Production Voice Agent Platform, Tool Calling Agent

TL;DR

Build a real AI phone receptionist that answers actual phone calls via SIP trunking, books appointments, answers FAQs from a knowledge base, and transfers callers to human agents — all using LiveKit's open-source Agents framework. Unlike the Production Voice Agent project where you built everything from scratch, here you use a production framework that handles audio transport, turn detection, and noise cancellation for you.

Core Terms

Before diving in, let's clarify the acronyms and protocols that power phone-based AI agents. These come from decades of telecom engineering — understanding them is essential for building systems that connect to real phone networks.

Term	Full Name	Plain English
SIP	Session Initiation Protocol	The "HTTP of phone calls." A signaling protocol that sets up, modifies, and tears down voice calls over the internet. When you dial a number, SIP messages negotiate the connection before any audio flows.
PSTN	Public Switched Telephone Network	The global network of copper wires, fiber, and switches that carries traditional phone calls. When someone dials your business from a landline or mobile, it travels through the PSTN.
SIP Trunk	SIP Trunking Service	A bridge between the internet and the PSTN. Providers like Twilio or Telnyx give you a phone number and route calls between the PSTN and your SIP-based application. Think of it as a "phone line as a service."
WebRTC	Web Real-Time Communication	A browser-native protocol for real-time audio/video. Unlike raw WebSockets (which just move bytes), WebRTC handles echo cancellation, jitter buffers, codec negotiation, and NAT traversal automatically.
NAT	Network Address Translation	Your router shares one public IP among many devices. NAT makes direct peer-to-peer connections difficult because external callers can't reach your internal IP. WebRTC solves this with ICE/TURN/STUN.
TURN	Traversal Using Relays around NAT	A relay server that forwards media when direct connections fail. About 10-30% of WebRTC calls need a TURN server. LiveKit Cloud includes this; self-hosting means running your own.
STUN	Session Traversal Utilities for NAT	A lightweight server that tells a device its public IP address. Used during connection setup so peers know how to reach each other. Unlike TURN, STUN doesn't relay media.
ICE	Interactive Connectivity Establishment	The process of finding the best connection path between two peers. ICE tries direct connections first, then STUN-assisted connections, then TURN relays as a last resort.
SFU	Selective Forwarding Unit	A media server that receives audio/video streams and forwards them to other participants without mixing or transcoding. LiveKit Server is an SFU — more scalable than mixing servers.
VAD	Voice Activity Detection	Detects when someone is speaking vs. silence. Critical for knowing when the caller has finished talking so the AI can respond. LiveKit uses Silero VAD (a small neural network).
DTMF	Dual-Tone Multi-Frequency	The tones generated when you press phone keypad buttons. Each button produces two simultaneous tones. Used for "Press 1 for sales" menus and PIN entry.
IVR	Interactive Voice Response	The automated phone menus you hear when calling a business — "Press 1 for billing, Press 2 for support." Traditional IVRs use pre-recorded audio and DTMF input. This project replaces IVRs with conversational AI.
E.164	ITU-T E.164 Standard	The international phone number format: `+` followed by country code and number. Example: `+14155551234`. SIP trunks require E.164 format for routing calls correctly.
RTC	Real-Time Communication	Umbrella term for any technology that enables live, low-latency audio/video communication between participants.

How a phone call reaches your AI agent:

Call Flow: PSTN → AI Agent

Caller dials your business number+1 (415) 555-1234 — travels through PSTN (cell tower → carrier → fiber)

SIP Trunk (Twilio)Twilio receives the call and converts it to SIP protocol → sends SIP INVITE to LiveKit

LiveKit Server (SFU)Creates a Room, dispatches your AI agent, routes audio between caller and agent via WebRTC

Your Agent (Python)Receives audio → STT → LLM reasoning → TTS → audio back to caller. All in real-time.

Why This Project Matters

The Production Voice Agent project taught you how voice agents work at the lowest level — raw PCM audio, async queues, manual VAD. That knowledge is essential. But in production, teams use frameworks like LiveKit to ship faster and handle the hard infrastructure problems (NAT traversal, echo cancellation, scaling) automatically.

This project bridges that gap:

What You Built Before	What LiveKit Handles For You
Custom `BargeInDetector` with RMS energy calculation	Silero VAD neural network + turn detection model
Raw WebSocket audio streaming	WebRTC with ICE/TURN/STUN, jitter buffers, echo cancellation
Manual `asyncio.Queue` for audio routing	LiveKit Room with Track publish/subscribe
No real phone integration	SIP trunking — actual phone calls from any phone
Single session, single server	Multi-room, horizontally scalable SFU

Business case: A human receptionist costs $3,000-4,000/month and handles ~40 calls/day. This system handles hundreds of concurrent calls at ~$0.05-0.10/minute, 24/7, in multiple languages.

What You'll Learn

LiveKit Agents framework — AgentSession, Agent, @function_tool
SIP trunking — connecting AI agents to real phone networks via Twilio
WebRTC fundamentals — how LiveKit manages real-time audio transport
Warm handoff — transferring callers to human agents in the same room
Multi-agent patterns — receptionist → scheduling specialist handoff
Function calling in voice — tools that execute while the caller waits
Telephony-specific audio — noise cancellation, DTMF handling, E.164 routing

Tech Stack

Library	Version	Purpose
livekit-agents	1.x	Agent framework — session, lifecycle, tools
livekit-plugins-deepgram	1.x	STT (speech-to-text) via Deepgram Nova-3
livekit-plugins-openai	1.x	LLM reasoning via GPT-4.1-mini
livekit-plugins-cartesia	1.x	TTS (text-to-speech) via Cartesia Sonic-3
livekit-plugins-silero	1.x	VAD (voice activity detection)
livekit-plugins-noise-cancellation	1.x	Background voice cancellation for telephony
livekit-plugins-turn-detector	1.x	Multilingual turn detection model
livekit-api	1.x	LiveKit server API (SIP, rooms, participants)
ChromaDB	0.5.0	Vector store for FAQ knowledge base
sentence-transformers	3.3.0	Embedding model for FAQ retrieval
FastAPI	0.115.0	Dashboard API and webhook receiver
Pydantic Settings	2.6.0	Typed configuration

LiveKit vs DIY: What Changes?

Architecture Comparison

DIY (Production Voice Agent project)

You wrote: StreamingGateway, BargeInDetector, ASRClient, TTSClient, asyncio.Queue routing, WebSocket handler. Total: ~1,400 LOC of infrastructure code + business logic.

LiveKit Agents (this project)

Recommended

LiveKit handles: audio transport, VAD, turn detection, noise cancellation, barge-in, echo cancellation, NAT traversal, scaling. You write: ~1,200 LOC of pure business logic — agent instructions, tools, handoff logic.

What you no longer write manually:

Component	DIY Project	LiveKit Project
Audio transport	`StreamingGateway` + `asyncio.Queue` (120 LOC)	LiveKit Room (0 LOC)
Barge-in detection	`BargeInDetector` with RMS energy (80 LOC)	Silero VAD plugin (0 LOC)
ASR client	`DeepgramASRClient` with WebSocket (90 LOC)	`stt="deepgram/nova-3:multi"` (1 line)
TTS client	`OpenAITTSClient` with streaming (60 LOC)	`tts="cartesia/sonic-3:..."` (1 line)
Turn detection	Manual `speech_final` + `utterance_end_ms`	`MultilingualModel()` (1 line)
Phone connectivity	Not supported	SIP trunk configuration

What you focus on instead: agent personality, function tools, business logic, handoff flows, knowledge base, appointment scheduling.

High-Level Architecture

AI Phone Receptionist System

Phone Network (PSTN)

Caller dials +1 (415) 555-1234

Cell tower / landline → carrier

SIP Trunk (Twilio)

Receives PSTN call

Converts to SIP

Routes to LiveKit via SIP INVITE

LiveKit Server (SFU)

Creates Room per call

Dispatches AI agent

Manages WebRTC tracks

Routes audio between participants

AI Agent (Python)

ReceptionistAgent — greets, routes, answers FAQs

SchedulingAgent — handles appointment booking

@function_tool — check_availability, book_appointment, transfer_to_human

Backend Services

ChromaDB — FAQ knowledge base

SQLite — appointment storage

FastAPI — dashboard + webhooks

Call Lifecycle — from dial to hangup:

Complete Call Lifecycle

1. Inbound CallCaller dials → PSTN → Twilio SIP trunk → LiveKit creates Room → dispatches ReceptionistAgent

2. GreetingAgent on_enter() → 'Thank you for calling. How can I help?' → Cartesia TTS → audio to caller

3. Conversation LoopCaller speaks → Deepgram STT → GPT-4.1-mini reasoning → function tools if needed → TTS response

4. Tool Executioncheck_availability() → queries SQLite. book_appointment() → creates record. lookup_faq() → ChromaDB vector search

5. Handoff (if needed)transfer_to_human() → LiveKit SIP API adds human agent to Room → caller + human in same room → AI steps back

6. Call EndCaller hangs up → Room closed → webhook fires → call record saved

Project Structure

livekit-receptionist/
├── agents/
│   ├── receptionist.py         # Main receptionist agent with tools
│   ├── scheduling.py           # Scheduling specialist agent
│   └── handoff.py              # Human handoff logic
├── services/
│   ├── appointments.py         # Appointment CRUD (SQLite)
│   ├── knowledge_base.py       # FAQ retrieval (ChromaDB)
│   └── call_logger.py          # Call record storage
├── server.py                   # LiveKit AgentServer entry point
├── config.py                   # Pydantic Settings
├── models.py                   # Shared data models
├── dashboard.py                # FastAPI dashboard + webhooks
├── tests/
│   ├── test_receptionist.py
│   ├── test_appointments.py
│   └── test_knowledge_base.py
├── data/
│   └── faq_documents.json      # FAQ seed data
├── .env.local
├── Dockerfile
├── docker-compose.yml
└── requirements.txt

Implementation

Step 0: Setup and Dependencies

requirements.txt

livekit-agents[codecs]~=1.0
livekit-plugins-deepgram~=1.0
livekit-plugins-openai~=1.0
livekit-plugins-cartesia~=1.0
livekit-plugins-silero~=1.0
livekit-plugins-noise-cancellation~=1.0
livekit-plugins-turn-detector~=1.0
livekit-api~=1.0
chromadb==0.5.0
sentence-transformers==3.3.0
fastapi==0.115.0
uvicorn==0.32.0
pydantic-settings==2.6.0
python-dotenv==1.0.0
pytest==8.3.0
pytest-asyncio==0.24.0

.env.local

# LiveKit
LIVEKIT_URL=ws://localhost:7880
LIVEKIT_API_KEY=your-api-key
LIVEKIT_API_SECRET=your-api-secret

# SIP Trunk (Twilio)
SIP_TRUNK_ID=ST_xxxx

# AI Providers
DEEPGRAM_API_KEY=your-deepgram-key
OPENAI_API_KEY=sk-your-openai-key
CARTESIA_API_KEY=your-cartesia-key

# Agent Settings
HUMAN_AGENT_PHONE=+14155559999
BUSINESS_NAME=Sunrise Medical Clinic
BUSINESS_HOURS=Monday-Friday 9AM-5PM

# Knowledge Base
CHROMA_COLLECTION=receptionist_faq
EMBEDDING_MODEL=all-MiniLM-L6-v2
CONFIDENCE_THRESHOLD=0.6

Setting up the LiveKit Server and SIP Trunk:

LiveKit requires three pieces of infrastructure: the LiveKit server, a SIP trunk provider, and SIP dispatch rules. Here is how they connect:

Infrastructure Setup

1. LiveKit ServerRun locally with Docker: docker run -p 7880:7880 livekit/livekit-server. Or use LiveKit Cloud (free tier: 5,000 participant-minutes/month)

2. Twilio SIP TrunkCreate a Twilio account → buy a phone number → create a SIP trunk → point it to your LiveKit server's SIP URI

3. SIP Dispatch RulesTell LiveKit which agent to dispatch when a SIP call arrives. Created via LiveKit CLI (lk)

Create an inbound SIP trunk using the LiveKit CLI:

# Install LiveKit CLI
brew install livekit-cli

# Create inbound SIP trunk (receives calls from Twilio)
lk sip inbound create \
  --request '{
    "trunk": {
      "name": "Twilio Inbound",
      "numbers": ["+14155551234"],
      "krisp_enabled": true
    }
  }'

Create a SIP dispatch rule that routes inbound calls to your agent:

lk sip dispatch create \
  --request '{
    "dispatch_rule": {
      "rule": {
        "dispatchRuleIndividual": {
          "roomPrefix": "call-"
        }
      },
      "roomConfig": {
        "agents": [{
          "agentName": "receptionist"
        }]
      }
    }
  }'

Beginner Breakdown — Infrastructure Setup:

Concept	What It Means
`livekit/livekit-server` Docker image	The open-source SFU server. Manages rooms, participants, and audio tracks. Runs on a single port (7880).
SIP trunk `numbers`	The phone numbers that Twilio will route to LiveKit. When someone calls +14155551234, Twilio sends a SIP INVITE to LiveKit.
`krisp_enabled: true`	Enables Krisp AI noise cancellation on the SIP trunk. Filters out background noise from the caller's environment before your agent hears it.
`dispatchRuleIndividual`	Each inbound call gets its own Room (named `call-{random}`). This means 100 concurrent calls = 100 separate rooms, each with their own agent instance.
`agentName: "receptionist"`	When a call arrives, LiveKit looks for a running agent registered with this name and dispatches it into the new room.

Step 1: Configuration

config.py

from functools import lru_cache

from pydantic_settings import BaseSettings
from pydantic import Field


class Settings(BaseSettings):
    """Receptionist agent configuration from environment."""

    # LiveKit
    livekit_url: str = Field(..., alias="LIVEKIT_URL")
    livekit_api_key: str = Field(..., alias="LIVEKIT_API_KEY")
    livekit_api_secret: str = Field(..., alias="LIVEKIT_API_SECRET")

    # SIP
    sip_trunk_id: str = Field("", alias="SIP_TRUNK_ID")
    human_agent_phone: str = Field("", alias="HUMAN_AGENT_PHONE")

    # AI Providers
    openai_api_key: str = Field(..., alias="OPENAI_API_KEY")
    deepgram_api_key: str = Field(..., alias="DEEPGRAM_API_KEY")
    cartesia_api_key: str = Field("", alias="CARTESIA_API_KEY")

    # Business
    business_name: str = Field("Sunrise Medical Clinic", alias="BUSINESS_NAME")
    business_hours: str = Field("Monday-Friday 9AM-5PM", alias="BUSINESS_HOURS")

    # Knowledge Base
    chroma_collection: str = Field("receptionist_faq", alias="CHROMA_COLLECTION")
    embedding_model: str = Field("all-MiniLM-L6-v2", alias="EMBEDDING_MODEL")
    confidence_threshold: float = Field(0.6, alias="CONFIDENCE_THRESHOLD")

    model_config = {"env_file": ".env.local", "extra": "ignore"}


@lru_cache
def get_settings() -> Settings:
    return Settings()

Step 2: Data Models

models.py

from dataclasses import dataclass, field
from datetime import datetime
from enum import Enum
from typing import Optional


class CallOutcome(str, Enum):
    """How the call ended."""
    COMPLETED = "completed"         # Agent handled it fully
    TRANSFERRED = "transferred"     # Handed off to human
    ABANDONED = "abandoned"         # Caller hung up early
    ERROR = "error"                 # System failure


class AppointmentStatus(str, Enum):
    """Appointment lifecycle states."""
    CONFIRMED = "confirmed"
    CANCELLED = "cancelled"
    RESCHEDULED = "rescheduled"


@dataclass
class Appointment:
    """A scheduled appointment."""
    id: str
    patient_name: str
    phone: str
    date: str              # YYYY-MM-DD
    time: str              # HH:MM
    reason: str
    provider: str = ""     # Doctor/specialist name
    status: AppointmentStatus = AppointmentStatus.CONFIRMED
    created_at: str = ""

    def __post_init__(self):
        if not self.created_at:
            self.created_at = datetime.now().isoformat()


@dataclass
class CallRecord:
    """Record of a completed call."""
    call_id: str
    room_name: str
    caller_phone: str
    duration_seconds: int = 0
    outcome: CallOutcome = CallOutcome.COMPLETED
    tools_used: list[str] = field(default_factory=list)
    appointment_id: Optional[str] = None
    transcript_summary: str = ""
    started_at: str = ""
    ended_at: str = ""

Beginner Breakdown — Data Models:

Python Concept	What It Means
`class CallOutcome(str, Enum)`	A fixed set of values. A call can ONLY end as `completed`, `transferred`, `abandoned`, or `error`. Using an Enum prevents typos — `CallOutcome.TRANSFERED` would raise an error at import time.
`@dataclass`	Auto-generates `__init__`, `__repr__`, and `__eq__` from the field declarations. Less boilerplate than writing constructors manually.
`field(default_factory=list)`	Creates a new empty list for each instance. Never write `tools_used: list = []` — all instances would share the same list object (a classic Python gotcha).
`__post_init__`	Runs after `__init__`. Here it sets `created_at` to the current time if not provided. Useful for computed defaults.

Step 3: FAQ Knowledge Base

The knowledge base answers common caller questions — office hours, insurance, directions, policies — so the LLM doesn't hallucinate answers to factual questions.

services/knowledge_base.py

import json
import logging
from pathlib import Path

import chromadb
from chromadb.utils.embedding_functions import (
    SentenceTransformerEmbeddingFunction,
)

from config import get_settings

logger = logging.getLogger(__name__)


class KnowledgeBase:
    """ChromaDB-backed FAQ retrieval with confidence scoring."""

    def __init__(self):
        settings = get_settings()
        self._client = chromadb.PersistentClient(path="./chroma_data")
        self._embedding_fn = SentenceTransformerEmbeddingFunction(
            model_name=settings.embedding_model
        )
        self._collection = self._client.get_or_create_collection(
            name=settings.chroma_collection,
            embedding_function=self._embedding_fn,
            metadata={"hnsw:space": "cosine"},
        )
        self._threshold = settings.confidence_threshold

    def load_faqs(self, path: str = "data/faq_documents.json") -> int:
        """Seed the knowledge base from a JSON file.

        Expected format: [{"id": "...", "question": "...", "answer": "..."}]
        """
        data = json.loads(Path(path).read_text())
        if not data:
            return 0

        self._collection.upsert(
            ids=[d["id"] for d in data],
            documents=[
                f"Q: {d['question']}\nA: {d['answer']}" for d in data
            ],
            metadatas=[{"question": d["question"]} for d in data],
        )
        logger.info("Loaded %d FAQ documents", len(data))
        return len(data)

    def search(self, query: str, n_results: int = 3) -> dict:
        """Search FAQs and return the best answer with confidence.

        Returns:
            {"answer": str, "confidence": float, "found": bool}
        """
        results = self._collection.query(
            query_texts=[query],
            n_results=n_results,
        )

        if not results["documents"] or not results["documents"][0]:
            return {"answer": "", "confidence": 0.0, "found": False}

        documents = results["documents"][0]
        distances = results["distances"][0] if results["distances"] else []

        # Cosine distance → similarity (0=opposite, 1=identical)
        similarities = [1 - d for d in distances] if distances else []
        best_score = max(similarities) if similarities else 0.0

        if best_score < self._threshold:
            return {
                "answer": "",
                "confidence": best_score,
                "found": False,
            }

        # Return the highest-scoring document
        best_idx = similarities.index(best_score)
        return {
            "answer": documents[best_idx],
            "confidence": best_score,
            "found": True,
        }

data/faq_documents.json

[
  {
    "id": "hours",
    "question": "What are your office hours?",
    "answer": "We are open Monday through Friday, 9 AM to 5 PM. We are closed on weekends and major holidays."
  },
  {
    "id": "insurance",
    "question": "What insurance do you accept?",
    "answer": "We accept most major insurance plans including Blue Cross Blue Shield, Aetna, Cigna, UnitedHealthcare, and Medicare. Please call ahead to verify your specific plan."
  },
  {
    "id": "location",
    "question": "Where are you located?",
    "answer": "We are located at 456 Oak Avenue, Suite 200, San Francisco, CA 94102. Free parking is available in the building garage."
  },
  {
    "id": "new-patient",
    "question": "How do I become a new patient?",
    "answer": "New patients can schedule an initial consultation by calling us or booking online. Please bring your insurance card, photo ID, and any relevant medical records to your first visit."
  },
  {
    "id": "cancellation",
    "question": "What is your cancellation policy?",
    "answer": "We require 24 hours notice for cancellations. Late cancellations or no-shows may incur a 50 dollar fee. We understand emergencies happen and handle those on a case-by-case basis."
  },
  {
    "id": "urgent",
    "question": "What should I do in an emergency?",
    "answer": "If you are experiencing a medical emergency, please call 911 immediately. For urgent but non-emergency concerns during office hours, call us and we will try to see you the same day."
  },
  {
    "id": "telehealth",
    "question": "Do you offer telehealth appointments?",
    "answer": "Yes, we offer telehealth appointments for follow-up visits and certain types of consultations. Ask when scheduling if your visit qualifies for telehealth."
  },
  {
    "id": "referral",
    "question": "Do I need a referral?",
    "answer": "Some insurance plans require a referral from your primary care physician. Check with your insurance provider before scheduling. We can help verify if you are unsure."
  }
]

Understanding the FAQ Search:

FAQ Lookup Flow

Caller asks'Do you take Blue Cross insurance?'

Vector searchQuery embedded → compared against FAQ embeddings → best match: insurance FAQ (similarity: 0.87)

Confidence check0.87 > 0.6 threshold → answer found

Agent speaks'We accept most major plans including Blue Cross Blue Shield, Aetna...'

Why use a knowledge base instead of putting FAQs in the system prompt?

FAQ in Prompt vs Knowledge Base

FAQs in system prompt

Works for 5-10 FAQs. But 50+ FAQs consume thousands of tokens per turn, increasing latency and cost. Every LLM call pays for all FAQs even when the question is about hours.

Knowledge base (ChromaDB)

Recommended

Retrieves only the 1-3 relevant FAQs per question. Scales to thousands of documents. Costs nothing when not queried. Returns a confidence score so the agent knows when to say "I don't know."

Step 4: Appointment Service

services/appointments.py

import logging
import sqlite3
import uuid
from contextlib import contextmanager
from datetime import datetime, timedelta

from models import Appointment, AppointmentStatus

logger = logging.getLogger(__name__)

DB_PATH = "data/appointments.db"

# Available time slots (30-minute intervals, 9 AM to 4:30 PM)
SLOT_START_HOUR = 9
SLOT_END_HOUR = 17
SLOT_DURATION_MINUTES = 30


def _init_db():
    """Create the appointments table if it does not exist."""
    with _get_conn() as conn:
        conn.execute("""
            CREATE TABLE IF NOT EXISTS appointments (
                id TEXT PRIMARY KEY,
                patient_name TEXT NOT NULL,
                phone TEXT NOT NULL,
                date TEXT NOT NULL,
                time TEXT NOT NULL,
                reason TEXT NOT NULL,
                provider TEXT DEFAULT '',
                status TEXT DEFAULT 'confirmed',
                created_at TEXT NOT NULL
            )
        """)


@contextmanager
def _get_conn():
    conn = sqlite3.connect(DB_PATH)
    conn.row_factory = sqlite3.Row
    try:
        yield conn
        conn.commit()
    finally:
        conn.close()


def get_available_slots(date: str) -> list[str]:
    """Return available 30-minute slots for a given date.

    Args:
        date: Date in YYYY-MM-DD format.

    Returns:
        List of available time strings like ["09:00", "09:30", "10:00", ...]
    """
    # Generate all possible slots
    all_slots = []
    current = datetime.strptime(f"{date} {SLOT_START_HOUR:02d}:00", "%Y-%m-%d %H:%M")
    end = datetime.strptime(f"{date} {SLOT_END_HOUR:02d}:00", "%Y-%m-%d %H:%M")

    while current < end:
        all_slots.append(current.strftime("%H:%M"))
        current += timedelta(minutes=SLOT_DURATION_MINUTES)

    # Remove booked slots
    with _get_conn() as conn:
        rows = conn.execute(
            "SELECT time FROM appointments WHERE date = ? AND status = ?",
            (date, AppointmentStatus.CONFIRMED.value),
        ).fetchall()

    booked = {row["time"] for row in rows}
    available = [s for s in all_slots if s not in booked]

    return available


def book_appointment(
    patient_name: str,
    phone: str,
    date: str,
    time: str,
    reason: str,
    provider: str = "",
) -> Appointment:
    """Book a new appointment.

    Returns:
        The created Appointment object.

    Raises:
        ValueError: If the slot is already booked.
    """
    available = get_available_slots(date)
    if time not in available:
        raise ValueError(f"Time slot {time} on {date} is not available")

    appointment = Appointment(
        id=str(uuid.uuid4())[:8],
        patient_name=patient_name,
        phone=phone,
        date=date,
        time=time,
        reason=reason,
        provider=provider,
    )

    with _get_conn() as conn:
        conn.execute(
            """INSERT INTO appointments
               (id, patient_name, phone, date, time, reason, provider, status, created_at)
               VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)""",
            (
                appointment.id,
                appointment.patient_name,
                appointment.phone,
                appointment.date,
                appointment.time,
                appointment.reason,
                appointment.provider,
                appointment.status.value,
                appointment.created_at,
            ),
        )

    logger.info("Booked appointment %s for %s on %s at %s",
                appointment.id, patient_name, date, time)
    return appointment


def cancel_appointment(appointment_id: str) -> bool:
    """Cancel an existing appointment."""
    with _get_conn() as conn:
        result = conn.execute(
            "UPDATE appointments SET status = ? WHERE id = ? AND status = ?",
            (AppointmentStatus.CANCELLED.value, appointment_id,
             AppointmentStatus.CONFIRMED.value),
        )
    return result.rowcount > 0


# Initialize database on import
_init_db()

Beginner Breakdown — Appointment Service:

Python Concept	What It Means
`@contextmanager`	Turns a generator function into a `with` statement. Code before `yield` = setup (open connection), code after `yield` = cleanup (close connection). Ensures the database connection always closes, even if an error occurs.
`conn.row_factory = sqlite3.Row`	Makes query results accessible by column name (`row["time"]`) instead of index (`row[0]`). Much more readable.
`str(uuid.uuid4())[:8]`	Generate a random 8-character ID like `"a1b2c3d4"`. Short enough to read over the phone — "Your confirmation number is alpha-one-bravo-two."
`timedelta(minutes=30)`	A duration of 30 minutes. Adding it to a datetime gives you the next time slot. `9:00 + 30min = 9:30`.
`{row["time"] for row in rows}`	A set comprehension — creates a set of booked times for O(1) lookup. `"09:30" in booked` is instant vs scanning a list.

Step 5: Call Logger

services/call_logger.py

import json
import logging
import sqlite3
from contextlib import contextmanager

from models import CallRecord

logger = logging.getLogger(__name__)

DB_PATH = "data/calls.db"


def _init_db():
    with _get_conn() as conn:
        conn.execute("""
            CREATE TABLE IF NOT EXISTS call_records (
                call_id TEXT PRIMARY KEY,
                room_name TEXT NOT NULL,
                caller_phone TEXT DEFAULT '',
                duration_seconds INTEGER DEFAULT 0,
                outcome TEXT DEFAULT 'completed',
                tools_used TEXT DEFAULT '[]',
                appointment_id TEXT DEFAULT '',
                transcript_summary TEXT DEFAULT '',
                started_at TEXT NOT NULL,
                ended_at TEXT DEFAULT ''
            )
        """)


@contextmanager
def _get_conn():
    conn = sqlite3.connect(DB_PATH)
    conn.row_factory = sqlite3.Row
    try:
        yield conn
        conn.commit()
    finally:
        conn.close()


def save_call(record: CallRecord) -> None:
    """Save a completed call record."""
    with _get_conn() as conn:
        conn.execute(
            """INSERT OR REPLACE INTO call_records
               (call_id, room_name, caller_phone, duration_seconds,
                outcome, tools_used, appointment_id,
                transcript_summary, started_at, ended_at)
               VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
            (
                record.call_id,
                record.room_name,
                record.caller_phone,
                record.duration_seconds,
                record.outcome.value,
                json.dumps(record.tools_used),
                record.appointment_id or "",
                record.transcript_summary,
                record.started_at,
                record.ended_at,
            ),
        )
    logger.info("Saved call record: %s (outcome=%s)", record.call_id, record.outcome.value)


def get_recent_calls(limit: int = 20) -> list[dict]:
    """Retrieve recent call records for the dashboard."""
    with _get_conn() as conn:
        rows = conn.execute(
            "SELECT * FROM call_records ORDER BY started_at DESC LIMIT ?",
            (limit,),
        ).fetchall()

    return [dict(row) for row in rows]


_init_db()

Step 6: Receptionist Agent

This is the core of the project. The ReceptionistAgent inherits from LiveKit's Agent class and defines the AI personality, function tools, and lifecycle hooks. LiveKit handles all the audio plumbing — you write pure business logic.

agents/receptionist.py

import logging
from typing import Any

from livekit.agents import Agent, RunContext, function_tool

from config import get_settings
from services.appointments import get_available_slots, book_appointment
from services.knowledge_base import KnowledgeBase

logger = logging.getLogger(__name__)

# Initialize knowledge base once
_kb = KnowledgeBase()
_kb.load_faqs()


class ReceptionistAgent(Agent):
    """AI phone receptionist for a medical clinic.

    Handles: greetings, FAQ answers, appointment scheduling,
    and transfers to human agents when needed.
    """

    def __init__(self, job_context=None) -> None:
        settings = get_settings()
        self.job_context = job_context

        super().__init__(
            instructions=f"""You are the phone receptionist for {settings.business_name}.
You answer incoming phone calls with warmth and professionalism.

RULES:
1. Be concise. Callers are LISTENING, not reading. Keep responses to 1-2 sentences.
2. Never spell out URLs, emails, or long numbers. Say "I can text you that information."
3. For factual questions (hours, insurance, location), ALWAYS use the lookup_faq tool.
   Never guess — if the tool returns no result, say "I don't have that information handy,
   let me transfer you to someone who can help."
4. For appointment scheduling, use check_availability first, then confirm with the caller
   before calling book_appointment.
5. If the caller asks for a human, a doctor, or says "transfer me," use transfer_to_human
   immediately. Do not try to convince them to stay.
6. Speak naturally. Avoid bullet points, markdown, or any text formatting.
7. If the caller sounds upset or frustrated, acknowledge their feelings before solving
   the problem.

BUSINESS INFO:
- Name: {settings.business_name}
- Hours: {settings.business_hours}
""",
        )

    async def on_enter(self) -> None:
        """Called when this agent becomes active. Greet the caller."""
        settings = get_settings()
        await self.session.generate_reply(
            instructions=f"Greet the caller warmly. Say: 'Thank you for calling "
            f"{settings.business_name}, how can I help you today?'"
        )

    @function_tool()
    async def lookup_faq(
        self,
        context: RunContext,
        question: str,
    ) -> dict[str, Any]:
        """Search the knowledge base for answers to common questions
        about office hours, insurance, location, policies, and services.

        Args:
            question: The caller's question to look up.
        """
        result = _kb.search(question)

        if result["found"]:
            logger.info("FAQ hit: %.2f confidence for '%s'",
                       result["confidence"], question[:50])
            return {
                "answer": result["answer"],
                "confidence": result["confidence"],
            }

        logger.info("FAQ miss: %.2f confidence for '%s'",
                    result["confidence"], question[:50])
        return {
            "answer": "No matching FAQ found.",
            "confidence": result["confidence"],
            "suggestion": "Offer to transfer to a staff member who can help.",
        }

    @function_tool()
    async def check_availability(
        self,
        context: RunContext,
        date: str,
    ) -> dict[str, Any]:
        """Check available appointment slots for a specific date.

        Args:
            date: The date to check in YYYY-MM-DD format.
        """
        slots = get_available_slots(date)

        if not slots:
            return {
                "available": False,
                "message": f"No slots available on {date}.",
                "suggestion": "Try the next business day.",
            }

        # Group slots for easier reading over the phone
        morning = [s for s in slots if int(s.split(":")[0]) < 12]
        afternoon = [s for s in slots if int(s.split(":")[0]) >= 12]

        return {
            "available": True,
            "date": date,
            "morning_slots": morning,
            "afternoon_slots": afternoon,
            "total": len(slots),
        }

    @function_tool()
    async def book_appointment(
        self,
        context: RunContext,
        patient_name: str,
        phone: str,
        date: str,
        time: str,
        reason: str,
    ) -> dict[str, Any]:
        """Book an appointment after confirming details with the caller.

        Args:
            patient_name: The patient's full name.
            phone: The patient's phone number for confirmation.
            date: Appointment date in YYYY-MM-DD format.
            time: Appointment time in HH:MM format.
            reason: Brief reason for the visit.
        """
        try:
            appointment = book_appointment(
                patient_name=patient_name,
                phone=phone,
                date=date,
                time=time,
                reason=reason,
            )
            return {
                "success": True,
                "confirmation_id": appointment.id,
                "date": date,
                "time": time,
                "message": f"Appointment confirmed for {patient_name}.",
            }
        except ValueError as exc:
            return {
                "success": False,
                "error": str(exc),
                "suggestion": "Check availability for a different time.",
            }

    @function_tool()
    async def transfer_to_human(
        self,
        context: RunContext,
        reason: str,
    ) -> dict[str, Any]:
        """Transfer the caller to a human staff member.
        Use this when the caller explicitly asks for a human,
        when you cannot answer their question, or when the situation
        requires human judgment.

        Args:
            reason: Brief reason for the transfer.
        """
        import os
        import uuid
        from livekit.protocol import api

        if not self.job_context:
            await self.session.say(
                "I'm sorry, I'm unable to transfer the call right now. "
                "Please try calling back."
            )
            return {"success": False, "error": "No job context available"}

        phone = os.environ.get("HUMAN_AGENT_PHONE", "")
        if not phone:
            await self.session.say(
                "I'm sorry, no staff members are available for transfer. "
                "Can I take a message instead?"
            )
            return {"success": False, "error": "No human agent phone configured"}

        sip_trunk_id = os.environ.get("SIP_TRUNK_ID", "")
        room_name = self.job_context.room.name

        try:
            # Add human agent to the same Room via SIP
            await self.job_context.api.sip.create_sip_participant(
                api.CreateSIPParticipantRequest(
                    sip_trunk_id=sip_trunk_id,
                    sip_call_to=phone,
                    room_name=room_name,
                    participant_identity=f"human_{uuid.uuid4().hex[:8]}",
                    participant_name="Staff Member",
                    krisp_enabled=True,
                )
            )

            await self.session.say(
                "I'm transferring you to a staff member now. "
                "Please hold for just a moment."
            )
            return {"success": True, "reason": reason}

        except Exception as exc:
            logger.error("Transfer failed: %s", exc)
            await self.session.say(
                "I'm sorry, I couldn't reach a staff member right now. "
                "Can I take your name and number so someone can call you back?"
            )
            return {"success": False, "error": str(exc)}

★ Insight ───────────────────────────────────── 1. LiveKit's Agent class vs your DIY DialogueManager: In the previous project, you manually built a DialogueManager that assembled OpenAI messages, parsed tool calls, and made follow-up LLM calls. Here, LiveKit's Agent base class handles all of that — you just define instructions and @function_tool methods. The framework manages the OpenAI function-calling protocol automatically.

2. @function_tool uses docstrings as schemas: The docstring and type hints on each tool method are automatically converted to the JSON schema the LLM sees. The Args: section in the docstring becomes parameter descriptions. This is why the docstrings are written for the LLM, not for Python developers.

3. Warm handoff via SIP: create_sip_participant() adds a human to the same Room as the caller. Both hear each other through LiveKit's SFU. The AI agent can stay in the room (listening, taking notes) or leave — unlike a cold transfer where the caller is disconnected and reconnected. ─────────────────────────────────────────────────

Understanding the Transfer Flow:

Warm Handoff: Caller → Human Agent

Caller says 'let me speak to someone'LLM decides to call transfer_to_human(reason='caller request')

create_sip_participant()LiveKit dials the human agent's phone via SIP trunk. Human's phone rings.

Human answersHuman joins the SAME Room as the caller. Both audio tracks routed by LiveKit SFU.

Agent says 'transferring you now'AI can stay (assist mode) or leave the room. Caller + human talk directly.

Why warm handoff beats cold transfer:

Cold Transfer vs Warm Handoff

Cold transfer (traditional IVR)

Caller is disconnected → reconnected to human → must re-explain their problem from scratch. "I already told the robot my name and appointment details!"

Warm handoff (LiveKit Room)

Recommended

Human joins the existing conversation. The AI can brief the human: "This caller needs to reschedule their Thursday appointment." No information lost. Caller feels respected.

Beginner Breakdown — Receptionist Agent:

Python Concept	What It Means
`class ReceptionistAgent(Agent)`	Inherits from LiveKit's `Agent` base class. You get session management, LLM integration, and tool execution for free.
`super().__init__(instructions=...)`	Passes the system prompt to the base class. LiveKit sends this as the `system` message in every LLM call.
`async def on_enter(self)`	Lifecycle hook — called when this agent becomes active in the session. Perfect for the initial greeting.
`@function_tool()`	Decorator that registers a method as a callable tool for the LLM. The LLM sees the function name, docstring, and parameter types.
`RunContext`	Passed to every tool call. Contains the current session, room, and agent state. Useful for accessing conversation context.
`self.session.say("...")`	Speaks text immediately to the caller via TTS. Unlike `generate_reply()`, this doesn't involve the LLM — it's a direct TTS utterance.
`self.session.generate_reply(instructions=...)`	Asks the LLM to generate a response with additional instructions. The LLM considers conversation history + these instructions.

Step 7: Agent Server Entry Point

This is where LiveKit, SIP, and your agent come together. The AgentServer listens for dispatched sessions and creates the voice pipeline for each call.

server.py

import logging

from dotenv import load_dotenv

from livekit import agents, rtc
from livekit.agents import AgentServer, AgentSession, JobContext, JobProcess, room_io
from livekit.plugins import noise_cancellation, silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel

from agents.receptionist import ReceptionistAgent

load_dotenv(".env.local")

logger = logging.getLogger(__name__)
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s %(name)s %(levelname)s %(message)s",
)

server = AgentServer()


def prewarm(proc: JobProcess):
    """Pre-load heavy models once per worker process.

    VAD (Voice Activity Detection) loads a neural network from disk.
    Loading it once here and reusing across sessions saves ~2s per call.
    """
    proc.userdata["vad"] = silero.VAD.load()


server.setup_fnc = prewarm


@server.rtc_session(agent_name="receptionist")
async def handle_call(ctx: JobContext):
    """Handle one inbound phone call.

    LiveKit dispatches this function for each incoming SIP call
    that matches the dispatch rule with agentName="receptionist".
    """
    # Build the voice pipeline
    session = AgentSession(
        stt="deepgram/nova-3:multi",
        llm="openai/gpt-4.1-mini",
        tts="cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
        vad=ctx.proc.userdata["vad"],
        turn_detection=MultilingualModel(),
    )

    # Create the receptionist agent with job context (needed for SIP transfers)
    agent = ReceptionistAgent(job_context=ctx)

    # Start the session with telephony-optimized audio
    await session.start(
        room=ctx.room,
        agent=agent,
        room_options=room_io.RoomOptions(
            audio_input=room_io.AudioInputOptions(
                noise_cancellation=_get_noise_cancellation,
            ),
        ),
    )

    # Connect to the room (makes the agent a participant)
    await ctx.connect()

    logger.info("Receptionist agent started in room %s", ctx.room.name)


def _get_noise_cancellation(params):
    """Select noise cancellation mode based on caller type.

    SIP callers (phone calls) get telephony-optimized cancellation
    that handles PSTN background noise and echo. Browser callers
    get standard background voice cancellation.
    """
    if params.participant.kind == rtc.ParticipantKind.PARTICIPANT_KIND_SIP:
        return noise_cancellation.BVCTelephony()
    return noise_cancellation.BVC()


if __name__ == "__main__":
    agents.cli.run_app(server)

★ Insight ───────────────────────────────────── 1. Prewarm pattern: Loading Silero VAD takes ~2 seconds (it's a PyTorch model). The prewarm function runs once per worker process, not per call. 100 calls share the same loaded VAD model. This is why ctx.proc.userdata["vad"] works — it's stored at the process level.

2. String-based STT/LLM/TTS: "deepgram/nova-3:multi" is LiveKit's provider string format: provider/model:variant. This is a recent API simplification — previously you had to instantiate plugin classes manually. The string format lets LiveKit handle provider initialization and configuration.

3. One session per call, one agent per session: Each SIP call gets its own Room, its own AgentSession, and its own ReceptionistAgent instance. No shared state between calls — this is how LiveKit achieves horizontal scaling. ─────────────────────────────────────────────────

Beginner Breakdown — Agent Server:

What Happens When a Call Arrives

SIP Call Arrives

Caller dials +14155551234 → Twilio → LiveKit Server

LiveKit Dispatch

Creates Room 'call-abc123'

Finds agent_name='receptionist'

Calls handle_call(ctx)

AgentSession Created

STT: Deepgram Nova-3 (multilingual)

LLM: GPT-4.1-mini

TTS: Cartesia Sonic-3

VAD: Silero (pre-loaded)

Turn Detection: Multilingual Model

Agent Active

on_enter() → greeting

Caller speaks → STT → LLM → tools → TTS

Loop until hangup or transfer

Python Concept	What It Means
`AgentServer()`	LiveKit's application container. Listens for dispatched jobs from the LiveKit server. Similar to FastAPI's `app = FastAPI()`.
`@server.rtc_session(agent_name="receptionist")`	Registers this function to handle sessions dispatched to agent name "receptionist". The dispatch rule in Step 0 routes SIP calls here.
`JobContext`	Contains: `ctx.room` (the LiveKit Room), `ctx.proc` (the worker process with shared userdata), `ctx.api` (LiveKit server API for SIP operations).
`AgentSession(stt=..., llm=..., tts=...)`	The voice pipeline. LiveKit connects these in sequence: audio → STT → LLM → TTS → audio. All streaming, all real-time.
`session.start(room=ctx.room, agent=agent)`	Connects the pipeline to the room and activates the agent. From this point, the agent can hear the caller and speak.
`ctx.connect()`	Makes the agent a visible participant in the room. Required for audio to flow.
`agents.cli.run_app(server)`	Starts the agent worker process. Connects to the LiveKit server and waits for dispatched jobs.

How the voice pipeline processes each turn:

Single Turn in AgentSession

Caller speaksAudio frames flow from Room → VAD detects speech start

STT (Deepgram Nova-3)Audio → text transcript. Turn detection model decides when the caller is done speaking.

LLM (GPT-4.1-mini)Transcript + conversation history + tool schemas → response (text or tool call)

Tool execution (if needed)@function_tool methods run → results sent back to LLM → LLM generates final response

TTS (Cartesia Sonic-3)Response text → streaming audio chunks → played to caller through the Room

Step 8: Dashboard API

A simple FastAPI server for viewing call records and managing the system. In production, this would power an admin dashboard.

dashboard.py

import logging
from datetime import datetime

from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware

from services.appointments import get_available_slots
from services.call_logger import get_recent_calls

logger = logging.getLogger(__name__)

app = FastAPI(title="Receptionist Dashboard", version="1.0.0")

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_methods=["*"],
    allow_headers=["*"],
)


@app.get("/api/calls")
async def list_calls(limit: int = 20):
    """List recent call records."""
    return {"calls": get_recent_calls(limit)}


@app.get("/api/availability/{date}")
async def check_date(date: str):
    """Check available appointment slots for a date."""
    slots = get_available_slots(date)
    return {"date": date, "slots": slots, "total": len(slots)}


@app.get("/api/stats")
async def stats():
    """Basic call statistics."""
    calls = get_recent_calls(100)

    total = len(calls)
    transferred = sum(1 for c in calls if c.get("outcome") == "transferred")
    completed = sum(1 for c in calls if c.get("outcome") == "completed")

    return {
        "total_calls": total,
        "completed": completed,
        "transferred": transferred,
        "transfer_rate": transferred / total if total > 0 else 0,
        "generated_at": datetime.now().isoformat(),
    }


@app.get("/health")
async def health():
    return {"status": "ok"}

Step 9: Tests

tests/test_receptionist.py

import pytest

from services.knowledge_base import KnowledgeBase


class TestKnowledgeBase:
    def setup_method(self):
        self.kb = KnowledgeBase()
        self.kb.load_faqs()

    def test_finds_hours_question(self):
        result = self.kb.search("What time do you open?")
        assert result["found"] is True
        assert result["confidence"] > 0.6
        assert "9 AM" in result["answer"] or "Monday" in result["answer"]

    def test_finds_insurance_question(self):
        result = self.kb.search("Do you take Blue Cross?")
        assert result["found"] is True
        assert "Blue Cross" in result["answer"]

    def test_returns_not_found_for_irrelevant_query(self):
        result = self.kb.search("What is the capital of France?")
        # Should not match any FAQ with high confidence
        assert result["found"] is False or result["confidence"] < 0.6

    def test_finds_location(self):
        result = self.kb.search("Where is your office?")
        assert result["found"] is True
        assert "Oak Avenue" in result["answer"] or "San Francisco" in result["answer"]

    def test_finds_cancellation_policy(self):
        result = self.kb.search("What if I need to cancel?")
        assert result["found"] is True
        assert "24 hours" in result["answer"]

tests/test_appointments.py

import os
import pytest

# Use a test database
os.environ.setdefault("DB_PATH", ":memory:")

from services.appointments import (
    get_available_slots,
    book_appointment,
    cancel_appointment,
)


class TestAppointmentSlots:
    def test_all_slots_available_on_empty_day(self):
        slots = get_available_slots("2099-01-15")
        assert len(slots) > 0
        assert "09:00" in slots
        assert "16:30" in slots

    def test_slot_format(self):
        slots = get_available_slots("2099-01-15")
        for slot in slots:
            hour, minute = slot.split(":")
            assert 0 <= int(hour) <= 23
            assert int(minute) in (0, 30)


class TestBookAppointment:
    def test_book_and_confirm(self):
        apt = book_appointment(
            patient_name="John Doe",
            phone="+14155551111",
            date="2099-02-01",
            time="10:00",
            reason="Annual checkup",
        )
        assert apt.id is not None
        assert apt.patient_name == "John Doe"
        assert apt.date == "2099-02-01"
        assert apt.time == "10:00"

    def test_double_booking_raises(self):
        book_appointment(
            patient_name="Jane Smith",
            phone="+14155552222",
            date="2099-03-01",
            time="11:00",
            reason="Follow-up",
        )
        with pytest.raises(ValueError, match="not available"):
            book_appointment(
                patient_name="Bob Wilson",
                phone="+14155553333",
                date="2099-03-01",
                time="11:00",
                reason="Consultation",
            )

    def test_slot_removed_after_booking(self):
        book_appointment(
            patient_name="Alice Brown",
            phone="+14155554444",
            date="2099-04-01",
            time="14:00",
            reason="Lab results",
        )
        slots = get_available_slots("2099-04-01")
        assert "14:00" not in slots


class TestCancelAppointment:
    def test_cancel_existing(self):
        apt = book_appointment(
            patient_name="To Cancel",
            phone="+14155555555",
            date="2099-05-01",
            time="09:00",
            reason="Test",
        )
        assert cancel_appointment(apt.id) is True

    def test_cancel_nonexistent(self):
        assert cancel_appointment("nonexistent-id") is False

tests/test_knowledge_base.py

import pytest

from services.knowledge_base import KnowledgeBase


class TestKnowledgeBaseEdgeCases:
    def setup_method(self):
        self.kb = KnowledgeBase()
        self.kb.load_faqs()

    def test_empty_query_returns_low_confidence(self):
        result = self.kb.search("")
        assert result["confidence"] < 0.6 or not result["found"]

    def test_returns_dict_format(self):
        result = self.kb.search("office hours")
        assert "answer" in result
        assert "confidence" in result
        assert "found" in result
        assert isinstance(result["confidence"], float)

    def test_confidence_between_0_and_1(self):
        result = self.kb.search("Do you accept insurance?")
        assert 0.0 <= result["confidence"] <= 1.0

    def test_multiple_searches_consistent(self):
        """Same query should return the same result."""
        r1 = self.kb.search("What are your hours?")
        r2 = self.kb.search("What are your hours?")
        assert r1["answer"] == r2["answer"]
        assert abs(r1["confidence"] - r2["confidence"]) < 0.01

Step 10: Docker Deployment

Dockerfile

FROM python:3.11-slim AS builder

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

FROM python:3.11-slim

WORKDIR /app
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin

COPY agents/ ./agents/
COPY services/ ./services/
COPY data/ ./data/
COPY server.py config.py models.py dashboard.py ./

# Create data directory for SQLite databases
RUN mkdir -p /app/data

EXPOSE 8000

CMD ["python", "server.py", "start"]

docker-compose.yml

services:
  # LiveKit Server (SFU)
  livekit-server:
    image: livekit/livekit-server:latest
    ports:
      - "7880:7880"   # WebSocket + HTTP
      - "7881:7881"   # WebRTC (TCP)
      - "50000-50100:50000-50100/udp"  # WebRTC (UDP)
    environment:
      - LIVEKIT_KEYS=devkey:secret
    command: --dev --bind 0.0.0.0

  # AI Receptionist Agent
  receptionist-agent:
    build: .
    env_file: .env.local
    depends_on:
      - livekit-server
    volumes:
      - agent_data:/app/data
    restart: unless-stopped

  # Dashboard API
  dashboard:
    build: .
    command: uvicorn dashboard:app --host 0.0.0.0 --port 8000
    ports:
      - "8000:8000"
    volumes:
      - agent_data:/app/data
    restart: unless-stopped

volumes:
  agent_data:

Beginner Breakdown — Docker Compose Services:

Docker Compose Services

livekit-server (port 7880)

SFU — routes audio between caller and agent

Manages Rooms and participants

--dev mode for local development

receptionist-agent

Runs server.py — connects to LiveKit and waits for calls

Creates one AgentSession per inbound SIP call

Shares /app/data volume for SQLite databases

dashboard (port 8000)

FastAPI app for viewing call logs and availability

Reads from the same SQLite database via shared volume

Docker Concept	What It Means
`--dev`	LiveKit dev mode — auto-generates API keys, enables test features. Never use in production.
`50000-50100/udp`	WebRTC media ports. Audio travels over UDP for lowest latency. The range allows up to 100 concurrent connections.
`volumes: agent_data`	Shared volume between agent and dashboard. Both read/write the same SQLite database files.
`depends_on: livekit-server`	Agent starts after LiveKit server. Without this, the agent would fail to connect.

Running the Application

Start everything with Docker Compose:

docker-compose up -d livekit-server
docker-compose up receptionist-agent dashboard

Or run locally for development:

# Terminal 1: Start LiveKit server
docker run --rm -p 7880:7880 -p 7881:7881 \
  -p 50000-50100:50000-50100/udp \
  -e LIVEKIT_KEYS=devkey:secret \
  livekit/livekit-server --dev --bind 0.0.0.0

# Terminal 2: Start the agent
python server.py start

# Terminal 3: Start the dashboard
uvicorn dashboard:app --reload --port 8000

Test with a real phone call (requires Twilio SIP trunk):

Configure Twilio SIP trunk to point to your LiveKit server
Call your Twilio phone number
The agent should greet you and respond to questions

Test without a phone (LiveKit Playground):

# Open LiveKit's web playground to test via browser
# Visit: https://agents-playground.livekit.io
# Enter your LiveKit server URL and API credentials
# Click "Connect" — your agent will activate via WebRTC instead of SIP

Check the dashboard:

# View recent calls
curl http://localhost:8000/api/calls

# Check appointment availability
curl http://localhost:8000/api/availability/2026-04-15

# View call statistics
curl http://localhost:8000/api/stats

Run the test suite:

pytest tests/ -v

Telephony Configuration Guide

Setting up SIP trunking is the most infrastructure-heavy part of this project. Here is a complete walkthrough:

SIP Trunk Setup Flow

1. Twilio AccountCreate account → buy a phone number (E.164 format: +14155551234). Cost: ~$1/month + $0.008/min

2. Create SIP Trunk in TwilioTwilio Console → Elastic SIP Trunking → Create Trunk → add your phone number as Origination URI

3. Point Trunk to LiveKitSet Termination SIP URI to your LiveKit server's public address. LiveKit Cloud provides this automatically.

4. Create Inbound Trunk in LiveKitlk sip inbound create — tells LiveKit to accept SIP calls from Twilio's IP ranges

5. Create Dispatch Rulelk sip dispatch create — routes incoming calls to your agent_name='receptionist'

Provider	Phone Number Cost	Per-Minute Cost	Notes
Twilio	~$1/month	~$0.008/min inbound	Most popular, excellent docs
Telnyx	~$1/month	~$0.005/min inbound	Lower cost, good quality
Vonage	~$1/month	~$0.007/min inbound	Global coverage

Total cost per conversation minute (all inclusive):

Component	Cost/min
SIP trunk (Twilio)	~$0.008
LiveKit Cloud (or $0 self-hosted)	~$0.003
STT (Deepgram Nova-3)	~$0.005
LLM (GPT-4.1-mini)	~$0.01-0.03
TTS (Cartesia Sonic-3)	~$0.01-0.02
Total	~$0.04-0.07

Compare this to a human receptionist at ~$25/hour (or ~$0.42/minute). The AI receptionist is approximately 6-10x cheaper per minute while handling unlimited concurrent calls.

Debugging Tips

Problem	Likely Cause	Fix
Agent doesn't start	Agent name mismatch	Verify `agent_name="receptionist"` in `server.py` matches the dispatch rule
No audio from caller	SIP trunk misconfigured	Check Twilio trunk Termination URI points to correct LiveKit address
Agent speaks but caller can't hear	Firewall blocking UDP	Open ports 50000-50100/udp for WebRTC media
High latency (>3s per turn)	LLM or TTS slow	Check which stage is slow — STT, LLM, or TTS. Try GPT-4.1-mini instead of GPT-4.1
FAQ tool returns wrong answers	Low similarity threshold	Adjust `CONFIDENCE_THRESHOLD` in .env — higher means stricter matching
Transfer fails	SIP trunk ID wrong	Verify `SIP_TRUNK_ID` matches your outbound trunk (not inbound)
Agent talks over caller	Turn detection too aggressive	Adjust VAD sensitivity or try different turn detection model
Echo on phone calls	Wrong noise cancellation	Ensure `BVCTelephony()` is used for SIP participants, not `BVC()`
Agent keeps greeting after transfer	Agent still active in room	After transfer, consider having the agent leave the room or go silent

Extensions

Difficulty	Extension	Description
Easy	Appointment reminders	Send SMS via Twilio 24 hours before appointments
Easy	Call recording	Enable LiveKit Egress to record calls for quality review
Medium	Multi-language receptionist	Detect caller language and switch STT/TTS locale dynamically
Medium	DTMF menu fallback	Handle "Press 1 for appointments" for callers who prefer traditional IVR
Medium	CRM integration	Look up caller by phone number in a CRM to personalize greetings
Hard	AI sales outreach agent	Outbound SIP calls to leads with CRM integration and objection handling
Hard	Multilingual support hotline	Language detection + dynamic provider switching + language-matched human agents
Hard	Voicemail with transcription	Detect voicemail, leave a message, transcribe incoming voicemails

Future Case Studies — The "AI sales outreach agent" and "Multilingual support hotline" extensions above are planned as full case studies in the AI Agents category, demonstrating production deployments of LiveKit voice agents in sales and international customer support.

Key Concepts Recap

Concept	What It Is	Why It Matters
SIP Trunking	Bridge between internet and phone network (PSTN)	Lets your AI agent answer real phone calls, not just browser connections
LiveKit Room	Virtual space where participants exchange audio/video	Each call gets its own room — isolated, scalable, multi-participant
AgentSession	The STT → LLM → TTS pipeline	Handles the entire voice AI loop automatically — you write business logic, not plumbing
@function_tool	Decorator that exposes a method as an LLM tool	The LLM can call your Python functions mid-conversation to look up data or take actions
Warm Handoff	Adding a human to the same Room as the caller	No disconnection, no re-explanation — the human joins the existing conversation
VAD (Silero)	Neural network that detects speech vs. silence	Knows when the caller has finished talking so the AI doesn't interrupt
Turn Detection	Model that predicts conversational turn boundaries	More sophisticated than VAD alone — handles pauses, thinking, and filler words
Noise Cancellation	AI-powered audio filtering	`BVCTelephony()` removes PSTN noise, echo, and background voices from phone calls
SFU	Selective Forwarding Unit (LiveKit Server)	Routes audio between participants without mixing — scales to hundreds of concurrent calls
WebRTC	Real-time audio/video protocol with built-in NAT traversal	Handles the hard networking problems (firewalls, echo, jitter) that raw WebSockets cannot

Resources

Beginner Glossary

Term	Plain English
SIP	The signaling protocol that sets up phone calls over the internet. Like HTTP is for web pages, SIP is for voice calls.
PSTN	The traditional phone network — the physical infrastructure that carries calls from cell towers and landlines.
SIP Trunk	A service (Twilio, Telnyx) that gives you a phone number and bridges between the internet and the PSTN.
WebRTC	Browser technology for real-time audio/video with built-in echo cancellation, encryption, and firewall traversal.
SFU	A server that forwards audio streams between participants without mixing them. LiveKit Server is an SFU.
NAT Traversal	The process of establishing direct connections between devices behind routers/firewalls. WebRTC uses ICE/STUN/TURN for this.
VAD	Voice Activity Detection — a neural network that detects when someone is speaking vs. silence.
DTMF	The beep tones when you press phone buttons. Each button makes two tones at once (dual-tone).
IVR	The automated phone menus: "Press 1 for billing." This project replaces IVRs with conversational AI.
E.164	International phone number format: +14155551234. The + and country code ensure global routing.
PCM	Raw audio as a list of numbers representing sound wave samples. The simplest audio format.
Krisp	AI noise cancellation technology. Filters background noise in real-time before your agent processes audio.
Warm Handoff	Transferring a caller to a human without disconnecting — the human joins the existing conversation.
Cold Transfer	Traditional transfer where the caller is disconnected and reconnected to someone new, losing context.
Participant	Anyone in a LiveKit Room — the caller, the AI agent, or a human agent. Each publishes and subscribes to audio tracks.
Room	A LiveKit virtual space where participants communicate. One room per phone call in this project.
Dispatch Rule	A LiveKit configuration that decides which agent to assign when a new call arrives.
Prewarm	Loading heavy resources (like the VAD model) once at startup instead of per-call, saving ~2 seconds per call.

LiveKit AI Phone Receptionist

Property	Value
Difficulty	Advanced
Time	~4-5 days
Code Size	~1,200 LOC
Prerequisites	Production Voice Agent Platform, Tool Calling Agent

TL;DR

Core Terms

Term	Full Name	Plain English
SIP	Session Initiation Protocol	The "HTTP of phone calls." A signaling protocol that sets up, modifies, and tears down voice calls over the internet. When you dial a number, SIP messages negotiate the connection before any audio flows.
PSTN	Public Switched Telephone Network	The global network of copper wires, fiber, and switches that carries traditional phone calls. When someone dials your business from a landline or mobile, it travels through the PSTN.
SIP Trunk	SIP Trunking Service	A bridge between the internet and the PSTN. Providers like Twilio or Telnyx give you a phone number and route calls between the PSTN and your SIP-based application. Think of it as a "phone line as a service."
WebRTC	Web Real-Time Communication	A browser-native protocol for real-time audio/video. Unlike raw WebSockets (which just move bytes), WebRTC handles echo cancellation, jitter buffers, codec negotiation, and NAT traversal automatically.
NAT	Network Address Translation	Your router shares one public IP among many devices. NAT makes direct peer-to-peer connections difficult because external callers can't reach your internal IP. WebRTC solves this with ICE/TURN/STUN.
TURN	Traversal Using Relays around NAT	A relay server that forwards media when direct connections fail. About 10-30% of WebRTC calls need a TURN server. LiveKit Cloud includes this; self-hosting means running your own.
STUN	Session Traversal Utilities for NAT	A lightweight server that tells a device its public IP address. Used during connection setup so peers know how to reach each other. Unlike TURN, STUN doesn't relay media.
ICE	Interactive Connectivity Establishment	The process of finding the best connection path between two peers. ICE tries direct connections first, then STUN-assisted connections, then TURN relays as a last resort.
SFU	Selective Forwarding Unit	A media server that receives audio/video streams and forwards them to other participants without mixing or transcoding. LiveKit Server is an SFU — more scalable than mixing servers.
VAD	Voice Activity Detection	Detects when someone is speaking vs. silence. Critical for knowing when the caller has finished talking so the AI can respond. LiveKit uses Silero VAD (a small neural network).
DTMF	Dual-Tone Multi-Frequency	The tones generated when you press phone keypad buttons. Each button produces two simultaneous tones. Used for "Press 1 for sales" menus and PIN entry.
IVR	Interactive Voice Response	The automated phone menus you hear when calling a business — "Press 1 for billing, Press 2 for support." Traditional IVRs use pre-recorded audio and DTMF input. This project replaces IVRs with conversational AI.
E.164	ITU-T E.164 Standard	The international phone number format: `+` followed by country code and number. Example: `+14155551234`. SIP trunks require E.164 format for routing calls correctly.
RTC	Real-Time Communication	Umbrella term for any technology that enables live, low-latency audio/video communication between participants.

How a phone call reaches your AI agent:

Call Flow: PSTN → AI Agent

Caller dials your business number+1 (415) 555-1234 — travels through PSTN (cell tower → carrier → fiber)

SIP Trunk (Twilio)Twilio receives the call and converts it to SIP protocol → sends SIP INVITE to LiveKit

LiveKit Server (SFU)Creates a Room, dispatches your AI agent, routes audio between caller and agent via WebRTC

Your Agent (Python)Receives audio → STT → LLM reasoning → TTS → audio back to caller. All in real-time.

Why This Project Matters

This project bridges that gap:

What You Built Before	What LiveKit Handles For You
Custom `BargeInDetector` with RMS energy calculation	Silero VAD neural network + turn detection model
Raw WebSocket audio streaming	WebRTC with ICE/TURN/STUN, jitter buffers, echo cancellation
Manual `asyncio.Queue` for audio routing	LiveKit Room with Track publish/subscribe
No real phone integration	SIP trunking — actual phone calls from any phone
Single session, single server	Multi-room, horizontally scalable SFU

Business case: A human receptionist costs $3,000-4,000/month and handles ~40 calls/day. This system handles hundreds of concurrent calls at ~$0.05-0.10/minute, 24/7, in multiple languages.

What You'll Learn

LiveKit Agents framework — AgentSession, Agent, @function_tool
SIP trunking — connecting AI agents to real phone networks via Twilio
WebRTC fundamentals — how LiveKit manages real-time audio transport
Warm handoff — transferring callers to human agents in the same room
Multi-agent patterns — receptionist → scheduling specialist handoff
Function calling in voice — tools that execute while the caller waits
Telephony-specific audio — noise cancellation, DTMF handling, E.164 routing

Tech Stack

Library	Version	Purpose
livekit-agents	1.x	Agent framework — session, lifecycle, tools
livekit-plugins-deepgram	1.x	STT (speech-to-text) via Deepgram Nova-3
livekit-plugins-openai	1.x	LLM reasoning via GPT-4.1-mini
livekit-plugins-cartesia	1.x	TTS (text-to-speech) via Cartesia Sonic-3
livekit-plugins-silero	1.x	VAD (voice activity detection)
livekit-plugins-noise-cancellation	1.x	Background voice cancellation for telephony
livekit-plugins-turn-detector	1.x	Multilingual turn detection model
livekit-api	1.x	LiveKit server API (SIP, rooms, participants)
ChromaDB	0.5.0	Vector store for FAQ knowledge base
sentence-transformers	3.3.0	Embedding model for FAQ retrieval
FastAPI	0.115.0	Dashboard API and webhook receiver
Pydantic Settings	2.6.0	Typed configuration

LiveKit vs DIY: What Changes?

Architecture Comparison

DIY (Production Voice Agent project)

You wrote: StreamingGateway, BargeInDetector, ASRClient, TTSClient, asyncio.Queue routing, WebSocket handler. Total: ~1,400 LOC of infrastructure code + business logic.

LiveKit Agents (this project)

Recommended

What you no longer write manually:

Component	DIY Project	LiveKit Project
Audio transport	`StreamingGateway` + `asyncio.Queue` (120 LOC)	LiveKit Room (0 LOC)
Barge-in detection	`BargeInDetector` with RMS energy (80 LOC)	Silero VAD plugin (0 LOC)
ASR client	`DeepgramASRClient` with WebSocket (90 LOC)	`stt="deepgram/nova-3:multi"` (1 line)
TTS client	`OpenAITTSClient` with streaming (60 LOC)	`tts="cartesia/sonic-3:..."` (1 line)
Turn detection	Manual `speech_final` + `utterance_end_ms`	`MultilingualModel()` (1 line)
Phone connectivity	Not supported	SIP trunk configuration

What you focus on instead: agent personality, function tools, business logic, handoff flows, knowledge base, appointment scheduling.

High-Level Architecture

AI Phone Receptionist System

Phone Network (PSTN)

Caller dials +1 (415) 555-1234

Cell tower / landline → carrier

SIP Trunk (Twilio)

Receives PSTN call

Converts to SIP

Routes to LiveKit via SIP INVITE

LiveKit Server (SFU)

Creates Room per call

Dispatches AI agent

Manages WebRTC tracks

Routes audio between participants

AI Agent (Python)

ReceptionistAgent — greets, routes, answers FAQs

SchedulingAgent — handles appointment booking

@function_tool — check_availability, book_appointment, transfer_to_human

Backend Services

ChromaDB — FAQ knowledge base

SQLite — appointment storage

FastAPI — dashboard + webhooks

Call Lifecycle — from dial to hangup:

Complete Call Lifecycle

1. Inbound CallCaller dials → PSTN → Twilio SIP trunk → LiveKit creates Room → dispatches ReceptionistAgent

2. GreetingAgent on_enter() → 'Thank you for calling. How can I help?' → Cartesia TTS → audio to caller

3. Conversation LoopCaller speaks → Deepgram STT → GPT-4.1-mini reasoning → function tools if needed → TTS response

4. Tool Executioncheck_availability() → queries SQLite. book_appointment() → creates record. lookup_faq() → ChromaDB vector search

5. Handoff (if needed)transfer_to_human() → LiveKit SIP API adds human agent to Room → caller + human in same room → AI steps back

6. Call EndCaller hangs up → Room closed → webhook fires → call record saved

Project Structure

livekit-receptionist/
├── agents/
│   ├── receptionist.py         # Main receptionist agent with tools
│   ├── scheduling.py           # Scheduling specialist agent
│   └── handoff.py              # Human handoff logic
├── services/
│   ├── appointments.py         # Appointment CRUD (SQLite)
│   ├── knowledge_base.py       # FAQ retrieval (ChromaDB)
│   └── call_logger.py          # Call record storage
├── server.py                   # LiveKit AgentServer entry point
├── config.py                   # Pydantic Settings
├── models.py                   # Shared data models
├── dashboard.py                # FastAPI dashboard + webhooks
├── tests/
│   ├── test_receptionist.py
│   ├── test_appointments.py
│   └── test_knowledge_base.py
├── data/
│   └── faq_documents.json      # FAQ seed data
├── .env.local
├── Dockerfile
├── docker-compose.yml
└── requirements.txt

Implementation

Step 0: Setup and Dependencies

requirements.txt

livekit-agents[codecs]~=1.0
livekit-plugins-deepgram~=1.0
livekit-plugins-openai~=1.0
livekit-plugins-cartesia~=1.0
livekit-plugins-silero~=1.0
livekit-plugins-noise-cancellation~=1.0
livekit-plugins-turn-detector~=1.0
livekit-api~=1.0
chromadb==0.5.0
sentence-transformers==3.3.0
fastapi==0.115.0
uvicorn==0.32.0
pydantic-settings==2.6.0
python-dotenv==1.0.0
pytest==8.3.0
pytest-asyncio==0.24.0

.env.local

# LiveKit
LIVEKIT_URL=ws://localhost:7880
LIVEKIT_API_KEY=your-api-key
LIVEKIT_API_SECRET=your-api-secret

# SIP Trunk (Twilio)
SIP_TRUNK_ID=ST_xxxx

# AI Providers
DEEPGRAM_API_KEY=your-deepgram-key
OPENAI_API_KEY=sk-your-openai-key
CARTESIA_API_KEY=your-cartesia-key

# Agent Settings
HUMAN_AGENT_PHONE=+14155559999
BUSINESS_NAME=Sunrise Medical Clinic
BUSINESS_HOURS=Monday-Friday 9AM-5PM

# Knowledge Base
CHROMA_COLLECTION=receptionist_faq
EMBEDDING_MODEL=all-MiniLM-L6-v2
CONFIDENCE_THRESHOLD=0.6

Setting up the LiveKit Server and SIP Trunk:

LiveKit requires three pieces of infrastructure: the LiveKit server, a SIP trunk provider, and SIP dispatch rules. Here is how they connect:

Infrastructure Setup

1. LiveKit ServerRun locally with Docker: docker run -p 7880:7880 livekit/livekit-server. Or use LiveKit Cloud (free tier: 5,000 participant-minutes/month)

2. Twilio SIP TrunkCreate a Twilio account → buy a phone number → create a SIP trunk → point it to your LiveKit server's SIP URI

3. SIP Dispatch RulesTell LiveKit which agent to dispatch when a SIP call arrives. Created via LiveKit CLI (lk)

Create an inbound SIP trunk using the LiveKit CLI:

# Install LiveKit CLI
brew install livekit-cli

# Create inbound SIP trunk (receives calls from Twilio)
lk sip inbound create \
  --request '{
    "trunk": {
      "name": "Twilio Inbound",
      "numbers": ["+14155551234"],
      "krisp_enabled": true
    }
  }'

Create a SIP dispatch rule that routes inbound calls to your agent:

lk sip dispatch create \
  --request '{
    "dispatch_rule": {
      "rule": {
        "dispatchRuleIndividual": {
          "roomPrefix": "call-"
        }
      },
      "roomConfig": {
        "agents": [{
          "agentName": "receptionist"
        }]
      }
    }
  }'

Beginner Breakdown — Infrastructure Setup:

Concept	What It Means
`livekit/livekit-server` Docker image	The open-source SFU server. Manages rooms, participants, and audio tracks. Runs on a single port (7880).
SIP trunk `numbers`	The phone numbers that Twilio will route to LiveKit. When someone calls +14155551234, Twilio sends a SIP INVITE to LiveKit.
`krisp_enabled: true`	Enables Krisp AI noise cancellation on the SIP trunk. Filters out background noise from the caller's environment before your agent hears it.
`dispatchRuleIndividual`	Each inbound call gets its own Room (named `call-{random}`). This means 100 concurrent calls = 100 separate rooms, each with their own agent instance.
`agentName: "receptionist"`	When a call arrives, LiveKit looks for a running agent registered with this name and dispatches it into the new room.

Step 1: Configuration

config.py

from functools import lru_cache

from pydantic_settings import BaseSettings
from pydantic import Field


class Settings(BaseSettings):
    """Receptionist agent configuration from environment."""

    # LiveKit
    livekit_url: str = Field(..., alias="LIVEKIT_URL")
    livekit_api_key: str = Field(..., alias="LIVEKIT_API_KEY")
    livekit_api_secret: str = Field(..., alias="LIVEKIT_API_SECRET")

    # SIP
    sip_trunk_id: str = Field("", alias="SIP_TRUNK_ID")
    human_agent_phone: str = Field("", alias="HUMAN_AGENT_PHONE")

    # AI Providers
    openai_api_key: str = Field(..., alias="OPENAI_API_KEY")
    deepgram_api_key: str = Field(..., alias="DEEPGRAM_API_KEY")
    cartesia_api_key: str = Field("", alias="CARTESIA_API_KEY")

    # Business
    business_name: str = Field("Sunrise Medical Clinic", alias="BUSINESS_NAME")
    business_hours: str = Field("Monday-Friday 9AM-5PM", alias="BUSINESS_HOURS")

    # Knowledge Base
    chroma_collection: str = Field("receptionist_faq", alias="CHROMA_COLLECTION")
    embedding_model: str = Field("all-MiniLM-L6-v2", alias="EMBEDDING_MODEL")
    confidence_threshold: float = Field(0.6, alias="CONFIDENCE_THRESHOLD")

    model_config = {"env_file": ".env.local", "extra": "ignore"}


@lru_cache
def get_settings() -> Settings:
    return Settings()

Step 2: Data Models

models.py

from dataclasses import dataclass, field
from datetime import datetime
from enum import Enum
from typing import Optional


class CallOutcome(str, Enum):
    """How the call ended."""
    COMPLETED = "completed"         # Agent handled it fully
    TRANSFERRED = "transferred"     # Handed off to human
    ABANDONED = "abandoned"         # Caller hung up early
    ERROR = "error"                 # System failure


class AppointmentStatus(str, Enum):
    """Appointment lifecycle states."""
    CONFIRMED = "confirmed"
    CANCELLED = "cancelled"
    RESCHEDULED = "rescheduled"


@dataclass
class Appointment:
    """A scheduled appointment."""
    id: str
    patient_name: str
    phone: str
    date: str              # YYYY-MM-DD
    time: str              # HH:MM
    reason: str
    provider: str = ""     # Doctor/specialist name
    status: AppointmentStatus = AppointmentStatus.CONFIRMED
    created_at: str = ""

    def __post_init__(self):
        if not self.created_at:
            self.created_at = datetime.now().isoformat()


@dataclass
class CallRecord:
    """Record of a completed call."""
    call_id: str
    room_name: str
    caller_phone: str
    duration_seconds: int = 0
    outcome: CallOutcome = CallOutcome.COMPLETED
    tools_used: list[str] = field(default_factory=list)
    appointment_id: Optional[str] = None
    transcript_summary: str = ""
    started_at: str = ""
    ended_at: str = ""

Beginner Breakdown — Data Models:

Python Concept	What It Means
`class CallOutcome(str, Enum)`	A fixed set of values. A call can ONLY end as `completed`, `transferred`, `abandoned`, or `error`. Using an Enum prevents typos — `CallOutcome.TRANSFERED` would raise an error at import time.
`@dataclass`	Auto-generates `__init__`, `__repr__`, and `__eq__` from the field declarations. Less boilerplate than writing constructors manually.
`field(default_factory=list)`	Creates a new empty list for each instance. Never write `tools_used: list = []` — all instances would share the same list object (a classic Python gotcha).
`__post_init__`	Runs after `__init__`. Here it sets `created_at` to the current time if not provided. Useful for computed defaults.

Step 3: FAQ Knowledge Base

The knowledge base answers common caller questions — office hours, insurance, directions, policies — so the LLM doesn't hallucinate answers to factual questions.

services/knowledge_base.py

import json
import logging
from pathlib import Path

import chromadb
from chromadb.utils.embedding_functions import (
    SentenceTransformerEmbeddingFunction,
)

from config import get_settings

logger = logging.getLogger(__name__)


class KnowledgeBase:
    """ChromaDB-backed FAQ retrieval with confidence scoring."""

    def __init__(self):
        settings = get_settings()
        self._client = chromadb.PersistentClient(path="./chroma_data")
        self._embedding_fn = SentenceTransformerEmbeddingFunction(
            model_name=settings.embedding_model
        )
        self._collection = self._client.get_or_create_collection(
            name=settings.chroma_collection,
            embedding_function=self._embedding_fn,
            metadata={"hnsw:space": "cosine"},
        )
        self._threshold = settings.confidence_threshold

    def load_faqs(self, path: str = "data/faq_documents.json") -> int:
        """Seed the knowledge base from a JSON file.

        Expected format: [{"id": "...", "question": "...", "answer": "..."}]
        """
        data = json.loads(Path(path).read_text())
        if not data:
            return 0

        self._collection.upsert(
            ids=[d["id"] for d in data],
            documents=[
                f"Q: {d['question']}\nA: {d['answer']}" for d in data
            ],
            metadatas=[{"question": d["question"]} for d in data],
        )
        logger.info("Loaded %d FAQ documents", len(data))
        return len(data)

    def search(self, query: str, n_results: int = 3) -> dict:
        """Search FAQs and return the best answer with confidence.

        Returns:
            {"answer": str, "confidence": float, "found": bool}
        """
        results = self._collection.query(
            query_texts=[query],
            n_results=n_results,
        )

        if not results["documents"] or not results["documents"][0]:
            return {"answer": "", "confidence": 0.0, "found": False}

        documents = results["documents"][0]
        distances = results["distances"][0] if results["distances"] else []

        # Cosine distance → similarity (0=opposite, 1=identical)
        similarities = [1 - d for d in distances] if distances else []
        best_score = max(similarities) if similarities else 0.0

        if best_score < self._threshold:
            return {
                "answer": "",
                "confidence": best_score,
                "found": False,
            }

        # Return the highest-scoring document
        best_idx = similarities.index(best_score)
        return {
            "answer": documents[best_idx],
            "confidence": best_score,
            "found": True,
        }

data/faq_documents.json

[
  {
    "id": "hours",
    "question": "What are your office hours?",
    "answer": "We are open Monday through Friday, 9 AM to 5 PM. We are closed on weekends and major holidays."
  },
  {
    "id": "insurance",
    "question": "What insurance do you accept?",
    "answer": "We accept most major insurance plans including Blue Cross Blue Shield, Aetna, Cigna, UnitedHealthcare, and Medicare. Please call ahead to verify your specific plan."
  },
  {
    "id": "location",
    "question": "Where are you located?",
    "answer": "We are located at 456 Oak Avenue, Suite 200, San Francisco, CA 94102. Free parking is available in the building garage."
  },
  {
    "id": "new-patient",
    "question": "How do I become a new patient?",
    "answer": "New patients can schedule an initial consultation by calling us or booking online. Please bring your insurance card, photo ID, and any relevant medical records to your first visit."
  },
  {
    "id": "cancellation",
    "question": "What is your cancellation policy?",
    "answer": "We require 24 hours notice for cancellations. Late cancellations or no-shows may incur a 50 dollar fee. We understand emergencies happen and handle those on a case-by-case basis."
  },
  {
    "id": "urgent",
    "question": "What should I do in an emergency?",
    "answer": "If you are experiencing a medical emergency, please call 911 immediately. For urgent but non-emergency concerns during office hours, call us and we will try to see you the same day."
  },
  {
    "id": "telehealth",
    "question": "Do you offer telehealth appointments?",
    "answer": "Yes, we offer telehealth appointments for follow-up visits and certain types of consultations. Ask when scheduling if your visit qualifies for telehealth."
  },
  {
    "id": "referral",
    "question": "Do I need a referral?",
    "answer": "Some insurance plans require a referral from your primary care physician. Check with your insurance provider before scheduling. We can help verify if you are unsure."
  }
]

Understanding the FAQ Search:

FAQ Lookup Flow

Caller asks'Do you take Blue Cross insurance?'

Vector searchQuery embedded → compared against FAQ embeddings → best match: insurance FAQ (similarity: 0.87)

Confidence check0.87 > 0.6 threshold → answer found

Agent speaks'We accept most major plans including Blue Cross Blue Shield, Aetna...'

Why use a knowledge base instead of putting FAQs in the system prompt?

FAQ in Prompt vs Knowledge Base

FAQs in system prompt

Works for 5-10 FAQs. But 50+ FAQs consume thousands of tokens per turn, increasing latency and cost. Every LLM call pays for all FAQs even when the question is about hours.

Knowledge base (ChromaDB)

Recommended

Retrieves only the 1-3 relevant FAQs per question. Scales to thousands of documents. Costs nothing when not queried. Returns a confidence score so the agent knows when to say "I don't know."

Step 4: Appointment Service

services/appointments.py

import logging
import sqlite3
import uuid
from contextlib import contextmanager
from datetime import datetime, timedelta

from models import Appointment, AppointmentStatus

logger = logging.getLogger(__name__)

DB_PATH = "data/appointments.db"

# Available time slots (30-minute intervals, 9 AM to 4:30 PM)
SLOT_START_HOUR = 9
SLOT_END_HOUR = 17
SLOT_DURATION_MINUTES = 30


def _init_db():
    """Create the appointments table if it does not exist."""
    with _get_conn() as conn:
        conn.execute("""
            CREATE TABLE IF NOT EXISTS appointments (
                id TEXT PRIMARY KEY,
                patient_name TEXT NOT NULL,
                phone TEXT NOT NULL,
                date TEXT NOT NULL,
                time TEXT NOT NULL,
                reason TEXT NOT NULL,
                provider TEXT DEFAULT '',
                status TEXT DEFAULT 'confirmed',
                created_at TEXT NOT NULL
            )
        """)


@contextmanager
def _get_conn():
    conn = sqlite3.connect(DB_PATH)
    conn.row_factory = sqlite3.Row
    try:
        yield conn
        conn.commit()
    finally:
        conn.close()


def get_available_slots(date: str) -> list[str]:
    """Return available 30-minute slots for a given date.

    Args:
        date: Date in YYYY-MM-DD format.

    Returns:
        List of available time strings like ["09:00", "09:30", "10:00", ...]
    """
    # Generate all possible slots
    all_slots = []
    current = datetime.strptime(f"{date} {SLOT_START_HOUR:02d}:00", "%Y-%m-%d %H:%M")
    end = datetime.strptime(f"{date} {SLOT_END_HOUR:02d}:00", "%Y-%m-%d %H:%M")

    while current < end:
        all_slots.append(current.strftime("%H:%M"))
        current += timedelta(minutes=SLOT_DURATION_MINUTES)

    # Remove booked slots
    with _get_conn() as conn:
        rows = conn.execute(
            "SELECT time FROM appointments WHERE date = ? AND status = ?",
            (date, AppointmentStatus.CONFIRMED.value),
        ).fetchall()

    booked = {row["time"] for row in rows}
    available = [s for s in all_slots if s not in booked]

    return available


def book_appointment(
    patient_name: str,
    phone: str,
    date: str,
    time: str,
    reason: str,
    provider: str = "",
) -> Appointment:
    """Book a new appointment.

    Returns:
        The created Appointment object.

    Raises:
        ValueError: If the slot is already booked.
    """
    available = get_available_slots(date)
    if time not in available:
        raise ValueError(f"Time slot {time} on {date} is not available")

    appointment = Appointment(
        id=str(uuid.uuid4())[:8],
        patient_name=patient_name,
        phone=phone,
        date=date,
        time=time,
        reason=reason,
        provider=provider,
    )

    with _get_conn() as conn:
        conn.execute(
            """INSERT INTO appointments
               (id, patient_name, phone, date, time, reason, provider, status, created_at)
               VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)""",
            (
                appointment.id,
                appointment.patient_name,
                appointment.phone,
                appointment.date,
                appointment.time,
                appointment.reason,
                appointment.provider,
                appointment.status.value,
                appointment.created_at,
            ),
        )

    logger.info("Booked appointment %s for %s on %s at %s",
                appointment.id, patient_name, date, time)
    return appointment


def cancel_appointment(appointment_id: str) -> bool:
    """Cancel an existing appointment."""
    with _get_conn() as conn:
        result = conn.execute(
            "UPDATE appointments SET status = ? WHERE id = ? AND status = ?",
            (AppointmentStatus.CANCELLED.value, appointment_id,
             AppointmentStatus.CONFIRMED.value),
        )
    return result.rowcount > 0


# Initialize database on import
_init_db()

Beginner Breakdown — Appointment Service:

Python Concept	What It Means
`@contextmanager`	Turns a generator function into a `with` statement. Code before `yield` = setup (open connection), code after `yield` = cleanup (close connection). Ensures the database connection always closes, even if an error occurs.
`conn.row_factory = sqlite3.Row`	Makes query results accessible by column name (`row["time"]`) instead of index (`row[0]`). Much more readable.
`str(uuid.uuid4())[:8]`	Generate a random 8-character ID like `"a1b2c3d4"`. Short enough to read over the phone — "Your confirmation number is alpha-one-bravo-two."
`timedelta(minutes=30)`	A duration of 30 minutes. Adding it to a datetime gives you the next time slot. `9:00 + 30min = 9:30`.
`{row["time"] for row in rows}`	A set comprehension — creates a set of booked times for O(1) lookup. `"09:30" in booked` is instant vs scanning a list.

Step 5: Call Logger

services/call_logger.py

import json
import logging
import sqlite3
from contextlib import contextmanager

from models import CallRecord

logger = logging.getLogger(__name__)

DB_PATH = "data/calls.db"


def _init_db():
    with _get_conn() as conn:
        conn.execute("""
            CREATE TABLE IF NOT EXISTS call_records (
                call_id TEXT PRIMARY KEY,
                room_name TEXT NOT NULL,
                caller_phone TEXT DEFAULT '',
                duration_seconds INTEGER DEFAULT 0,
                outcome TEXT DEFAULT 'completed',
                tools_used TEXT DEFAULT '[]',
                appointment_id TEXT DEFAULT '',
                transcript_summary TEXT DEFAULT '',
                started_at TEXT NOT NULL,
                ended_at TEXT DEFAULT ''
            )
        """)


@contextmanager
def _get_conn():
    conn = sqlite3.connect(DB_PATH)
    conn.row_factory = sqlite3.Row
    try:
        yield conn
        conn.commit()
    finally:
        conn.close()


def save_call(record: CallRecord) -> None:
    """Save a completed call record."""
    with _get_conn() as conn:
        conn.execute(
            """INSERT OR REPLACE INTO call_records
               (call_id, room_name, caller_phone, duration_seconds,
                outcome, tools_used, appointment_id,
                transcript_summary, started_at, ended_at)
               VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
            (
                record.call_id,
                record.room_name,
                record.caller_phone,
                record.duration_seconds,
                record.outcome.value,
                json.dumps(record.tools_used),
                record.appointment_id or "",
                record.transcript_summary,
                record.started_at,
                record.ended_at,
            ),
        )
    logger.info("Saved call record: %s (outcome=%s)", record.call_id, record.outcome.value)


def get_recent_calls(limit: int = 20) -> list[dict]:
    """Retrieve recent call records for the dashboard."""
    with _get_conn() as conn:
        rows = conn.execute(
            "SELECT * FROM call_records ORDER BY started_at DESC LIMIT ?",
            (limit,),
        ).fetchall()

    return [dict(row) for row in rows]


_init_db()

Step 6: Receptionist Agent

agents/receptionist.py

import logging
from typing import Any

from livekit.agents import Agent, RunContext, function_tool

from config import get_settings
from services.appointments import get_available_slots, book_appointment
from services.knowledge_base import KnowledgeBase

logger = logging.getLogger(__name__)

# Initialize knowledge base once
_kb = KnowledgeBase()
_kb.load_faqs()


class ReceptionistAgent(Agent):
    """AI phone receptionist for a medical clinic.

    Handles: greetings, FAQ answers, appointment scheduling,
    and transfers to human agents when needed.
    """

    def __init__(self, job_context=None) -> None:
        settings = get_settings()
        self.job_context = job_context

        super().__init__(
            instructions=f"""You are the phone receptionist for {settings.business_name}.
You answer incoming phone calls with warmth and professionalism.

RULES:
1. Be concise. Callers are LISTENING, not reading. Keep responses to 1-2 sentences.
2. Never spell out URLs, emails, or long numbers. Say "I can text you that information."
3. For factual questions (hours, insurance, location), ALWAYS use the lookup_faq tool.
   Never guess — if the tool returns no result, say "I don't have that information handy,
   let me transfer you to someone who can help."
4. For appointment scheduling, use check_availability first, then confirm with the caller
   before calling book_appointment.
5. If the caller asks for a human, a doctor, or says "transfer me," use transfer_to_human
   immediately. Do not try to convince them to stay.
6. Speak naturally. Avoid bullet points, markdown, or any text formatting.
7. If the caller sounds upset or frustrated, acknowledge their feelings before solving
   the problem.

BUSINESS INFO:
- Name: {settings.business_name}
- Hours: {settings.business_hours}
""",
        )

    async def on_enter(self) -> None:
        """Called when this agent becomes active. Greet the caller."""
        settings = get_settings()
        await self.session.generate_reply(
            instructions=f"Greet the caller warmly. Say: 'Thank you for calling "
            f"{settings.business_name}, how can I help you today?'"
        )

    @function_tool()
    async def lookup_faq(
        self,
        context: RunContext,
        question: str,
    ) -> dict[str, Any]:
        """Search the knowledge base for answers to common questions
        about office hours, insurance, location, policies, and services.

        Args:
            question: The caller's question to look up.
        """
        result = _kb.search(question)

        if result["found"]:
            logger.info("FAQ hit: %.2f confidence for '%s'",
                       result["confidence"], question[:50])
            return {
                "answer": result["answer"],
                "confidence": result["confidence"],
            }

        logger.info("FAQ miss: %.2f confidence for '%s'",
                    result["confidence"], question[:50])
        return {
            "answer": "No matching FAQ found.",
            "confidence": result["confidence"],
            "suggestion": "Offer to transfer to a staff member who can help.",
        }

    @function_tool()
    async def check_availability(
        self,
        context: RunContext,
        date: str,
    ) -> dict[str, Any]:
        """Check available appointment slots for a specific date.

        Args:
            date: The date to check in YYYY-MM-DD format.
        """
        slots = get_available_slots(date)

        if not slots:
            return {
                "available": False,
                "message": f"No slots available on {date}.",
                "suggestion": "Try the next business day.",
            }

        # Group slots for easier reading over the phone
        morning = [s for s in slots if int(s.split(":")[0]) < 12]
        afternoon = [s for s in slots if int(s.split(":")[0]) >= 12]

        return {
            "available": True,
            "date": date,
            "morning_slots": morning,
            "afternoon_slots": afternoon,
            "total": len(slots),
        }

    @function_tool()
    async def book_appointment(
        self,
        context: RunContext,
        patient_name: str,
        phone: str,
        date: str,
        time: str,
        reason: str,
    ) -> dict[str, Any]:
        """Book an appointment after confirming details with the caller.

        Args:
            patient_name: The patient's full name.
            phone: The patient's phone number for confirmation.
            date: Appointment date in YYYY-MM-DD format.
            time: Appointment time in HH:MM format.
            reason: Brief reason for the visit.
        """
        try:
            appointment = book_appointment(
                patient_name=patient_name,
                phone=phone,
                date=date,
                time=time,
                reason=reason,
            )
            return {
                "success": True,
                "confirmation_id": appointment.id,
                "date": date,
                "time": time,
                "message": f"Appointment confirmed for {patient_name}.",
            }
        except ValueError as exc:
            return {
                "success": False,
                "error": str(exc),
                "suggestion": "Check availability for a different time.",
            }

    @function_tool()
    async def transfer_to_human(
        self,
        context: RunContext,
        reason: str,
    ) -> dict[str, Any]:
        """Transfer the caller to a human staff member.
        Use this when the caller explicitly asks for a human,
        when you cannot answer their question, or when the situation
        requires human judgment.

        Args:
            reason: Brief reason for the transfer.
        """
        import os
        import uuid
        from livekit.protocol import api

        if not self.job_context:
            await self.session.say(
                "I'm sorry, I'm unable to transfer the call right now. "
                "Please try calling back."
            )
            return {"success": False, "error": "No job context available"}

        phone = os.environ.get("HUMAN_AGENT_PHONE", "")
        if not phone:
            await self.session.say(
                "I'm sorry, no staff members are available for transfer. "
                "Can I take a message instead?"
            )
            return {"success": False, "error": "No human agent phone configured"}

        sip_trunk_id = os.environ.get("SIP_TRUNK_ID", "")
        room_name = self.job_context.room.name

        try:
            # Add human agent to the same Room via SIP
            await self.job_context.api.sip.create_sip_participant(
                api.CreateSIPParticipantRequest(
                    sip_trunk_id=sip_trunk_id,
                    sip_call_to=phone,
                    room_name=room_name,
                    participant_identity=f"human_{uuid.uuid4().hex[:8]}",
                    participant_name="Staff Member",
                    krisp_enabled=True,
                )
            )

            await self.session.say(
                "I'm transferring you to a staff member now. "
                "Please hold for just a moment."
            )
            return {"success": True, "reason": reason}

        except Exception as exc:
            logger.error("Transfer failed: %s", exc)
            await self.session.say(
                "I'm sorry, I couldn't reach a staff member right now. "
                "Can I take your name and number so someone can call you back?"
            )
            return {"success": False, "error": str(exc)}

Understanding the Transfer Flow:

Warm Handoff: Caller → Human Agent

Caller says 'let me speak to someone'LLM decides to call transfer_to_human(reason='caller request')

create_sip_participant()LiveKit dials the human agent's phone via SIP trunk. Human's phone rings.

Human answersHuman joins the SAME Room as the caller. Both audio tracks routed by LiveKit SFU.

Agent says 'transferring you now'AI can stay (assist mode) or leave the room. Caller + human talk directly.

Why warm handoff beats cold transfer:

Cold Transfer vs Warm Handoff

Cold transfer (traditional IVR)

Caller is disconnected → reconnected to human → must re-explain their problem from scratch. "I already told the robot my name and appointment details!"

Warm handoff (LiveKit Room)

Recommended

Human joins the existing conversation. The AI can brief the human: "This caller needs to reschedule their Thursday appointment." No information lost. Caller feels respected.

Beginner Breakdown — Receptionist Agent:

Python Concept	What It Means
`class ReceptionistAgent(Agent)`	Inherits from LiveKit's `Agent` base class. You get session management, LLM integration, and tool execution for free.
`super().__init__(instructions=...)`	Passes the system prompt to the base class. LiveKit sends this as the `system` message in every LLM call.
`async def on_enter(self)`	Lifecycle hook — called when this agent becomes active in the session. Perfect for the initial greeting.
`@function_tool()`	Decorator that registers a method as a callable tool for the LLM. The LLM sees the function name, docstring, and parameter types.
`RunContext`	Passed to every tool call. Contains the current session, room, and agent state. Useful for accessing conversation context.
`self.session.say("...")`	Speaks text immediately to the caller via TTS. Unlike `generate_reply()`, this doesn't involve the LLM — it's a direct TTS utterance.
`self.session.generate_reply(instructions=...)`	Asks the LLM to generate a response with additional instructions. The LLM considers conversation history + these instructions.

Step 7: Agent Server Entry Point

This is where LiveKit, SIP, and your agent come together. The AgentServer listens for dispatched sessions and creates the voice pipeline for each call.

server.py

import logging

from dotenv import load_dotenv

from livekit import agents, rtc
from livekit.agents import AgentServer, AgentSession, JobContext, JobProcess, room_io
from livekit.plugins import noise_cancellation, silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel

from agents.receptionist import ReceptionistAgent

load_dotenv(".env.local")

logger = logging.getLogger(__name__)
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s %(name)s %(levelname)s %(message)s",
)

server = AgentServer()


def prewarm(proc: JobProcess):
    """Pre-load heavy models once per worker process.

    VAD (Voice Activity Detection) loads a neural network from disk.
    Loading it once here and reusing across sessions saves ~2s per call.
    """
    proc.userdata["vad"] = silero.VAD.load()


server.setup_fnc = prewarm


@server.rtc_session(agent_name="receptionist")
async def handle_call(ctx: JobContext):
    """Handle one inbound phone call.

    LiveKit dispatches this function for each incoming SIP call
    that matches the dispatch rule with agentName="receptionist".
    """
    # Build the voice pipeline
    session = AgentSession(
        stt="deepgram/nova-3:multi",
        llm="openai/gpt-4.1-mini",
        tts="cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
        vad=ctx.proc.userdata["vad"],
        turn_detection=MultilingualModel(),
    )

    # Create the receptionist agent with job context (needed for SIP transfers)
    agent = ReceptionistAgent(job_context=ctx)

    # Start the session with telephony-optimized audio
    await session.start(
        room=ctx.room,
        agent=agent,
        room_options=room_io.RoomOptions(
            audio_input=room_io.AudioInputOptions(
                noise_cancellation=_get_noise_cancellation,
            ),
        ),
    )

    # Connect to the room (makes the agent a participant)
    await ctx.connect()

    logger.info("Receptionist agent started in room %s", ctx.room.name)


def _get_noise_cancellation(params):
    """Select noise cancellation mode based on caller type.

    SIP callers (phone calls) get telephony-optimized cancellation
    that handles PSTN background noise and echo. Browser callers
    get standard background voice cancellation.
    """
    if params.participant.kind == rtc.ParticipantKind.PARTICIPANT_KIND_SIP:
        return noise_cancellation.BVCTelephony()
    return noise_cancellation.BVC()


if __name__ == "__main__":
    agents.cli.run_app(server)

Beginner Breakdown — Agent Server:

What Happens When a Call Arrives

SIP Call Arrives

Caller dials +14155551234 → Twilio → LiveKit Server

LiveKit Dispatch

Creates Room 'call-abc123'

Finds agent_name='receptionist'

Calls handle_call(ctx)

AgentSession Created

STT: Deepgram Nova-3 (multilingual)

LLM: GPT-4.1-mini

TTS: Cartesia Sonic-3

VAD: Silero (pre-loaded)

Turn Detection: Multilingual Model

Agent Active

on_enter() → greeting

Caller speaks → STT → LLM → tools → TTS

Loop until hangup or transfer

Python Concept	What It Means
`AgentServer()`	LiveKit's application container. Listens for dispatched jobs from the LiveKit server. Similar to FastAPI's `app = FastAPI()`.
`@server.rtc_session(agent_name="receptionist")`	Registers this function to handle sessions dispatched to agent name "receptionist". The dispatch rule in Step 0 routes SIP calls here.
`JobContext`	Contains: `ctx.room` (the LiveKit Room), `ctx.proc` (the worker process with shared userdata), `ctx.api` (LiveKit server API for SIP operations).
`AgentSession(stt=..., llm=..., tts=...)`	The voice pipeline. LiveKit connects these in sequence: audio → STT → LLM → TTS → audio. All streaming, all real-time.
`session.start(room=ctx.room, agent=agent)`	Connects the pipeline to the room and activates the agent. From this point, the agent can hear the caller and speak.
`ctx.connect()`	Makes the agent a visible participant in the room. Required for audio to flow.
`agents.cli.run_app(server)`	Starts the agent worker process. Connects to the LiveKit server and waits for dispatched jobs.

How the voice pipeline processes each turn:

Single Turn in AgentSession

Caller speaksAudio frames flow from Room → VAD detects speech start

STT (Deepgram Nova-3)Audio → text transcript. Turn detection model decides when the caller is done speaking.

LLM (GPT-4.1-mini)Transcript + conversation history + tool schemas → response (text or tool call)

Tool execution (if needed)@function_tool methods run → results sent back to LLM → LLM generates final response

TTS (Cartesia Sonic-3)Response text → streaming audio chunks → played to caller through the Room

Step 8: Dashboard API

A simple FastAPI server for viewing call records and managing the system. In production, this would power an admin dashboard.

dashboard.py

import logging
from datetime import datetime

from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware

from services.appointments import get_available_slots
from services.call_logger import get_recent_calls

logger = logging.getLogger(__name__)

app = FastAPI(title="Receptionist Dashboard", version="1.0.0")

app.add_middleware(
    CORSMiddleware,
    allow_origins=["*"],
    allow_methods=["*"],
    allow_headers=["*"],
)


@app.get("/api/calls")
async def list_calls(limit: int = 20):
    """List recent call records."""
    return {"calls": get_recent_calls(limit)}


@app.get("/api/availability/{date}")
async def check_date(date: str):
    """Check available appointment slots for a date."""
    slots = get_available_slots(date)
    return {"date": date, "slots": slots, "total": len(slots)}


@app.get("/api/stats")
async def stats():
    """Basic call statistics."""
    calls = get_recent_calls(100)

    total = len(calls)
    transferred = sum(1 for c in calls if c.get("outcome") == "transferred")
    completed = sum(1 for c in calls if c.get("outcome") == "completed")

    return {
        "total_calls": total,
        "completed": completed,
        "transferred": transferred,
        "transfer_rate": transferred / total if total > 0 else 0,
        "generated_at": datetime.now().isoformat(),
    }


@app.get("/health")
async def health():
    return {"status": "ok"}

Step 9: Tests

tests/test_receptionist.py

import pytest

from services.knowledge_base import KnowledgeBase


class TestKnowledgeBase:
    def setup_method(self):
        self.kb = KnowledgeBase()
        self.kb.load_faqs()

    def test_finds_hours_question(self):
        result = self.kb.search("What time do you open?")
        assert result["found"] is True
        assert result["confidence"] > 0.6
        assert "9 AM" in result["answer"] or "Monday" in result["answer"]

    def test_finds_insurance_question(self):
        result = self.kb.search("Do you take Blue Cross?")
        assert result["found"] is True
        assert "Blue Cross" in result["answer"]

    def test_returns_not_found_for_irrelevant_query(self):
        result = self.kb.search("What is the capital of France?")
        # Should not match any FAQ with high confidence
        assert result["found"] is False or result["confidence"] < 0.6

    def test_finds_location(self):
        result = self.kb.search("Where is your office?")
        assert result["found"] is True
        assert "Oak Avenue" in result["answer"] or "San Francisco" in result["answer"]

    def test_finds_cancellation_policy(self):
        result = self.kb.search("What if I need to cancel?")
        assert result["found"] is True
        assert "24 hours" in result["answer"]

tests/test_appointments.py

import os
import pytest

# Use a test database
os.environ.setdefault("DB_PATH", ":memory:")

from services.appointments import (
    get_available_slots,
    book_appointment,
    cancel_appointment,
)


class TestAppointmentSlots:
    def test_all_slots_available_on_empty_day(self):
        slots = get_available_slots("2099-01-15")
        assert len(slots) > 0
        assert "09:00" in slots
        assert "16:30" in slots

    def test_slot_format(self):
        slots = get_available_slots("2099-01-15")
        for slot in slots:
            hour, minute = slot.split(":")
            assert 0 <= int(hour) <= 23
            assert int(minute) in (0, 30)


class TestBookAppointment:
    def test_book_and_confirm(self):
        apt = book_appointment(
            patient_name="John Doe",
            phone="+14155551111",
            date="2099-02-01",
            time="10:00",
            reason="Annual checkup",
        )
        assert apt.id is not None
        assert apt.patient_name == "John Doe"
        assert apt.date == "2099-02-01"
        assert apt.time == "10:00"

    def test_double_booking_raises(self):
        book_appointment(
            patient_name="Jane Smith",
            phone="+14155552222",
            date="2099-03-01",
            time="11:00",
            reason="Follow-up",
        )
        with pytest.raises(ValueError, match="not available"):
            book_appointment(
                patient_name="Bob Wilson",
                phone="+14155553333",
                date="2099-03-01",
                time="11:00",
                reason="Consultation",
            )

    def test_slot_removed_after_booking(self):
        book_appointment(
            patient_name="Alice Brown",
            phone="+14155554444",
            date="2099-04-01",
            time="14:00",
            reason="Lab results",
        )
        slots = get_available_slots("2099-04-01")
        assert "14:00" not in slots


class TestCancelAppointment:
    def test_cancel_existing(self):
        apt = book_appointment(
            patient_name="To Cancel",
            phone="+14155555555",
            date="2099-05-01",
            time="09:00",
            reason="Test",
        )
        assert cancel_appointment(apt.id) is True

    def test_cancel_nonexistent(self):
        assert cancel_appointment("nonexistent-id") is False

tests/test_knowledge_base.py

import pytest

from services.knowledge_base import KnowledgeBase


class TestKnowledgeBaseEdgeCases:
    def setup_method(self):
        self.kb = KnowledgeBase()
        self.kb.load_faqs()

    def test_empty_query_returns_low_confidence(self):
        result = self.kb.search("")
        assert result["confidence"] < 0.6 or not result["found"]

    def test_returns_dict_format(self):
        result = self.kb.search("office hours")
        assert "answer" in result
        assert "confidence" in result
        assert "found" in result
        assert isinstance(result["confidence"], float)

    def test_confidence_between_0_and_1(self):
        result = self.kb.search("Do you accept insurance?")
        assert 0.0 <= result["confidence"] <= 1.0

    def test_multiple_searches_consistent(self):
        """Same query should return the same result."""
        r1 = self.kb.search("What are your hours?")
        r2 = self.kb.search("What are your hours?")
        assert r1["answer"] == r2["answer"]
        assert abs(r1["confidence"] - r2["confidence"]) < 0.01

Step 10: Docker Deployment

Dockerfile

FROM python:3.11-slim AS builder

WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

FROM python:3.11-slim

WORKDIR /app
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin

COPY agents/ ./agents/
COPY services/ ./services/
COPY data/ ./data/
COPY server.py config.py models.py dashboard.py ./

# Create data directory for SQLite databases
RUN mkdir -p /app/data

EXPOSE 8000

CMD ["python", "server.py", "start"]

docker-compose.yml

services:
  # LiveKit Server (SFU)
  livekit-server:
    image: livekit/livekit-server:latest
    ports:
      - "7880:7880"   # WebSocket + HTTP
      - "7881:7881"   # WebRTC (TCP)
      - "50000-50100:50000-50100/udp"  # WebRTC (UDP)
    environment:
      - LIVEKIT_KEYS=devkey:secret
    command: --dev --bind 0.0.0.0

  # AI Receptionist Agent
  receptionist-agent:
    build: .
    env_file: .env.local
    depends_on:
      - livekit-server
    volumes:
      - agent_data:/app/data
    restart: unless-stopped

  # Dashboard API
  dashboard:
    build: .
    command: uvicorn dashboard:app --host 0.0.0.0 --port 8000
    ports:
      - "8000:8000"
    volumes:
      - agent_data:/app/data
    restart: unless-stopped

volumes:
  agent_data:

Beginner Breakdown — Docker Compose Services:

Docker Compose Services

livekit-server (port 7880)

SFU — routes audio between caller and agent

Manages Rooms and participants

--dev mode for local development

receptionist-agent

Runs server.py — connects to LiveKit and waits for calls

Creates one AgentSession per inbound SIP call

Shares /app/data volume for SQLite databases

dashboard (port 8000)

FastAPI app for viewing call logs and availability

Reads from the same SQLite database via shared volume

Docker Concept	What It Means
`--dev`	LiveKit dev mode — auto-generates API keys, enables test features. Never use in production.
`50000-50100/udp`	WebRTC media ports. Audio travels over UDP for lowest latency. The range allows up to 100 concurrent connections.
`volumes: agent_data`	Shared volume between agent and dashboard. Both read/write the same SQLite database files.
`depends_on: livekit-server`	Agent starts after LiveKit server. Without this, the agent would fail to connect.

Running the Application

Start everything with Docker Compose:

docker-compose up -d livekit-server
docker-compose up receptionist-agent dashboard

Or run locally for development:

# Terminal 1: Start LiveKit server
docker run --rm -p 7880:7880 -p 7881:7881 \
  -p 50000-50100:50000-50100/udp \
  -e LIVEKIT_KEYS=devkey:secret \
  livekit/livekit-server --dev --bind 0.0.0.0

# Terminal 2: Start the agent
python server.py start

# Terminal 3: Start the dashboard
uvicorn dashboard:app --reload --port 8000

Test with a real phone call (requires Twilio SIP trunk):

Configure Twilio SIP trunk to point to your LiveKit server
Call your Twilio phone number
The agent should greet you and respond to questions

Test without a phone (LiveKit Playground):

# Open LiveKit's web playground to test via browser
# Visit: https://agents-playground.livekit.io
# Enter your LiveKit server URL and API credentials
# Click "Connect" — your agent will activate via WebRTC instead of SIP

Check the dashboard:

# View recent calls
curl http://localhost:8000/api/calls

# Check appointment availability
curl http://localhost:8000/api/availability/2026-04-15

# View call statistics
curl http://localhost:8000/api/stats

Run the test suite:

pytest tests/ -v

Telephony Configuration Guide

Setting up SIP trunking is the most infrastructure-heavy part of this project. Here is a complete walkthrough:

SIP Trunk Setup Flow

1. Twilio AccountCreate account → buy a phone number (E.164 format: +14155551234). Cost: ~$1/month + $0.008/min

2. Create SIP Trunk in TwilioTwilio Console → Elastic SIP Trunking → Create Trunk → add your phone number as Origination URI

3. Point Trunk to LiveKitSet Termination SIP URI to your LiveKit server's public address. LiveKit Cloud provides this automatically.

4. Create Inbound Trunk in LiveKitlk sip inbound create — tells LiveKit to accept SIP calls from Twilio's IP ranges

5. Create Dispatch Rulelk sip dispatch create — routes incoming calls to your agent_name='receptionist'

Provider	Phone Number Cost	Per-Minute Cost	Notes
Twilio	~$1/month	~$0.008/min inbound	Most popular, excellent docs
Telnyx	~$1/month	~$0.005/min inbound	Lower cost, good quality
Vonage	~$1/month	~$0.007/min inbound	Global coverage

Total cost per conversation minute (all inclusive):

Component	Cost/min
SIP trunk (Twilio)	~$0.008
LiveKit Cloud (or $0 self-hosted)	~$0.003
STT (Deepgram Nova-3)	~$0.005
LLM (GPT-4.1-mini)	~$0.01-0.03
TTS (Cartesia Sonic-3)	~$0.01-0.02
Total	~$0.04-0.07

Compare this to a human receptionist at ~$25/hour (or ~$0.42/minute). The AI receptionist is approximately 6-10x cheaper per minute while handling unlimited concurrent calls.

Debugging Tips

Problem	Likely Cause	Fix
Agent doesn't start	Agent name mismatch	Verify `agent_name="receptionist"` in `server.py` matches the dispatch rule
No audio from caller	SIP trunk misconfigured	Check Twilio trunk Termination URI points to correct LiveKit address
Agent speaks but caller can't hear	Firewall blocking UDP	Open ports 50000-50100/udp for WebRTC media
High latency (>3s per turn)	LLM or TTS slow	Check which stage is slow — STT, LLM, or TTS. Try GPT-4.1-mini instead of GPT-4.1
FAQ tool returns wrong answers	Low similarity threshold	Adjust `CONFIDENCE_THRESHOLD` in .env — higher means stricter matching
Transfer fails	SIP trunk ID wrong	Verify `SIP_TRUNK_ID` matches your outbound trunk (not inbound)
Agent talks over caller	Turn detection too aggressive	Adjust VAD sensitivity or try different turn detection model
Echo on phone calls	Wrong noise cancellation	Ensure `BVCTelephony()` is used for SIP participants, not `BVC()`
Agent keeps greeting after transfer	Agent still active in room	After transfer, consider having the agent leave the room or go silent

Extensions

Difficulty	Extension	Description
Easy	Appointment reminders	Send SMS via Twilio 24 hours before appointments
Easy	Call recording	Enable LiveKit Egress to record calls for quality review
Medium	Multi-language receptionist	Detect caller language and switch STT/TTS locale dynamically
Medium	DTMF menu fallback	Handle "Press 1 for appointments" for callers who prefer traditional IVR
Medium	CRM integration	Look up caller by phone number in a CRM to personalize greetings
Hard	AI sales outreach agent	Outbound SIP calls to leads with CRM integration and objection handling
Hard	Multilingual support hotline	Language detection + dynamic provider switching + language-matched human agents
Hard	Voicemail with transcription	Detect voicemail, leave a message, transcribe incoming voicemails

Key Concepts Recap

Concept	What It Is	Why It Matters
SIP Trunking	Bridge between internet and phone network (PSTN)	Lets your AI agent answer real phone calls, not just browser connections
LiveKit Room	Virtual space where participants exchange audio/video	Each call gets its own room — isolated, scalable, multi-participant
AgentSession	The STT → LLM → TTS pipeline	Handles the entire voice AI loop automatically — you write business logic, not plumbing
@function_tool	Decorator that exposes a method as an LLM tool	The LLM can call your Python functions mid-conversation to look up data or take actions
Warm Handoff	Adding a human to the same Room as the caller	No disconnection, no re-explanation — the human joins the existing conversation
VAD (Silero)	Neural network that detects speech vs. silence	Knows when the caller has finished talking so the AI doesn't interrupt
Turn Detection	Model that predicts conversational turn boundaries	More sophisticated than VAD alone — handles pauses, thinking, and filler words
Noise Cancellation	AI-powered audio filtering	`BVCTelephony()` removes PSTN noise, echo, and background voices from phone calls
SFU	Selective Forwarding Unit (LiveKit Server)	Routes audio between participants without mixing — scales to hundreds of concurrent calls
WebRTC	Real-time audio/video protocol with built-in NAT traversal	Handles the hard networking problems (firewalls, echo, jitter) that raw WebSockets cannot

Resources

Beginner Glossary

Term	Plain English
SIP	The signaling protocol that sets up phone calls over the internet. Like HTTP is for web pages, SIP is for voice calls.
PSTN	The traditional phone network — the physical infrastructure that carries calls from cell towers and landlines.
SIP Trunk	A service (Twilio, Telnyx) that gives you a phone number and bridges between the internet and the PSTN.
WebRTC	Browser technology for real-time audio/video with built-in echo cancellation, encryption, and firewall traversal.
SFU	A server that forwards audio streams between participants without mixing them. LiveKit Server is an SFU.
NAT Traversal	The process of establishing direct connections between devices behind routers/firewalls. WebRTC uses ICE/STUN/TURN for this.
VAD	Voice Activity Detection — a neural network that detects when someone is speaking vs. silence.
DTMF	The beep tones when you press phone buttons. Each button makes two tones at once (dual-tone).
IVR	The automated phone menus: "Press 1 for billing." This project replaces IVRs with conversational AI.
E.164	International phone number format: +14155551234. The + and country code ensure global routing.
PCM	Raw audio as a list of numbers representing sound wave samples. The simplest audio format.
Krisp	AI noise cancellation technology. Filters background noise in real-time before your agent processes audio.
Warm Handoff	Transferring a caller to a human without disconnecting — the human joins the existing conversation.
Cold Transfer	Traditional transfer where the caller is disconnected and reconnected to someone new, losing context.
Participant	Anyone in a LiveKit Room — the caller, the AI agent, or a human agent. Each publishes and subscribes to audio tracks.
Room	A LiveKit virtual space where participants communicate. One room per phone call in this project.
Dispatch Rule	A LiveKit configuration that decides which agent to assign when a new call arrives.
Prewarm	Loading heavy resources (like the VAD model) once at startup instead of per-call, saving ~2 seconds per call.

LiveKit AI Phone Receptionist

On this page

LiveKit AI Phone Receptionist

On this page