LiveKit AI Phone Receptionist
Build a production AI phone receptionist with LiveKit Agents, SIP trunking, function tools, and warm handoff to human agents
LiveKit AI Phone Receptionist
| Property | Value |
|---|---|
| Difficulty | Advanced |
| Time | ~4-5 days |
| Code Size | ~1,200 LOC |
| Prerequisites | Production Voice Agent Platform, Tool Calling Agent |
TL;DR
Build a real AI phone receptionist that answers actual phone calls via SIP trunking, books appointments, answers FAQs from a knowledge base, and transfers callers to human agents — all using LiveKit's open-source Agents framework. Unlike the Production Voice Agent project where you built everything from scratch, here you use a production framework that handles audio transport, turn detection, and noise cancellation for you.
Core Terms
Before diving in, let's clarify the acronyms and protocols that power phone-based AI agents. These come from decades of telecom engineering — understanding them is essential for building systems that connect to real phone networks.
| Term | Full Name | Plain English |
|---|---|---|
| SIP | Session Initiation Protocol | The "HTTP of phone calls." A signaling protocol that sets up, modifies, and tears down voice calls over the internet. When you dial a number, SIP messages negotiate the connection before any audio flows. |
| PSTN | Public Switched Telephone Network | The global network of copper wires, fiber, and switches that carries traditional phone calls. When someone dials your business from a landline or mobile, it travels through the PSTN. |
| SIP Trunk | SIP Trunking Service | A bridge between the internet and the PSTN. Providers like Twilio or Telnyx give you a phone number and route calls between the PSTN and your SIP-based application. Think of it as a "phone line as a service." |
| WebRTC | Web Real-Time Communication | A browser-native protocol for real-time audio/video. Unlike raw WebSockets (which just move bytes), WebRTC handles echo cancellation, jitter buffers, codec negotiation, and NAT traversal automatically. |
| NAT | Network Address Translation | Your router shares one public IP among many devices. NAT makes direct peer-to-peer connections difficult because external callers can't reach your internal IP. WebRTC solves this with ICE/TURN/STUN. |
| TURN | Traversal Using Relays around NAT | A relay server that forwards media when direct connections fail. About 10-30% of WebRTC calls need a TURN server. LiveKit Cloud includes this; self-hosting means running your own. |
| STUN | Session Traversal Utilities for NAT | A lightweight server that tells a device its public IP address. Used during connection setup so peers know how to reach each other. Unlike TURN, STUN doesn't relay media. |
| ICE | Interactive Connectivity Establishment | The process of finding the best connection path between two peers. ICE tries direct connections first, then STUN-assisted connections, then TURN relays as a last resort. |
| SFU | Selective Forwarding Unit | A media server that receives audio/video streams and forwards them to other participants without mixing or transcoding. LiveKit Server is an SFU — more scalable than mixing servers. |
| VAD | Voice Activity Detection | Detects when someone is speaking vs. silence. Critical for knowing when the caller has finished talking so the AI can respond. LiveKit uses Silero VAD (a small neural network). |
| DTMF | Dual-Tone Multi-Frequency | The tones generated when you press phone keypad buttons. Each button produces two simultaneous tones. Used for "Press 1 for sales" menus and PIN entry. |
| IVR | Interactive Voice Response | The automated phone menus you hear when calling a business — "Press 1 for billing, Press 2 for support." Traditional IVRs use pre-recorded audio and DTMF input. This project replaces IVRs with conversational AI. |
| E.164 | ITU-T E.164 Standard | The international phone number format: + followed by country code and number. Example: +14155551234. SIP trunks require E.164 format for routing calls correctly. |
| RTC | Real-Time Communication | Umbrella term for any technology that enables live, low-latency audio/video communication between participants. |
How a phone call reaches your AI agent:
Call Flow: PSTN → AI Agent
Why This Project Matters
The Production Voice Agent project taught you how voice agents work at the lowest level — raw PCM audio, async queues, manual VAD. That knowledge is essential. But in production, teams use frameworks like LiveKit to ship faster and handle the hard infrastructure problems (NAT traversal, echo cancellation, scaling) automatically.
This project bridges that gap:
| What You Built Before | What LiveKit Handles For You |
|---|---|
Custom BargeInDetector with RMS energy calculation | Silero VAD neural network + turn detection model |
| Raw WebSocket audio streaming | WebRTC with ICE/TURN/STUN, jitter buffers, echo cancellation |
Manual asyncio.Queue for audio routing | LiveKit Room with Track publish/subscribe |
| No real phone integration | SIP trunking — actual phone calls from any phone |
| Single session, single server | Multi-room, horizontally scalable SFU |
Business case: A human receptionist costs $3,000-4,000/month and handles ~40 calls/day. This system handles hundreds of concurrent calls at ~$0.05-0.10/minute, 24/7, in multiple languages.
What You'll Learn
- LiveKit Agents framework —
AgentSession,Agent,@function_tool - SIP trunking — connecting AI agents to real phone networks via Twilio
- WebRTC fundamentals — how LiveKit manages real-time audio transport
- Warm handoff — transferring callers to human agents in the same room
- Multi-agent patterns — receptionist → scheduling specialist handoff
- Function calling in voice — tools that execute while the caller waits
- Telephony-specific audio — noise cancellation, DTMF handling, E.164 routing
Tech Stack
| Library | Version | Purpose |
|---|---|---|
| livekit-agents | 1.x | Agent framework — session, lifecycle, tools |
| livekit-plugins-deepgram | 1.x | STT (speech-to-text) via Deepgram Nova-3 |
| livekit-plugins-openai | 1.x | LLM reasoning via GPT-4.1-mini |
| livekit-plugins-cartesia | 1.x | TTS (text-to-speech) via Cartesia Sonic-3 |
| livekit-plugins-silero | 1.x | VAD (voice activity detection) |
| livekit-plugins-noise-cancellation | 1.x | Background voice cancellation for telephony |
| livekit-plugins-turn-detector | 1.x | Multilingual turn detection model |
| livekit-api | 1.x | LiveKit server API (SIP, rooms, participants) |
| ChromaDB | 0.5.0 | Vector store for FAQ knowledge base |
| sentence-transformers | 3.3.0 | Embedding model for FAQ retrieval |
| FastAPI | 0.115.0 | Dashboard API and webhook receiver |
| Pydantic Settings | 2.6.0 | Typed configuration |
LiveKit vs DIY: What Changes?
Architecture Comparison
DIY (Production Voice Agent project)
You wrote: StreamingGateway, BargeInDetector, ASRClient, TTSClient, asyncio.Queue routing, WebSocket handler. Total: ~1,400 LOC of infrastructure code + business logic.
LiveKit Agents (this project)
RecommendedLiveKit handles: audio transport, VAD, turn detection, noise cancellation, barge-in, echo cancellation, NAT traversal, scaling. You write: ~1,200 LOC of pure business logic — agent instructions, tools, handoff logic.
What you no longer write manually:
| Component | DIY Project | LiveKit Project |
|---|---|---|
| Audio transport | StreamingGateway + asyncio.Queue (120 LOC) | LiveKit Room (0 LOC) |
| Barge-in detection | BargeInDetector with RMS energy (80 LOC) | Silero VAD plugin (0 LOC) |
| ASR client | DeepgramASRClient with WebSocket (90 LOC) | stt="deepgram/nova-3:multi" (1 line) |
| TTS client | OpenAITTSClient with streaming (60 LOC) | tts="cartesia/sonic-3:..." (1 line) |
| Turn detection | Manual speech_final + utterance_end_ms | MultilingualModel() (1 line) |
| Phone connectivity | Not supported | SIP trunk configuration |
What you focus on instead: agent personality, function tools, business logic, handoff flows, knowledge base, appointment scheduling.
High-Level Architecture
AI Phone Receptionist System
Phone Network (PSTN)
SIP Trunk (Twilio)
LiveKit Server (SFU)
AI Agent (Python)
Backend Services
Call Lifecycle — from dial to hangup:
Complete Call Lifecycle
Project Structure
livekit-receptionist/
├── agents/
│ ├── receptionist.py # Main receptionist agent with tools
│ ├── scheduling.py # Scheduling specialist agent
│ └── handoff.py # Human handoff logic
├── services/
│ ├── appointments.py # Appointment CRUD (SQLite)
│ ├── knowledge_base.py # FAQ retrieval (ChromaDB)
│ └── call_logger.py # Call record storage
├── server.py # LiveKit AgentServer entry point
├── config.py # Pydantic Settings
├── models.py # Shared data models
├── dashboard.py # FastAPI dashboard + webhooks
├── tests/
│ ├── test_receptionist.py
│ ├── test_appointments.py
│ └── test_knowledge_base.py
├── data/
│ └── faq_documents.json # FAQ seed data
├── .env.local
├── Dockerfile
├── docker-compose.yml
└── requirements.txtImplementation
Step 0: Setup and Dependencies
livekit-agents[codecs]~=1.0
livekit-plugins-deepgram~=1.0
livekit-plugins-openai~=1.0
livekit-plugins-cartesia~=1.0
livekit-plugins-silero~=1.0
livekit-plugins-noise-cancellation~=1.0
livekit-plugins-turn-detector~=1.0
livekit-api~=1.0
chromadb==0.5.0
sentence-transformers==3.3.0
fastapi==0.115.0
uvicorn==0.32.0
pydantic-settings==2.6.0
python-dotenv==1.0.0
pytest==8.3.0
pytest-asyncio==0.24.0# LiveKit
LIVEKIT_URL=ws://localhost:7880
LIVEKIT_API_KEY=your-api-key
LIVEKIT_API_SECRET=your-api-secret
# SIP Trunk (Twilio)
SIP_TRUNK_ID=ST_xxxx
# AI Providers
DEEPGRAM_API_KEY=your-deepgram-key
OPENAI_API_KEY=sk-your-openai-key
CARTESIA_API_KEY=your-cartesia-key
# Agent Settings
HUMAN_AGENT_PHONE=+14155559999
BUSINESS_NAME=Sunrise Medical Clinic
BUSINESS_HOURS=Monday-Friday 9AM-5PM
# Knowledge Base
CHROMA_COLLECTION=receptionist_faq
EMBEDDING_MODEL=all-MiniLM-L6-v2
CONFIDENCE_THRESHOLD=0.6Setting up the LiveKit Server and SIP Trunk:
LiveKit requires three pieces of infrastructure: the LiveKit server, a SIP trunk provider, and SIP dispatch rules. Here is how they connect:
Infrastructure Setup
Create an inbound SIP trunk using the LiveKit CLI:
# Install LiveKit CLI
brew install livekit-cli
# Create inbound SIP trunk (receives calls from Twilio)
lk sip inbound create \
--request '{
"trunk": {
"name": "Twilio Inbound",
"numbers": ["+14155551234"],
"krisp_enabled": true
}
}'Create a SIP dispatch rule that routes inbound calls to your agent:
lk sip dispatch create \
--request '{
"dispatch_rule": {
"rule": {
"dispatchRuleIndividual": {
"roomPrefix": "call-"
}
},
"roomConfig": {
"agents": [{
"agentName": "receptionist"
}]
}
}
}'Beginner Breakdown — Infrastructure Setup:
| Concept | What It Means |
|---|---|
livekit/livekit-server Docker image | The open-source SFU server. Manages rooms, participants, and audio tracks. Runs on a single port (7880). |
SIP trunk numbers | The phone numbers that Twilio will route to LiveKit. When someone calls +14155551234, Twilio sends a SIP INVITE to LiveKit. |
krisp_enabled: true | Enables Krisp AI noise cancellation on the SIP trunk. Filters out background noise from the caller's environment before your agent hears it. |
dispatchRuleIndividual | Each inbound call gets its own Room (named call-{random}). This means 100 concurrent calls = 100 separate rooms, each with their own agent instance. |
agentName: "receptionist" | When a call arrives, LiveKit looks for a running agent registered with this name and dispatches it into the new room. |
Step 1: Configuration
from functools import lru_cache
from pydantic_settings import BaseSettings
from pydantic import Field
class Settings(BaseSettings):
"""Receptionist agent configuration from environment."""
# LiveKit
livekit_url: str = Field(..., alias="LIVEKIT_URL")
livekit_api_key: str = Field(..., alias="LIVEKIT_API_KEY")
livekit_api_secret: str = Field(..., alias="LIVEKIT_API_SECRET")
# SIP
sip_trunk_id: str = Field("", alias="SIP_TRUNK_ID")
human_agent_phone: str = Field("", alias="HUMAN_AGENT_PHONE")
# AI Providers
openai_api_key: str = Field(..., alias="OPENAI_API_KEY")
deepgram_api_key: str = Field(..., alias="DEEPGRAM_API_KEY")
cartesia_api_key: str = Field("", alias="CARTESIA_API_KEY")
# Business
business_name: str = Field("Sunrise Medical Clinic", alias="BUSINESS_NAME")
business_hours: str = Field("Monday-Friday 9AM-5PM", alias="BUSINESS_HOURS")
# Knowledge Base
chroma_collection: str = Field("receptionist_faq", alias="CHROMA_COLLECTION")
embedding_model: str = Field("all-MiniLM-L6-v2", alias="EMBEDDING_MODEL")
confidence_threshold: float = Field(0.6, alias="CONFIDENCE_THRESHOLD")
model_config = {"env_file": ".env.local", "extra": "ignore"}
@lru_cache
def get_settings() -> Settings:
return Settings()Step 2: Data Models
from dataclasses import dataclass, field
from datetime import datetime
from enum import Enum
from typing import Optional
class CallOutcome(str, Enum):
"""How the call ended."""
COMPLETED = "completed" # Agent handled it fully
TRANSFERRED = "transferred" # Handed off to human
ABANDONED = "abandoned" # Caller hung up early
ERROR = "error" # System failure
class AppointmentStatus(str, Enum):
"""Appointment lifecycle states."""
CONFIRMED = "confirmed"
CANCELLED = "cancelled"
RESCHEDULED = "rescheduled"
@dataclass
class Appointment:
"""A scheduled appointment."""
id: str
patient_name: str
phone: str
date: str # YYYY-MM-DD
time: str # HH:MM
reason: str
provider: str = "" # Doctor/specialist name
status: AppointmentStatus = AppointmentStatus.CONFIRMED
created_at: str = ""
def __post_init__(self):
if not self.created_at:
self.created_at = datetime.now().isoformat()
@dataclass
class CallRecord:
"""Record of a completed call."""
call_id: str
room_name: str
caller_phone: str
duration_seconds: int = 0
outcome: CallOutcome = CallOutcome.COMPLETED
tools_used: list[str] = field(default_factory=list)
appointment_id: Optional[str] = None
transcript_summary: str = ""
started_at: str = ""
ended_at: str = ""Beginner Breakdown — Data Models:
| Python Concept | What It Means |
|---|---|
class CallOutcome(str, Enum) | A fixed set of values. A call can ONLY end as completed, transferred, abandoned, or error. Using an Enum prevents typos — CallOutcome.TRANSFERED would raise an error at import time. |
@dataclass | Auto-generates __init__, __repr__, and __eq__ from the field declarations. Less boilerplate than writing constructors manually. |
field(default_factory=list) | Creates a new empty list for each instance. Never write tools_used: list = [] — all instances would share the same list object (a classic Python gotcha). |
__post_init__ | Runs after __init__. Here it sets created_at to the current time if not provided. Useful for computed defaults. |
Step 3: FAQ Knowledge Base
The knowledge base answers common caller questions — office hours, insurance, directions, policies — so the LLM doesn't hallucinate answers to factual questions.
import json
import logging
from pathlib import Path
import chromadb
from chromadb.utils.embedding_functions import (
SentenceTransformerEmbeddingFunction,
)
from config import get_settings
logger = logging.getLogger(__name__)
class KnowledgeBase:
"""ChromaDB-backed FAQ retrieval with confidence scoring."""
def __init__(self):
settings = get_settings()
self._client = chromadb.PersistentClient(path="./chroma_data")
self._embedding_fn = SentenceTransformerEmbeddingFunction(
model_name=settings.embedding_model
)
self._collection = self._client.get_or_create_collection(
name=settings.chroma_collection,
embedding_function=self._embedding_fn,
metadata={"hnsw:space": "cosine"},
)
self._threshold = settings.confidence_threshold
def load_faqs(self, path: str = "data/faq_documents.json") -> int:
"""Seed the knowledge base from a JSON file.
Expected format: [{"id": "...", "question": "...", "answer": "..."}]
"""
data = json.loads(Path(path).read_text())
if not data:
return 0
self._collection.upsert(
ids=[d["id"] for d in data],
documents=[
f"Q: {d['question']}\nA: {d['answer']}" for d in data
],
metadatas=[{"question": d["question"]} for d in data],
)
logger.info("Loaded %d FAQ documents", len(data))
return len(data)
def search(self, query: str, n_results: int = 3) -> dict:
"""Search FAQs and return the best answer with confidence.
Returns:
{"answer": str, "confidence": float, "found": bool}
"""
results = self._collection.query(
query_texts=[query],
n_results=n_results,
)
if not results["documents"] or not results["documents"][0]:
return {"answer": "", "confidence": 0.0, "found": False}
documents = results["documents"][0]
distances = results["distances"][0] if results["distances"] else []
# Cosine distance → similarity (0=opposite, 1=identical)
similarities = [1 - d for d in distances] if distances else []
best_score = max(similarities) if similarities else 0.0
if best_score < self._threshold:
return {
"answer": "",
"confidence": best_score,
"found": False,
}
# Return the highest-scoring document
best_idx = similarities.index(best_score)
return {
"answer": documents[best_idx],
"confidence": best_score,
"found": True,
}[
{
"id": "hours",
"question": "What are your office hours?",
"answer": "We are open Monday through Friday, 9 AM to 5 PM. We are closed on weekends and major holidays."
},
{
"id": "insurance",
"question": "What insurance do you accept?",
"answer": "We accept most major insurance plans including Blue Cross Blue Shield, Aetna, Cigna, UnitedHealthcare, and Medicare. Please call ahead to verify your specific plan."
},
{
"id": "location",
"question": "Where are you located?",
"answer": "We are located at 456 Oak Avenue, Suite 200, San Francisco, CA 94102. Free parking is available in the building garage."
},
{
"id": "new-patient",
"question": "How do I become a new patient?",
"answer": "New patients can schedule an initial consultation by calling us or booking online. Please bring your insurance card, photo ID, and any relevant medical records to your first visit."
},
{
"id": "cancellation",
"question": "What is your cancellation policy?",
"answer": "We require 24 hours notice for cancellations. Late cancellations or no-shows may incur a 50 dollar fee. We understand emergencies happen and handle those on a case-by-case basis."
},
{
"id": "urgent",
"question": "What should I do in an emergency?",
"answer": "If you are experiencing a medical emergency, please call 911 immediately. For urgent but non-emergency concerns during office hours, call us and we will try to see you the same day."
},
{
"id": "telehealth",
"question": "Do you offer telehealth appointments?",
"answer": "Yes, we offer telehealth appointments for follow-up visits and certain types of consultations. Ask when scheduling if your visit qualifies for telehealth."
},
{
"id": "referral",
"question": "Do I need a referral?",
"answer": "Some insurance plans require a referral from your primary care physician. Check with your insurance provider before scheduling. We can help verify if you are unsure."
}
]Understanding the FAQ Search:
FAQ Lookup Flow
Why use a knowledge base instead of putting FAQs in the system prompt?
FAQ in Prompt vs Knowledge Base
FAQs in system prompt
Works for 5-10 FAQs. But 50+ FAQs consume thousands of tokens per turn, increasing latency and cost. Every LLM call pays for all FAQs even when the question is about hours.
Knowledge base (ChromaDB)
RecommendedRetrieves only the 1-3 relevant FAQs per question. Scales to thousands of documents. Costs nothing when not queried. Returns a confidence score so the agent knows when to say "I don't know."
Step 4: Appointment Service
import logging
import sqlite3
import uuid
from contextlib import contextmanager
from datetime import datetime, timedelta
from models import Appointment, AppointmentStatus
logger = logging.getLogger(__name__)
DB_PATH = "data/appointments.db"
# Available time slots (30-minute intervals, 9 AM to 4:30 PM)
SLOT_START_HOUR = 9
SLOT_END_HOUR = 17
SLOT_DURATION_MINUTES = 30
def _init_db():
"""Create the appointments table if it does not exist."""
with _get_conn() as conn:
conn.execute("""
CREATE TABLE IF NOT EXISTS appointments (
id TEXT PRIMARY KEY,
patient_name TEXT NOT NULL,
phone TEXT NOT NULL,
date TEXT NOT NULL,
time TEXT NOT NULL,
reason TEXT NOT NULL,
provider TEXT DEFAULT '',
status TEXT DEFAULT 'confirmed',
created_at TEXT NOT NULL
)
""")
@contextmanager
def _get_conn():
conn = sqlite3.connect(DB_PATH)
conn.row_factory = sqlite3.Row
try:
yield conn
conn.commit()
finally:
conn.close()
def get_available_slots(date: str) -> list[str]:
"""Return available 30-minute slots for a given date.
Args:
date: Date in YYYY-MM-DD format.
Returns:
List of available time strings like ["09:00", "09:30", "10:00", ...]
"""
# Generate all possible slots
all_slots = []
current = datetime.strptime(f"{date} {SLOT_START_HOUR:02d}:00", "%Y-%m-%d %H:%M")
end = datetime.strptime(f"{date} {SLOT_END_HOUR:02d}:00", "%Y-%m-%d %H:%M")
while current < end:
all_slots.append(current.strftime("%H:%M"))
current += timedelta(minutes=SLOT_DURATION_MINUTES)
# Remove booked slots
with _get_conn() as conn:
rows = conn.execute(
"SELECT time FROM appointments WHERE date = ? AND status = ?",
(date, AppointmentStatus.CONFIRMED.value),
).fetchall()
booked = {row["time"] for row in rows}
available = [s for s in all_slots if s not in booked]
return available
def book_appointment(
patient_name: str,
phone: str,
date: str,
time: str,
reason: str,
provider: str = "",
) -> Appointment:
"""Book a new appointment.
Returns:
The created Appointment object.
Raises:
ValueError: If the slot is already booked.
"""
available = get_available_slots(date)
if time not in available:
raise ValueError(f"Time slot {time} on {date} is not available")
appointment = Appointment(
id=str(uuid.uuid4())[:8],
patient_name=patient_name,
phone=phone,
date=date,
time=time,
reason=reason,
provider=provider,
)
with _get_conn() as conn:
conn.execute(
"""INSERT INTO appointments
(id, patient_name, phone, date, time, reason, provider, status, created_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?)""",
(
appointment.id,
appointment.patient_name,
appointment.phone,
appointment.date,
appointment.time,
appointment.reason,
appointment.provider,
appointment.status.value,
appointment.created_at,
),
)
logger.info("Booked appointment %s for %s on %s at %s",
appointment.id, patient_name, date, time)
return appointment
def cancel_appointment(appointment_id: str) -> bool:
"""Cancel an existing appointment."""
with _get_conn() as conn:
result = conn.execute(
"UPDATE appointments SET status = ? WHERE id = ? AND status = ?",
(AppointmentStatus.CANCELLED.value, appointment_id,
AppointmentStatus.CONFIRMED.value),
)
return result.rowcount > 0
# Initialize database on import
_init_db()Beginner Breakdown — Appointment Service:
| Python Concept | What It Means |
|---|---|
@contextmanager | Turns a generator function into a with statement. Code before yield = setup (open connection), code after yield = cleanup (close connection). Ensures the database connection always closes, even if an error occurs. |
conn.row_factory = sqlite3.Row | Makes query results accessible by column name (row["time"]) instead of index (row[0]). Much more readable. |
str(uuid.uuid4())[:8] | Generate a random 8-character ID like "a1b2c3d4". Short enough to read over the phone — "Your confirmation number is alpha-one-bravo-two." |
timedelta(minutes=30) | A duration of 30 minutes. Adding it to a datetime gives you the next time slot. 9:00 + 30min = 9:30. |
{row["time"] for row in rows} | A set comprehension — creates a set of booked times for O(1) lookup. "09:30" in booked is instant vs scanning a list. |
Step 5: Call Logger
import json
import logging
import sqlite3
from contextlib import contextmanager
from models import CallRecord
logger = logging.getLogger(__name__)
DB_PATH = "data/calls.db"
def _init_db():
with _get_conn() as conn:
conn.execute("""
CREATE TABLE IF NOT EXISTS call_records (
call_id TEXT PRIMARY KEY,
room_name TEXT NOT NULL,
caller_phone TEXT DEFAULT '',
duration_seconds INTEGER DEFAULT 0,
outcome TEXT DEFAULT 'completed',
tools_used TEXT DEFAULT '[]',
appointment_id TEXT DEFAULT '',
transcript_summary TEXT DEFAULT '',
started_at TEXT NOT NULL,
ended_at TEXT DEFAULT ''
)
""")
@contextmanager
def _get_conn():
conn = sqlite3.connect(DB_PATH)
conn.row_factory = sqlite3.Row
try:
yield conn
conn.commit()
finally:
conn.close()
def save_call(record: CallRecord) -> None:
"""Save a completed call record."""
with _get_conn() as conn:
conn.execute(
"""INSERT OR REPLACE INTO call_records
(call_id, room_name, caller_phone, duration_seconds,
outcome, tools_used, appointment_id,
transcript_summary, started_at, ended_at)
VALUES (?, ?, ?, ?, ?, ?, ?, ?, ?, ?)""",
(
record.call_id,
record.room_name,
record.caller_phone,
record.duration_seconds,
record.outcome.value,
json.dumps(record.tools_used),
record.appointment_id or "",
record.transcript_summary,
record.started_at,
record.ended_at,
),
)
logger.info("Saved call record: %s (outcome=%s)", record.call_id, record.outcome.value)
def get_recent_calls(limit: int = 20) -> list[dict]:
"""Retrieve recent call records for the dashboard."""
with _get_conn() as conn:
rows = conn.execute(
"SELECT * FROM call_records ORDER BY started_at DESC LIMIT ?",
(limit,),
).fetchall()
return [dict(row) for row in rows]
_init_db()Step 6: Receptionist Agent
This is the core of the project. The ReceptionistAgent inherits from LiveKit's Agent class and defines the AI personality, function tools, and lifecycle hooks. LiveKit handles all the audio plumbing — you write pure business logic.
import logging
from typing import Any
from livekit.agents import Agent, RunContext, function_tool
from config import get_settings
from services.appointments import get_available_slots, book_appointment
from services.knowledge_base import KnowledgeBase
logger = logging.getLogger(__name__)
# Initialize knowledge base once
_kb = KnowledgeBase()
_kb.load_faqs()
class ReceptionistAgent(Agent):
"""AI phone receptionist for a medical clinic.
Handles: greetings, FAQ answers, appointment scheduling,
and transfers to human agents when needed.
"""
def __init__(self, job_context=None) -> None:
settings = get_settings()
self.job_context = job_context
super().__init__(
instructions=f"""You are the phone receptionist for {settings.business_name}.
You answer incoming phone calls with warmth and professionalism.
RULES:
1. Be concise. Callers are LISTENING, not reading. Keep responses to 1-2 sentences.
2. Never spell out URLs, emails, or long numbers. Say "I can text you that information."
3. For factual questions (hours, insurance, location), ALWAYS use the lookup_faq tool.
Never guess — if the tool returns no result, say "I don't have that information handy,
let me transfer you to someone who can help."
4. For appointment scheduling, use check_availability first, then confirm with the caller
before calling book_appointment.
5. If the caller asks for a human, a doctor, or says "transfer me," use transfer_to_human
immediately. Do not try to convince them to stay.
6. Speak naturally. Avoid bullet points, markdown, or any text formatting.
7. If the caller sounds upset or frustrated, acknowledge their feelings before solving
the problem.
BUSINESS INFO:
- Name: {settings.business_name}
- Hours: {settings.business_hours}
""",
)
async def on_enter(self) -> None:
"""Called when this agent becomes active. Greet the caller."""
settings = get_settings()
await self.session.generate_reply(
instructions=f"Greet the caller warmly. Say: 'Thank you for calling "
f"{settings.business_name}, how can I help you today?'"
)
@function_tool()
async def lookup_faq(
self,
context: RunContext,
question: str,
) -> dict[str, Any]:
"""Search the knowledge base for answers to common questions
about office hours, insurance, location, policies, and services.
Args:
question: The caller's question to look up.
"""
result = _kb.search(question)
if result["found"]:
logger.info("FAQ hit: %.2f confidence for '%s'",
result["confidence"], question[:50])
return {
"answer": result["answer"],
"confidence": result["confidence"],
}
logger.info("FAQ miss: %.2f confidence for '%s'",
result["confidence"], question[:50])
return {
"answer": "No matching FAQ found.",
"confidence": result["confidence"],
"suggestion": "Offer to transfer to a staff member who can help.",
}
@function_tool()
async def check_availability(
self,
context: RunContext,
date: str,
) -> dict[str, Any]:
"""Check available appointment slots for a specific date.
Args:
date: The date to check in YYYY-MM-DD format.
"""
slots = get_available_slots(date)
if not slots:
return {
"available": False,
"message": f"No slots available on {date}.",
"suggestion": "Try the next business day.",
}
# Group slots for easier reading over the phone
morning = [s for s in slots if int(s.split(":")[0]) < 12]
afternoon = [s for s in slots if int(s.split(":")[0]) >= 12]
return {
"available": True,
"date": date,
"morning_slots": morning,
"afternoon_slots": afternoon,
"total": len(slots),
}
@function_tool()
async def book_appointment(
self,
context: RunContext,
patient_name: str,
phone: str,
date: str,
time: str,
reason: str,
) -> dict[str, Any]:
"""Book an appointment after confirming details with the caller.
Args:
patient_name: The patient's full name.
phone: The patient's phone number for confirmation.
date: Appointment date in YYYY-MM-DD format.
time: Appointment time in HH:MM format.
reason: Brief reason for the visit.
"""
try:
appointment = book_appointment(
patient_name=patient_name,
phone=phone,
date=date,
time=time,
reason=reason,
)
return {
"success": True,
"confirmation_id": appointment.id,
"date": date,
"time": time,
"message": f"Appointment confirmed for {patient_name}.",
}
except ValueError as exc:
return {
"success": False,
"error": str(exc),
"suggestion": "Check availability for a different time.",
}
@function_tool()
async def transfer_to_human(
self,
context: RunContext,
reason: str,
) -> dict[str, Any]:
"""Transfer the caller to a human staff member.
Use this when the caller explicitly asks for a human,
when you cannot answer their question, or when the situation
requires human judgment.
Args:
reason: Brief reason for the transfer.
"""
import os
import uuid
from livekit.protocol import api
if not self.job_context:
await self.session.say(
"I'm sorry, I'm unable to transfer the call right now. "
"Please try calling back."
)
return {"success": False, "error": "No job context available"}
phone = os.environ.get("HUMAN_AGENT_PHONE", "")
if not phone:
await self.session.say(
"I'm sorry, no staff members are available for transfer. "
"Can I take a message instead?"
)
return {"success": False, "error": "No human agent phone configured"}
sip_trunk_id = os.environ.get("SIP_TRUNK_ID", "")
room_name = self.job_context.room.name
try:
# Add human agent to the same Room via SIP
await self.job_context.api.sip.create_sip_participant(
api.CreateSIPParticipantRequest(
sip_trunk_id=sip_trunk_id,
sip_call_to=phone,
room_name=room_name,
participant_identity=f"human_{uuid.uuid4().hex[:8]}",
participant_name="Staff Member",
krisp_enabled=True,
)
)
await self.session.say(
"I'm transferring you to a staff member now. "
"Please hold for just a moment."
)
return {"success": True, "reason": reason}
except Exception as exc:
logger.error("Transfer failed: %s", exc)
await self.session.say(
"I'm sorry, I couldn't reach a staff member right now. "
"Can I take your name and number so someone can call you back?"
)
return {"success": False, "error": str(exc)}★ Insight ─────────────────────────────────────
1. LiveKit's Agent class vs your DIY DialogueManager: In the previous project, you manually built a DialogueManager that assembled OpenAI messages, parsed tool calls, and made follow-up LLM calls. Here, LiveKit's Agent base class handles all of that — you just define instructions and @function_tool methods. The framework manages the OpenAI function-calling protocol automatically.
2. @function_tool uses docstrings as schemas: The docstring and type hints on each tool method are automatically converted to the JSON schema the LLM sees. The Args: section in the docstring becomes parameter descriptions. This is why the docstrings are written for the LLM, not for Python developers.
3. Warm handoff via SIP: create_sip_participant() adds a human to the same Room as the caller. Both hear each other through LiveKit's SFU. The AI agent can stay in the room (listening, taking notes) or leave — unlike a cold transfer where the caller is disconnected and reconnected.
─────────────────────────────────────────────────
Understanding the Transfer Flow:
Warm Handoff: Caller → Human Agent
Why warm handoff beats cold transfer:
Cold Transfer vs Warm Handoff
Cold transfer (traditional IVR)
Caller is disconnected → reconnected to human → must re-explain their problem from scratch. "I already told the robot my name and appointment details!"
Warm handoff (LiveKit Room)
RecommendedHuman joins the existing conversation. The AI can brief the human: "This caller needs to reschedule their Thursday appointment." No information lost. Caller feels respected.
Beginner Breakdown — Receptionist Agent:
| Python Concept | What It Means |
|---|---|
class ReceptionistAgent(Agent) | Inherits from LiveKit's Agent base class. You get session management, LLM integration, and tool execution for free. |
super().__init__(instructions=...) | Passes the system prompt to the base class. LiveKit sends this as the system message in every LLM call. |
async def on_enter(self) | Lifecycle hook — called when this agent becomes active in the session. Perfect for the initial greeting. |
@function_tool() | Decorator that registers a method as a callable tool for the LLM. The LLM sees the function name, docstring, and parameter types. |
RunContext | Passed to every tool call. Contains the current session, room, and agent state. Useful for accessing conversation context. |
self.session.say("...") | Speaks text immediately to the caller via TTS. Unlike generate_reply(), this doesn't involve the LLM — it's a direct TTS utterance. |
self.session.generate_reply(instructions=...) | Asks the LLM to generate a response with additional instructions. The LLM considers conversation history + these instructions. |
Step 7: Agent Server Entry Point
This is where LiveKit, SIP, and your agent come together. The AgentServer listens for dispatched sessions and creates the voice pipeline for each call.
import logging
from dotenv import load_dotenv
from livekit import agents, rtc
from livekit.agents import AgentServer, AgentSession, JobContext, JobProcess, room_io
from livekit.plugins import noise_cancellation, silero
from livekit.plugins.turn_detector.multilingual import MultilingualModel
from agents.receptionist import ReceptionistAgent
load_dotenv(".env.local")
logger = logging.getLogger(__name__)
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s %(name)s %(levelname)s %(message)s",
)
server = AgentServer()
def prewarm(proc: JobProcess):
"""Pre-load heavy models once per worker process.
VAD (Voice Activity Detection) loads a neural network from disk.
Loading it once here and reusing across sessions saves ~2s per call.
"""
proc.userdata["vad"] = silero.VAD.load()
server.setup_fnc = prewarm
@server.rtc_session(agent_name="receptionist")
async def handle_call(ctx: JobContext):
"""Handle one inbound phone call.
LiveKit dispatches this function for each incoming SIP call
that matches the dispatch rule with agentName="receptionist".
"""
# Build the voice pipeline
session = AgentSession(
stt="deepgram/nova-3:multi",
llm="openai/gpt-4.1-mini",
tts="cartesia/sonic-3:9626c31c-bec5-4cca-baa8-f8ba9e84c8bc",
vad=ctx.proc.userdata["vad"],
turn_detection=MultilingualModel(),
)
# Create the receptionist agent with job context (needed for SIP transfers)
agent = ReceptionistAgent(job_context=ctx)
# Start the session with telephony-optimized audio
await session.start(
room=ctx.room,
agent=agent,
room_options=room_io.RoomOptions(
audio_input=room_io.AudioInputOptions(
noise_cancellation=_get_noise_cancellation,
),
),
)
# Connect to the room (makes the agent a participant)
await ctx.connect()
logger.info("Receptionist agent started in room %s", ctx.room.name)
def _get_noise_cancellation(params):
"""Select noise cancellation mode based on caller type.
SIP callers (phone calls) get telephony-optimized cancellation
that handles PSTN background noise and echo. Browser callers
get standard background voice cancellation.
"""
if params.participant.kind == rtc.ParticipantKind.PARTICIPANT_KIND_SIP:
return noise_cancellation.BVCTelephony()
return noise_cancellation.BVC()
if __name__ == "__main__":
agents.cli.run_app(server)★ Insight ─────────────────────────────────────
1. Prewarm pattern: Loading Silero VAD takes ~2 seconds (it's a PyTorch model). The prewarm function runs once per worker process, not per call. 100 calls share the same loaded VAD model. This is why ctx.proc.userdata["vad"] works — it's stored at the process level.
2. String-based STT/LLM/TTS: "deepgram/nova-3:multi" is LiveKit's provider string format: provider/model:variant. This is a recent API simplification — previously you had to instantiate plugin classes manually. The string format lets LiveKit handle provider initialization and configuration.
3. One session per call, one agent per session: Each SIP call gets its own Room, its own AgentSession, and its own ReceptionistAgent instance. No shared state between calls — this is how LiveKit achieves horizontal scaling.
─────────────────────────────────────────────────
Beginner Breakdown — Agent Server:
What Happens When a Call Arrives
SIP Call Arrives
LiveKit Dispatch
AgentSession Created
Agent Active
| Python Concept | What It Means |
|---|---|
AgentServer() | LiveKit's application container. Listens for dispatched jobs from the LiveKit server. Similar to FastAPI's app = FastAPI(). |
@server.rtc_session(agent_name="receptionist") | Registers this function to handle sessions dispatched to agent name "receptionist". The dispatch rule in Step 0 routes SIP calls here. |
JobContext | Contains: ctx.room (the LiveKit Room), ctx.proc (the worker process with shared userdata), ctx.api (LiveKit server API for SIP operations). |
AgentSession(stt=..., llm=..., tts=...) | The voice pipeline. LiveKit connects these in sequence: audio → STT → LLM → TTS → audio. All streaming, all real-time. |
session.start(room=ctx.room, agent=agent) | Connects the pipeline to the room and activates the agent. From this point, the agent can hear the caller and speak. |
ctx.connect() | Makes the agent a visible participant in the room. Required for audio to flow. |
agents.cli.run_app(server) | Starts the agent worker process. Connects to the LiveKit server and waits for dispatched jobs. |
How the voice pipeline processes each turn:
Single Turn in AgentSession
Step 8: Dashboard API
A simple FastAPI server for viewing call records and managing the system. In production, this would power an admin dashboard.
import logging
from datetime import datetime
from fastapi import FastAPI
from fastapi.middleware.cors import CORSMiddleware
from services.appointments import get_available_slots
from services.call_logger import get_recent_calls
logger = logging.getLogger(__name__)
app = FastAPI(title="Receptionist Dashboard", version="1.0.0")
app.add_middleware(
CORSMiddleware,
allow_origins=["*"],
allow_methods=["*"],
allow_headers=["*"],
)
@app.get("/api/calls")
async def list_calls(limit: int = 20):
"""List recent call records."""
return {"calls": get_recent_calls(limit)}
@app.get("/api/availability/{date}")
async def check_date(date: str):
"""Check available appointment slots for a date."""
slots = get_available_slots(date)
return {"date": date, "slots": slots, "total": len(slots)}
@app.get("/api/stats")
async def stats():
"""Basic call statistics."""
calls = get_recent_calls(100)
total = len(calls)
transferred = sum(1 for c in calls if c.get("outcome") == "transferred")
completed = sum(1 for c in calls if c.get("outcome") == "completed")
return {
"total_calls": total,
"completed": completed,
"transferred": transferred,
"transfer_rate": transferred / total if total > 0 else 0,
"generated_at": datetime.now().isoformat(),
}
@app.get("/health")
async def health():
return {"status": "ok"}Step 9: Tests
import pytest
from services.knowledge_base import KnowledgeBase
class TestKnowledgeBase:
def setup_method(self):
self.kb = KnowledgeBase()
self.kb.load_faqs()
def test_finds_hours_question(self):
result = self.kb.search("What time do you open?")
assert result["found"] is True
assert result["confidence"] > 0.6
assert "9 AM" in result["answer"] or "Monday" in result["answer"]
def test_finds_insurance_question(self):
result = self.kb.search("Do you take Blue Cross?")
assert result["found"] is True
assert "Blue Cross" in result["answer"]
def test_returns_not_found_for_irrelevant_query(self):
result = self.kb.search("What is the capital of France?")
# Should not match any FAQ with high confidence
assert result["found"] is False or result["confidence"] < 0.6
def test_finds_location(self):
result = self.kb.search("Where is your office?")
assert result["found"] is True
assert "Oak Avenue" in result["answer"] or "San Francisco" in result["answer"]
def test_finds_cancellation_policy(self):
result = self.kb.search("What if I need to cancel?")
assert result["found"] is True
assert "24 hours" in result["answer"]import os
import pytest
# Use a test database
os.environ.setdefault("DB_PATH", ":memory:")
from services.appointments import (
get_available_slots,
book_appointment,
cancel_appointment,
)
class TestAppointmentSlots:
def test_all_slots_available_on_empty_day(self):
slots = get_available_slots("2099-01-15")
assert len(slots) > 0
assert "09:00" in slots
assert "16:30" in slots
def test_slot_format(self):
slots = get_available_slots("2099-01-15")
for slot in slots:
hour, minute = slot.split(":")
assert 0 <= int(hour) <= 23
assert int(minute) in (0, 30)
class TestBookAppointment:
def test_book_and_confirm(self):
apt = book_appointment(
patient_name="John Doe",
phone="+14155551111",
date="2099-02-01",
time="10:00",
reason="Annual checkup",
)
assert apt.id is not None
assert apt.patient_name == "John Doe"
assert apt.date == "2099-02-01"
assert apt.time == "10:00"
def test_double_booking_raises(self):
book_appointment(
patient_name="Jane Smith",
phone="+14155552222",
date="2099-03-01",
time="11:00",
reason="Follow-up",
)
with pytest.raises(ValueError, match="not available"):
book_appointment(
patient_name="Bob Wilson",
phone="+14155553333",
date="2099-03-01",
time="11:00",
reason="Consultation",
)
def test_slot_removed_after_booking(self):
book_appointment(
patient_name="Alice Brown",
phone="+14155554444",
date="2099-04-01",
time="14:00",
reason="Lab results",
)
slots = get_available_slots("2099-04-01")
assert "14:00" not in slots
class TestCancelAppointment:
def test_cancel_existing(self):
apt = book_appointment(
patient_name="To Cancel",
phone="+14155555555",
date="2099-05-01",
time="09:00",
reason="Test",
)
assert cancel_appointment(apt.id) is True
def test_cancel_nonexistent(self):
assert cancel_appointment("nonexistent-id") is Falseimport pytest
from services.knowledge_base import KnowledgeBase
class TestKnowledgeBaseEdgeCases:
def setup_method(self):
self.kb = KnowledgeBase()
self.kb.load_faqs()
def test_empty_query_returns_low_confidence(self):
result = self.kb.search("")
assert result["confidence"] < 0.6 or not result["found"]
def test_returns_dict_format(self):
result = self.kb.search("office hours")
assert "answer" in result
assert "confidence" in result
assert "found" in result
assert isinstance(result["confidence"], float)
def test_confidence_between_0_and_1(self):
result = self.kb.search("Do you accept insurance?")
assert 0.0 <= result["confidence"] <= 1.0
def test_multiple_searches_consistent(self):
"""Same query should return the same result."""
r1 = self.kb.search("What are your hours?")
r2 = self.kb.search("What are your hours?")
assert r1["answer"] == r2["answer"]
assert abs(r1["confidence"] - r2["confidence"]) < 0.01Step 10: Docker Deployment
FROM python:3.11-slim AS builder
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
FROM python:3.11-slim
WORKDIR /app
COPY --from=builder /usr/local/lib/python3.11/site-packages /usr/local/lib/python3.11/site-packages
COPY --from=builder /usr/local/bin /usr/local/bin
COPY agents/ ./agents/
COPY services/ ./services/
COPY data/ ./data/
COPY server.py config.py models.py dashboard.py ./
# Create data directory for SQLite databases
RUN mkdir -p /app/data
EXPOSE 8000
CMD ["python", "server.py", "start"]services:
# LiveKit Server (SFU)
livekit-server:
image: livekit/livekit-server:latest
ports:
- "7880:7880" # WebSocket + HTTP
- "7881:7881" # WebRTC (TCP)
- "50000-50100:50000-50100/udp" # WebRTC (UDP)
environment:
- LIVEKIT_KEYS=devkey:secret
command: --dev --bind 0.0.0.0
# AI Receptionist Agent
receptionist-agent:
build: .
env_file: .env.local
depends_on:
- livekit-server
volumes:
- agent_data:/app/data
restart: unless-stopped
# Dashboard API
dashboard:
build: .
command: uvicorn dashboard:app --host 0.0.0.0 --port 8000
ports:
- "8000:8000"
volumes:
- agent_data:/app/data
restart: unless-stopped
volumes:
agent_data:Beginner Breakdown — Docker Compose Services:
Docker Compose Services
livekit-server (port 7880)
receptionist-agent
dashboard (port 8000)
| Docker Concept | What It Means |
|---|---|
--dev | LiveKit dev mode — auto-generates API keys, enables test features. Never use in production. |
50000-50100/udp | WebRTC media ports. Audio travels over UDP for lowest latency. The range allows up to 100 concurrent connections. |
volumes: agent_data | Shared volume between agent and dashboard. Both read/write the same SQLite database files. |
depends_on: livekit-server | Agent starts after LiveKit server. Without this, the agent would fail to connect. |
Running the Application
Start everything with Docker Compose:
docker-compose up -d livekit-server
docker-compose up receptionist-agent dashboardOr run locally for development:
# Terminal 1: Start LiveKit server
docker run --rm -p 7880:7880 -p 7881:7881 \
-p 50000-50100:50000-50100/udp \
-e LIVEKIT_KEYS=devkey:secret \
livekit/livekit-server --dev --bind 0.0.0.0
# Terminal 2: Start the agent
python server.py start
# Terminal 3: Start the dashboard
uvicorn dashboard:app --reload --port 8000Test with a real phone call (requires Twilio SIP trunk):
- Configure Twilio SIP trunk to point to your LiveKit server
- Call your Twilio phone number
- The agent should greet you and respond to questions
Test without a phone (LiveKit Playground):
# Open LiveKit's web playground to test via browser
# Visit: https://agents-playground.livekit.io
# Enter your LiveKit server URL and API credentials
# Click "Connect" — your agent will activate via WebRTC instead of SIPCheck the dashboard:
# View recent calls
curl http://localhost:8000/api/calls
# Check appointment availability
curl http://localhost:8000/api/availability/2026-04-15
# View call statistics
curl http://localhost:8000/api/statsRun the test suite:
pytest tests/ -vTelephony Configuration Guide
Setting up SIP trunking is the most infrastructure-heavy part of this project. Here is a complete walkthrough:
SIP Trunk Setup Flow
| Provider | Phone Number Cost | Per-Minute Cost | Notes |
|---|---|---|---|
| Twilio | ~$1/month | ~$0.008/min inbound | Most popular, excellent docs |
| Telnyx | ~$1/month | ~$0.005/min inbound | Lower cost, good quality |
| Vonage | ~$1/month | ~$0.007/min inbound | Global coverage |
Total cost per conversation minute (all inclusive):
| Component | Cost/min |
|---|---|
| SIP trunk (Twilio) | ~$0.008 |
| LiveKit Cloud (or $0 self-hosted) | ~$0.003 |
| STT (Deepgram Nova-3) | ~$0.005 |
| LLM (GPT-4.1-mini) | ~$0.01-0.03 |
| TTS (Cartesia Sonic-3) | ~$0.01-0.02 |
| Total | ~$0.04-0.07 |
Compare this to a human receptionist at ~$25/hour (or ~$0.42/minute). The AI receptionist is approximately 6-10x cheaper per minute while handling unlimited concurrent calls.
Debugging Tips
| Problem | Likely Cause | Fix |
|---|---|---|
| Agent doesn't start | Agent name mismatch | Verify agent_name="receptionist" in server.py matches the dispatch rule |
| No audio from caller | SIP trunk misconfigured | Check Twilio trunk Termination URI points to correct LiveKit address |
| Agent speaks but caller can't hear | Firewall blocking UDP | Open ports 50000-50100/udp for WebRTC media |
| High latency (>3s per turn) | LLM or TTS slow | Check which stage is slow — STT, LLM, or TTS. Try GPT-4.1-mini instead of GPT-4.1 |
| FAQ tool returns wrong answers | Low similarity threshold | Adjust CONFIDENCE_THRESHOLD in .env — higher means stricter matching |
| Transfer fails | SIP trunk ID wrong | Verify SIP_TRUNK_ID matches your outbound trunk (not inbound) |
| Agent talks over caller | Turn detection too aggressive | Adjust VAD sensitivity or try different turn detection model |
| Echo on phone calls | Wrong noise cancellation | Ensure BVCTelephony() is used for SIP participants, not BVC() |
| Agent keeps greeting after transfer | Agent still active in room | After transfer, consider having the agent leave the room or go silent |
Extensions
| Difficulty | Extension | Description |
|---|---|---|
| Easy | Appointment reminders | Send SMS via Twilio 24 hours before appointments |
| Easy | Call recording | Enable LiveKit Egress to record calls for quality review |
| Medium | Multi-language receptionist | Detect caller language and switch STT/TTS locale dynamically |
| Medium | DTMF menu fallback | Handle "Press 1 for appointments" for callers who prefer traditional IVR |
| Medium | CRM integration | Look up caller by phone number in a CRM to personalize greetings |
| Hard | AI sales outreach agent | Outbound SIP calls to leads with CRM integration and objection handling |
| Hard | Multilingual support hotline | Language detection + dynamic provider switching + language-matched human agents |
| Hard | Voicemail with transcription | Detect voicemail, leave a message, transcribe incoming voicemails |
Future Case Studies — The "AI sales outreach agent" and "Multilingual support hotline" extensions above are planned as full case studies in the AI Agents category, demonstrating production deployments of LiveKit voice agents in sales and international customer support.
Key Concepts Recap
| Concept | What It Is | Why It Matters |
|---|---|---|
| SIP Trunking | Bridge between internet and phone network (PSTN) | Lets your AI agent answer real phone calls, not just browser connections |
| LiveKit Room | Virtual space where participants exchange audio/video | Each call gets its own room — isolated, scalable, multi-participant |
| AgentSession | The STT → LLM → TTS pipeline | Handles the entire voice AI loop automatically — you write business logic, not plumbing |
| @function_tool | Decorator that exposes a method as an LLM tool | The LLM can call your Python functions mid-conversation to look up data or take actions |
| Warm Handoff | Adding a human to the same Room as the caller | No disconnection, no re-explanation — the human joins the existing conversation |
| VAD (Silero) | Neural network that detects speech vs. silence | Knows when the caller has finished talking so the AI doesn't interrupt |
| Turn Detection | Model that predicts conversational turn boundaries | More sophisticated than VAD alone — handles pauses, thinking, and filler words |
| Noise Cancellation | AI-powered audio filtering | BVCTelephony() removes PSTN noise, echo, and background voices from phone calls |
| SFU | Selective Forwarding Unit (LiveKit Server) | Routes audio between participants without mixing — scales to hundreds of concurrent calls |
| WebRTC | Real-time audio/video protocol with built-in NAT traversal | Handles the hard networking problems (firewalls, echo, jitter) that raw WebSockets cannot |
Resources
- LiveKit Agents Documentation
- LiveKit Telephony Guide
- LiveKit Agents Python Examples
- Twilio SIP Trunking Docs
- Deepgram Nova-3 API
- Cartesia Sonic TTS
- LiveKit Agents Playground
- Production Voice Agent (DIY prerequisite)
Beginner Glossary
| Term | Plain English |
|---|---|
| SIP | The signaling protocol that sets up phone calls over the internet. Like HTTP is for web pages, SIP is for voice calls. |
| PSTN | The traditional phone network — the physical infrastructure that carries calls from cell towers and landlines. |
| SIP Trunk | A service (Twilio, Telnyx) that gives you a phone number and bridges between the internet and the PSTN. |
| WebRTC | Browser technology for real-time audio/video with built-in echo cancellation, encryption, and firewall traversal. |
| SFU | A server that forwards audio streams between participants without mixing them. LiveKit Server is an SFU. |
| NAT Traversal | The process of establishing direct connections between devices behind routers/firewalls. WebRTC uses ICE/STUN/TURN for this. |
| VAD | Voice Activity Detection — a neural network that detects when someone is speaking vs. silence. |
| DTMF | The beep tones when you press phone buttons. Each button makes two tones at once (dual-tone). |
| IVR | The automated phone menus: "Press 1 for billing." This project replaces IVRs with conversational AI. |
| E.164 | International phone number format: +14155551234. The + and country code ensure global routing. |
| PCM | Raw audio as a list of numbers representing sound wave samples. The simplest audio format. |
| Krisp | AI noise cancellation technology. Filters background noise in real-time before your agent processes audio. |
| Warm Handoff | Transferring a caller to a human without disconnecting — the human joins the existing conversation. |
| Cold Transfer | Traditional transfer where the caller is disconnected and reconnected to someone new, losing context. |
| Participant | Anyone in a LiveKit Room — the caller, the AI agent, or a human agent. Each publishes and subscribes to audio tracks. |
| Room | A LiveKit virtual space where participants communicate. One room per phone call in this project. |
| Dispatch Rule | A LiveKit configuration that decides which agent to assign when a new call arrives. |
| Prewarm | Loading heavy resources (like the VAD model) once at startup instead of per-call, saving ~2 seconds per call. |