LLM Router & Orchestration Controller
Build a router that sends each request to the right model — or a human — to save cost and protect quality
LLM Router & Orchestration Controller
TL;DR
A router is a cheap Large Language Model (LLM) that reads each incoming request and decides who should answer it: a fast model, an expert model, or a human. This one decision saves money on easy questions and protects quality on hard or risky ones. This project teaches the idea from scratch, with four orchestration patterns and a clinic example.
| Difficulty | Intermediate |
| Time | ~3 hours |
| Code Size | ~150 LOC |
| Prerequisites | Tool Calling Agent |
The Problem: Not Every Question Needs a Genius
Imagine you run a clinic's AI assistant. Patients ask very different things:
- "What time does the clinic open?" — easy, factual.
- "Can I take ibuprofen with my warfarin?" — complex and clinical.
- "I feel unsafe and want to hurt myself." — sensitive; a human must step in.
If you send every question to your biggest, most expensive model, you waste money on "what time do you open?". If you send everything to the cheapest model, you get shallow — or unsafe — answers to the questions that matter most.
A router fixes this. It sits in front of your models, reads each message, and decides where it should go.
What Is a Router?
A router is just an LLM with a small, focused job. It does not answer the question. It reads the message and returns a single decision — for example fast, expert, or human — and your code sends the request to the right place.
Think of the router like the front desk at a clinic. The receptionist does not treat patients. They quickly decide: "this is a quick admin question," "this needs a doctor," or "this is an emergency." Routing is that same triage, done by a cheap model.
Diagram 1 — The Big Picture
LLM Router Overview
Request
Router (cheap LLM)
Destinations
The Three Routes
The router returns one word. Your code maps that word to a destination.
| Route | Goes to | When to use | Clinic example |
|---|---|---|---|
fast | A small, cheap model | Simple, factual, low-risk | "What are your opening hours?" |
expert | A large, capable model | Needs multi-step or clinical reasoning | "Explain my lab results" |
human | A real person | Sensitive, legal, or emotional | "I feel unsafe" |
Pick models by tier, not brand: a fast tier model is cheap and quick; an expert tier model costs more but reasons better. Most providers offer both. The router logic stays the same whichever models you choose.
Try It: See the Router Decide
Try it — simulated router
Type a patient question. This demo uses simple keyword rules (not a real model call) to show how a router would label it.
This demo uses simple keyword rules so it runs instantly in your browser. A real router uses an LLM call (shown below), which understands meaning far better than keyword matching.
Diagram 2 — Inside the Router
The router runs a few checks in order. The first check that says "yes" wins — safety always comes before cost.
Router Decision Flow
Four Orchestration Patterns
Routing is the entry point to orchestration — coordinating several models or agents to answer one request. These four patterns are the most useful to know.
Orchestration Patterns
Simple router
One router picks a model. That model answers. Easy to build and surprisingly powerful — start here.
Pipeline router
Router → model A → model B → answer. Each step passes its result forward. Good for "first extract, then explain."
Parallel router
Send the same question to 2–3 models at once, then a judge picks the best answer. Higher quality, higher cost.
Fallback router
Try the cheap model first. If its confidence is low, retry with the expert model. Budget-smart escalation.
This is the same idea behind Adaptive RAG (routing by query complexity) and the Production SLM System (routing between small and large models with fallback).
Why It Matters: The Cost
Routing's biggest payoff is cost. Most real traffic is easy questions — and easy questions do not need your expensive model. Drag the slider below to see the difference.
Cost savings calculator
Drag to set how many questions are simple. Watch how routing cuts the bill.
Assumes 1,000 questions/day · expert $0.015/q · fast $0.001/q
No router (all expert)
$15.00
per day
With router
$5.20
per day
You save $9.80/day — about $3,577/year
How to Build It
The router is a prompt that returns a small JSON decision. Here is the core in Python.
import json
# 1. The router prompt — its only job is to classify, not to answer
ROUTER_PROMPT = """You are a triage router for a clinic assistant.
Read the user's message and reply with JSON only:
{"route": "fast" | "expert" | "human", "reason": "<short reason>"}
Rules:
fast -> simple facts: hours, address, booking, greetings
expert -> clinical or multi-step reasoning: symptoms, medications, results
human -> anything sensitive or risky: self-harm, emergencies, legal, complaints
"""
# 2. Route with a cheap, fast-tier model (the router itself must be cheap)
def route(message: str) -> dict:
response = llm.chat(
model="fast-tier-model",
system=ROUTER_PROMPT,
user=message,
)
return json.loads(response) # {"route": "expert", "reason": "..."}
# 3. Dispatch to the chosen destination
def handle(message: str) -> str:
decision = route(message)
if decision["route"] == "fast":
return fast_model.answer(message)
elif decision["route"] == "expert":
return expert_model.answer(message)
else:
return escalate_to_human(message)Why return JSON instead of a single word? A bare word is easy to mis-parse. JSON with a reason field also gives you a built-in audit log — you can review why the router made each call and tune the rules over time.
A Note on Safety
The keyword demo above is fine for learning, but never ship keyword matching as your safety net. Real safety routing needs careful prompts, a tested set of examples, and human review for edge cases. Build it properly with the techniques in Agent Security & Safe Deployment.
What You'll Build
A small clinic question router:
- Write the router prompt and test it on ~20 sample questions.
- Build a dispatcher that calls the right destination for each route.
- Log every decision (route + reason) so you can review and improve accuracy over time.
Key Concepts Recap
| Concept | What it is | Why it matters |
|---|---|---|
| Router | A cheap LLM that classifies each request | One small decision saves cost and protects quality |
| Route tiers | fast / expert / human destinations | Match effort to the question's difficulty and risk |
| Safety first | Risk checks run before cost checks | Sensitive cases reach a human, not a cheap model |
| Orchestration patterns | Simple, pipeline, parallel, fallback | Different ways to combine models for one answer |
| JSON decisions | Router returns {route, reason} | Reliable parsing plus a built-in audit log |
Next Steps
- Tool Calling Agent and ReAct Agent — build the destinations your router dispatches to.
- Agent Security & Safe Deployment — make the
human/ safety route production-grade. - Adaptive RAG — the same routing idea applied to retrieval.
- Production SLM System — routing and fallback between small and large models at scale.