Build a router that sends each request to the right model — or a human — to save cost and protect quality

LLM Router & Orchestration Controller

TL;DR

A router is a cheap Large Language Model (LLM) that reads each incoming request and decides who should answer it: a fast model, an expert model, or a human. This one decision saves money on easy questions and protects quality on hard or risky ones. This project teaches the idea from scratch, with four orchestration patterns and a clinic example.


Difficulty	Intermediate
Time	~3 hours
Code Size	~150 LOC
Prerequisites	Tool Calling Agent

The Problem: Not Every Question Needs a Genius

Imagine you run a clinic's AI assistant. Patients ask very different things:

"What time does the clinic open?" — easy, factual.
"Can I take ibuprofen with my warfarin?" — complex and clinical.
"I feel unsafe and want to hurt myself." — sensitive; a human must step in.

If you send every question to your biggest, most expensive model, you waste money on "what time do you open?". If you send everything to the cheapest model, you get shallow — or unsafe — answers to the questions that matter most.

A router fixes this. It sits in front of your models, reads each message, and decides where it should go.

What Is a Router?

A router is just an LLM with a small, focused job. It does not answer the question. It reads the message and returns a single decision — for example fast, expert, or human — and your code sends the request to the right place.

Think of the router like the front desk at a clinic. The receptionist does not treat patients. They quickly decide: "this is a quick admin question," "this needs a doctor," or "this is an emergency." Routing is that same triage, done by a cheap model.

Diagram 1 — The Big Picture

LLM Router Overview

Request

Patient question

Router (cheap LLM)

Reads the message

Returns one decision

Destinations

Fast model — easy

Expert model — complex

Human agent — risky

The Three Routes

The router returns one word. Your code maps that word to a destination.

Route	Goes to	When to use	Clinic example
`fast`	A small, cheap model	Simple, factual, low-risk	"What are your opening hours?"
`expert`	A large, capable model	Needs multi-step or clinical reasoning	"Explain my lab results"
`human`	A real person	Sensitive, legal, or emotional	"I feel unsafe"

Pick models by tier, not brand: a fast tier model is cheap and quick; an expert tier model costs more but reasons better. Most providers offer both. The router logic stays the same whichever models you choose.

Try It: See the Router Decide

Try it — simulated router

Type a patient question. This demo uses simple keyword rules (not a real model call) to show how a router would label it.

This demo uses simple keyword rules so it runs instantly in your browser. A real router uses an LLM call (shown below), which understands meaning far better than keyword matching.

Diagram 2 — Inside the Router

The router runs a few checks in order. The first check that says "yes" wins — safety always comes before cost.

Router Decision Flow

Message receivedRead the patient's question

Safety checkHarmful or urgent? → send to a human

Complexity checkNeeds reasoning? → send to the expert model

Default → fastOtherwise → cheap, quick model

Four Orchestration Patterns

Routing is the entry point to orchestration — coordinating several models or agents to answer one request. These four patterns are the most useful to know.

Orchestration Patterns

Simple router

One router picks a model. That model answers. Easy to build and surprisingly powerful — start here.

Pipeline router

Router → model A → model B → answer. Each step passes its result forward. Good for "first extract, then explain."

Parallel router

Send the same question to 2–3 models at once, then a judge picks the best answer. Higher quality, higher cost.

Fallback router

Try the cheap model first. If its confidence is low, retry with the expert model. Budget-smart escalation.

This is the same idea behind Adaptive RAG (routing by query complexity) and the Production SLM System (routing between small and large models with fallback).

Why It Matters: The Cost

Routing's biggest payoff is cost. Most real traffic is easy questions — and easy questions do not need your expensive model. Drag the slider below to see the difference.

Cost savings calculator

Drag to set how many questions are simple. Watch how routing cuts the bill.

Simple questions70%

Assumes 1,000 questions/day · expert $0.015/q · fast $0.001/q

No router (all expert)

$15.00

per day

With router

$5.20

per day

You save $9.80/day — about $3,577/year

How to Build It

The router is a prompt that returns a small JSON decision. Here is the core in Python.

import json

# 1. The router prompt — its only job is to classify, not to answer
ROUTER_PROMPT = """You are a triage router for a clinic assistant.
Read the user's message and reply with JSON only:
{"route": "fast" | "expert" | "human", "reason": "<short reason>"}

Rules:
  fast   -> simple facts: hours, address, booking, greetings
  expert -> clinical or multi-step reasoning: symptoms, medications, results
  human  -> anything sensitive or risky: self-harm, emergencies, legal, complaints
"""

# 2. Route with a cheap, fast-tier model (the router itself must be cheap)
def route(message: str) -> dict:
    response = llm.chat(
        model="fast-tier-model",
        system=ROUTER_PROMPT,
        user=message,
    )
    return json.loads(response)  # {"route": "expert", "reason": "..."}

# 3. Dispatch to the chosen destination
def handle(message: str) -> str:
    decision = route(message)
    if decision["route"] == "fast":
        return fast_model.answer(message)
    elif decision["route"] == "expert":
        return expert_model.answer(message)
    else:
        return escalate_to_human(message)

Why return JSON instead of a single word? A bare word is easy to mis-parse. JSON with a reason field also gives you a built-in audit log — you can review why the router made each call and tune the rules over time.

A Note on Safety

The keyword demo above is fine for learning, but never ship keyword matching as your safety net. Real safety routing needs careful prompts, a tested set of examples, and human review for edge cases. Build it properly with the techniques in Agent Security & Safe Deployment.

What You'll Build

A small clinic question router:

Write the router prompt and test it on ~20 sample questions.
Build a dispatcher that calls the right destination for each route.
Log every decision (route + reason) so you can review and improve accuracy over time.

Key Concepts Recap

Concept	What it is	Why it matters
Router	A cheap LLM that classifies each request	One small decision saves cost and protects quality
Route tiers	`fast` / `expert` / `human` destinations	Match effort to the question's difficulty and risk
Safety first	Risk checks run before cost checks	Sensitive cases reach a human, not a cheap model
Orchestration patterns	Simple, pipeline, parallel, fallback	Different ways to combine models for one answer
JSON decisions	Router returns `{route, reason}`	Reliable parsing plus a built-in audit log

Next Steps

Tool Calling Agent and ReAct Agent — build the destinations your router dispatches to.
Agent Security & Safe Deployment — make the human / safety route production-grade.
Adaptive RAG — the same routing idea applied to retrieval.
Production SLM System — routing and fallback between small and large models at scale.

LLM Router & Orchestration Controller

TL;DR


Difficulty	Intermediate
Time	~3 hours
Code Size	~150 LOC
Prerequisites	Tool Calling Agent

The Problem: Not Every Question Needs a Genius

Imagine you run a clinic's AI assistant. Patients ask very different things:

"What time does the clinic open?" — easy, factual.
"Can I take ibuprofen with my warfarin?" — complex and clinical.
"I feel unsafe and want to hurt myself." — sensitive; a human must step in.

A router fixes this. It sits in front of your models, reads each message, and decides where it should go.

What Is a Router?

Diagram 1 — The Big Picture

LLM Router Overview

Request

Patient question

Router (cheap LLM)

Reads the message

Returns one decision

Destinations

Fast model — easy

Expert model — complex

Human agent — risky

The Three Routes

The router returns one word. Your code maps that word to a destination.

Route	Goes to	When to use	Clinic example
`fast`	A small, cheap model	Simple, factual, low-risk	"What are your opening hours?"
`expert`	A large, capable model	Needs multi-step or clinical reasoning	"Explain my lab results"
`human`	A real person	Sensitive, legal, or emotional	"I feel unsafe"

Try It: See the Router Decide

Try it — simulated router

Type a patient question. This demo uses simple keyword rules (not a real model call) to show how a router would label it.

This demo uses simple keyword rules so it runs instantly in your browser. A real router uses an LLM call (shown below), which understands meaning far better than keyword matching.

Diagram 2 — Inside the Router

The router runs a few checks in order. The first check that says "yes" wins — safety always comes before cost.

Router Decision Flow

Message receivedRead the patient's question

Safety checkHarmful or urgent? → send to a human

Complexity checkNeeds reasoning? → send to the expert model

Default → fastOtherwise → cheap, quick model

Four Orchestration Patterns

Routing is the entry point to orchestration — coordinating several models or agents to answer one request. These four patterns are the most useful to know.

Orchestration Patterns

Simple router

One router picks a model. That model answers. Easy to build and surprisingly powerful — start here.

Pipeline router

Router → model A → model B → answer. Each step passes its result forward. Good for "first extract, then explain."

Parallel router

Send the same question to 2–3 models at once, then a judge picks the best answer. Higher quality, higher cost.

Fallback router

Try the cheap model first. If its confidence is low, retry with the expert model. Budget-smart escalation.

This is the same idea behind Adaptive RAG (routing by query complexity) and the Production SLM System (routing between small and large models with fallback).

Why It Matters: The Cost

Routing's biggest payoff is cost. Most real traffic is easy questions — and easy questions do not need your expensive model. Drag the slider below to see the difference.

Cost savings calculator

Drag to set how many questions are simple. Watch how routing cuts the bill.

Simple questions70%

Assumes 1,000 questions/day · expert $0.015/q · fast $0.001/q

No router (all expert)

$15.00

per day

With router

$5.20

per day

You save $9.80/day — about $3,577/year

How to Build It

The router is a prompt that returns a small JSON decision. Here is the core in Python.

import json

# 1. The router prompt — its only job is to classify, not to answer
ROUTER_PROMPT = """You are a triage router for a clinic assistant.
Read the user's message and reply with JSON only:
{"route": "fast" | "expert" | "human", "reason": "<short reason>"}

Rules:
  fast   -> simple facts: hours, address, booking, greetings
  expert -> clinical or multi-step reasoning: symptoms, medications, results
  human  -> anything sensitive or risky: self-harm, emergencies, legal, complaints
"""

# 2. Route with a cheap, fast-tier model (the router itself must be cheap)
def route(message: str) -> dict:
    response = llm.chat(
        model="fast-tier-model",
        system=ROUTER_PROMPT,
        user=message,
    )
    return json.loads(response)  # {"route": "expert", "reason": "..."}

# 3. Dispatch to the chosen destination
def handle(message: str) -> str:
    decision = route(message)
    if decision["route"] == "fast":
        return fast_model.answer(message)
    elif decision["route"] == "expert":
        return expert_model.answer(message)
    else:
        return escalate_to_human(message)

A Note on Safety

What You'll Build

A small clinic question router:

Write the router prompt and test it on ~20 sample questions.
Build a dispatcher that calls the right destination for each route.
Log every decision (route + reason) so you can review and improve accuracy over time.

Key Concepts Recap

Concept	What it is	Why it matters
Router	A cheap LLM that classifies each request	One small decision saves cost and protects quality
Route tiers	`fast` / `expert` / `human` destinations	Match effort to the question's difficulty and risk
Safety first	Risk checks run before cost checks	Sensitive cases reach a human, not a cheap model
Orchestration patterns	Simple, pipeline, parallel, fallback	Different ways to combine models for one answer
JSON decisions	Router returns `{route, reason}`	Reliable parsing plus a built-in audit log

Next Steps

Tool Calling Agent and ReAct Agent — build the destinations your router dispatches to.
Agent Security & Safe Deployment — make the human / safety route production-grade.
Adaptive RAG — the same routing idea applied to retrieval.
Production SLM System — routing and fallback between small and large models at scale.

LLM Router & Orchestration Controller

LLM Router & Orchestration Controller

The Problem: Not Every Question Needs a Genius

What Is a Router?

Diagram 1 — The Big Picture

The Three Routes

Try It: See the Router Decide

Diagram 2 — Inside the Router

Four Orchestration Patterns

Why It Matters: The Cost

How to Build It

A Note on Safety

What You'll Build

Key Concepts Recap

Next Steps

On this page

LLM Router & Orchestration Controller

LLM Router & Orchestration Controller

The Problem: Not Every Question Needs a Genius

What Is a Router?

Diagram 1 — The Big Picture

The Three Routes

Try It: See the Router Decide

Diagram 2 — Inside the Router

Four Orchestration Patterns

Why It Matters: The Cost

How to Build It

A Note on Safety

What You'll Build

Key Concepts Recap

Next Steps

On this page