Build a multi-agent clinical orchestrator — planner, parallel specialists, a reflection loop, and a human gate — and watch it run step by step

Live Clinical Orchestration Simulator

TL;DR

Routing picks one model. Orchestration coordinates many agents to solve one problem together. This project builds a clinical orchestrator that plans the work, runs specialists in parallel on a shared blackboard, uses a critic to catch what no single agent saw, and pauses for a human before acting — and you can watch a real run replay step by step in your browser.


Difficulty	Advanced
Time	~4–5 days
Code Size	~700 LOC
Prerequisites	Multi-Agent System, LLM Router

Why Routing Isn't Enough

In the LLM Router project, one model read a request and chose where to send it. That is perfect for triage, but it falls apart when a single question needs several kinds of expertise at once.

Think about reviewing an elderly patient's medications. You need a drug-interaction check, a kidney-dosing check, and a falls-risk check — and crucially, you need someone to look at all of those findings together, because the most dangerous problems hide between specialties, not inside one. No single model call does this well.

That is what orchestration is for: a coordinator runs many agents, shares their work, reviews it, and only then acts.

The Case

Our orchestrator reviews one realistic patient:

78-year-old woman. eGFR 38 (chronic kidney disease, stage 3). Recurrent falls. Taking 9 medications: ramipril, furosemide, ibuprofen (as needed), metformin, atorvastatin, amlodipine, omeprazole, zopiclone, amitriptyline.

The goal: produce a safe deprescribing plan — but never act on it without a clinician's sign-off.

Five Ideas That Make It Orchestration

These five concepts are the whole project. Each maps to a concrete LangGraph feature you will build below.

Concept	What it does	How we build it
Dynamic planner	Reads the case and decides which specialists are needed	A planner node
Parallel fan-out	Runs specialists at the same time, not one by one	`Send()` to many nodes
Blackboard	One shared memory all agents read from and write to	The graph `State` with a reducer
Reflection (critic)	Reviews all findings together; sends work back if something's wrong	A conditional edge that loops
Human-in-the-loop	Pauses for a person before anything is acted on	`interrupt()` + checkpointer

Clinical Orchestrator Architecture

Coordinate

Planner — decides the specialists

Parallel specialists (blackboard)

Pharmacology

Renal

Geriatrics

Reflect

Critic — reviews all findings, loops back if needed

Gate

Human approval — pauses before acting

Output

Final deprescribing plan + monitoring

Watch It Run

Press Play (or Step through it). Click any agent to read its full reasoning. Watch what happens at step 5 — that is the moment orchestration earns its keep.

Orchestration replay — Polypharmacy review

Step 1 / 7

Case: 78-year-old woman · eGFR 38 (CKD stage 3) · recurrent falls · 9 medications: ramipril, furosemide, ibuprofen (PRN), metformin, atorvastatin, amlodipine, omeprazole, zopiclone, amitriptyline.

Intake

The Planner reads the case: 78 y/o, 9 medications, CKD stage 3, recurrent falls.

QueuedRunningDoneSent back

Tip: click any agent to read its role and full reasoning.

This replay is deterministic — it steps through a real run that the LangGraph code below produced. Building the simulator into the page (instead of calling a live model) keeps it free, offline, and identical every time, which is what you want for teaching.

The "Triple Whammy" — Why the Critic Matters

At step 5 the Critic caught something none of the specialists flagged on their own: the patient is on an ACE inhibitor (ramipril) + a diuretic (furosemide) + an NSAID (ibuprofen) at the same time. This combination is so well known for causing acute kidney injury (AKI) that it has a name — the "triple whammy" — and it is especially dangerous in older patients with reduced kidney function, exactly like ours.

The reason a single agent missed it is instructive:

Pharmacology looked at drug–drug interactions, but in isolation the NSAID looks like a minor issue.
Renal confirmed the ACE inhibitor was fine on its own.
Geriatrics was focused on falls.

The risk only appears when you look at all three findings together — which is precisely the Critic's job. This is the core lesson: a reflection step that reviews the combined output catches cross-cutting errors that no individual specialist can.

Build It: Step by Step

We use LangGraph because it gives us the four features we need out of the box: shared state, parallel fan-out, conditional loops, and human-in-the-loop pauses.

1. The Blackboard (shared state)

Every agent reads from and writes to one shared state object. The findings list uses a reducer (operator.add) so parallel specialists can append to it without overwriting each other.

import operator
from typing import Annotated, Literal
from typing_extensions import TypedDict

class CaseState(TypedDict):
    case: str                                       # the patient case
    specialists: list[str]                          # planner fills this in
    findings: Annotated[list[dict], operator.add]   # blackboard — append-only
    critic_feedback: str
    revision_round: int
    plan: str
    approved: bool

The Annotated[..., operator.add] part is the key detail: without it, three specialists writing findings at the same time would clobber one another. With it, their results are merged (appended) into one list.

2. The Planner

The planner reads the case and decides which specialists are needed. Here it is dynamic — the LLM could choose different specialists for a different patient.

def planner(state: CaseState) -> dict:
    specialists = decide_specialists(state["case"])  # LLM → ["pharmacology", "renal", "geriatrics"]
    # Preserve the revision counter so the critic loop can't run forever.
    return {"specialists": specialists, "revision_round": state.get("revision_round", 0)}

3. Parallel Fan-Out

This is the heart of orchestration. Instead of a normal edge, we use a conditional edge that returns a list of Send() objects — one per specialist. LangGraph runs them in parallel.

from langgraph.types import Send

def assign_specialists(state: CaseState):
    # One parallel task per specialist (a "map" step).
    return [
        Send("specialist", {
            "case": state["case"],
            "role": role,
            "feedback": state.get("critic_feedback", ""),
        })
        for role in state["specialists"]
    ]

A single, reusable specialist node handles whichever role it is given:

def specialist(payload: dict) -> dict:
    role = payload["role"]
    finding = run_specialist(role, payload["case"], payload["feedback"])
    # Appends to the blackboard thanks to the operator.add reducer.
    return {"findings": [{"role": role, "finding": finding}]}

4. The Critic (reflection loop)

The critic reviews all findings together. If it spots a problem (like the triple whammy), it writes feedback and asks for a revision; otherwise it drafts the plan.

def critic(state: CaseState) -> dict:
    issues = review_for_conflicts(state["findings"])  # looks across ALL specialists
    if issues and state["revision_round"] < 1:        # allow one revision round
        return {
            "critic_feedback": issues,
            "revision_round": state["revision_round"] + 1,
        }
    return {"critic_feedback": "", "plan": draft_plan(state["findings"])}

A routing function decides whether to loop back or move on:

def after_critic(state: CaseState) -> Literal["planner", "human_gate"]:
    return "planner" if state["critic_feedback"] else "human_gate"

One detail to know: because findings is append-only (the operator.add reducer), a revision round adds new findings on top of the old ones rather than replacing them. That keeps the demo simple, but in production you would tag each finding with its round number — or clear findings before re-running — so the critic always compares like with like.

The revision_round counter is what stops this loop from running forever: the critic only asks for one revision (revision_round < 1), then drafts the plan.

5. The Human Gate

Before anything is acted on, execution pauses for a clinician. interrupt() saves the state and hands control back to your application.

from langgraph.types import interrupt, Command

def human_gate(state: CaseState) -> dict:
    decision = interrupt({
        "question": "Approve this deprescribing plan?",
        "plan": state["plan"],
    })
    return {"approved": bool(decision)}

6. Wire the Graph Together

from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import InMemorySaver

builder = StateGraph(CaseState)
builder.add_node("planner", planner)
builder.add_node("specialist", specialist)
builder.add_node("critic", critic)
builder.add_node("human_gate", human_gate)

builder.add_edge(START, "planner")
builder.add_conditional_edges("planner", assign_specialists, ["specialist"])
builder.add_edge("specialist", "critic")
builder.add_conditional_edges("critic", after_critic, ["planner", "human_gate"])
builder.add_edge("human_gate", END)

# A checkpointer is required for interrupt()/resume to work.
graph = builder.compile(checkpointer=InMemorySaver())

7. Run It (with the human pause)

config = {"configurable": {"thread_id": "patient-001"}}

# Runs planner → specialists (parallel) → critic (loops once) → pauses at human_gate.
result = graph.invoke({"case": CASE}, config)
print(result["__interrupt__"])   # the approval request + drafted plan

# Clinician approves → resume from exactly where it paused.
final = graph.invoke(Command(resume=True), config)
print(final["plan"], final["approved"])

Observability: The Replay Is Your Audit Log

The simulator above is really a trace of a run. In production you want the same thing: a record of every agent's input, output, and the order things happened. That trace is what lets you debug, explain a decision to a clinician, and improve the system over time. Tools like LangSmith capture it automatically; at minimum, log every node's (role, input, output) to your own store.

When to Use Orchestration — and When Not To

Orchestration is powerful, but it is not free. Every extra agent adds tokens, latency, and a new way for things to go wrong. The research is blunt: most multi-agent failures come from poor design, not weak models. So the rule is start simple and add structure only when a simpler design clearly can't cope.

Pick the simplest design that works

One LLM call

The task fits in a single prompt with no branching. Always try this first. Cheapest, fastest, easiest to debug.

A workflow (fixed path)

The steps are known in advance — e.g. classify, then route, then answer. Use a router or a fixed chain. Predictable and still easy to trace.

Multi-agent orchestration

The task genuinely needs several kinds of expertise at once, the sub-tasks aren't known up front, or one context window can't hold it all — like our polypharmacy review. Worth the overhead only here.

A quick gut-check before reaching for orchestration: would a human need more than one specialist for this? If not, a single agent is almost always the better engineering choice.

Failure Modes & How This Design Avoids Them

Multi-agent systems fail in predictable ways. A 2025 study — the Multi-Agent System Failure Taxonomy (MAST) — catalogued 14 failure modes across three buckets: system design, inter-agent misalignment, and task verification. The good news: the patterns in this project are direct countermeasures to the most common ones.

Common failure (MAST category)	What goes wrong	How this design prevents it
No / weak verification (task verification)	Nobody checks the combined result, so cross-cutting errors slip through	The Critic reviews all findings together — that's how it caught the triple whammy
Step repetition / infinite loops (system design)	Agents redo work and never finish	The `revision_round` cap allows exactly one revision, then forces a decision
Role / task ambiguity (inter-agent misalignment)	An agent drifts outside its job	Each specialist gets a typed `Send` payload with one clear role
Information loss between agents (inter-agent misalignment)	Context gets dropped on handoff	The blackboard (shared state) keeps every finding in one place
Acting without confirmation (system design)	The system takes an unsafe action on its own	The human-in-the-loop gate pauses before anything is acted on

Production hardening (not shown in the demo). A real run must survive a specialist that times out, errors, or returns garbage. Add a per-agent timeout, a small number of retries, and a partial-failure policy (proceed with the specialists that succeeded, and tell the Critic which ones are missing) so one slow agent can't stall the whole review.

A Note on Safety

This orchestrator proposes a plan — it never changes a medication on its own. The human gate is not optional decoration; it is the safety boundary. For real clinical use you also need rule-based red-flag checks, input sanitization, and audit logging. Build those properly with the techniques in Agent Security & Safe Deployment.

Key Concepts Recap

Concept	What it is	Why it matters
Orchestration	Coordinating many agents to solve one problem	Handles tasks that need several kinds of expertise at once
Blackboard	One shared state with a reducer	Parallel agents merge their work instead of overwriting it
Parallel fan-out	`Send()` to many nodes	Specialists run at the same time — faster than one by one
Reflection / critic	Reviews combined findings, can loop back	Catches cross-cutting errors no single agent sees
Human-in-the-loop	`interrupt()` pause for approval	A person signs off before anything is acted on

Next Steps

Orchestration Patterns — the big-picture map of all orchestration patterns and when to use each.
Tumor Board Simulator — multi-specialist debate (adversarial), a different orchestration shape.
Clinical Decision Support — a full case study with safety guardrails.
Adverse Event Surveillance — supervisor-worker orchestration over live patient data.
Agent Security & Safe Deployment — make the human gate and red-flag checks production-grade.

Live Clinical Orchestration Simulator

TL;DR


Difficulty	Advanced
Time	~4–5 days
Code Size	~700 LOC
Prerequisites	Multi-Agent System, LLM Router

Why Routing Isn't Enough

In the LLM Router project, one model read a request and chose where to send it. That is perfect for triage, but it falls apart when a single question needs several kinds of expertise at once.

That is what orchestration is for: a coordinator runs many agents, shares their work, reviews it, and only then acts.

The Case

Our orchestrator reviews one realistic patient:

78-year-old woman. eGFR 38 (chronic kidney disease, stage 3). Recurrent falls. Taking 9 medications: ramipril, furosemide, ibuprofen (as needed), metformin, atorvastatin, amlodipine, omeprazole, zopiclone, amitriptyline.

The goal: produce a safe deprescribing plan — but never act on it without a clinician's sign-off.

Five Ideas That Make It Orchestration

These five concepts are the whole project. Each maps to a concrete LangGraph feature you will build below.

Concept	What it does	How we build it
Dynamic planner	Reads the case and decides which specialists are needed	A planner node
Parallel fan-out	Runs specialists at the same time, not one by one	`Send()` to many nodes
Blackboard	One shared memory all agents read from and write to	The graph `State` with a reducer
Reflection (critic)	Reviews all findings together; sends work back if something's wrong	A conditional edge that loops
Human-in-the-loop	Pauses for a person before anything is acted on	`interrupt()` + checkpointer

Clinical Orchestrator Architecture

Coordinate

Planner — decides the specialists

Parallel specialists (blackboard)

Pharmacology

Renal

Geriatrics

Reflect

Critic — reviews all findings, loops back if needed

Gate

Human approval — pauses before acting

Output

Final deprescribing plan + monitoring

Watch It Run

Press Play (or Step through it). Click any agent to read its full reasoning. Watch what happens at step 5 — that is the moment orchestration earns its keep.

Orchestration replay — Polypharmacy review

Step 1 / 7

Case: 78-year-old woman · eGFR 38 (CKD stage 3) · recurrent falls · 9 medications: ramipril, furosemide, ibuprofen (PRN), metformin, atorvastatin, amlodipine, omeprazole, zopiclone, amitriptyline.

Intake

The Planner reads the case: 78 y/o, 9 medications, CKD stage 3, recurrent falls.

QueuedRunningDoneSent back

Tip: click any agent to read its role and full reasoning.

The "Triple Whammy" — Why the Critic Matters

The reason a single agent missed it is instructive:

Pharmacology looked at drug–drug interactions, but in isolation the NSAID looks like a minor issue.
Renal confirmed the ACE inhibitor was fine on its own.
Geriatrics was focused on falls.

Build It: Step by Step

We use LangGraph because it gives us the four features we need out of the box: shared state, parallel fan-out, conditional loops, and human-in-the-loop pauses.

1. The Blackboard (shared state)

Every agent reads from and writes to one shared state object. The findings list uses a reducer (operator.add) so parallel specialists can append to it without overwriting each other.

import operator
from typing import Annotated, Literal
from typing_extensions import TypedDict

class CaseState(TypedDict):
    case: str                                       # the patient case
    specialists: list[str]                          # planner fills this in
    findings: Annotated[list[dict], operator.add]   # blackboard — append-only
    critic_feedback: str
    revision_round: int
    plan: str
    approved: bool

2. The Planner

The planner reads the case and decides which specialists are needed. Here it is dynamic — the LLM could choose different specialists for a different patient.

def planner(state: CaseState) -> dict:
    specialists = decide_specialists(state["case"])  # LLM → ["pharmacology", "renal", "geriatrics"]
    # Preserve the revision counter so the critic loop can't run forever.
    return {"specialists": specialists, "revision_round": state.get("revision_round", 0)}

3. Parallel Fan-Out

This is the heart of orchestration. Instead of a normal edge, we use a conditional edge that returns a list of Send() objects — one per specialist. LangGraph runs them in parallel.

from langgraph.types import Send

def assign_specialists(state: CaseState):
    # One parallel task per specialist (a "map" step).
    return [
        Send("specialist", {
            "case": state["case"],
            "role": role,
            "feedback": state.get("critic_feedback", ""),
        })
        for role in state["specialists"]
    ]

A single, reusable specialist node handles whichever role it is given:

def specialist(payload: dict) -> dict:
    role = payload["role"]
    finding = run_specialist(role, payload["case"], payload["feedback"])
    # Appends to the blackboard thanks to the operator.add reducer.
    return {"findings": [{"role": role, "finding": finding}]}

4. The Critic (reflection loop)

The critic reviews all findings together. If it spots a problem (like the triple whammy), it writes feedback and asks for a revision; otherwise it drafts the plan.

def critic(state: CaseState) -> dict:
    issues = review_for_conflicts(state["findings"])  # looks across ALL specialists
    if issues and state["revision_round"] < 1:        # allow one revision round
        return {
            "critic_feedback": issues,
            "revision_round": state["revision_round"] + 1,
        }
    return {"critic_feedback": "", "plan": draft_plan(state["findings"])}

A routing function decides whether to loop back or move on:

def after_critic(state: CaseState) -> Literal["planner", "human_gate"]:
    return "planner" if state["critic_feedback"] else "human_gate"

The revision_round counter is what stops this loop from running forever: the critic only asks for one revision (revision_round < 1), then drafts the plan.

5. The Human Gate

Before anything is acted on, execution pauses for a clinician. interrupt() saves the state and hands control back to your application.

from langgraph.types import interrupt, Command

def human_gate(state: CaseState) -> dict:
    decision = interrupt({
        "question": "Approve this deprescribing plan?",
        "plan": state["plan"],
    })
    return {"approved": bool(decision)}

6. Wire the Graph Together

from langgraph.graph import StateGraph, START, END
from langgraph.checkpoint.memory import InMemorySaver

builder = StateGraph(CaseState)
builder.add_node("planner", planner)
builder.add_node("specialist", specialist)
builder.add_node("critic", critic)
builder.add_node("human_gate", human_gate)

builder.add_edge(START, "planner")
builder.add_conditional_edges("planner", assign_specialists, ["specialist"])
builder.add_edge("specialist", "critic")
builder.add_conditional_edges("critic", after_critic, ["planner", "human_gate"])
builder.add_edge("human_gate", END)

# A checkpointer is required for interrupt()/resume to work.
graph = builder.compile(checkpointer=InMemorySaver())

7. Run It (with the human pause)

config = {"configurable": {"thread_id": "patient-001"}}

# Runs planner → specialists (parallel) → critic (loops once) → pauses at human_gate.
result = graph.invoke({"case": CASE}, config)
print(result["__interrupt__"])   # the approval request + drafted plan

# Clinician approves → resume from exactly where it paused.
final = graph.invoke(Command(resume=True), config)
print(final["plan"], final["approved"])

Observability: The Replay Is Your Audit Log

When to Use Orchestration — and When Not To

Pick the simplest design that works

One LLM call

The task fits in a single prompt with no branching. Always try this first. Cheapest, fastest, easiest to debug.

A workflow (fixed path)

The steps are known in advance — e.g. classify, then route, then answer. Use a router or a fixed chain. Predictable and still easy to trace.

Multi-agent orchestration

A quick gut-check before reaching for orchestration: would a human need more than one specialist for this? If not, a single agent is almost always the better engineering choice.

Failure Modes & How This Design Avoids Them

Common failure (MAST category)	What goes wrong	How this design prevents it
No / weak verification (task verification)	Nobody checks the combined result, so cross-cutting errors slip through	The Critic reviews all findings together — that's how it caught the triple whammy
Step repetition / infinite loops (system design)	Agents redo work and never finish	The `revision_round` cap allows exactly one revision, then forces a decision
Role / task ambiguity (inter-agent misalignment)	An agent drifts outside its job	Each specialist gets a typed `Send` payload with one clear role
Information loss between agents (inter-agent misalignment)	Context gets dropped on handoff	The blackboard (shared state) keeps every finding in one place
Acting without confirmation (system design)	The system takes an unsafe action on its own	The human-in-the-loop gate pauses before anything is acted on

A Note on Safety

Key Concepts Recap

Concept	What it is	Why it matters
Orchestration	Coordinating many agents to solve one problem	Handles tasks that need several kinds of expertise at once
Blackboard	One shared state with a reducer	Parallel agents merge their work instead of overwriting it
Parallel fan-out	`Send()` to many nodes	Specialists run at the same time — faster than one by one
Reflection / critic	Reviews combined findings, can loop back	Catches cross-cutting errors no single agent sees
Human-in-the-loop	`interrupt()` pause for approval	A person signs off before anything is acted on

Next Steps

Orchestration Patterns — the big-picture map of all orchestration patterns and when to use each.
Tumor Board Simulator — multi-specialist debate (adversarial), a different orchestration shape.
Clinical Decision Support — a full case study with safety guardrails.
Adverse Event Surveillance — supervisor-worker orchestration over live patient data.
Agent Security & Safe Deployment — make the human gate and red-flag checks production-grade.

Live Clinical Orchestration Simulator

On this page

Live Clinical Orchestration Simulator

On this page