Differential Diagnosis Debate
Build an adversarial diagnostic agent where two physician personas argue for competing diagnoses, with an attending physician judging and synthesizing a final differential
Differential Diagnosis Debate
Build an adversarial multi-agent system where two AI physician personas debate competing diagnoses for the same patient presentation, forcing comprehensive consideration of diagnostic alternatives.
| Difficulty | Advanced |
| Time | 3-4 days |
| Code | ~900 lines |
| Pattern | Adversarial Debate (Medical Domain) |
TL;DR
Apply the adversarial debate pattern to medical diagnosis using two physician agents arguing for different diagnoses, structured clinical evidence (supporting/contradicting findings), attending physician as judge, and synthesized differential ranking diagnoses by probability. Reduces diagnostic errors by forcing consideration of alternative diagnoses.
Medical Disclaimer
This system is for educational purposes only. It is designed as a clinical decision support tool to assist licensed healthcare professionals in considering diagnostic alternatives. It does not provide medical diagnoses and must never replace clinical judgment. All outputs must be reviewed by qualified clinicians before any patient care decisions.
The Problem: Diagnostic Errors
┌─────────────────────────────────────────────────────────────────────┐
│ WHY DIAGNOSTIC ERRORS HAPPEN │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Diagnostic errors cause 40,000-80,000 deaths/year in the US │
│ │
│ Common cognitive biases: │
│ ┌─────────────────────────────────────────────────────────────┐ │
│ │ Anchoring │ Lock onto first diagnosis too early │ │
│ │ Availability │ Recall recent cases, miss rare ones │ │
│ │ Confirmation │ Seek evidence supporting initial dx │ │
│ │ Premature closure │ Stop considering alternatives too soon │ │
│ └─────────────────────────────────────────────────────────────┘ │
│ │
│ The adversarial debate FORCES consideration of alternatives │
│ by having a second physician argue for a DIFFERENT diagnosis │
│ │
│ Single physician: Adversarial debate: │
│ "Looks like MI" ───────► Physician A: "This is MI" │
│ (anchors, stops) Physician B: "This is PE" │
│ Attending: "Consider both, order..." │
│ │
└─────────────────────────────────────────────────────────────────────┘What You'll Build
A diagnostic debate agent that:
- Parses clinical presentations - Extracts symptoms, vitals, history into structured format
- Generates Diagnosis A - First physician argues for their leading diagnosis
- Generates Diagnosis B - Second physician argues for an alternative diagnosis
- Runs critique rounds - Each physician attacks the other's diagnostic reasoning
- Runs rebuttal rounds - Each defends their diagnosis, conceding valid points
- Attending judges - Evaluates diagnostic reasoning quality
- Synthesizes differential - Produces ranked differential with workup recommendations
Architecture
┌─────────────────────────────────────────────────────────────────────┐
│ DIFFERENTIAL DIAGNOSIS DEBATE ARCHITECTURE │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Input: "55yo M, acute chest pain, diaphoresis, normal ECG" │
│ │ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ CLINICAL PARSER │ Structure: symptoms, vitals, PMH, meds │
│ └────────┬─────────┘ │
│ │ │
│ ▼ │
│ ╔══════════════════════════════════════════════════════════════╗ │
│ ║ ROUND 1: Diagnostic Hypotheses ║ │
│ ║ ┌─────────────────┐ ┌─────────────────┐ ║ │
│ ║ │ PHYSICIAN A │ (parallel) │ PHYSICIAN B │ ║ │
│ ║ │ "This is ACS" │ │ "This is PE" │ ║ │
│ ║ │ + evidence │ │ + evidence │ ║ │
│ ║ └─────────────────┘ └─────────────────┘ ║ │
│ ╚══════════════════════════════════════════════════════════════╝ │
│ │ │
│ ▼ │
│ ╔══════════════════════════════════════════════════════════════╗ │
│ ║ ROUND 2: Diagnostic Critiques ║ │
│ ║ ┌─────────────────┐ ┌─────────────────┐ ║ │
│ ║ │ PHYSICIAN A │ (parallel) │ PHYSICIAN B │ ║ │
│ ║ │ "PE unlikely │ │ "ACS less │ ║ │
│ ║ │ because..." │ │ likely bc..." │ ║ │
│ ║ └─────────────────┘ └─────────────────┘ ║ │
│ ╚══════════════════════════════════════════════════════════════╝ │
│ │ │
│ ▼ │
│ ╔══════════════════════════════════════════════════════════════╗ │
│ ║ ROUND 3: Rebuttals & Concessions ║ │
│ ║ ┌─────────────────┐ ┌─────────────────┐ ║ │
│ ║ │ PHYSICIAN A │ (parallel) │ PHYSICIAN B │ ║ │
│ ║ │ defends ACS │ │ defends PE │ ║ │
│ ║ │ concedes: "PE │ │ concedes: "ACS │ ║ │
│ ║ │ should be r/o" │ │ is possible" │ ║ │
│ ║ └─────────────────┘ └─────────────────┘ ║ │
│ ╚══════════════════════════════════════════════════════════════╝ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ ATTENDING │ Evaluates clinical reasoning quality │
│ └────────┬─────────┘ │
│ │ │
│ ▼ │
│ ┌──────────────────┐ │
│ │ SYNTHESIZER │ Final differential with workup plan │
│ └──────────────────┘ │
│ │
│ Output: Ranked DDx with "can't miss" diagnoses and workup plan │
│ │
└─────────────────────────────────────────────────────────────────────┘Project Structure
debate-diagnosis/
├── src/
│ ├── __init__.py
│ ├── config.py
│ ├── models/
│ │ ├── __init__.py
│ │ ├── clinical.py # Clinical data models
│ │ ├── arguments.py # Diagnostic argument models
│ │ ├── scoring.py # Attending scoring models
│ │ └── state.py # DebateState for LangGraph
│ ├── agents/
│ │ ├── __init__.py
│ │ ├── parser.py # Clinical presentation parser
│ │ ├── physician_a.py # First physician agent
│ │ ├── physician_b.py # Second physician agent
│ │ ├── attending.py # Attending physician (judge)
│ │ └── synthesizer.py # Differential synthesizer
│ ├── workflow/
│ │ ├── __init__.py
│ │ └── debate.py # LangGraph debate workflow
│ └── api/
│ ├── __init__.py
│ └── main.py # FastAPI endpoints
├── tests/
├── docker-compose.yml
└── requirements.txtTech Stack
| Technology | Purpose |
|---|---|
| LangGraph | Round-based diagnostic debate workflow |
| OpenAI GPT-4o | Physician personas with medical reasoning |
| Pydantic | Clinical data and diagnostic argument models |
| FastAPI | API for submitting cases and retrieving differentials |
Implementation
Configuration
# src/config.py
from pydantic_settings import BaseSettings
from typing import List
class Settings(BaseSettings):
# LLM Settings
openai_api_key: str
openai_model: str = "gpt-4o"
temperature_physicians: float = 0.4 # Some creativity, but clinical accuracy
temperature_attending: float = 0.2 # Consistent evaluation
# Debate Settings
num_supporting_findings: int = 4
num_contradicting_findings: int = 2
num_critique_points: int = 3
# Scoring Weights
weight_clinical_reasoning: float = 0.35
weight_evidence_quality: float = 0.35
weight_differential_breadth: float = 0.15
weight_safety_awareness: float = 0.15 # "Can't miss" diagnoses
# Safety Settings
always_include_cant_miss: bool = True # Always flag life-threatening DDx
class Config:
env_file = ".env"
settings = Settings()Why These Weights:
┌─────────────────────────────────────────────────────────────────────┐
│ SCORING WEIGHTS FOR CLINICAL DIAGNOSIS │
├─────────────────────────────────────────────────────────────────────┤
│ │
│ Clinical Reasoning (35%) │
│ └── Does the pathophysiology make sense? │
│ Is the mechanism of disease explained? │
│ │
│ Evidence Quality (35%) │
│ └── Are findings specific to this diagnosis? │
│ How well do symptoms match classic presentation? │
│ │
│ Differential Breadth (15%) │
│ └── Did they consider alternatives? │
│ Are they aware of diagnostic mimics? │
│ │
│ Safety Awareness (15%) │
│ └── Did they mention "can't miss" diagnoses? │
│ Are life-threatening conditions addressed? │
│ │
│ Safety gets 15% because MISSING a dangerous diagnosis │
│ is worse than an incomplete differential. │
│ │
└─────────────────────────────────────────────────────────────────────┘Clinical Models
# src/models/clinical.py
from pydantic import BaseModel, Field
from typing import List, Optional, Dict
from enum import Enum
class Severity(str, Enum):
MILD = "mild"
MODERATE = "moderate"
SEVERE = "severe"
CRITICAL = "critical"
class Acuity(str, Enum):
EMERGENT = "emergent" # Immediate threat to life
URGENT = "urgent" # Needs attention within hours
SEMI_URGENT = "semi_urgent" # Within days
ROUTINE = "routine" # Scheduled care
class Symptom(BaseModel):
"""A clinical symptom."""
name: str
duration: Optional[str] = None
severity: Severity = Severity.MODERATE
onset: Optional[str] = None # sudden, gradual
character: Optional[str] = None # sharp, dull, burning
location: Optional[str] = None
radiation: Optional[str] = None
aggravating_factors: List[str] = Field(default_factory=list)
alleviating_factors: List[str] = Field(default_factory=list)
class VitalSigns(BaseModel):
"""Patient vital signs."""
blood_pressure: Optional[str] = None # "120/80"
heart_rate: Optional[int] = None
respiratory_rate: Optional[int] = None
temperature: Optional[float] = None # Celsius
oxygen_saturation: Optional[int] = None # Percentage
class ClinicalPresentation(BaseModel):
"""Structured clinical presentation."""
# Demographics
age: int
sex: str # M, F
# Chief complaint
chief_complaint: str
# History of present illness
symptoms: List[Symptom]
symptom_timeline: Optional[str] = None
# Vital signs
vitals: Optional[VitalSigns] = None
# Background
past_medical_history: List[str] = Field(default_factory=list)
medications: List[str] = Field(default_factory=list)
allergies: List[str] = Field(default_factory=list)
family_history: List[str] = Field(default_factory=list)
social_history: Optional[str] = None
# Physical exam findings (if available)
exam_findings: List[str] = Field(default_factory=list)
# Initial assessment
acuity: Acuity = Acuity.URGENT
class DiagnosticTest(BaseModel):
"""A recommended diagnostic test."""
test_name: str
rationale: str
urgency: str = "routine" # stat, urgent, routine
what_it_rules_out: List[str] = Field(default_factory=list)
what_it_confirms: List[str] = Field(default_factory=list)Diagnostic Argument Models
# src/models/arguments.py
from pydantic import BaseModel, Field
from typing import List, Optional
from enum import Enum
class PhysicianRole(str, Enum):
PHYSICIAN_A = "physician_a"
PHYSICIAN_B = "physician_b"
class DiagnosticHypothesis(BaseModel):
"""A diagnostic hypothesis with supporting evidence."""
diagnosis: str = Field(description="The proposed diagnosis")
icd10_code: Optional[str] = Field(None, description="ICD-10 code if known")
probability: str = Field(description="high, moderate, low")
# Evidence
supporting_findings: List[str] = Field(
description="Clinical findings that support this diagnosis"
)
contradicting_findings: List[str] = Field(
default_factory=list,
description="Findings that argue against this diagnosis"
)
# Reasoning
pathophysiology: str = Field(
description="How the disease mechanism explains the presentation"
)
classic_presentation_match: str = Field(
description="How well this matches the textbook presentation"
)
# Risk assessment
is_cant_miss: bool = Field(
default=False,
description="Is this a life-threatening diagnosis that must be ruled out?"
)
miss_consequences: Optional[str] = Field(
None,
description="What happens if this diagnosis is missed?"
)
confidence: float = Field(ge=0.0, le=1.0)
class DiagnosticArgument(BaseModel):
"""Complete diagnostic argument from one physician."""
role: PhysicianRole
primary_hypothesis: DiagnosticHypothesis
alternative_considerations: List[str] = Field(
default_factory=list,
description="Other diagnoses briefly considered"
)
recommended_workup: List[str] = Field(
description="Tests to confirm or rule out"
)
round_number: int = 1
class DiagnosticCritique(BaseModel):
"""Critique of opponent's diagnostic reasoning."""
target_diagnosis: str
weakness: str = Field(description="Flaw in the diagnostic reasoning")
missed_findings: List[str] = Field(
default_factory=list,
description="Clinical findings the opponent ignored"
)
alternative_explanation: str = Field(
description="How these findings better fit YOUR diagnosis"
)
severity: str = Field(description="minor, moderate, major")
class CritiqueSet(BaseModel):
"""Collection of critiques from one physician."""
role: PhysicianRole
critiques: List[DiagnosticCritique]
round_number: int = 2
class DiagnosticRebuttal(BaseModel):
"""Defense against a diagnostic critique."""
critique_addressed: str
defense: str
concession: Optional[str] = Field(
None,
description="What valid points the critic raised"
)
updated_confidence: float = Field(ge=0.0, le=1.0)
additional_workup_suggested: List[str] = Field(
default_factory=list,
description="Additional tests to address the critique"
)
class RebuttalSet(BaseModel):
"""Collection of rebuttals from one physician."""
role: PhysicianRole
rebuttals: List[DiagnosticRebuttal]
round_number: int = 3Scoring and Synthesis Models
# src/models/scoring.py
from pydantic import BaseModel, Field
from typing import List, Optional
from .arguments import PhysicianRole
class DiagnosticScore(BaseModel):
"""Score for a physician's diagnostic reasoning."""
role: PhysicianRole
clinical_reasoning_score: float = Field(ge=0.0, le=10.0)
evidence_quality_score: float = Field(ge=0.0, le=10.0)
differential_breadth_score: float = Field(ge=0.0, le=10.0)
safety_awareness_score: float = Field(ge=0.0, le=10.0)
total_score: float = Field(ge=0.0, le=100.0)
strengths: List[str]
weaknesses: List[str]
class AttendingVerdict(BaseModel):
"""Attending physician's evaluation."""
physician_a_score: DiagnosticScore
physician_b_score: DiagnosticScore
stronger_case: PhysicianRole
margin: str # narrow, moderate, decisive
teaching_points: List[str] = Field(
description="Key learning points from this case"
)
missed_diagnoses: List[str] = Field(
default_factory=list,
description="Important diagnoses neither physician considered"
)
class DifferentialItem(BaseModel):
"""A single item in the final differential."""
rank: int
diagnosis: str
icd10_code: Optional[str] = None
probability: str
key_supporting_findings: List[str]
key_contradicting_findings: List[str]
is_cant_miss: bool = False
recommended_tests: List[str]
class SynthesizedDifferential(BaseModel):
"""Final synthesized differential diagnosis."""
patient_summary: str
differential: List[DifferentialItem]
cant_miss_diagnoses: List[str] = Field(
description="Life-threatening diagnoses that must be ruled out"
)
immediate_workup: List[str] = Field(
description="Tests to order now"
)
admission_recommendation: str = Field(
description="Admit, observe, discharge with follow-up"
)
clinical_pearls: List[str] = Field(
description="Key teaching points from this case"
)
uncertainty_acknowledgment: str = Field(
description="What remains uncertain and how to address it"
)Debate State
# src/models/state.py
from typing import TypedDict, List, Optional, Annotated
from enum import Enum
import operator
from .clinical import ClinicalPresentation
from .arguments import DiagnosticArgument, CritiqueSet, RebuttalSet
from .scoring import AttendingVerdict, SynthesizedDifferential
class DiagnosticPhase(str, Enum):
PARSING = "parsing"
HYPOTHESIS = "hypothesis"
CRITIQUE = "critique"
REBUTTAL = "rebuttal"
ATTENDING_REVIEW = "attending_review"
SYNTHESIS = "synthesis"
COMPLETE = "complete"
class DiagnosticDebateState(TypedDict):
"""State for the diagnostic debate workflow."""
# Input
raw_presentation: str
# Parsed clinical data
clinical: Optional[ClinicalPresentation]
# Workflow tracking
phase: DiagnosticPhase
current_round: int
# Diagnostic arguments
physician_a_argument: Optional[DiagnosticArgument]
physician_b_argument: Optional[DiagnosticArgument]
# Critiques
physician_a_critiques: Optional[CritiqueSet]
physician_b_critiques: Optional[CritiqueSet]
# Rebuttals
physician_a_rebuttals: Optional[RebuttalSet]
physician_b_rebuttals: Optional[RebuttalSet]
# Evaluation
attending_verdict: Optional[AttendingVerdict]
# Output
final_differential: Optional[SynthesizedDifferential]
# Audit
reasoning_trace: Annotated[List[str], operator.add]Clinical Parser Agent
# src/agents/parser.py
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from ..models.clinical import ClinicalPresentation, Symptom, VitalSigns, Acuity
from ..config import settings
class ClinicalParser:
"""Parses raw clinical presentations into structured format."""
def __init__(self):
self.llm = ChatOpenAI(
model=settings.openai_model,
api_key=settings.openai_api_key,
temperature=0.1
).with_structured_output(ClinicalPresentation)
self.prompt = ChatPromptTemplate.from_messages([
("system", """You are an experienced emergency physician extracting
structured clinical data from a case presentation.
Extract:
1. Demographics (age, sex)
2. Chief complaint
3. Individual symptoms with OPQRST characteristics:
- Onset (sudden vs gradual)
- Provocation/Palliation (what makes it better/worse)
- Quality (sharp, dull, burning, pressure)
- Region/Radiation
- Severity
- Timing/Duration
4. Vital signs if mentioned
5. Past medical history, medications, allergies
6. Physical exam findings if mentioned
7. Acuity assessment:
- Emergent: immediate life threat (chest pain + diaphoresis, stroke symptoms)
- Urgent: needs attention within hours
- Semi-urgent: can wait days
- Routine: scheduled care
If information is not provided, leave it as null/empty.
Make reasonable clinical inferences but note them."""),
("human", "{presentation}")
])
async def parse(self, presentation: str) -> ClinicalPresentation:
"""Parse a raw clinical presentation."""
chain = self.prompt | self.llm
result = await chain.ainvoke({"presentation": presentation})
return resultPhysician A Agent (Primary Diagnosis)
# src/agents/physician_a.py
from typing import List
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from ..models.clinical import ClinicalPresentation
from ..models.arguments import (
DiagnosticArgument, DiagnosticHypothesis, PhysicianRole,
DiagnosticCritique, CritiqueSet, DiagnosticRebuttal, RebuttalSet
)
from ..config import settings
class PhysicianAAgent:
"""First physician - argues for primary/most likely diagnosis."""
def __init__(self):
self.llm = ChatOpenAI(
model=settings.openai_model,
api_key=settings.openai_api_key,
temperature=settings.temperature_physicians
)
self.role = PhysicianRole.PHYSICIAN_A
async def generate_hypothesis(
self,
clinical: ClinicalPresentation
) -> DiagnosticArgument:
"""Generate primary diagnostic hypothesis."""
llm = self.llm.with_structured_output(DiagnosticArgument)
prompt = ChatPromptTemplate.from_messages([
("system", """You are an experienced physician generating a diagnostic
hypothesis. You are PHYSICIAN A - argue for what you believe is the
MOST LIKELY diagnosis given this presentation.
Your role is to make the strongest case for your leading diagnosis.
Include:
1. The specific diagnosis with ICD-10 code if known
2. {num_supporting} supporting clinical findings with explanations
3. Acknowledge {num_contradicting} findings that don't fit perfectly
4. Pathophysiological reasoning - WHY does this disease cause these symptoms?
5. How well this matches the classic/textbook presentation
6. Whether this is a "can't miss" diagnosis (life-threatening if missed)
7. Recommended workup to confirm
Think like a clinician: What diagnosis would you bet on?
IMPORTANT: You will face critique from another physician arguing for a
DIFFERENT diagnosis. Make your strongest case."""),
("human", """Clinical Presentation:
Age: {age}yo {sex}
Chief Complaint: {chief_complaint}
Symptoms:
{symptoms}
Vitals: {vitals}
PMH: {pmh}
Medications: {meds}
Allergies: {allergies}
Exam Findings: {exam}
Acuity: {acuity}
Generate your primary diagnostic hypothesis.""")
])
symptoms_text = "\n".join([
f"- {s.name}: onset={s.onset}, duration={s.duration}, "
f"severity={s.severity.value}, character={s.character}"
for s in clinical.symptoms
])
vitals_text = "Not recorded"
if clinical.vitals:
vitals_parts = []
if clinical.vitals.blood_pressure:
vitals_parts.append(f"BP: {clinical.vitals.blood_pressure}")
if clinical.vitals.heart_rate:
vitals_parts.append(f"HR: {clinical.vitals.heart_rate}")
if clinical.vitals.respiratory_rate:
vitals_parts.append(f"RR: {clinical.vitals.respiratory_rate}")
if clinical.vitals.oxygen_saturation:
vitals_parts.append(f"SpO2: {clinical.vitals.oxygen_saturation}%")
if clinical.vitals.temperature:
vitals_parts.append(f"Temp: {clinical.vitals.temperature}°C")
vitals_text = ", ".join(vitals_parts) if vitals_parts else "Not recorded"
chain = prompt | llm
result = await chain.ainvoke({
"num_supporting": settings.num_supporting_findings,
"num_contradicting": settings.num_contradicting_findings,
"age": clinical.age,
"sex": clinical.sex,
"chief_complaint": clinical.chief_complaint,
"symptoms": symptoms_text or "See chief complaint",
"vitals": vitals_text,
"pmh": ", ".join(clinical.past_medical_history) or "None",
"meds": ", ".join(clinical.medications) or "None",
"allergies": ", ".join(clinical.allergies) or "NKDA",
"exam": ", ".join(clinical.exam_findings) or "Not documented",
"acuity": clinical.acuity.value
})
result.role = self.role
result.round_number = 1
return result
async def generate_critiques(
self,
clinical: ClinicalPresentation,
opponent_argument: DiagnosticArgument
) -> CritiqueSet:
"""Critique Physician B's diagnosis."""
llm = self.llm.with_structured_output(CritiqueSet)
prompt = ChatPromptTemplate.from_messages([
("system", """You are PHYSICIAN A critiquing PHYSICIAN B's diagnosis.
Physician B argued for: {opponent_dx}
Your diagnosis: {my_dx}
Generate {num_critiques} critiques attacking their diagnostic reasoning.
For each critique:
1. Identify a specific weakness in their reasoning
2. Point out clinical findings they ignored or misinterpreted
3. Explain how those findings better support YOUR diagnosis
4. Rate severity: minor, moderate, major
Focus on:
- Findings that DON'T fit their diagnosis
- Classic features of their diagnosis that are MISSING
- Risk factors that point to your diagnosis instead
- Pathophysiological inconsistencies"""),
("human", """Opponent's argument:
Diagnosis: {opponent_dx}
Supporting findings: {opponent_support}
Pathophysiology: {opponent_patho}
Your diagnosis: {my_dx}
Critique their diagnostic reasoning.""")
])
chain = prompt | llm
result = await chain.ainvoke({
"opponent_dx": opponent_argument.primary_hypothesis.diagnosis,
"opponent_support": ", ".join(opponent_argument.primary_hypothesis.supporting_findings),
"opponent_patho": opponent_argument.primary_hypothesis.pathophysiology,
"my_dx": "Your primary diagnosis", # Will be filled from state
"num_critiques": settings.num_critique_points
})
result.role = self.role
result.round_number = 2
return result
async def generate_rebuttals(
self,
clinical: ClinicalPresentation,
my_argument: DiagnosticArgument,
opponent_critiques: CritiqueSet
) -> RebuttalSet:
"""Defend against Physician B's critiques."""
llm = self.llm.with_structured_output(RebuttalSet)
prompt = ChatPromptTemplate.from_messages([
("system", """You are PHYSICIAN A defending your diagnosis against critiques.
Your diagnosis: {my_dx}
Physician B critiqued your reasoning:
{critiques}
Generate rebuttals for each critique.
For each rebuttal:
1. Defend your position where the critique is unfair
2. CONCEDE valid points - intellectual honesty is crucial in medicine
3. Update your confidence based on the critique
4. Suggest additional workup if the critique raised valid uncertainty
IMPORTANT: Good clinicians acknowledge uncertainty. If a critique
reveals a genuine diagnostic dilemma, acknowledge it and suggest
how to resolve it (specific tests, consults, observation)."""),
("human", "Defend your diagnosis and address the critiques.")
])
critiques_text = "\n".join([
f"- {c.weakness}\n Missed findings: {', '.join(c.missed_findings)}"
for c in opponent_critiques.critiques
])
chain = prompt | llm
result = await chain.ainvoke({
"my_dx": my_argument.primary_hypothesis.diagnosis,
"critiques": critiques_text
})
result.role = self.role
result.round_number = 3
return resultPhysician B Agent (Alternative Diagnosis)
# src/agents/physician_b.py
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from ..models.clinical import ClinicalPresentation
from ..models.arguments import (
DiagnosticArgument, PhysicianRole,
CritiqueSet, RebuttalSet
)
from ..config import settings
class PhysicianBAgent:
"""Second physician - argues for alternative/competing diagnosis."""
def __init__(self):
self.llm = ChatOpenAI(
model=settings.openai_model,
api_key=settings.openai_api_key,
temperature=settings.temperature_physicians
)
self.role = PhysicianRole.PHYSICIAN_B
async def generate_hypothesis(
self,
clinical: ClinicalPresentation
) -> DiagnosticArgument:
"""Generate alternative diagnostic hypothesis."""
llm = self.llm.with_structured_output(DiagnosticArgument)
prompt = ChatPromptTemplate.from_messages([
("system", """You are an experienced physician generating a diagnostic
hypothesis. You are PHYSICIAN B - your role is to argue for an
ALTERNATIVE diagnosis that another physician might miss.
Think about:
- What diagnosis would be easy to miss but dangerous?
- What "zebra" (rare but serious) fits this presentation?
- What common diagnosis might masquerade as something else?
- What would a specialist in a different field consider?
Include:
1. A DIFFERENT diagnosis than the obvious one
2. {num_supporting} supporting clinical findings
3. {num_contradicting} findings that don't fit perfectly
4. Pathophysiological reasoning
5. Whether this is a "can't miss" diagnosis
6. Recommended workup
IMPORTANT: Do NOT argue for the most obvious diagnosis.
Your role is to force consideration of alternatives.
Think: "What if it's NOT the obvious thing?"
Examples of alternative thinking:
- Chest pain: Instead of ACS, consider PE, aortic dissection, esophageal rupture
- Headache: Instead of migraine, consider SAH, meningitis, temporal arteritis
- Abdominal pain: Instead of appendicitis, consider ectopic pregnancy, AAA"""),
("human", """Clinical Presentation:
Age: {age}yo {sex}
Chief Complaint: {chief_complaint}
Symptoms:
{symptoms}
Vitals: {vitals}
PMH: {pmh}
Medications: {meds}
Acuity: {acuity}
Generate an ALTERNATIVE diagnostic hypothesis.""")
])
symptoms_text = "\n".join([
f"- {s.name}: onset={s.onset}, duration={s.duration}, "
f"severity={s.severity.value}, character={s.character}"
for s in clinical.symptoms
])
vitals_text = "Not recorded"
if clinical.vitals:
vitals_parts = []
if clinical.vitals.blood_pressure:
vitals_parts.append(f"BP: {clinical.vitals.blood_pressure}")
if clinical.vitals.heart_rate:
vitals_parts.append(f"HR: {clinical.vitals.heart_rate}")
if clinical.vitals.oxygen_saturation:
vitals_parts.append(f"SpO2: {clinical.vitals.oxygen_saturation}%")
vitals_text = ", ".join(vitals_parts) if vitals_parts else "Not recorded"
chain = prompt | llm
result = await chain.ainvoke({
"num_supporting": settings.num_supporting_findings,
"num_contradicting": settings.num_contradicting_findings,
"age": clinical.age,
"sex": clinical.sex,
"chief_complaint": clinical.chief_complaint,
"symptoms": symptoms_text or "See chief complaint",
"vitals": vitals_text,
"pmh": ", ".join(clinical.past_medical_history) or "None",
"meds": ", ".join(clinical.medications) or "None",
"acuity": clinical.acuity.value
})
result.role = self.role
result.round_number = 1
return result
async def generate_critiques(
self,
clinical: ClinicalPresentation,
opponent_argument: DiagnosticArgument
) -> CritiqueSet:
"""Critique Physician A's diagnosis."""
llm = self.llm.with_structured_output(CritiqueSet)
prompt = ChatPromptTemplate.from_messages([
("system", """You are PHYSICIAN B critiquing PHYSICIAN A's diagnosis.
Physician A went with the obvious diagnosis: {opponent_dx}
Your alternative diagnosis: {my_dx}
Generate {num_critiques} critiques showing why the obvious
diagnosis might be WRONG or incomplete.
Focus on:
- Red flags they might have dismissed
- Atypical features that don't fit their diagnosis
- Why jumping to the obvious diagnosis is dangerous
- Cases where the "obvious" diagnosis was wrong"""),
("human", """Opponent's argument:
Diagnosis: {opponent_dx}
Supporting findings: {opponent_support}
Pathophysiology: {opponent_patho}
Critique their diagnostic reasoning.""")
])
chain = prompt | llm
result = await chain.ainvoke({
"opponent_dx": opponent_argument.primary_hypothesis.diagnosis,
"opponent_support": ", ".join(opponent_argument.primary_hypothesis.supporting_findings),
"opponent_patho": opponent_argument.primary_hypothesis.pathophysiology,
"my_dx": "Alternative diagnosis",
"num_critiques": settings.num_critique_points
})
result.role = self.role
result.round_number = 2
return result
async def generate_rebuttals(
self,
clinical: ClinicalPresentation,
my_argument: DiagnosticArgument,
opponent_critiques: CritiqueSet
) -> RebuttalSet:
"""Defend against Physician A's critiques."""
llm = self.llm.with_structured_output(RebuttalSet)
prompt = ChatPromptTemplate.from_messages([
("system", """You are PHYSICIAN B defending your alternative diagnosis.
Your diagnosis: {my_dx}
Physician A critiqued your reasoning:
{critiques}
Generate rebuttals for each critique.
Remember: Your role is to ensure the alternative diagnosis
isn't dismissed too quickly. Even if less likely, dangerous
diagnoses need to be ruled out.
Concede where appropriate, but emphasize:
- The cost of missing your diagnosis (if it's serious)
- Simple tests that could rule it out
- Why it's worth considering even if less probable"""),
("human", "Defend your alternative diagnosis.")
])
critiques_text = "\n".join([
f"- {c.weakness}\n Missed findings: {', '.join(c.missed_findings)}"
for c in opponent_critiques.critiques
])
chain = prompt | llm
result = await chain.ainvoke({
"my_dx": my_argument.primary_hypothesis.diagnosis,
"critiques": critiques_text
})
result.role = self.role
result.round_number = 3
return resultAttending Physician (Judge)
# src/agents/attending.py
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from ..models.clinical import ClinicalPresentation
from ..models.arguments import DiagnosticArgument, CritiqueSet, RebuttalSet
from ..models.scoring import AttendingVerdict, DiagnosticScore
from ..config import settings
class AttendingPhysician:
"""Senior physician who evaluates the diagnostic debate."""
def __init__(self):
self.llm = ChatOpenAI(
model=settings.openai_model,
api_key=settings.openai_api_key,
temperature=settings.temperature_attending
).with_structured_output(AttendingVerdict)
self.prompt = ChatPromptTemplate.from_messages([
("system", """You are a senior attending physician evaluating a
diagnostic debate between two residents/fellows.
Evaluate each physician on:
1. CLINICAL REASONING ({weight_reasoning}%)
- Is the pathophysiology sound?
- Does the mechanism explain the symptoms?
- Are they thinking systematically?
2. EVIDENCE QUALITY ({weight_evidence}%)
- Are findings specific to their diagnosis?
- Did they acknowledge contradicting findings?
- Is the evidence strong enough to support confidence?
3. DIFFERENTIAL BREADTH ({weight_breadth}%)
- Did they consider alternatives?
- Are they aware of diagnostic mimics?
4. SAFETY AWARENESS ({weight_safety}%)
- Did they mention can't-miss diagnoses?
- Would their approach catch life-threatening conditions?
Also identify:
- Teaching points from this case
- Diagnoses NEITHER physician considered
- Who made the stronger case overall
Score each physician 0-10 on each criterion."""),
("human", """CASE:
{case_summary}
=== PHYSICIAN A (Primary Diagnosis) ===
Diagnosis: {dx_a}
Supporting: {support_a}
Pathophysiology: {patho_a}
Critiques of B: {crit_a}
Rebuttals: {reb_a}
=== PHYSICIAN B (Alternative Diagnosis) ===
Diagnosis: {dx_b}
Supporting: {support_b}
Pathophysiology: {patho_b}
Critiques of A: {crit_b}
Rebuttals: {reb_b}
Evaluate this diagnostic debate.""")
])
async def evaluate(
self,
clinical: ClinicalPresentation,
arg_a: DiagnosticArgument,
arg_b: DiagnosticArgument,
crit_a: CritiqueSet,
crit_b: CritiqueSet,
reb_a: RebuttalSet,
reb_b: RebuttalSet
) -> AttendingVerdict:
"""Evaluate the diagnostic debate."""
case_summary = (
f"{clinical.age}yo {clinical.sex} with {clinical.chief_complaint}. "
f"PMH: {', '.join(clinical.past_medical_history) or 'None'}. "
f"Acuity: {clinical.acuity.value}."
)
chain = self.prompt | self.llm
result = await chain.ainvoke({
"weight_reasoning": int(settings.weight_clinical_reasoning * 100),
"weight_evidence": int(settings.weight_evidence_quality * 100),
"weight_breadth": int(settings.weight_differential_breadth * 100),
"weight_safety": int(settings.weight_safety_awareness * 100),
"case_summary": case_summary,
"dx_a": arg_a.primary_hypothesis.diagnosis,
"support_a": ", ".join(arg_a.primary_hypothesis.supporting_findings),
"patho_a": arg_a.primary_hypothesis.pathophysiology,
"crit_a": "\n".join([c.weakness for c in crit_a.critiques]),
"reb_a": "\n".join([r.defense for r in reb_a.rebuttals]),
"dx_b": arg_b.primary_hypothesis.diagnosis,
"support_b": ", ".join(arg_b.primary_hypothesis.supporting_findings),
"patho_b": arg_b.primary_hypothesis.pathophysiology,
"crit_b": "\n".join([c.weakness for c in crit_b.critiques]),
"reb_b": "\n".join([r.defense for r in reb_b.rebuttals])
})
return resultDifferential Synthesizer
# src/agents/synthesizer.py
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from ..models.clinical import ClinicalPresentation
from ..models.arguments import DiagnosticArgument
from ..models.scoring import AttendingVerdict, SynthesizedDifferential
from ..config import settings
class DifferentialSynthesizer:
"""Produces final synthesized differential from the debate."""
def __init__(self):
self.llm = ChatOpenAI(
model=settings.openai_model,
api_key=settings.openai_api_key,
temperature=0.2
).with_structured_output(SynthesizedDifferential)
self.prompt = ChatPromptTemplate.from_messages([
("system", """You are synthesizing a final differential diagnosis
from a diagnostic debate.
Produce a ranked differential that:
1. Lists diagnoses by probability (high, moderate, low)
2. Highlights ALL "can't miss" diagnoses regardless of probability
3. Notes key supporting and contradicting findings for each
4. Recommends specific workup for each diagnosis
5. Provides admission/disposition recommendation
6. Acknowledges remaining uncertainty
The differential should reflect BOTH physicians' arguments,
not just the winner. Even if one diagnosis is more likely,
the alternative may need to be ruled out.
Format as a teaching case with clinical pearls."""),
("human", """Case: {case_summary}
Debate outcome: {outcome}
Physician A argued: {dx_a}
Physician B argued: {dx_b}
Attending noted:
- Teaching points: {teaching}
- Missed diagnoses: {missed}
Synthesize the final differential and workup plan.""")
])
async def synthesize(
self,
clinical: ClinicalPresentation,
arg_a: DiagnosticArgument,
arg_b: DiagnosticArgument,
verdict: AttendingVerdict
) -> SynthesizedDifferential:
"""Produce final differential diagnosis."""
case_summary = (
f"{clinical.age}yo {clinical.sex} presenting with "
f"{clinical.chief_complaint}. Acuity: {clinical.acuity.value}."
)
outcome = (
f"{verdict.stronger_case.value} made stronger case "
f"({verdict.margin} margin)"
)
chain = self.prompt | self.llm
result = await chain.ainvoke({
"case_summary": case_summary,
"outcome": outcome,
"dx_a": f"{arg_a.primary_hypothesis.diagnosis} "
f"(confidence: {arg_a.primary_hypothesis.confidence})",
"dx_b": f"{arg_b.primary_hypothesis.diagnosis} "
f"(confidence: {arg_b.primary_hypothesis.confidence})",
"teaching": ", ".join(verdict.teaching_points),
"missed": ", ".join(verdict.missed_diagnoses) or "None noted"
})
return resultLangGraph Workflow
# src/workflow/debate.py
from langgraph.graph import StateGraph, END
from ..models.state import DiagnosticDebateState, DiagnosticPhase
from ..agents.parser import ClinicalParser
from ..agents.physician_a import PhysicianAAgent
from ..agents.physician_b import PhysicianBAgent
from ..agents.attending import AttendingPhysician
from ..agents.synthesizer import DifferentialSynthesizer
# Initialize agents
parser = ClinicalParser()
physician_a = PhysicianAAgent()
physician_b = PhysicianBAgent()
attending = AttendingPhysician()
synthesizer = DifferentialSynthesizer()
async def parse_clinical_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
"""Parse the clinical presentation."""
clinical = await parser.parse(state["raw_presentation"])
return {
**state,
"clinical": clinical,
"phase": DiagnosticPhase.HYPOTHESIS,
"current_round": 1,
"reasoning_trace": [f"Parsed: {clinical.age}yo {clinical.sex}, {clinical.chief_complaint}"]
}
async def physician_a_hypothesis_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
"""Physician A generates primary hypothesis."""
argument = await physician_a.generate_hypothesis(state["clinical"])
return {
"physician_a_argument": argument,
"reasoning_trace": [f"Physician A: {argument.primary_hypothesis.diagnosis}"]
}
async def physician_b_hypothesis_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
"""Physician B generates alternative hypothesis."""
argument = await physician_b.generate_hypothesis(state["clinical"])
return {
"physician_b_argument": argument,
"reasoning_trace": [f"Physician B: {argument.primary_hypothesis.diagnosis}"]
}
async def advance_to_critique_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
"""Advance to critique phase."""
return {
**state,
"phase": DiagnosticPhase.CRITIQUE,
"current_round": 2,
"reasoning_trace": ["Advancing to critique round"]
}
async def physician_a_critique_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
"""Physician A critiques Physician B."""
critiques = await physician_a.generate_critiques(
state["clinical"],
state["physician_b_argument"]
)
return {
"physician_a_critiques": critiques,
"reasoning_trace": [f"Physician A critiqued B with {len(critiques.critiques)} points"]
}
async def physician_b_critique_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
"""Physician B critiques Physician A."""
critiques = await physician_b.generate_critiques(
state["clinical"],
state["physician_a_argument"]
)
return {
"physician_b_critiques": critiques,
"reasoning_trace": [f"Physician B critiqued A with {len(critiques.critiques)} points"]
}
async def advance_to_rebuttal_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
"""Advance to rebuttal phase."""
return {
**state,
"phase": DiagnosticPhase.REBUTTAL,
"current_round": 3,
"reasoning_trace": ["Advancing to rebuttal round"]
}
async def physician_a_rebuttal_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
"""Physician A rebuts B's critiques."""
rebuttals = await physician_a.generate_rebuttals(
state["clinical"],
state["physician_a_argument"],
state["physician_b_critiques"]
)
return {
"physician_a_rebuttals": rebuttals,
"reasoning_trace": ["Physician A defended diagnosis"]
}
async def physician_b_rebuttal_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
"""Physician B rebuts A's critiques."""
rebuttals = await physician_b.generate_rebuttals(
state["clinical"],
state["physician_b_argument"],
state["physician_a_critiques"]
)
return {
"physician_b_rebuttals": rebuttals,
"reasoning_trace": ["Physician B defended diagnosis"]
}
async def attending_review_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
"""Attending evaluates the debate."""
verdict = await attending.evaluate(
state["clinical"],
state["physician_a_argument"],
state["physician_b_argument"],
state["physician_a_critiques"],
state["physician_b_critiques"],
state["physician_a_rebuttals"],
state["physician_b_rebuttals"]
)
return {
**state,
"attending_verdict": verdict,
"phase": DiagnosticPhase.SYNTHESIS,
"reasoning_trace": [f"Attending: {verdict.stronger_case.value} stronger ({verdict.margin})"]
}
async def synthesize_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
"""Synthesize final differential."""
differential = await synthesizer.synthesize(
state["clinical"],
state["physician_a_argument"],
state["physician_b_argument"],
state["attending_verdict"]
)
return {
**state,
"final_differential": differential,
"phase": DiagnosticPhase.COMPLETE,
"reasoning_trace": [f"Final DDx: {len(differential.differential)} diagnoses"]
}
def create_diagnostic_debate_workflow() -> StateGraph:
"""Create the diagnostic debate workflow."""
workflow = StateGraph(DiagnosticDebateState)
# Add nodes
workflow.add_node("parse_clinical", parse_clinical_node)
workflow.add_node("physician_a_hypothesis", physician_a_hypothesis_node)
workflow.add_node("physician_b_hypothesis", physician_b_hypothesis_node)
workflow.add_node("advance_critique", advance_to_critique_node)
workflow.add_node("physician_a_critique", physician_a_critique_node)
workflow.add_node("physician_b_critique", physician_b_critique_node)
workflow.add_node("advance_rebuttal", advance_to_rebuttal_node)
workflow.add_node("physician_a_rebuttal", physician_a_rebuttal_node)
workflow.add_node("physician_b_rebuttal", physician_b_rebuttal_node)
workflow.add_node("attending_review", attending_review_node)
workflow.add_node("synthesize", synthesize_node)
# Entry point
workflow.set_entry_point("parse_clinical")
# Round 1: Hypotheses (parallel)
workflow.add_edge("parse_clinical", "physician_a_hypothesis")
workflow.add_edge("parse_clinical", "physician_b_hypothesis")
workflow.add_edge("physician_a_hypothesis", "advance_critique")
workflow.add_edge("physician_b_hypothesis", "advance_critique")
# Round 2: Critiques (parallel)
workflow.add_edge("advance_critique", "physician_a_critique")
workflow.add_edge("advance_critique", "physician_b_critique")
workflow.add_edge("physician_a_critique", "advance_rebuttal")
workflow.add_edge("physician_b_critique", "advance_rebuttal")
# Round 3: Rebuttals (parallel)
workflow.add_edge("advance_rebuttal", "physician_a_rebuttal")
workflow.add_edge("advance_rebuttal", "physician_b_rebuttal")
workflow.add_edge("physician_a_rebuttal", "attending_review")
workflow.add_edge("physician_b_rebuttal", "attending_review")
# Evaluation and synthesis
workflow.add_edge("attending_review", "synthesize")
workflow.add_edge("synthesize", END)
return workflow.compile()
diagnostic_debate_agent = create_diagnostic_debate_workflow()FastAPI Application
# src/api/main.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List, Optional
from ..workflow.debate import diagnostic_debate_agent, DiagnosticDebateState
from ..models.state import DiagnosticPhase
app = FastAPI(
title="Differential Diagnosis Debate",
description="Adversarial diagnostic reasoning for clinical decision support",
version="1.0.0"
)
class DiagnosticRequest(BaseModel):
presentation: str
class DifferentialItem(BaseModel):
rank: int
diagnosis: str
probability: str
supporting_findings: List[str]
contradicting_findings: List[str]
is_cant_miss: bool
recommended_tests: List[str]
class DiagnosticResponse(BaseModel):
patient_summary: str
physician_a_diagnosis: str
physician_b_diagnosis: str
winner: str
margin: str
differential: List[DifferentialItem]
cant_miss_diagnoses: List[str]
immediate_workup: List[str]
admission_recommendation: str
clinical_pearls: List[str]
teaching_points: List[str]
reasoning_trace: List[str]
@app.post("/diagnose", response_model=DiagnosticResponse)
async def run_diagnostic_debate(request: DiagnosticRequest):
"""Run a diagnostic debate on a clinical presentation."""
initial_state: DiagnosticDebateState = {
"raw_presentation": request.presentation,
"clinical": None,
"phase": DiagnosticPhase.PARSING,
"current_round": 0,
"physician_a_argument": None,
"physician_b_argument": None,
"physician_a_critiques": None,
"physician_b_critiques": None,
"physician_a_rebuttals": None,
"physician_b_rebuttals": None,
"attending_verdict": None,
"final_differential": None,
"reasoning_trace": []
}
try:
result = await diagnostic_debate_agent.ainvoke(initial_state)
if not result.get("final_differential"):
raise HTTPException(
status_code=500,
detail="Diagnostic debate did not produce a differential"
)
diff = result["final_differential"]
verdict = result["attending_verdict"]
arg_a = result["physician_a_argument"]
arg_b = result["physician_b_argument"]
return DiagnosticResponse(
patient_summary=diff.patient_summary,
physician_a_diagnosis=arg_a.primary_hypothesis.diagnosis,
physician_b_diagnosis=arg_b.primary_hypothesis.diagnosis,
winner=verdict.stronger_case.value,
margin=verdict.margin,
differential=[
DifferentialItem(
rank=item.rank,
diagnosis=item.diagnosis,
probability=item.probability,
supporting_findings=item.key_supporting_findings,
contradicting_findings=item.key_contradicting_findings,
is_cant_miss=item.is_cant_miss,
recommended_tests=item.recommended_tests
)
for item in diff.differential
],
cant_miss_diagnoses=diff.cant_miss_diagnoses,
immediate_workup=diff.immediate_workup,
admission_recommendation=diff.admission_recommendation,
clinical_pearls=diff.clinical_pearls,
teaching_points=verdict.teaching_points,
reasoning_trace=result.get("reasoning_trace", [])
)
except Exception as e:
raise HTTPException(
status_code=500,
detail=f"Diagnostic debate failed: {str(e)}"
)
@app.get("/health")
async def health():
return {"status": "healthy", "service": "diagnostic-debate"}Example Usage
curl -X POST http://localhost:8000/diagnose \
-H "Content-Type: application/json" \
-d '{
"presentation": "55 year old male with sudden onset chest pain radiating to the left arm, diaphoresis, and nausea. History of hypertension and diabetes. Vitals: BP 160/100, HR 95, SpO2 97%. ECG shows no acute ST changes."
}'Key Learnings
-
Alternative diagnoses prevent anchoring - Requiring a second physician to argue for a DIFFERENT diagnosis forces consideration of alternatives that might be missed.
-
Can't-miss framing improves safety - Explicitly asking whether a diagnosis is "can't miss" (life-threatening if missed) ensures dangerous conditions are considered even if less likely.
-
Concessions build trust - When physicians acknowledge valid points from the other side, the final differential is more calibrated and trustworthy.
-
Teaching points emerge naturally - The attending's evaluation produces learning points that wouldn't surface from a single-perspective analysis.
Key Concepts Recap
| Concept | What It Is | Why It Matters |
|---|---|---|
| Adversarial Diagnosis | Two physicians argue for different diagnoses | Prevents anchoring and premature closure |
| Can't-Miss Diagnoses | Life-threatening conditions to rule out | Safety-first even if low probability |
| OPQRST | Symptom characterization framework | Structured clinical data extraction |
| Diagnostic Scoring | Reasoning, evidence, breadth, safety | Multi-dimensional evaluation |
| Teaching Points | Learning insights from the case | Educational value from every case |
Next Steps
Continue with:
- Drug Interaction Arbitrator - Debate pattern for pharmacy
- Tumor Board Simulator - Multi-expert debate (4+ specialists)
Bull vs Bear Market Analyst
Build an adversarial debate agent where Bull and Bear personas argue opposing investment theses, with a Judge scoring arguments and a Synthesizer providing balanced risk-aware recommendations
Drug Interaction Arbitrator
Build an adversarial debate agent where one pharmacist argues an interaction is clinically significant while another argues it's manageable, helping reduce alert fatigue