Build an adversarial diagnostic agent where two physician personas argue for competing diagnoses, with an attending physician judging and synthesizing a final differential

Differential Diagnosis Debate

Build an adversarial multi-agent system where two AI physician personas debate competing diagnoses for the same patient presentation, forcing comprehensive consideration of diagnostic alternatives.


Difficulty	Advanced
Time	3-4 days
Code	~900 lines
Pattern	Adversarial Debate (Medical Domain)

TL;DR

Apply the adversarial debate pattern to medical diagnosis using two physician agents arguing for different diagnoses, structured clinical evidence (supporting/contradicting findings), attending physician as judge, and synthesized differential ranking diagnoses by probability. Reduces diagnostic errors by forcing consideration of alternative diagnoses.

Medical Disclaimer

This system is for educational purposes only. It is designed as a clinical decision support tool to assist licensed healthcare professionals in considering diagnostic alternatives. It does not provide medical diagnoses and must never replace clinical judgment. All outputs must be reviewed by qualified clinicians before any patient care decisions.

The Problem: Diagnostic Errors

┌─────────────────────────────────────────────────────────────────────┐
│ WHY DIAGNOSTIC ERRORS HAPPEN                                         │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Diagnostic errors cause 40,000-80,000 deaths/year in the US        │
│                                                                     │
│  Common cognitive biases:                                           │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │ Anchoring          │ Lock onto first diagnosis too early    │   │
│  │ Availability       │ Recall recent cases, miss rare ones    │   │
│  │ Confirmation       │ Seek evidence supporting initial dx    │   │
│  │ Premature closure  │ Stop considering alternatives too soon │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                                                                     │
│  The adversarial debate FORCES consideration of alternatives        │
│  by having a second physician argue for a DIFFERENT diagnosis       │
│                                                                     │
│  Single physician:           Adversarial debate:                    │
│  "Looks like MI" ───────►    Physician A: "This is MI"              │
│  (anchors, stops)            Physician B: "This is PE"              │
│                              Attending: "Consider both, order..."   │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

What You'll Build

A diagnostic debate agent that:

Parses clinical presentations - Extracts symptoms, vitals, history into structured format
Generates Diagnosis A - First physician argues for their leading diagnosis
Generates Diagnosis B - Second physician argues for an alternative diagnosis
Runs critique rounds - Each physician attacks the other's diagnostic reasoning
Runs rebuttal rounds - Each defends their diagnosis, conceding valid points
Attending judges - Evaluates diagnostic reasoning quality
Synthesizes differential - Produces ranked differential with workup recommendations

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│              DIFFERENTIAL DIAGNOSIS DEBATE ARCHITECTURE              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Input: "55yo M, acute chest pain, diaphoresis, normal ECG"         │
│       │                                                             │
│       ▼                                                             │
│  ┌──────────────────┐                                               │
│  │ CLINICAL PARSER  │  Structure: symptoms, vitals, PMH, meds       │
│  └────────┬─────────┘                                               │
│           │                                                         │
│           ▼                                                         │
│  ╔══════════════════════════════════════════════════════════════╗  │
│  ║ ROUND 1: Diagnostic Hypotheses                                ║  │
│  ║  ┌─────────────────┐              ┌─────────────────┐        ║  │
│  ║  │  PHYSICIAN A    │  (parallel)  │  PHYSICIAN B    │        ║  │
│  ║  │  "This is ACS"  │              │  "This is PE"   │        ║  │
│  ║  │  + evidence     │              │  + evidence     │        ║  │
│  ║  └─────────────────┘              └─────────────────┘        ║  │
│  ╚══════════════════════════════════════════════════════════════╝  │
│           │                                                         │
│           ▼                                                         │
│  ╔══════════════════════════════════════════════════════════════╗  │
│  ║ ROUND 2: Diagnostic Critiques                                 ║  │
│  ║  ┌─────────────────┐              ┌─────────────────┐        ║  │
│  ║  │  PHYSICIAN A    │  (parallel)  │  PHYSICIAN B    │        ║  │
│  ║  │  "PE unlikely   │              │  "ACS less      │        ║  │
│  ║  │   because..."   │              │   likely bc..." │        ║  │
│  ║  └─────────────────┘              └─────────────────┘        ║  │
│  ╚══════════════════════════════════════════════════════════════╝  │
│           │                                                         │
│           ▼                                                         │
│  ╔══════════════════════════════════════════════════════════════╗  │
│  ║ ROUND 3: Rebuttals & Concessions                              ║  │
│  ║  ┌─────────────────┐              ┌─────────────────┐        ║  │
│  ║  │  PHYSICIAN A    │  (parallel)  │  PHYSICIAN B    │        ║  │
│  ║  │  defends ACS    │              │  defends PE     │        ║  │
│  ║  │  concedes: "PE  │              │  concedes: "ACS │        ║  │
│  ║  │  should be r/o" │              │  is possible"   │        ║  │
│  ║  └─────────────────┘              └─────────────────┘        ║  │
│  ╚══════════════════════════════════════════════════════════════╝  │
│           │                                                         │
│           ▼                                                         │
│  ┌──────────────────┐                                               │
│  │    ATTENDING     │  Evaluates clinical reasoning quality         │
│  └────────┬─────────┘                                               │
│           │                                                         │
│           ▼                                                         │
│  ┌──────────────────┐                                               │
│  │   SYNTHESIZER    │  Final differential with workup plan          │
│  └──────────────────┘                                               │
│                                                                     │
│  Output: Ranked DDx with "can't miss" diagnoses and workup plan     │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Project Structure

debate-diagnosis/
├── src/
│   ├── __init__.py
│   ├── config.py
│   ├── models/
│   │   ├── __init__.py
│   │   ├── clinical.py         # Clinical data models
│   │   ├── arguments.py        # Diagnostic argument models
│   │   ├── scoring.py          # Attending scoring models
│   │   └── state.py            # DebateState for LangGraph
│   ├── agents/
│   │   ├── __init__.py
│   │   ├── parser.py           # Clinical presentation parser
│   │   ├── physician_a.py      # First physician agent
│   │   ├── physician_b.py      # Second physician agent
│   │   ├── attending.py        # Attending physician (judge)
│   │   └── synthesizer.py      # Differential synthesizer
│   ├── workflow/
│   │   ├── __init__.py
│   │   └── debate.py           # LangGraph debate workflow
│   └── api/
│       ├── __init__.py
│       └── main.py             # FastAPI endpoints
├── tests/
├── docker-compose.yml
└── requirements.txt

Tech Stack

Technology	Purpose
LangGraph	Round-based diagnostic debate workflow
OpenAI GPT-4o	Physician personas with medical reasoning
Pydantic	Clinical data and diagnostic argument models
FastAPI	API for submitting cases and retrieving differentials

Implementation

Configuration

# src/config.py
from pydantic_settings import BaseSettings
from typing import List

class Settings(BaseSettings):
    # LLM Settings
    openai_api_key: str
    openai_model: str = "gpt-4o"
    temperature_physicians: float = 0.4  # Some creativity, but clinical accuracy
    temperature_attending: float = 0.2   # Consistent evaluation

    # Debate Settings
    num_supporting_findings: int = 4
    num_contradicting_findings: int = 2
    num_critique_points: int = 3

    # Scoring Weights
    weight_clinical_reasoning: float = 0.35
    weight_evidence_quality: float = 0.35
    weight_differential_breadth: float = 0.15
    weight_safety_awareness: float = 0.15  # "Can't miss" diagnoses

    # Safety Settings
    always_include_cant_miss: bool = True  # Always flag life-threatening DDx

    class Config:
        env_file = ".env"

settings = Settings()

Why These Weights:

┌─────────────────────────────────────────────────────────────────────┐
│ SCORING WEIGHTS FOR CLINICAL DIAGNOSIS                               │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Clinical Reasoning (35%)                                           │
│  └── Does the pathophysiology make sense?                           │
│      Is the mechanism of disease explained?                         │
│                                                                     │
│  Evidence Quality (35%)                                             │
│  └── Are findings specific to this diagnosis?                       │
│      How well do symptoms match classic presentation?               │
│                                                                     │
│  Differential Breadth (15%)                                         │
│  └── Did they consider alternatives?                                │
│      Are they aware of diagnostic mimics?                           │
│                                                                     │
│  Safety Awareness (15%)                                             │
│  └── Did they mention "can't miss" diagnoses?                       │
│      Are life-threatening conditions addressed?                      │
│                                                                     │
│  Safety gets 15% because MISSING a dangerous diagnosis              │
│  is worse than an incomplete differential.                          │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Clinical Models

# src/models/clinical.py
from pydantic import BaseModel, Field
from typing import List, Optional, Dict
from enum import Enum

class Severity(str, Enum):
    MILD = "mild"
    MODERATE = "moderate"
    SEVERE = "severe"
    CRITICAL = "critical"

class Acuity(str, Enum):
    EMERGENT = "emergent"      # Immediate threat to life
    URGENT = "urgent"          # Needs attention within hours
    SEMI_URGENT = "semi_urgent"  # Within days
    ROUTINE = "routine"        # Scheduled care

class Symptom(BaseModel):
    """A clinical symptom."""
    name: str
    duration: Optional[str] = None
    severity: Severity = Severity.MODERATE
    onset: Optional[str] = None  # sudden, gradual
    character: Optional[str] = None  # sharp, dull, burning
    location: Optional[str] = None
    radiation: Optional[str] = None
    aggravating_factors: List[str] = Field(default_factory=list)
    alleviating_factors: List[str] = Field(default_factory=list)

class VitalSigns(BaseModel):
    """Patient vital signs."""
    blood_pressure: Optional[str] = None  # "120/80"
    heart_rate: Optional[int] = None
    respiratory_rate: Optional[int] = None
    temperature: Optional[float] = None  # Celsius
    oxygen_saturation: Optional[int] = None  # Percentage

class ClinicalPresentation(BaseModel):
    """Structured clinical presentation."""
    # Demographics
    age: int
    sex: str  # M, F

    # Chief complaint
    chief_complaint: str

    # History of present illness
    symptoms: List[Symptom]
    symptom_timeline: Optional[str] = None

    # Vital signs
    vitals: Optional[VitalSigns] = None

    # Background
    past_medical_history: List[str] = Field(default_factory=list)
    medications: List[str] = Field(default_factory=list)
    allergies: List[str] = Field(default_factory=list)
    family_history: List[str] = Field(default_factory=list)
    social_history: Optional[str] = None

    # Physical exam findings (if available)
    exam_findings: List[str] = Field(default_factory=list)

    # Initial assessment
    acuity: Acuity = Acuity.URGENT

class DiagnosticTest(BaseModel):
    """A recommended diagnostic test."""
    test_name: str
    rationale: str
    urgency: str = "routine"  # stat, urgent, routine
    what_it_rules_out: List[str] = Field(default_factory=list)
    what_it_confirms: List[str] = Field(default_factory=list)

Diagnostic Argument Models

# src/models/arguments.py
from pydantic import BaseModel, Field
from typing import List, Optional
from enum import Enum

class PhysicianRole(str, Enum):
    PHYSICIAN_A = "physician_a"
    PHYSICIAN_B = "physician_b"

class DiagnosticHypothesis(BaseModel):
    """A diagnostic hypothesis with supporting evidence."""
    diagnosis: str = Field(description="The proposed diagnosis")
    icd10_code: Optional[str] = Field(None, description="ICD-10 code if known")
    probability: str = Field(description="high, moderate, low")

    # Evidence
    supporting_findings: List[str] = Field(
        description="Clinical findings that support this diagnosis"
    )
    contradicting_findings: List[str] = Field(
        default_factory=list,
        description="Findings that argue against this diagnosis"
    )

    # Reasoning
    pathophysiology: str = Field(
        description="How the disease mechanism explains the presentation"
    )
    classic_presentation_match: str = Field(
        description="How well this matches the textbook presentation"
    )

    # Risk assessment
    is_cant_miss: bool = Field(
        default=False,
        description="Is this a life-threatening diagnosis that must be ruled out?"
    )
    miss_consequences: Optional[str] = Field(
        None,
        description="What happens if this diagnosis is missed?"
    )

    confidence: float = Field(ge=0.0, le=1.0)

class DiagnosticArgument(BaseModel):
    """Complete diagnostic argument from one physician."""
    role: PhysicianRole
    primary_hypothesis: DiagnosticHypothesis
    alternative_considerations: List[str] = Field(
        default_factory=list,
        description="Other diagnoses briefly considered"
    )
    recommended_workup: List[str] = Field(
        description="Tests to confirm or rule out"
    )
    round_number: int = 1

class DiagnosticCritique(BaseModel):
    """Critique of opponent's diagnostic reasoning."""
    target_diagnosis: str
    weakness: str = Field(description="Flaw in the diagnostic reasoning")
    missed_findings: List[str] = Field(
        default_factory=list,
        description="Clinical findings the opponent ignored"
    )
    alternative_explanation: str = Field(
        description="How these findings better fit YOUR diagnosis"
    )
    severity: str = Field(description="minor, moderate, major")

class CritiqueSet(BaseModel):
    """Collection of critiques from one physician."""
    role: PhysicianRole
    critiques: List[DiagnosticCritique]
    round_number: int = 2

class DiagnosticRebuttal(BaseModel):
    """Defense against a diagnostic critique."""
    critique_addressed: str
    defense: str
    concession: Optional[str] = Field(
        None,
        description="What valid points the critic raised"
    )
    updated_confidence: float = Field(ge=0.0, le=1.0)
    additional_workup_suggested: List[str] = Field(
        default_factory=list,
        description="Additional tests to address the critique"
    )

class RebuttalSet(BaseModel):
    """Collection of rebuttals from one physician."""
    role: PhysicianRole
    rebuttals: List[DiagnosticRebuttal]
    round_number: int = 3

Scoring and Synthesis Models

# src/models/scoring.py
from pydantic import BaseModel, Field
from typing import List, Optional
from .arguments import PhysicianRole

class DiagnosticScore(BaseModel):
    """Score for a physician's diagnostic reasoning."""
    role: PhysicianRole
    clinical_reasoning_score: float = Field(ge=0.0, le=10.0)
    evidence_quality_score: float = Field(ge=0.0, le=10.0)
    differential_breadth_score: float = Field(ge=0.0, le=10.0)
    safety_awareness_score: float = Field(ge=0.0, le=10.0)
    total_score: float = Field(ge=0.0, le=100.0)
    strengths: List[str]
    weaknesses: List[str]

class AttendingVerdict(BaseModel):
    """Attending physician's evaluation."""
    physician_a_score: DiagnosticScore
    physician_b_score: DiagnosticScore
    stronger_case: PhysicianRole
    margin: str  # narrow, moderate, decisive
    teaching_points: List[str] = Field(
        description="Key learning points from this case"
    )
    missed_diagnoses: List[str] = Field(
        default_factory=list,
        description="Important diagnoses neither physician considered"
    )

class DifferentialItem(BaseModel):
    """A single item in the final differential."""
    rank: int
    diagnosis: str
    icd10_code: Optional[str] = None
    probability: str
    key_supporting_findings: List[str]
    key_contradicting_findings: List[str]
    is_cant_miss: bool = False
    recommended_tests: List[str]

class SynthesizedDifferential(BaseModel):
    """Final synthesized differential diagnosis."""
    patient_summary: str
    differential: List[DifferentialItem]
    cant_miss_diagnoses: List[str] = Field(
        description="Life-threatening diagnoses that must be ruled out"
    )
    immediate_workup: List[str] = Field(
        description="Tests to order now"
    )
    admission_recommendation: str = Field(
        description="Admit, observe, discharge with follow-up"
    )
    clinical_pearls: List[str] = Field(
        description="Key teaching points from this case"
    )
    uncertainty_acknowledgment: str = Field(
        description="What remains uncertain and how to address it"
    )

Debate State

# src/models/state.py
from typing import TypedDict, List, Optional, Annotated
from enum import Enum
import operator
from .clinical import ClinicalPresentation
from .arguments import DiagnosticArgument, CritiqueSet, RebuttalSet
from .scoring import AttendingVerdict, SynthesizedDifferential

class DiagnosticPhase(str, Enum):
    PARSING = "parsing"
    HYPOTHESIS = "hypothesis"
    CRITIQUE = "critique"
    REBUTTAL = "rebuttal"
    ATTENDING_REVIEW = "attending_review"
    SYNTHESIS = "synthesis"
    COMPLETE = "complete"

class DiagnosticDebateState(TypedDict):
    """State for the diagnostic debate workflow."""
    # Input
    raw_presentation: str

    # Parsed clinical data
    clinical: Optional[ClinicalPresentation]

    # Workflow tracking
    phase: DiagnosticPhase
    current_round: int

    # Diagnostic arguments
    physician_a_argument: Optional[DiagnosticArgument]
    physician_b_argument: Optional[DiagnosticArgument]

    # Critiques
    physician_a_critiques: Optional[CritiqueSet]
    physician_b_critiques: Optional[CritiqueSet]

    # Rebuttals
    physician_a_rebuttals: Optional[RebuttalSet]
    physician_b_rebuttals: Optional[RebuttalSet]

    # Evaluation
    attending_verdict: Optional[AttendingVerdict]

    # Output
    final_differential: Optional[SynthesizedDifferential]

    # Audit
    reasoning_trace: Annotated[List[str], operator.add]

Clinical Parser Agent

# src/agents/parser.py
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from ..models.clinical import ClinicalPresentation, Symptom, VitalSigns, Acuity
from ..config import settings

class ClinicalParser:
    """Parses raw clinical presentations into structured format."""

    def __init__(self):
        self.llm = ChatOpenAI(
            model=settings.openai_model,
            api_key=settings.openai_api_key,
            temperature=0.1
        ).with_structured_output(ClinicalPresentation)

        self.prompt = ChatPromptTemplate.from_messages([
            ("system", """You are an experienced emergency physician extracting
structured clinical data from a case presentation.

Extract:
1. Demographics (age, sex)
2. Chief complaint
3. Individual symptoms with OPQRST characteristics:
   - Onset (sudden vs gradual)
   - Provocation/Palliation (what makes it better/worse)
   - Quality (sharp, dull, burning, pressure)
   - Region/Radiation
   - Severity
   - Timing/Duration
4. Vital signs if mentioned
5. Past medical history, medications, allergies
6. Physical exam findings if mentioned
7. Acuity assessment:
   - Emergent: immediate life threat (chest pain + diaphoresis, stroke symptoms)
   - Urgent: needs attention within hours
   - Semi-urgent: can wait days
   - Routine: scheduled care

If information is not provided, leave it as null/empty.
Make reasonable clinical inferences but note them."""),
            ("human", "{presentation}")
        ])

    async def parse(self, presentation: str) -> ClinicalPresentation:
        """Parse a raw clinical presentation."""
        chain = self.prompt | self.llm
        result = await chain.ainvoke({"presentation": presentation})
        return result

Physician A Agent (Primary Diagnosis)

# src/agents/physician_a.py
from typing import List
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from ..models.clinical import ClinicalPresentation
from ..models.arguments import (
    DiagnosticArgument, DiagnosticHypothesis, PhysicianRole,
    DiagnosticCritique, CritiqueSet, DiagnosticRebuttal, RebuttalSet
)
from ..config import settings

class PhysicianAAgent:
    """First physician - argues for primary/most likely diagnosis."""

    def __init__(self):
        self.llm = ChatOpenAI(
            model=settings.openai_model,
            api_key=settings.openai_api_key,
            temperature=settings.temperature_physicians
        )
        self.role = PhysicianRole.PHYSICIAN_A

    async def generate_hypothesis(
        self,
        clinical: ClinicalPresentation
    ) -> DiagnosticArgument:
        """Generate primary diagnostic hypothesis."""
        llm = self.llm.with_structured_output(DiagnosticArgument)

        prompt = ChatPromptTemplate.from_messages([
            ("system", """You are an experienced physician generating a diagnostic
hypothesis. You are PHYSICIAN A - argue for what you believe is the
MOST LIKELY diagnosis given this presentation.

Your role is to make the strongest case for your leading diagnosis.

Include:
1. The specific diagnosis with ICD-10 code if known
2. {num_supporting} supporting clinical findings with explanations
3. Acknowledge {num_contradicting} findings that don't fit perfectly
4. Pathophysiological reasoning - WHY does this disease cause these symptoms?
5. How well this matches the classic/textbook presentation
6. Whether this is a "can't miss" diagnosis (life-threatening if missed)
7. Recommended workup to confirm

Think like a clinician: What diagnosis would you bet on?

IMPORTANT: You will face critique from another physician arguing for a
DIFFERENT diagnosis. Make your strongest case."""),
            ("human", """Clinical Presentation:
Age: {age}yo {sex}
Chief Complaint: {chief_complaint}

Symptoms:
{symptoms}

Vitals: {vitals}

PMH: {pmh}
Medications: {meds}
Allergies: {allergies}

Exam Findings: {exam}

Acuity: {acuity}

Generate your primary diagnostic hypothesis.""")
        ])

        symptoms_text = "\n".join([
            f"- {s.name}: onset={s.onset}, duration={s.duration}, "
            f"severity={s.severity.value}, character={s.character}"
            for s in clinical.symptoms
        ])

        vitals_text = "Not recorded"
        if clinical.vitals:
            vitals_parts = []
            if clinical.vitals.blood_pressure:
                vitals_parts.append(f"BP: {clinical.vitals.blood_pressure}")
            if clinical.vitals.heart_rate:
                vitals_parts.append(f"HR: {clinical.vitals.heart_rate}")
            if clinical.vitals.respiratory_rate:
                vitals_parts.append(f"RR: {clinical.vitals.respiratory_rate}")
            if clinical.vitals.oxygen_saturation:
                vitals_parts.append(f"SpO2: {clinical.vitals.oxygen_saturation}%")
            if clinical.vitals.temperature:
                vitals_parts.append(f"Temp: {clinical.vitals.temperature}°C")
            vitals_text = ", ".join(vitals_parts) if vitals_parts else "Not recorded"

        chain = prompt | llm
        result = await chain.ainvoke({
            "num_supporting": settings.num_supporting_findings,
            "num_contradicting": settings.num_contradicting_findings,
            "age": clinical.age,
            "sex": clinical.sex,
            "chief_complaint": clinical.chief_complaint,
            "symptoms": symptoms_text or "See chief complaint",
            "vitals": vitals_text,
            "pmh": ", ".join(clinical.past_medical_history) or "None",
            "meds": ", ".join(clinical.medications) or "None",
            "allergies": ", ".join(clinical.allergies) or "NKDA",
            "exam": ", ".join(clinical.exam_findings) or "Not documented",
            "acuity": clinical.acuity.value
        })

        result.role = self.role
        result.round_number = 1
        return result

    async def generate_critiques(
        self,
        clinical: ClinicalPresentation,
        opponent_argument: DiagnosticArgument
    ) -> CritiqueSet:
        """Critique Physician B's diagnosis."""
        llm = self.llm.with_structured_output(CritiqueSet)

        prompt = ChatPromptTemplate.from_messages([
            ("system", """You are PHYSICIAN A critiquing PHYSICIAN B's diagnosis.

Physician B argued for: {opponent_dx}

Your diagnosis: {my_dx}

Generate {num_critiques} critiques attacking their diagnostic reasoning.

For each critique:
1. Identify a specific weakness in their reasoning
2. Point out clinical findings they ignored or misinterpreted
3. Explain how those findings better support YOUR diagnosis
4. Rate severity: minor, moderate, major

Focus on:
- Findings that DON'T fit their diagnosis
- Classic features of their diagnosis that are MISSING
- Risk factors that point to your diagnosis instead
- Pathophysiological inconsistencies"""),
            ("human", """Opponent's argument:
Diagnosis: {opponent_dx}
Supporting findings: {opponent_support}
Pathophysiology: {opponent_patho}

Your diagnosis: {my_dx}

Critique their diagnostic reasoning.""")
        ])

        chain = prompt | llm
        result = await chain.ainvoke({
            "opponent_dx": opponent_argument.primary_hypothesis.diagnosis,
            "opponent_support": ", ".join(opponent_argument.primary_hypothesis.supporting_findings),
            "opponent_patho": opponent_argument.primary_hypothesis.pathophysiology,
            "my_dx": "Your primary diagnosis",  # Will be filled from state
            "num_critiques": settings.num_critique_points
        })

        result.role = self.role
        result.round_number = 2
        return result

    async def generate_rebuttals(
        self,
        clinical: ClinicalPresentation,
        my_argument: DiagnosticArgument,
        opponent_critiques: CritiqueSet
    ) -> RebuttalSet:
        """Defend against Physician B's critiques."""
        llm = self.llm.with_structured_output(RebuttalSet)

        prompt = ChatPromptTemplate.from_messages([
            ("system", """You are PHYSICIAN A defending your diagnosis against critiques.

Your diagnosis: {my_dx}

Physician B critiqued your reasoning:
{critiques}

Generate rebuttals for each critique.

For each rebuttal:
1. Defend your position where the critique is unfair
2. CONCEDE valid points - intellectual honesty is crucial in medicine
3. Update your confidence based on the critique
4. Suggest additional workup if the critique raised valid uncertainty

IMPORTANT: Good clinicians acknowledge uncertainty. If a critique
reveals a genuine diagnostic dilemma, acknowledge it and suggest
how to resolve it (specific tests, consults, observation)."""),
            ("human", "Defend your diagnosis and address the critiques.")
        ])

        critiques_text = "\n".join([
            f"- {c.weakness}\n  Missed findings: {', '.join(c.missed_findings)}"
            for c in opponent_critiques.critiques
        ])

        chain = prompt | llm
        result = await chain.ainvoke({
            "my_dx": my_argument.primary_hypothesis.diagnosis,
            "critiques": critiques_text
        })

        result.role = self.role
        result.round_number = 3
        return result

Physician B Agent (Alternative Diagnosis)

# src/agents/physician_b.py
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from ..models.clinical import ClinicalPresentation
from ..models.arguments import (
    DiagnosticArgument, PhysicianRole,
    CritiqueSet, RebuttalSet
)
from ..config import settings

class PhysicianBAgent:
    """Second physician - argues for alternative/competing diagnosis."""

    def __init__(self):
        self.llm = ChatOpenAI(
            model=settings.openai_model,
            api_key=settings.openai_api_key,
            temperature=settings.temperature_physicians
        )
        self.role = PhysicianRole.PHYSICIAN_B

    async def generate_hypothesis(
        self,
        clinical: ClinicalPresentation
    ) -> DiagnosticArgument:
        """Generate alternative diagnostic hypothesis."""
        llm = self.llm.with_structured_output(DiagnosticArgument)

        prompt = ChatPromptTemplate.from_messages([
            ("system", """You are an experienced physician generating a diagnostic
hypothesis. You are PHYSICIAN B - your role is to argue for an
ALTERNATIVE diagnosis that another physician might miss.

Think about:
- What diagnosis would be easy to miss but dangerous?
- What "zebra" (rare but serious) fits this presentation?
- What common diagnosis might masquerade as something else?
- What would a specialist in a different field consider?

Include:
1. A DIFFERENT diagnosis than the obvious one
2. {num_supporting} supporting clinical findings
3. {num_contradicting} findings that don't fit perfectly
4. Pathophysiological reasoning
5. Whether this is a "can't miss" diagnosis
6. Recommended workup

IMPORTANT: Do NOT argue for the most obvious diagnosis.
Your role is to force consideration of alternatives.
Think: "What if it's NOT the obvious thing?"

Examples of alternative thinking:
- Chest pain: Instead of ACS, consider PE, aortic dissection, esophageal rupture
- Headache: Instead of migraine, consider SAH, meningitis, temporal arteritis
- Abdominal pain: Instead of appendicitis, consider ectopic pregnancy, AAA"""),
            ("human", """Clinical Presentation:
Age: {age}yo {sex}
Chief Complaint: {chief_complaint}

Symptoms:
{symptoms}

Vitals: {vitals}

PMH: {pmh}
Medications: {meds}

Acuity: {acuity}

Generate an ALTERNATIVE diagnostic hypothesis.""")
        ])

        symptoms_text = "\n".join([
            f"- {s.name}: onset={s.onset}, duration={s.duration}, "
            f"severity={s.severity.value}, character={s.character}"
            for s in clinical.symptoms
        ])

        vitals_text = "Not recorded"
        if clinical.vitals:
            vitals_parts = []
            if clinical.vitals.blood_pressure:
                vitals_parts.append(f"BP: {clinical.vitals.blood_pressure}")
            if clinical.vitals.heart_rate:
                vitals_parts.append(f"HR: {clinical.vitals.heart_rate}")
            if clinical.vitals.oxygen_saturation:
                vitals_parts.append(f"SpO2: {clinical.vitals.oxygen_saturation}%")
            vitals_text = ", ".join(vitals_parts) if vitals_parts else "Not recorded"

        chain = prompt | llm
        result = await chain.ainvoke({
            "num_supporting": settings.num_supporting_findings,
            "num_contradicting": settings.num_contradicting_findings,
            "age": clinical.age,
            "sex": clinical.sex,
            "chief_complaint": clinical.chief_complaint,
            "symptoms": symptoms_text or "See chief complaint",
            "vitals": vitals_text,
            "pmh": ", ".join(clinical.past_medical_history) or "None",
            "meds": ", ".join(clinical.medications) or "None",
            "acuity": clinical.acuity.value
        })

        result.role = self.role
        result.round_number = 1
        return result

    async def generate_critiques(
        self,
        clinical: ClinicalPresentation,
        opponent_argument: DiagnosticArgument
    ) -> CritiqueSet:
        """Critique Physician A's diagnosis."""
        llm = self.llm.with_structured_output(CritiqueSet)

        prompt = ChatPromptTemplate.from_messages([
            ("system", """You are PHYSICIAN B critiquing PHYSICIAN A's diagnosis.

Physician A went with the obvious diagnosis: {opponent_dx}

Your alternative diagnosis: {my_dx}

Generate {num_critiques} critiques showing why the obvious
diagnosis might be WRONG or incomplete.

Focus on:
- Red flags they might have dismissed
- Atypical features that don't fit their diagnosis
- Why jumping to the obvious diagnosis is dangerous
- Cases where the "obvious" diagnosis was wrong"""),
            ("human", """Opponent's argument:
Diagnosis: {opponent_dx}
Supporting findings: {opponent_support}
Pathophysiology: {opponent_patho}

Critique their diagnostic reasoning.""")
        ])

        chain = prompt | llm
        result = await chain.ainvoke({
            "opponent_dx": opponent_argument.primary_hypothesis.diagnosis,
            "opponent_support": ", ".join(opponent_argument.primary_hypothesis.supporting_findings),
            "opponent_patho": opponent_argument.primary_hypothesis.pathophysiology,
            "my_dx": "Alternative diagnosis",
            "num_critiques": settings.num_critique_points
        })

        result.role = self.role
        result.round_number = 2
        return result

    async def generate_rebuttals(
        self,
        clinical: ClinicalPresentation,
        my_argument: DiagnosticArgument,
        opponent_critiques: CritiqueSet
    ) -> RebuttalSet:
        """Defend against Physician A's critiques."""
        llm = self.llm.with_structured_output(RebuttalSet)

        prompt = ChatPromptTemplate.from_messages([
            ("system", """You are PHYSICIAN B defending your alternative diagnosis.

Your diagnosis: {my_dx}

Physician A critiqued your reasoning:
{critiques}

Generate rebuttals for each critique.

Remember: Your role is to ensure the alternative diagnosis
isn't dismissed too quickly. Even if less likely, dangerous
diagnoses need to be ruled out.

Concede where appropriate, but emphasize:
- The cost of missing your diagnosis (if it's serious)
- Simple tests that could rule it out
- Why it's worth considering even if less probable"""),
            ("human", "Defend your alternative diagnosis.")
        ])

        critiques_text = "\n".join([
            f"- {c.weakness}\n  Missed findings: {', '.join(c.missed_findings)}"
            for c in opponent_critiques.critiques
        ])

        chain = prompt | llm
        result = await chain.ainvoke({
            "my_dx": my_argument.primary_hypothesis.diagnosis,
            "critiques": critiques_text
        })

        result.role = self.role
        result.round_number = 3
        return result

Attending Physician (Judge)

# src/agents/attending.py
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from ..models.clinical import ClinicalPresentation
from ..models.arguments import DiagnosticArgument, CritiqueSet, RebuttalSet
from ..models.scoring import AttendingVerdict, DiagnosticScore
from ..config import settings

class AttendingPhysician:
    """Senior physician who evaluates the diagnostic debate."""

    def __init__(self):
        self.llm = ChatOpenAI(
            model=settings.openai_model,
            api_key=settings.openai_api_key,
            temperature=settings.temperature_attending
        ).with_structured_output(AttendingVerdict)

        self.prompt = ChatPromptTemplate.from_messages([
            ("system", """You are a senior attending physician evaluating a
diagnostic debate between two residents/fellows.

Evaluate each physician on:

1. CLINICAL REASONING ({weight_reasoning}%)
   - Is the pathophysiology sound?
   - Does the mechanism explain the symptoms?
   - Are they thinking systematically?

2. EVIDENCE QUALITY ({weight_evidence}%)
   - Are findings specific to their diagnosis?
   - Did they acknowledge contradicting findings?
   - Is the evidence strong enough to support confidence?

3. DIFFERENTIAL BREADTH ({weight_breadth}%)
   - Did they consider alternatives?
   - Are they aware of diagnostic mimics?

4. SAFETY AWARENESS ({weight_safety}%)
   - Did they mention can't-miss diagnoses?
   - Would their approach catch life-threatening conditions?

Also identify:
- Teaching points from this case
- Diagnoses NEITHER physician considered
- Who made the stronger case overall

Score each physician 0-10 on each criterion."""),
            ("human", """CASE:
{case_summary}

=== PHYSICIAN A (Primary Diagnosis) ===
Diagnosis: {dx_a}
Supporting: {support_a}
Pathophysiology: {patho_a}
Critiques of B: {crit_a}
Rebuttals: {reb_a}

=== PHYSICIAN B (Alternative Diagnosis) ===
Diagnosis: {dx_b}
Supporting: {support_b}
Pathophysiology: {patho_b}
Critiques of A: {crit_b}
Rebuttals: {reb_b}

Evaluate this diagnostic debate.""")
        ])

    async def evaluate(
        self,
        clinical: ClinicalPresentation,
        arg_a: DiagnosticArgument,
        arg_b: DiagnosticArgument,
        crit_a: CritiqueSet,
        crit_b: CritiqueSet,
        reb_a: RebuttalSet,
        reb_b: RebuttalSet
    ) -> AttendingVerdict:
        """Evaluate the diagnostic debate."""

        case_summary = (
            f"{clinical.age}yo {clinical.sex} with {clinical.chief_complaint}. "
            f"PMH: {', '.join(clinical.past_medical_history) or 'None'}. "
            f"Acuity: {clinical.acuity.value}."
        )

        chain = self.prompt | self.llm
        result = await chain.ainvoke({
            "weight_reasoning": int(settings.weight_clinical_reasoning * 100),
            "weight_evidence": int(settings.weight_evidence_quality * 100),
            "weight_breadth": int(settings.weight_differential_breadth * 100),
            "weight_safety": int(settings.weight_safety_awareness * 100),
            "case_summary": case_summary,
            "dx_a": arg_a.primary_hypothesis.diagnosis,
            "support_a": ", ".join(arg_a.primary_hypothesis.supporting_findings),
            "patho_a": arg_a.primary_hypothesis.pathophysiology,
            "crit_a": "\n".join([c.weakness for c in crit_a.critiques]),
            "reb_a": "\n".join([r.defense for r in reb_a.rebuttals]),
            "dx_b": arg_b.primary_hypothesis.diagnosis,
            "support_b": ", ".join(arg_b.primary_hypothesis.supporting_findings),
            "patho_b": arg_b.primary_hypothesis.pathophysiology,
            "crit_b": "\n".join([c.weakness for c in crit_b.critiques]),
            "reb_b": "\n".join([r.defense for r in reb_b.rebuttals])
        })

        return result

Differential Synthesizer

# src/agents/synthesizer.py
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from ..models.clinical import ClinicalPresentation
from ..models.arguments import DiagnosticArgument
from ..models.scoring import AttendingVerdict, SynthesizedDifferential
from ..config import settings

class DifferentialSynthesizer:
    """Produces final synthesized differential from the debate."""

    def __init__(self):
        self.llm = ChatOpenAI(
            model=settings.openai_model,
            api_key=settings.openai_api_key,
            temperature=0.2
        ).with_structured_output(SynthesizedDifferential)

        self.prompt = ChatPromptTemplate.from_messages([
            ("system", """You are synthesizing a final differential diagnosis
from a diagnostic debate.

Produce a ranked differential that:
1. Lists diagnoses by probability (high, moderate, low)
2. Highlights ALL "can't miss" diagnoses regardless of probability
3. Notes key supporting and contradicting findings for each
4. Recommends specific workup for each diagnosis
5. Provides admission/disposition recommendation
6. Acknowledges remaining uncertainty

The differential should reflect BOTH physicians' arguments,
not just the winner. Even if one diagnosis is more likely,
the alternative may need to be ruled out.

Format as a teaching case with clinical pearls."""),
            ("human", """Case: {case_summary}

Debate outcome: {outcome}

Physician A argued: {dx_a}
Physician B argued: {dx_b}

Attending noted:
- Teaching points: {teaching}
- Missed diagnoses: {missed}

Synthesize the final differential and workup plan.""")
        ])

    async def synthesize(
        self,
        clinical: ClinicalPresentation,
        arg_a: DiagnosticArgument,
        arg_b: DiagnosticArgument,
        verdict: AttendingVerdict
    ) -> SynthesizedDifferential:
        """Produce final differential diagnosis."""

        case_summary = (
            f"{clinical.age}yo {clinical.sex} presenting with "
            f"{clinical.chief_complaint}. Acuity: {clinical.acuity.value}."
        )

        outcome = (
            f"{verdict.stronger_case.value} made stronger case "
            f"({verdict.margin} margin)"
        )

        chain = self.prompt | self.llm
        result = await chain.ainvoke({
            "case_summary": case_summary,
            "outcome": outcome,
            "dx_a": f"{arg_a.primary_hypothesis.diagnosis} "
                    f"(confidence: {arg_a.primary_hypothesis.confidence})",
            "dx_b": f"{arg_b.primary_hypothesis.diagnosis} "
                    f"(confidence: {arg_b.primary_hypothesis.confidence})",
            "teaching": ", ".join(verdict.teaching_points),
            "missed": ", ".join(verdict.missed_diagnoses) or "None noted"
        })

        return result

LangGraph Workflow

# src/workflow/debate.py
from langgraph.graph import StateGraph, END
from ..models.state import DiagnosticDebateState, DiagnosticPhase
from ..agents.parser import ClinicalParser
from ..agents.physician_a import PhysicianAAgent
from ..agents.physician_b import PhysicianBAgent
from ..agents.attending import AttendingPhysician
from ..agents.synthesizer import DifferentialSynthesizer

# Initialize agents
parser = ClinicalParser()
physician_a = PhysicianAAgent()
physician_b = PhysicianBAgent()
attending = AttendingPhysician()
synthesizer = DifferentialSynthesizer()


async def parse_clinical_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
    """Parse the clinical presentation."""
    clinical = await parser.parse(state["raw_presentation"])
    return {
        **state,
        "clinical": clinical,
        "phase": DiagnosticPhase.HYPOTHESIS,
        "current_round": 1,
        "reasoning_trace": [f"Parsed: {clinical.age}yo {clinical.sex}, {clinical.chief_complaint}"]
    }


async def physician_a_hypothesis_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
    """Physician A generates primary hypothesis."""
    argument = await physician_a.generate_hypothesis(state["clinical"])
    return {
        "physician_a_argument": argument,
        "reasoning_trace": [f"Physician A: {argument.primary_hypothesis.diagnosis}"]
    }


async def physician_b_hypothesis_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
    """Physician B generates alternative hypothesis."""
    argument = await physician_b.generate_hypothesis(state["clinical"])
    return {
        "physician_b_argument": argument,
        "reasoning_trace": [f"Physician B: {argument.primary_hypothesis.diagnosis}"]
    }


async def advance_to_critique_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
    """Advance to critique phase."""
    return {
        **state,
        "phase": DiagnosticPhase.CRITIQUE,
        "current_round": 2,
        "reasoning_trace": ["Advancing to critique round"]
    }


async def physician_a_critique_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
    """Physician A critiques Physician B."""
    critiques = await physician_a.generate_critiques(
        state["clinical"],
        state["physician_b_argument"]
    )
    return {
        "physician_a_critiques": critiques,
        "reasoning_trace": [f"Physician A critiqued B with {len(critiques.critiques)} points"]
    }


async def physician_b_critique_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
    """Physician B critiques Physician A."""
    critiques = await physician_b.generate_critiques(
        state["clinical"],
        state["physician_a_argument"]
    )
    return {
        "physician_b_critiques": critiques,
        "reasoning_trace": [f"Physician B critiqued A with {len(critiques.critiques)} points"]
    }


async def advance_to_rebuttal_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
    """Advance to rebuttal phase."""
    return {
        **state,
        "phase": DiagnosticPhase.REBUTTAL,
        "current_round": 3,
        "reasoning_trace": ["Advancing to rebuttal round"]
    }


async def physician_a_rebuttal_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
    """Physician A rebuts B's critiques."""
    rebuttals = await physician_a.generate_rebuttals(
        state["clinical"],
        state["physician_a_argument"],
        state["physician_b_critiques"]
    )
    return {
        "physician_a_rebuttals": rebuttals,
        "reasoning_trace": ["Physician A defended diagnosis"]
    }


async def physician_b_rebuttal_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
    """Physician B rebuts A's critiques."""
    rebuttals = await physician_b.generate_rebuttals(
        state["clinical"],
        state["physician_b_argument"],
        state["physician_a_critiques"]
    )
    return {
        "physician_b_rebuttals": rebuttals,
        "reasoning_trace": ["Physician B defended diagnosis"]
    }


async def attending_review_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
    """Attending evaluates the debate."""
    verdict = await attending.evaluate(
        state["clinical"],
        state["physician_a_argument"],
        state["physician_b_argument"],
        state["physician_a_critiques"],
        state["physician_b_critiques"],
        state["physician_a_rebuttals"],
        state["physician_b_rebuttals"]
    )
    return {
        **state,
        "attending_verdict": verdict,
        "phase": DiagnosticPhase.SYNTHESIS,
        "reasoning_trace": [f"Attending: {verdict.stronger_case.value} stronger ({verdict.margin})"]
    }


async def synthesize_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
    """Synthesize final differential."""
    differential = await synthesizer.synthesize(
        state["clinical"],
        state["physician_a_argument"],
        state["physician_b_argument"],
        state["attending_verdict"]
    )
    return {
        **state,
        "final_differential": differential,
        "phase": DiagnosticPhase.COMPLETE,
        "reasoning_trace": [f"Final DDx: {len(differential.differential)} diagnoses"]
    }


def create_diagnostic_debate_workflow() -> StateGraph:
    """Create the diagnostic debate workflow."""
    workflow = StateGraph(DiagnosticDebateState)

    # Add nodes
    workflow.add_node("parse_clinical", parse_clinical_node)
    workflow.add_node("physician_a_hypothesis", physician_a_hypothesis_node)
    workflow.add_node("physician_b_hypothesis", physician_b_hypothesis_node)
    workflow.add_node("advance_critique", advance_to_critique_node)
    workflow.add_node("physician_a_critique", physician_a_critique_node)
    workflow.add_node("physician_b_critique", physician_b_critique_node)
    workflow.add_node("advance_rebuttal", advance_to_rebuttal_node)
    workflow.add_node("physician_a_rebuttal", physician_a_rebuttal_node)
    workflow.add_node("physician_b_rebuttal", physician_b_rebuttal_node)
    workflow.add_node("attending_review", attending_review_node)
    workflow.add_node("synthesize", synthesize_node)

    # Entry point
    workflow.set_entry_point("parse_clinical")

    # Round 1: Hypotheses (parallel)
    workflow.add_edge("parse_clinical", "physician_a_hypothesis")
    workflow.add_edge("parse_clinical", "physician_b_hypothesis")

    workflow.add_edge("physician_a_hypothesis", "advance_critique")
    workflow.add_edge("physician_b_hypothesis", "advance_critique")

    # Round 2: Critiques (parallel)
    workflow.add_edge("advance_critique", "physician_a_critique")
    workflow.add_edge("advance_critique", "physician_b_critique")

    workflow.add_edge("physician_a_critique", "advance_rebuttal")
    workflow.add_edge("physician_b_critique", "advance_rebuttal")

    # Round 3: Rebuttals (parallel)
    workflow.add_edge("advance_rebuttal", "physician_a_rebuttal")
    workflow.add_edge("advance_rebuttal", "physician_b_rebuttal")

    workflow.add_edge("physician_a_rebuttal", "attending_review")
    workflow.add_edge("physician_b_rebuttal", "attending_review")

    # Evaluation and synthesis
    workflow.add_edge("attending_review", "synthesize")
    workflow.add_edge("synthesize", END)

    return workflow.compile()


diagnostic_debate_agent = create_diagnostic_debate_workflow()

FastAPI Application

# src/api/main.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List, Optional

from ..workflow.debate import diagnostic_debate_agent, DiagnosticDebateState
from ..models.state import DiagnosticPhase

app = FastAPI(
    title="Differential Diagnosis Debate",
    description="Adversarial diagnostic reasoning for clinical decision support",
    version="1.0.0"
)


class DiagnosticRequest(BaseModel):
    presentation: str


class DifferentialItem(BaseModel):
    rank: int
    diagnosis: str
    probability: str
    supporting_findings: List[str]
    contradicting_findings: List[str]
    is_cant_miss: bool
    recommended_tests: List[str]


class DiagnosticResponse(BaseModel):
    patient_summary: str
    physician_a_diagnosis: str
    physician_b_diagnosis: str
    winner: str
    margin: str
    differential: List[DifferentialItem]
    cant_miss_diagnoses: List[str]
    immediate_workup: List[str]
    admission_recommendation: str
    clinical_pearls: List[str]
    teaching_points: List[str]
    reasoning_trace: List[str]


@app.post("/diagnose", response_model=DiagnosticResponse)
async def run_diagnostic_debate(request: DiagnosticRequest):
    """Run a diagnostic debate on a clinical presentation."""
    initial_state: DiagnosticDebateState = {
        "raw_presentation": request.presentation,
        "clinical": None,
        "phase": DiagnosticPhase.PARSING,
        "current_round": 0,
        "physician_a_argument": None,
        "physician_b_argument": None,
        "physician_a_critiques": None,
        "physician_b_critiques": None,
        "physician_a_rebuttals": None,
        "physician_b_rebuttals": None,
        "attending_verdict": None,
        "final_differential": None,
        "reasoning_trace": []
    }

    try:
        result = await diagnostic_debate_agent.ainvoke(initial_state)

        if not result.get("final_differential"):
            raise HTTPException(
                status_code=500,
                detail="Diagnostic debate did not produce a differential"
            )

        diff = result["final_differential"]
        verdict = result["attending_verdict"]
        arg_a = result["physician_a_argument"]
        arg_b = result["physician_b_argument"]

        return DiagnosticResponse(
            patient_summary=diff.patient_summary,
            physician_a_diagnosis=arg_a.primary_hypothesis.diagnosis,
            physician_b_diagnosis=arg_b.primary_hypothesis.diagnosis,
            winner=verdict.stronger_case.value,
            margin=verdict.margin,
            differential=[
                DifferentialItem(
                    rank=item.rank,
                    diagnosis=item.diagnosis,
                    probability=item.probability,
                    supporting_findings=item.key_supporting_findings,
                    contradicting_findings=item.key_contradicting_findings,
                    is_cant_miss=item.is_cant_miss,
                    recommended_tests=item.recommended_tests
                )
                for item in diff.differential
            ],
            cant_miss_diagnoses=diff.cant_miss_diagnoses,
            immediate_workup=diff.immediate_workup,
            admission_recommendation=diff.admission_recommendation,
            clinical_pearls=diff.clinical_pearls,
            teaching_points=verdict.teaching_points,
            reasoning_trace=result.get("reasoning_trace", [])
        )
    except Exception as e:
        raise HTTPException(
            status_code=500,
            detail=f"Diagnostic debate failed: {str(e)}"
        )


@app.get("/health")
async def health():
    return {"status": "healthy", "service": "diagnostic-debate"}

Example Usage

curl -X POST http://localhost:8000/diagnose \
  -H "Content-Type: application/json" \
  -d '{
    "presentation": "55 year old male with sudden onset chest pain radiating to the left arm, diaphoresis, and nausea. History of hypertension and diabetes. Vitals: BP 160/100, HR 95, SpO2 97%. ECG shows no acute ST changes."
  }'

Key Learnings

Alternative diagnoses prevent anchoring - Requiring a second physician to argue for a DIFFERENT diagnosis forces consideration of alternatives that might be missed.
Can't-miss framing improves safety - Explicitly asking whether a diagnosis is "can't miss" (life-threatening if missed) ensures dangerous conditions are considered even if less likely.
Concessions build trust - When physicians acknowledge valid points from the other side, the final differential is more calibrated and trustworthy.
Teaching points emerge naturally - The attending's evaluation produces learning points that wouldn't surface from a single-perspective analysis.

Key Concepts Recap

Concept	What It Is	Why It Matters
Adversarial Diagnosis	Two physicians argue for different diagnoses	Prevents anchoring and premature closure
Can't-Miss Diagnoses	Life-threatening conditions to rule out	Safety-first even if low probability
OPQRST	Symptom characterization framework	Structured clinical data extraction
Diagnostic Scoring	Reasoning, evidence, breadth, safety	Multi-dimensional evaluation
Teaching Points	Learning insights from the case	Educational value from every case

Next Steps

Continue with:

Drug Interaction Arbitrator - Debate pattern for pharmacy
Tumor Board Simulator - Multi-expert debate (4+ specialists)

Differential Diagnosis Debate

Build an adversarial multi-agent system where two AI physician personas debate competing diagnoses for the same patient presentation, forcing comprehensive consideration of diagnostic alternatives.


Difficulty	Advanced
Time	3-4 days
Code	~900 lines
Pattern	Adversarial Debate (Medical Domain)

TL;DR

Medical Disclaimer

The Problem: Diagnostic Errors

┌─────────────────────────────────────────────────────────────────────┐
│ WHY DIAGNOSTIC ERRORS HAPPEN                                         │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Diagnostic errors cause 40,000-80,000 deaths/year in the US        │
│                                                                     │
│  Common cognitive biases:                                           │
│  ┌─────────────────────────────────────────────────────────────┐   │
│  │ Anchoring          │ Lock onto first diagnosis too early    │   │
│  │ Availability       │ Recall recent cases, miss rare ones    │   │
│  │ Confirmation       │ Seek evidence supporting initial dx    │   │
│  │ Premature closure  │ Stop considering alternatives too soon │   │
│  └─────────────────────────────────────────────────────────────┘   │
│                                                                     │
│  The adversarial debate FORCES consideration of alternatives        │
│  by having a second physician argue for a DIFFERENT diagnosis       │
│                                                                     │
│  Single physician:           Adversarial debate:                    │
│  "Looks like MI" ───────►    Physician A: "This is MI"              │
│  (anchors, stops)            Physician B: "This is PE"              │
│                              Attending: "Consider both, order..."   │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

What You'll Build

A diagnostic debate agent that:

Parses clinical presentations - Extracts symptoms, vitals, history into structured format
Generates Diagnosis A - First physician argues for their leading diagnosis
Generates Diagnosis B - Second physician argues for an alternative diagnosis
Runs critique rounds - Each physician attacks the other's diagnostic reasoning
Runs rebuttal rounds - Each defends their diagnosis, conceding valid points
Attending judges - Evaluates diagnostic reasoning quality
Synthesizes differential - Produces ranked differential with workup recommendations

Architecture

┌─────────────────────────────────────────────────────────────────────┐
│              DIFFERENTIAL DIAGNOSIS DEBATE ARCHITECTURE              │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Input: "55yo M, acute chest pain, diaphoresis, normal ECG"         │
│       │                                                             │
│       ▼                                                             │
│  ┌──────────────────┐                                               │
│  │ CLINICAL PARSER  │  Structure: symptoms, vitals, PMH, meds       │
│  └────────┬─────────┘                                               │
│           │                                                         │
│           ▼                                                         │
│  ╔══════════════════════════════════════════════════════════════╗  │
│  ║ ROUND 1: Diagnostic Hypotheses                                ║  │
│  ║  ┌─────────────────┐              ┌─────────────────┐        ║  │
│  ║  │  PHYSICIAN A    │  (parallel)  │  PHYSICIAN B    │        ║  │
│  ║  │  "This is ACS"  │              │  "This is PE"   │        ║  │
│  ║  │  + evidence     │              │  + evidence     │        ║  │
│  ║  └─────────────────┘              └─────────────────┘        ║  │
│  ╚══════════════════════════════════════════════════════════════╝  │
│           │                                                         │
│           ▼                                                         │
│  ╔══════════════════════════════════════════════════════════════╗  │
│  ║ ROUND 2: Diagnostic Critiques                                 ║  │
│  ║  ┌─────────────────┐              ┌─────────────────┐        ║  │
│  ║  │  PHYSICIAN A    │  (parallel)  │  PHYSICIAN B    │        ║  │
│  ║  │  "PE unlikely   │              │  "ACS less      │        ║  │
│  ║  │   because..."   │              │   likely bc..." │        ║  │
│  ║  └─────────────────┘              └─────────────────┘        ║  │
│  ╚══════════════════════════════════════════════════════════════╝  │
│           │                                                         │
│           ▼                                                         │
│  ╔══════════════════════════════════════════════════════════════╗  │
│  ║ ROUND 3: Rebuttals & Concessions                              ║  │
│  ║  ┌─────────────────┐              ┌─────────────────┐        ║  │
│  ║  │  PHYSICIAN A    │  (parallel)  │  PHYSICIAN B    │        ║  │
│  ║  │  defends ACS    │              │  defends PE     │        ║  │
│  ║  │  concedes: "PE  │              │  concedes: "ACS │        ║  │
│  ║  │  should be r/o" │              │  is possible"   │        ║  │
│  ║  └─────────────────┘              └─────────────────┘        ║  │
│  ╚══════════════════════════════════════════════════════════════╝  │
│           │                                                         │
│           ▼                                                         │
│  ┌──────────────────┐                                               │
│  │    ATTENDING     │  Evaluates clinical reasoning quality         │
│  └────────┬─────────┘                                               │
│           │                                                         │
│           ▼                                                         │
│  ┌──────────────────┐                                               │
│  │   SYNTHESIZER    │  Final differential with workup plan          │
│  └──────────────────┘                                               │
│                                                                     │
│  Output: Ranked DDx with "can't miss" diagnoses and workup plan     │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Project Structure

debate-diagnosis/
├── src/
│   ├── __init__.py
│   ├── config.py
│   ├── models/
│   │   ├── __init__.py
│   │   ├── clinical.py         # Clinical data models
│   │   ├── arguments.py        # Diagnostic argument models
│   │   ├── scoring.py          # Attending scoring models
│   │   └── state.py            # DebateState for LangGraph
│   ├── agents/
│   │   ├── __init__.py
│   │   ├── parser.py           # Clinical presentation parser
│   │   ├── physician_a.py      # First physician agent
│   │   ├── physician_b.py      # Second physician agent
│   │   ├── attending.py        # Attending physician (judge)
│   │   └── synthesizer.py      # Differential synthesizer
│   ├── workflow/
│   │   ├── __init__.py
│   │   └── debate.py           # LangGraph debate workflow
│   └── api/
│       ├── __init__.py
│       └── main.py             # FastAPI endpoints
├── tests/
├── docker-compose.yml
└── requirements.txt

Tech Stack

Technology	Purpose
LangGraph	Round-based diagnostic debate workflow
OpenAI GPT-4o	Physician personas with medical reasoning
Pydantic	Clinical data and diagnostic argument models
FastAPI	API for submitting cases and retrieving differentials

Implementation

Configuration

# src/config.py
from pydantic_settings import BaseSettings
from typing import List

class Settings(BaseSettings):
    # LLM Settings
    openai_api_key: str
    openai_model: str = "gpt-4o"
    temperature_physicians: float = 0.4  # Some creativity, but clinical accuracy
    temperature_attending: float = 0.2   # Consistent evaluation

    # Debate Settings
    num_supporting_findings: int = 4
    num_contradicting_findings: int = 2
    num_critique_points: int = 3

    # Scoring Weights
    weight_clinical_reasoning: float = 0.35
    weight_evidence_quality: float = 0.35
    weight_differential_breadth: float = 0.15
    weight_safety_awareness: float = 0.15  # "Can't miss" diagnoses

    # Safety Settings
    always_include_cant_miss: bool = True  # Always flag life-threatening DDx

    class Config:
        env_file = ".env"

settings = Settings()

Why These Weights:

┌─────────────────────────────────────────────────────────────────────┐
│ SCORING WEIGHTS FOR CLINICAL DIAGNOSIS                               │
├─────────────────────────────────────────────────────────────────────┤
│                                                                     │
│  Clinical Reasoning (35%)                                           │
│  └── Does the pathophysiology make sense?                           │
│      Is the mechanism of disease explained?                         │
│                                                                     │
│  Evidence Quality (35%)                                             │
│  └── Are findings specific to this diagnosis?                       │
│      How well do symptoms match classic presentation?               │
│                                                                     │
│  Differential Breadth (15%)                                         │
│  └── Did they consider alternatives?                                │
│      Are they aware of diagnostic mimics?                           │
│                                                                     │
│  Safety Awareness (15%)                                             │
│  └── Did they mention "can't miss" diagnoses?                       │
│      Are life-threatening conditions addressed?                      │
│                                                                     │
│  Safety gets 15% because MISSING a dangerous diagnosis              │
│  is worse than an incomplete differential.                          │
│                                                                     │
└─────────────────────────────────────────────────────────────────────┘

Clinical Models

# src/models/clinical.py
from pydantic import BaseModel, Field
from typing import List, Optional, Dict
from enum import Enum

class Severity(str, Enum):
    MILD = "mild"
    MODERATE = "moderate"
    SEVERE = "severe"
    CRITICAL = "critical"

class Acuity(str, Enum):
    EMERGENT = "emergent"      # Immediate threat to life
    URGENT = "urgent"          # Needs attention within hours
    SEMI_URGENT = "semi_urgent"  # Within days
    ROUTINE = "routine"        # Scheduled care

class Symptom(BaseModel):
    """A clinical symptom."""
    name: str
    duration: Optional[str] = None
    severity: Severity = Severity.MODERATE
    onset: Optional[str] = None  # sudden, gradual
    character: Optional[str] = None  # sharp, dull, burning
    location: Optional[str] = None
    radiation: Optional[str] = None
    aggravating_factors: List[str] = Field(default_factory=list)
    alleviating_factors: List[str] = Field(default_factory=list)

class VitalSigns(BaseModel):
    """Patient vital signs."""
    blood_pressure: Optional[str] = None  # "120/80"
    heart_rate: Optional[int] = None
    respiratory_rate: Optional[int] = None
    temperature: Optional[float] = None  # Celsius
    oxygen_saturation: Optional[int] = None  # Percentage

class ClinicalPresentation(BaseModel):
    """Structured clinical presentation."""
    # Demographics
    age: int
    sex: str  # M, F

    # Chief complaint
    chief_complaint: str

    # History of present illness
    symptoms: List[Symptom]
    symptom_timeline: Optional[str] = None

    # Vital signs
    vitals: Optional[VitalSigns] = None

    # Background
    past_medical_history: List[str] = Field(default_factory=list)
    medications: List[str] = Field(default_factory=list)
    allergies: List[str] = Field(default_factory=list)
    family_history: List[str] = Field(default_factory=list)
    social_history: Optional[str] = None

    # Physical exam findings (if available)
    exam_findings: List[str] = Field(default_factory=list)

    # Initial assessment
    acuity: Acuity = Acuity.URGENT

class DiagnosticTest(BaseModel):
    """A recommended diagnostic test."""
    test_name: str
    rationale: str
    urgency: str = "routine"  # stat, urgent, routine
    what_it_rules_out: List[str] = Field(default_factory=list)
    what_it_confirms: List[str] = Field(default_factory=list)

Diagnostic Argument Models

# src/models/arguments.py
from pydantic import BaseModel, Field
from typing import List, Optional
from enum import Enum

class PhysicianRole(str, Enum):
    PHYSICIAN_A = "physician_a"
    PHYSICIAN_B = "physician_b"

class DiagnosticHypothesis(BaseModel):
    """A diagnostic hypothesis with supporting evidence."""
    diagnosis: str = Field(description="The proposed diagnosis")
    icd10_code: Optional[str] = Field(None, description="ICD-10 code if known")
    probability: str = Field(description="high, moderate, low")

    # Evidence
    supporting_findings: List[str] = Field(
        description="Clinical findings that support this diagnosis"
    )
    contradicting_findings: List[str] = Field(
        default_factory=list,
        description="Findings that argue against this diagnosis"
    )

    # Reasoning
    pathophysiology: str = Field(
        description="How the disease mechanism explains the presentation"
    )
    classic_presentation_match: str = Field(
        description="How well this matches the textbook presentation"
    )

    # Risk assessment
    is_cant_miss: bool = Field(
        default=False,
        description="Is this a life-threatening diagnosis that must be ruled out?"
    )
    miss_consequences: Optional[str] = Field(
        None,
        description="What happens if this diagnosis is missed?"
    )

    confidence: float = Field(ge=0.0, le=1.0)

class DiagnosticArgument(BaseModel):
    """Complete diagnostic argument from one physician."""
    role: PhysicianRole
    primary_hypothesis: DiagnosticHypothesis
    alternative_considerations: List[str] = Field(
        default_factory=list,
        description="Other diagnoses briefly considered"
    )
    recommended_workup: List[str] = Field(
        description="Tests to confirm or rule out"
    )
    round_number: int = 1

class DiagnosticCritique(BaseModel):
    """Critique of opponent's diagnostic reasoning."""
    target_diagnosis: str
    weakness: str = Field(description="Flaw in the diagnostic reasoning")
    missed_findings: List[str] = Field(
        default_factory=list,
        description="Clinical findings the opponent ignored"
    )
    alternative_explanation: str = Field(
        description="How these findings better fit YOUR diagnosis"
    )
    severity: str = Field(description="minor, moderate, major")

class CritiqueSet(BaseModel):
    """Collection of critiques from one physician."""
    role: PhysicianRole
    critiques: List[DiagnosticCritique]
    round_number: int = 2

class DiagnosticRebuttal(BaseModel):
    """Defense against a diagnostic critique."""
    critique_addressed: str
    defense: str
    concession: Optional[str] = Field(
        None,
        description="What valid points the critic raised"
    )
    updated_confidence: float = Field(ge=0.0, le=1.0)
    additional_workup_suggested: List[str] = Field(
        default_factory=list,
        description="Additional tests to address the critique"
    )

class RebuttalSet(BaseModel):
    """Collection of rebuttals from one physician."""
    role: PhysicianRole
    rebuttals: List[DiagnosticRebuttal]
    round_number: int = 3

Scoring and Synthesis Models

# src/models/scoring.py
from pydantic import BaseModel, Field
from typing import List, Optional
from .arguments import PhysicianRole

class DiagnosticScore(BaseModel):
    """Score for a physician's diagnostic reasoning."""
    role: PhysicianRole
    clinical_reasoning_score: float = Field(ge=0.0, le=10.0)
    evidence_quality_score: float = Field(ge=0.0, le=10.0)
    differential_breadth_score: float = Field(ge=0.0, le=10.0)
    safety_awareness_score: float = Field(ge=0.0, le=10.0)
    total_score: float = Field(ge=0.0, le=100.0)
    strengths: List[str]
    weaknesses: List[str]

class AttendingVerdict(BaseModel):
    """Attending physician's evaluation."""
    physician_a_score: DiagnosticScore
    physician_b_score: DiagnosticScore
    stronger_case: PhysicianRole
    margin: str  # narrow, moderate, decisive
    teaching_points: List[str] = Field(
        description="Key learning points from this case"
    )
    missed_diagnoses: List[str] = Field(
        default_factory=list,
        description="Important diagnoses neither physician considered"
    )

class DifferentialItem(BaseModel):
    """A single item in the final differential."""
    rank: int
    diagnosis: str
    icd10_code: Optional[str] = None
    probability: str
    key_supporting_findings: List[str]
    key_contradicting_findings: List[str]
    is_cant_miss: bool = False
    recommended_tests: List[str]

class SynthesizedDifferential(BaseModel):
    """Final synthesized differential diagnosis."""
    patient_summary: str
    differential: List[DifferentialItem]
    cant_miss_diagnoses: List[str] = Field(
        description="Life-threatening diagnoses that must be ruled out"
    )
    immediate_workup: List[str] = Field(
        description="Tests to order now"
    )
    admission_recommendation: str = Field(
        description="Admit, observe, discharge with follow-up"
    )
    clinical_pearls: List[str] = Field(
        description="Key teaching points from this case"
    )
    uncertainty_acknowledgment: str = Field(
        description="What remains uncertain and how to address it"
    )

Debate State

# src/models/state.py
from typing import TypedDict, List, Optional, Annotated
from enum import Enum
import operator
from .clinical import ClinicalPresentation
from .arguments import DiagnosticArgument, CritiqueSet, RebuttalSet
from .scoring import AttendingVerdict, SynthesizedDifferential

class DiagnosticPhase(str, Enum):
    PARSING = "parsing"
    HYPOTHESIS = "hypothesis"
    CRITIQUE = "critique"
    REBUTTAL = "rebuttal"
    ATTENDING_REVIEW = "attending_review"
    SYNTHESIS = "synthesis"
    COMPLETE = "complete"

class DiagnosticDebateState(TypedDict):
    """State for the diagnostic debate workflow."""
    # Input
    raw_presentation: str

    # Parsed clinical data
    clinical: Optional[ClinicalPresentation]

    # Workflow tracking
    phase: DiagnosticPhase
    current_round: int

    # Diagnostic arguments
    physician_a_argument: Optional[DiagnosticArgument]
    physician_b_argument: Optional[DiagnosticArgument]

    # Critiques
    physician_a_critiques: Optional[CritiqueSet]
    physician_b_critiques: Optional[CritiqueSet]

    # Rebuttals
    physician_a_rebuttals: Optional[RebuttalSet]
    physician_b_rebuttals: Optional[RebuttalSet]

    # Evaluation
    attending_verdict: Optional[AttendingVerdict]

    # Output
    final_differential: Optional[SynthesizedDifferential]

    # Audit
    reasoning_trace: Annotated[List[str], operator.add]

Clinical Parser Agent

# src/agents/parser.py
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from ..models.clinical import ClinicalPresentation, Symptom, VitalSigns, Acuity
from ..config import settings

class ClinicalParser:
    """Parses raw clinical presentations into structured format."""

    def __init__(self):
        self.llm = ChatOpenAI(
            model=settings.openai_model,
            api_key=settings.openai_api_key,
            temperature=0.1
        ).with_structured_output(ClinicalPresentation)

        self.prompt = ChatPromptTemplate.from_messages([
            ("system", """You are an experienced emergency physician extracting
structured clinical data from a case presentation.

Extract:
1. Demographics (age, sex)
2. Chief complaint
3. Individual symptoms with OPQRST characteristics:
   - Onset (sudden vs gradual)
   - Provocation/Palliation (what makes it better/worse)
   - Quality (sharp, dull, burning, pressure)
   - Region/Radiation
   - Severity
   - Timing/Duration
4. Vital signs if mentioned
5. Past medical history, medications, allergies
6. Physical exam findings if mentioned
7. Acuity assessment:
   - Emergent: immediate life threat (chest pain + diaphoresis, stroke symptoms)
   - Urgent: needs attention within hours
   - Semi-urgent: can wait days
   - Routine: scheduled care

If information is not provided, leave it as null/empty.
Make reasonable clinical inferences but note them."""),
            ("human", "{presentation}")
        ])

    async def parse(self, presentation: str) -> ClinicalPresentation:
        """Parse a raw clinical presentation."""
        chain = self.prompt | self.llm
        result = await chain.ainvoke({"presentation": presentation})
        return result

Physician A Agent (Primary Diagnosis)

# src/agents/physician_a.py
from typing import List
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from ..models.clinical import ClinicalPresentation
from ..models.arguments import (
    DiagnosticArgument, DiagnosticHypothesis, PhysicianRole,
    DiagnosticCritique, CritiqueSet, DiagnosticRebuttal, RebuttalSet
)
from ..config import settings

class PhysicianAAgent:
    """First physician - argues for primary/most likely diagnosis."""

    def __init__(self):
        self.llm = ChatOpenAI(
            model=settings.openai_model,
            api_key=settings.openai_api_key,
            temperature=settings.temperature_physicians
        )
        self.role = PhysicianRole.PHYSICIAN_A

    async def generate_hypothesis(
        self,
        clinical: ClinicalPresentation
    ) -> DiagnosticArgument:
        """Generate primary diagnostic hypothesis."""
        llm = self.llm.with_structured_output(DiagnosticArgument)

        prompt = ChatPromptTemplate.from_messages([
            ("system", """You are an experienced physician generating a diagnostic
hypothesis. You are PHYSICIAN A - argue for what you believe is the
MOST LIKELY diagnosis given this presentation.

Your role is to make the strongest case for your leading diagnosis.

Include:
1. The specific diagnosis with ICD-10 code if known
2. {num_supporting} supporting clinical findings with explanations
3. Acknowledge {num_contradicting} findings that don't fit perfectly
4. Pathophysiological reasoning - WHY does this disease cause these symptoms?
5. How well this matches the classic/textbook presentation
6. Whether this is a "can't miss" diagnosis (life-threatening if missed)
7. Recommended workup to confirm

Think like a clinician: What diagnosis would you bet on?

IMPORTANT: You will face critique from another physician arguing for a
DIFFERENT diagnosis. Make your strongest case."""),
            ("human", """Clinical Presentation:
Age: {age}yo {sex}
Chief Complaint: {chief_complaint}

Symptoms:
{symptoms}

Vitals: {vitals}

PMH: {pmh}
Medications: {meds}
Allergies: {allergies}

Exam Findings: {exam}

Acuity: {acuity}

Generate your primary diagnostic hypothesis.""")
        ])

        symptoms_text = "\n".join([
            f"- {s.name}: onset={s.onset}, duration={s.duration}, "
            f"severity={s.severity.value}, character={s.character}"
            for s in clinical.symptoms
        ])

        vitals_text = "Not recorded"
        if clinical.vitals:
            vitals_parts = []
            if clinical.vitals.blood_pressure:
                vitals_parts.append(f"BP: {clinical.vitals.blood_pressure}")
            if clinical.vitals.heart_rate:
                vitals_parts.append(f"HR: {clinical.vitals.heart_rate}")
            if clinical.vitals.respiratory_rate:
                vitals_parts.append(f"RR: {clinical.vitals.respiratory_rate}")
            if clinical.vitals.oxygen_saturation:
                vitals_parts.append(f"SpO2: {clinical.vitals.oxygen_saturation}%")
            if clinical.vitals.temperature:
                vitals_parts.append(f"Temp: {clinical.vitals.temperature}°C")
            vitals_text = ", ".join(vitals_parts) if vitals_parts else "Not recorded"

        chain = prompt | llm
        result = await chain.ainvoke({
            "num_supporting": settings.num_supporting_findings,
            "num_contradicting": settings.num_contradicting_findings,
            "age": clinical.age,
            "sex": clinical.sex,
            "chief_complaint": clinical.chief_complaint,
            "symptoms": symptoms_text or "See chief complaint",
            "vitals": vitals_text,
            "pmh": ", ".join(clinical.past_medical_history) or "None",
            "meds": ", ".join(clinical.medications) or "None",
            "allergies": ", ".join(clinical.allergies) or "NKDA",
            "exam": ", ".join(clinical.exam_findings) or "Not documented",
            "acuity": clinical.acuity.value
        })

        result.role = self.role
        result.round_number = 1
        return result

    async def generate_critiques(
        self,
        clinical: ClinicalPresentation,
        opponent_argument: DiagnosticArgument
    ) -> CritiqueSet:
        """Critique Physician B's diagnosis."""
        llm = self.llm.with_structured_output(CritiqueSet)

        prompt = ChatPromptTemplate.from_messages([
            ("system", """You are PHYSICIAN A critiquing PHYSICIAN B's diagnosis.

Physician B argued for: {opponent_dx}

Your diagnosis: {my_dx}

Generate {num_critiques} critiques attacking their diagnostic reasoning.

For each critique:
1. Identify a specific weakness in their reasoning
2. Point out clinical findings they ignored or misinterpreted
3. Explain how those findings better support YOUR diagnosis
4. Rate severity: minor, moderate, major

Focus on:
- Findings that DON'T fit their diagnosis
- Classic features of their diagnosis that are MISSING
- Risk factors that point to your diagnosis instead
- Pathophysiological inconsistencies"""),
            ("human", """Opponent's argument:
Diagnosis: {opponent_dx}
Supporting findings: {opponent_support}
Pathophysiology: {opponent_patho}

Your diagnosis: {my_dx}

Critique their diagnostic reasoning.""")
        ])

        chain = prompt | llm
        result = await chain.ainvoke({
            "opponent_dx": opponent_argument.primary_hypothesis.diagnosis,
            "opponent_support": ", ".join(opponent_argument.primary_hypothesis.supporting_findings),
            "opponent_patho": opponent_argument.primary_hypothesis.pathophysiology,
            "my_dx": "Your primary diagnosis",  # Will be filled from state
            "num_critiques": settings.num_critique_points
        })

        result.role = self.role
        result.round_number = 2
        return result

    async def generate_rebuttals(
        self,
        clinical: ClinicalPresentation,
        my_argument: DiagnosticArgument,
        opponent_critiques: CritiqueSet
    ) -> RebuttalSet:
        """Defend against Physician B's critiques."""
        llm = self.llm.with_structured_output(RebuttalSet)

        prompt = ChatPromptTemplate.from_messages([
            ("system", """You are PHYSICIAN A defending your diagnosis against critiques.

Your diagnosis: {my_dx}

Physician B critiqued your reasoning:
{critiques}

Generate rebuttals for each critique.

For each rebuttal:
1. Defend your position where the critique is unfair
2. CONCEDE valid points - intellectual honesty is crucial in medicine
3. Update your confidence based on the critique
4. Suggest additional workup if the critique raised valid uncertainty

IMPORTANT: Good clinicians acknowledge uncertainty. If a critique
reveals a genuine diagnostic dilemma, acknowledge it and suggest
how to resolve it (specific tests, consults, observation)."""),
            ("human", "Defend your diagnosis and address the critiques.")
        ])

        critiques_text = "\n".join([
            f"- {c.weakness}\n  Missed findings: {', '.join(c.missed_findings)}"
            for c in opponent_critiques.critiques
        ])

        chain = prompt | llm
        result = await chain.ainvoke({
            "my_dx": my_argument.primary_hypothesis.diagnosis,
            "critiques": critiques_text
        })

        result.role = self.role
        result.round_number = 3
        return result

Physician B Agent (Alternative Diagnosis)

# src/agents/physician_b.py
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from ..models.clinical import ClinicalPresentation
from ..models.arguments import (
    DiagnosticArgument, PhysicianRole,
    CritiqueSet, RebuttalSet
)
from ..config import settings

class PhysicianBAgent:
    """Second physician - argues for alternative/competing diagnosis."""

    def __init__(self):
        self.llm = ChatOpenAI(
            model=settings.openai_model,
            api_key=settings.openai_api_key,
            temperature=settings.temperature_physicians
        )
        self.role = PhysicianRole.PHYSICIAN_B

    async def generate_hypothesis(
        self,
        clinical: ClinicalPresentation
    ) -> DiagnosticArgument:
        """Generate alternative diagnostic hypothesis."""
        llm = self.llm.with_structured_output(DiagnosticArgument)

        prompt = ChatPromptTemplate.from_messages([
            ("system", """You are an experienced physician generating a diagnostic
hypothesis. You are PHYSICIAN B - your role is to argue for an
ALTERNATIVE diagnosis that another physician might miss.

Think about:
- What diagnosis would be easy to miss but dangerous?
- What "zebra" (rare but serious) fits this presentation?
- What common diagnosis might masquerade as something else?
- What would a specialist in a different field consider?

Include:
1. A DIFFERENT diagnosis than the obvious one
2. {num_supporting} supporting clinical findings
3. {num_contradicting} findings that don't fit perfectly
4. Pathophysiological reasoning
5. Whether this is a "can't miss" diagnosis
6. Recommended workup

IMPORTANT: Do NOT argue for the most obvious diagnosis.
Your role is to force consideration of alternatives.
Think: "What if it's NOT the obvious thing?"

Examples of alternative thinking:
- Chest pain: Instead of ACS, consider PE, aortic dissection, esophageal rupture
- Headache: Instead of migraine, consider SAH, meningitis, temporal arteritis
- Abdominal pain: Instead of appendicitis, consider ectopic pregnancy, AAA"""),
            ("human", """Clinical Presentation:
Age: {age}yo {sex}
Chief Complaint: {chief_complaint}

Symptoms:
{symptoms}

Vitals: {vitals}

PMH: {pmh}
Medications: {meds}

Acuity: {acuity}

Generate an ALTERNATIVE diagnostic hypothesis.""")
        ])

        symptoms_text = "\n".join([
            f"- {s.name}: onset={s.onset}, duration={s.duration}, "
            f"severity={s.severity.value}, character={s.character}"
            for s in clinical.symptoms
        ])

        vitals_text = "Not recorded"
        if clinical.vitals:
            vitals_parts = []
            if clinical.vitals.blood_pressure:
                vitals_parts.append(f"BP: {clinical.vitals.blood_pressure}")
            if clinical.vitals.heart_rate:
                vitals_parts.append(f"HR: {clinical.vitals.heart_rate}")
            if clinical.vitals.oxygen_saturation:
                vitals_parts.append(f"SpO2: {clinical.vitals.oxygen_saturation}%")
            vitals_text = ", ".join(vitals_parts) if vitals_parts else "Not recorded"

        chain = prompt | llm
        result = await chain.ainvoke({
            "num_supporting": settings.num_supporting_findings,
            "num_contradicting": settings.num_contradicting_findings,
            "age": clinical.age,
            "sex": clinical.sex,
            "chief_complaint": clinical.chief_complaint,
            "symptoms": symptoms_text or "See chief complaint",
            "vitals": vitals_text,
            "pmh": ", ".join(clinical.past_medical_history) or "None",
            "meds": ", ".join(clinical.medications) or "None",
            "acuity": clinical.acuity.value
        })

        result.role = self.role
        result.round_number = 1
        return result

    async def generate_critiques(
        self,
        clinical: ClinicalPresentation,
        opponent_argument: DiagnosticArgument
    ) -> CritiqueSet:
        """Critique Physician A's diagnosis."""
        llm = self.llm.with_structured_output(CritiqueSet)

        prompt = ChatPromptTemplate.from_messages([
            ("system", """You are PHYSICIAN B critiquing PHYSICIAN A's diagnosis.

Physician A went with the obvious diagnosis: {opponent_dx}

Your alternative diagnosis: {my_dx}

Generate {num_critiques} critiques showing why the obvious
diagnosis might be WRONG or incomplete.

Focus on:
- Red flags they might have dismissed
- Atypical features that don't fit their diagnosis
- Why jumping to the obvious diagnosis is dangerous
- Cases where the "obvious" diagnosis was wrong"""),
            ("human", """Opponent's argument:
Diagnosis: {opponent_dx}
Supporting findings: {opponent_support}
Pathophysiology: {opponent_patho}

Critique their diagnostic reasoning.""")
        ])

        chain = prompt | llm
        result = await chain.ainvoke({
            "opponent_dx": opponent_argument.primary_hypothesis.diagnosis,
            "opponent_support": ", ".join(opponent_argument.primary_hypothesis.supporting_findings),
            "opponent_patho": opponent_argument.primary_hypothesis.pathophysiology,
            "my_dx": "Alternative diagnosis",
            "num_critiques": settings.num_critique_points
        })

        result.role = self.role
        result.round_number = 2
        return result

    async def generate_rebuttals(
        self,
        clinical: ClinicalPresentation,
        my_argument: DiagnosticArgument,
        opponent_critiques: CritiqueSet
    ) -> RebuttalSet:
        """Defend against Physician A's critiques."""
        llm = self.llm.with_structured_output(RebuttalSet)

        prompt = ChatPromptTemplate.from_messages([
            ("system", """You are PHYSICIAN B defending your alternative diagnosis.

Your diagnosis: {my_dx}

Physician A critiqued your reasoning:
{critiques}

Generate rebuttals for each critique.

Remember: Your role is to ensure the alternative diagnosis
isn't dismissed too quickly. Even if less likely, dangerous
diagnoses need to be ruled out.

Concede where appropriate, but emphasize:
- The cost of missing your diagnosis (if it's serious)
- Simple tests that could rule it out
- Why it's worth considering even if less probable"""),
            ("human", "Defend your alternative diagnosis.")
        ])

        critiques_text = "\n".join([
            f"- {c.weakness}\n  Missed findings: {', '.join(c.missed_findings)}"
            for c in opponent_critiques.critiques
        ])

        chain = prompt | llm
        result = await chain.ainvoke({
            "my_dx": my_argument.primary_hypothesis.diagnosis,
            "critiques": critiques_text
        })

        result.role = self.role
        result.round_number = 3
        return result

Attending Physician (Judge)

# src/agents/attending.py
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from ..models.clinical import ClinicalPresentation
from ..models.arguments import DiagnosticArgument, CritiqueSet, RebuttalSet
from ..models.scoring import AttendingVerdict, DiagnosticScore
from ..config import settings

class AttendingPhysician:
    """Senior physician who evaluates the diagnostic debate."""

    def __init__(self):
        self.llm = ChatOpenAI(
            model=settings.openai_model,
            api_key=settings.openai_api_key,
            temperature=settings.temperature_attending
        ).with_structured_output(AttendingVerdict)

        self.prompt = ChatPromptTemplate.from_messages([
            ("system", """You are a senior attending physician evaluating a
diagnostic debate between two residents/fellows.

Evaluate each physician on:

1. CLINICAL REASONING ({weight_reasoning}%)
   - Is the pathophysiology sound?
   - Does the mechanism explain the symptoms?
   - Are they thinking systematically?

2. EVIDENCE QUALITY ({weight_evidence}%)
   - Are findings specific to their diagnosis?
   - Did they acknowledge contradicting findings?
   - Is the evidence strong enough to support confidence?

3. DIFFERENTIAL BREADTH ({weight_breadth}%)
   - Did they consider alternatives?
   - Are they aware of diagnostic mimics?

4. SAFETY AWARENESS ({weight_safety}%)
   - Did they mention can't-miss diagnoses?
   - Would their approach catch life-threatening conditions?

Also identify:
- Teaching points from this case
- Diagnoses NEITHER physician considered
- Who made the stronger case overall

Score each physician 0-10 on each criterion."""),
            ("human", """CASE:
{case_summary}

=== PHYSICIAN A (Primary Diagnosis) ===
Diagnosis: {dx_a}
Supporting: {support_a}
Pathophysiology: {patho_a}
Critiques of B: {crit_a}
Rebuttals: {reb_a}

=== PHYSICIAN B (Alternative Diagnosis) ===
Diagnosis: {dx_b}
Supporting: {support_b}
Pathophysiology: {patho_b}
Critiques of A: {crit_b}
Rebuttals: {reb_b}

Evaluate this diagnostic debate.""")
        ])

    async def evaluate(
        self,
        clinical: ClinicalPresentation,
        arg_a: DiagnosticArgument,
        arg_b: DiagnosticArgument,
        crit_a: CritiqueSet,
        crit_b: CritiqueSet,
        reb_a: RebuttalSet,
        reb_b: RebuttalSet
    ) -> AttendingVerdict:
        """Evaluate the diagnostic debate."""

        case_summary = (
            f"{clinical.age}yo {clinical.sex} with {clinical.chief_complaint}. "
            f"PMH: {', '.join(clinical.past_medical_history) or 'None'}. "
            f"Acuity: {clinical.acuity.value}."
        )

        chain = self.prompt | self.llm
        result = await chain.ainvoke({
            "weight_reasoning": int(settings.weight_clinical_reasoning * 100),
            "weight_evidence": int(settings.weight_evidence_quality * 100),
            "weight_breadth": int(settings.weight_differential_breadth * 100),
            "weight_safety": int(settings.weight_safety_awareness * 100),
            "case_summary": case_summary,
            "dx_a": arg_a.primary_hypothesis.diagnosis,
            "support_a": ", ".join(arg_a.primary_hypothesis.supporting_findings),
            "patho_a": arg_a.primary_hypothesis.pathophysiology,
            "crit_a": "\n".join([c.weakness for c in crit_a.critiques]),
            "reb_a": "\n".join([r.defense for r in reb_a.rebuttals]),
            "dx_b": arg_b.primary_hypothesis.diagnosis,
            "support_b": ", ".join(arg_b.primary_hypothesis.supporting_findings),
            "patho_b": arg_b.primary_hypothesis.pathophysiology,
            "crit_b": "\n".join([c.weakness for c in crit_b.critiques]),
            "reb_b": "\n".join([r.defense for r in reb_b.rebuttals])
        })

        return result

Differential Synthesizer

# src/agents/synthesizer.py
from langchain_openai import ChatOpenAI
from langchain_core.prompts import ChatPromptTemplate
from ..models.clinical import ClinicalPresentation
from ..models.arguments import DiagnosticArgument
from ..models.scoring import AttendingVerdict, SynthesizedDifferential
from ..config import settings

class DifferentialSynthesizer:
    """Produces final synthesized differential from the debate."""

    def __init__(self):
        self.llm = ChatOpenAI(
            model=settings.openai_model,
            api_key=settings.openai_api_key,
            temperature=0.2
        ).with_structured_output(SynthesizedDifferential)

        self.prompt = ChatPromptTemplate.from_messages([
            ("system", """You are synthesizing a final differential diagnosis
from a diagnostic debate.

Produce a ranked differential that:
1. Lists diagnoses by probability (high, moderate, low)
2. Highlights ALL "can't miss" diagnoses regardless of probability
3. Notes key supporting and contradicting findings for each
4. Recommends specific workup for each diagnosis
5. Provides admission/disposition recommendation
6. Acknowledges remaining uncertainty

The differential should reflect BOTH physicians' arguments,
not just the winner. Even if one diagnosis is more likely,
the alternative may need to be ruled out.

Format as a teaching case with clinical pearls."""),
            ("human", """Case: {case_summary}

Debate outcome: {outcome}

Physician A argued: {dx_a}
Physician B argued: {dx_b}

Attending noted:
- Teaching points: {teaching}
- Missed diagnoses: {missed}

Synthesize the final differential and workup plan.""")
        ])

    async def synthesize(
        self,
        clinical: ClinicalPresentation,
        arg_a: DiagnosticArgument,
        arg_b: DiagnosticArgument,
        verdict: AttendingVerdict
    ) -> SynthesizedDifferential:
        """Produce final differential diagnosis."""

        case_summary = (
            f"{clinical.age}yo {clinical.sex} presenting with "
            f"{clinical.chief_complaint}. Acuity: {clinical.acuity.value}."
        )

        outcome = (
            f"{verdict.stronger_case.value} made stronger case "
            f"({verdict.margin} margin)"
        )

        chain = self.prompt | self.llm
        result = await chain.ainvoke({
            "case_summary": case_summary,
            "outcome": outcome,
            "dx_a": f"{arg_a.primary_hypothesis.diagnosis} "
                    f"(confidence: {arg_a.primary_hypothesis.confidence})",
            "dx_b": f"{arg_b.primary_hypothesis.diagnosis} "
                    f"(confidence: {arg_b.primary_hypothesis.confidence})",
            "teaching": ", ".join(verdict.teaching_points),
            "missed": ", ".join(verdict.missed_diagnoses) or "None noted"
        })

        return result

LangGraph Workflow

# src/workflow/debate.py
from langgraph.graph import StateGraph, END
from ..models.state import DiagnosticDebateState, DiagnosticPhase
from ..agents.parser import ClinicalParser
from ..agents.physician_a import PhysicianAAgent
from ..agents.physician_b import PhysicianBAgent
from ..agents.attending import AttendingPhysician
from ..agents.synthesizer import DifferentialSynthesizer

# Initialize agents
parser = ClinicalParser()
physician_a = PhysicianAAgent()
physician_b = PhysicianBAgent()
attending = AttendingPhysician()
synthesizer = DifferentialSynthesizer()


async def parse_clinical_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
    """Parse the clinical presentation."""
    clinical = await parser.parse(state["raw_presentation"])
    return {
        **state,
        "clinical": clinical,
        "phase": DiagnosticPhase.HYPOTHESIS,
        "current_round": 1,
        "reasoning_trace": [f"Parsed: {clinical.age}yo {clinical.sex}, {clinical.chief_complaint}"]
    }


async def physician_a_hypothesis_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
    """Physician A generates primary hypothesis."""
    argument = await physician_a.generate_hypothesis(state["clinical"])
    return {
        "physician_a_argument": argument,
        "reasoning_trace": [f"Physician A: {argument.primary_hypothesis.diagnosis}"]
    }


async def physician_b_hypothesis_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
    """Physician B generates alternative hypothesis."""
    argument = await physician_b.generate_hypothesis(state["clinical"])
    return {
        "physician_b_argument": argument,
        "reasoning_trace": [f"Physician B: {argument.primary_hypothesis.diagnosis}"]
    }


async def advance_to_critique_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
    """Advance to critique phase."""
    return {
        **state,
        "phase": DiagnosticPhase.CRITIQUE,
        "current_round": 2,
        "reasoning_trace": ["Advancing to critique round"]
    }


async def physician_a_critique_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
    """Physician A critiques Physician B."""
    critiques = await physician_a.generate_critiques(
        state["clinical"],
        state["physician_b_argument"]
    )
    return {
        "physician_a_critiques": critiques,
        "reasoning_trace": [f"Physician A critiqued B with {len(critiques.critiques)} points"]
    }


async def physician_b_critique_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
    """Physician B critiques Physician A."""
    critiques = await physician_b.generate_critiques(
        state["clinical"],
        state["physician_a_argument"]
    )
    return {
        "physician_b_critiques": critiques,
        "reasoning_trace": [f"Physician B critiqued A with {len(critiques.critiques)} points"]
    }


async def advance_to_rebuttal_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
    """Advance to rebuttal phase."""
    return {
        **state,
        "phase": DiagnosticPhase.REBUTTAL,
        "current_round": 3,
        "reasoning_trace": ["Advancing to rebuttal round"]
    }


async def physician_a_rebuttal_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
    """Physician A rebuts B's critiques."""
    rebuttals = await physician_a.generate_rebuttals(
        state["clinical"],
        state["physician_a_argument"],
        state["physician_b_critiques"]
    )
    return {
        "physician_a_rebuttals": rebuttals,
        "reasoning_trace": ["Physician A defended diagnosis"]
    }


async def physician_b_rebuttal_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
    """Physician B rebuts A's critiques."""
    rebuttals = await physician_b.generate_rebuttals(
        state["clinical"],
        state["physician_b_argument"],
        state["physician_a_critiques"]
    )
    return {
        "physician_b_rebuttals": rebuttals,
        "reasoning_trace": ["Physician B defended diagnosis"]
    }


async def attending_review_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
    """Attending evaluates the debate."""
    verdict = await attending.evaluate(
        state["clinical"],
        state["physician_a_argument"],
        state["physician_b_argument"],
        state["physician_a_critiques"],
        state["physician_b_critiques"],
        state["physician_a_rebuttals"],
        state["physician_b_rebuttals"]
    )
    return {
        **state,
        "attending_verdict": verdict,
        "phase": DiagnosticPhase.SYNTHESIS,
        "reasoning_trace": [f"Attending: {verdict.stronger_case.value} stronger ({verdict.margin})"]
    }


async def synthesize_node(state: DiagnosticDebateState) -> DiagnosticDebateState:
    """Synthesize final differential."""
    differential = await synthesizer.synthesize(
        state["clinical"],
        state["physician_a_argument"],
        state["physician_b_argument"],
        state["attending_verdict"]
    )
    return {
        **state,
        "final_differential": differential,
        "phase": DiagnosticPhase.COMPLETE,
        "reasoning_trace": [f"Final DDx: {len(differential.differential)} diagnoses"]
    }


def create_diagnostic_debate_workflow() -> StateGraph:
    """Create the diagnostic debate workflow."""
    workflow = StateGraph(DiagnosticDebateState)

    # Add nodes
    workflow.add_node("parse_clinical", parse_clinical_node)
    workflow.add_node("physician_a_hypothesis", physician_a_hypothesis_node)
    workflow.add_node("physician_b_hypothesis", physician_b_hypothesis_node)
    workflow.add_node("advance_critique", advance_to_critique_node)
    workflow.add_node("physician_a_critique", physician_a_critique_node)
    workflow.add_node("physician_b_critique", physician_b_critique_node)
    workflow.add_node("advance_rebuttal", advance_to_rebuttal_node)
    workflow.add_node("physician_a_rebuttal", physician_a_rebuttal_node)
    workflow.add_node("physician_b_rebuttal", physician_b_rebuttal_node)
    workflow.add_node("attending_review", attending_review_node)
    workflow.add_node("synthesize", synthesize_node)

    # Entry point
    workflow.set_entry_point("parse_clinical")

    # Round 1: Hypotheses (parallel)
    workflow.add_edge("parse_clinical", "physician_a_hypothesis")
    workflow.add_edge("parse_clinical", "physician_b_hypothesis")

    workflow.add_edge("physician_a_hypothesis", "advance_critique")
    workflow.add_edge("physician_b_hypothesis", "advance_critique")

    # Round 2: Critiques (parallel)
    workflow.add_edge("advance_critique", "physician_a_critique")
    workflow.add_edge("advance_critique", "physician_b_critique")

    workflow.add_edge("physician_a_critique", "advance_rebuttal")
    workflow.add_edge("physician_b_critique", "advance_rebuttal")

    # Round 3: Rebuttals (parallel)
    workflow.add_edge("advance_rebuttal", "physician_a_rebuttal")
    workflow.add_edge("advance_rebuttal", "physician_b_rebuttal")

    workflow.add_edge("physician_a_rebuttal", "attending_review")
    workflow.add_edge("physician_b_rebuttal", "attending_review")

    # Evaluation and synthesis
    workflow.add_edge("attending_review", "synthesize")
    workflow.add_edge("synthesize", END)

    return workflow.compile()


diagnostic_debate_agent = create_diagnostic_debate_workflow()

FastAPI Application

# src/api/main.py
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel
from typing import List, Optional

from ..workflow.debate import diagnostic_debate_agent, DiagnosticDebateState
from ..models.state import DiagnosticPhase

app = FastAPI(
    title="Differential Diagnosis Debate",
    description="Adversarial diagnostic reasoning for clinical decision support",
    version="1.0.0"
)


class DiagnosticRequest(BaseModel):
    presentation: str


class DifferentialItem(BaseModel):
    rank: int
    diagnosis: str
    probability: str
    supporting_findings: List[str]
    contradicting_findings: List[str]
    is_cant_miss: bool
    recommended_tests: List[str]


class DiagnosticResponse(BaseModel):
    patient_summary: str
    physician_a_diagnosis: str
    physician_b_diagnosis: str
    winner: str
    margin: str
    differential: List[DifferentialItem]
    cant_miss_diagnoses: List[str]
    immediate_workup: List[str]
    admission_recommendation: str
    clinical_pearls: List[str]
    teaching_points: List[str]
    reasoning_trace: List[str]


@app.post("/diagnose", response_model=DiagnosticResponse)
async def run_diagnostic_debate(request: DiagnosticRequest):
    """Run a diagnostic debate on a clinical presentation."""
    initial_state: DiagnosticDebateState = {
        "raw_presentation": request.presentation,
        "clinical": None,
        "phase": DiagnosticPhase.PARSING,
        "current_round": 0,
        "physician_a_argument": None,
        "physician_b_argument": None,
        "physician_a_critiques": None,
        "physician_b_critiques": None,
        "physician_a_rebuttals": None,
        "physician_b_rebuttals": None,
        "attending_verdict": None,
        "final_differential": None,
        "reasoning_trace": []
    }

    try:
        result = await diagnostic_debate_agent.ainvoke(initial_state)

        if not result.get("final_differential"):
            raise HTTPException(
                status_code=500,
                detail="Diagnostic debate did not produce a differential"
            )

        diff = result["final_differential"]
        verdict = result["attending_verdict"]
        arg_a = result["physician_a_argument"]
        arg_b = result["physician_b_argument"]

        return DiagnosticResponse(
            patient_summary=diff.patient_summary,
            physician_a_diagnosis=arg_a.primary_hypothesis.diagnosis,
            physician_b_diagnosis=arg_b.primary_hypothesis.diagnosis,
            winner=verdict.stronger_case.value,
            margin=verdict.margin,
            differential=[
                DifferentialItem(
                    rank=item.rank,
                    diagnosis=item.diagnosis,
                    probability=item.probability,
                    supporting_findings=item.key_supporting_findings,
                    contradicting_findings=item.key_contradicting_findings,
                    is_cant_miss=item.is_cant_miss,
                    recommended_tests=item.recommended_tests
                )
                for item in diff.differential
            ],
            cant_miss_diagnoses=diff.cant_miss_diagnoses,
            immediate_workup=diff.immediate_workup,
            admission_recommendation=diff.admission_recommendation,
            clinical_pearls=diff.clinical_pearls,
            teaching_points=verdict.teaching_points,
            reasoning_trace=result.get("reasoning_trace", [])
        )
    except Exception as e:
        raise HTTPException(
            status_code=500,
            detail=f"Diagnostic debate failed: {str(e)}"
        )


@app.get("/health")
async def health():
    return {"status": "healthy", "service": "diagnostic-debate"}

Example Usage

curl -X POST http://localhost:8000/diagnose \
  -H "Content-Type: application/json" \
  -d '{
    "presentation": "55 year old male with sudden onset chest pain radiating to the left arm, diaphoresis, and nausea. History of hypertension and diabetes. Vitals: BP 160/100, HR 95, SpO2 97%. ECG shows no acute ST changes."
  }'

Key Learnings

Alternative diagnoses prevent anchoring - Requiring a second physician to argue for a DIFFERENT diagnosis forces consideration of alternatives that might be missed.
Can't-miss framing improves safety - Explicitly asking whether a diagnosis is "can't miss" (life-threatening if missed) ensures dangerous conditions are considered even if less likely.
Concessions build trust - When physicians acknowledge valid points from the other side, the final differential is more calibrated and trustworthy.
Teaching points emerge naturally - The attending's evaluation produces learning points that wouldn't surface from a single-perspective analysis.

Key Concepts Recap

Concept	What It Is	Why It Matters
Adversarial Diagnosis	Two physicians argue for different diagnoses	Prevents anchoring and premature closure
Can't-Miss Diagnoses	Life-threatening conditions to rule out	Safety-first even if low probability
OPQRST	Symptom characterization framework	Structured clinical data extraction
Diagnostic Scoring	Reasoning, evidence, breadth, safety	Multi-dimensional evaluation
Teaching Points	Learning insights from the case	Educational value from every case

Next Steps

Continue with:

Drug Interaction Arbitrator - Debate pattern for pharmacy
Tumor Board Simulator - Multi-expert debate (4+ specialists)

Differential Diagnosis Debate

On this page

Differential Diagnosis Debate

On this page