A Practical Blueprint for Multimodal, Safety-First Clinical AI

Most “clinical AI” demos collapse the moment you ask three questions:

1) What happens when the data is split across a report, an image, and messy notes?*

2) How do you prevent unsafe recommendations (drug–drug interactions) without hallucinating?*

3) Can you audit every decision the system made—end to end?*

AI Medical Swarm is my attempt to answer those questions with a system that’s deliberately boring in the right ways: deterministic safety gates, traceable artifacts, and an architecture you can plausibly deploy on AWS.

Github

Multimodal intake: treating each modality like an expert witness

This stage demonstrates multimodal extraction + explainability + structured outputs.

PDF Agent (BioClinicalBERT): converts free-text reports into typed clinical fields (diagnosis, findings, confidence) suitable for validation and logging.
X-ray Agent (TorchXRayVision + Grad-CAM): GPU-backed multi-label inference with heatmaps to make predictions reviewable.
Notes Agent (SBERT/BioClinicalBERT embeddings): normalizes messy symptoms into retrieval-ready signals (risk factors, severity cues).

Fusion + triage: confidence is a first-class output

This stage demonstrates decision fusion under uncertainty.

Confidence-weighted fusion merges modality outputs without “averaging the vibes.”
Conflict detection + conservative routing produces an explicit decision: proceed, request more input, or flag for review.
Output is a single working diagnosis + confidence + rationale, not just text.

Treatment planning: protocol-first beats freeform text

This stage demonstrates constraint-aware generation.

Produces a structured plan (drug, dose, duration, monitoring, follow-ups) mapped to ICD-10.
Keeps an internal representation that is testable (unit tests, regression suites) and easy to audit.
The “nice narrative” is generated only at the end (report composer).

Drug safety (DDI): the non-negotiable gate

This stage demonstrates deterministic safety engineering.

Neo4j DDI/KG queries return severity/mechanism deterministically (no hallucinated interactions).
A safety critic combines DDI severity, diagnosis confidence, and risk flags into pass/warn/block decisions.
Safety decisions emit machine-readable artifacts suitable for compliance review.

Alternatives: safe substitutes require more than a nearest-neighbor lookup

This stage demonstrates hybrid retrieval (semantic + graph) and re-validation.

FAISS semantic similarity proposes indication-aligned alternatives.
Graph similarity (Neo4j neighborhood / GraphSAGE-style) captures structural equivalence beyond embeddings.
Every candidate is re-run through the same DDI gate before recommendation.

Evidence grounding: citations, not vibes

This stage demonstrates RAG with quality control.

Retrieves PubMed evidence and produces citation-backed claims.
Summarizes into clinician-readable snippets (BART-style) and filters with a relevance scorer (SBERT-style).
Outputs include citations + relevance scores for auditability.

Orchestration: making the pipeline explicit

This stage demonstrates agentic orchestration + observability.

State-machine orchestration (LangGraph-like) makes transitions explicit and debuggable.
Each agent has a single responsibility; routing logic is visible (not buried in prompts).
Produces a traceable run timeline (stage outputs + confidence + safety flags).

The production piece that’s usually missing: ACP + MCP

A notebook can chain function calls. A production swarm needs two explicit boundaries to make coordination and tool use reliable.

ACP (Agent Communication Protocol): tasks + state references

ACP is the messaging contract for orchestrator ↔ agents ↔ services coordination.

The key design choice: agents exchange references (S3 URIs), not bulky payloads.

AWS mapping (reference deployment):

workflow + retries: AWS Step Functions
task fan-out: Amazon EventBridge (or MSK if you need replay/streaming)
agent workers: EKS/ECS services consuming tasks and emitting results

MCP Gateway: standardized tool calls (schema + policy + telemetry)

MCP is the boundary for agent → tool calls. Instead of every agent implementing its own integrations, all tool calls go through one gateway that enforces:

schema validation (typed inputs/outputs)
policy enforcement (PHI controls, tool allowlists, version pinning)
telemetry (latency, errors, cost per tool) with trace propagation

In AWS terms, the MCP Gateway is a small EKS service (or ECS Fargate) that fronts tools like:

SageMaker inference endpoints (vision/text)
Neo4j DDI/KG queries
vector retrieval
PubMed retrieval + summarization

One-line boundary rule: ACP moves tasks/results between agents; MCP moves tool calls from agents to services.

The audit trail: turning “a pipeline” into “a system”

Every run produces structured artifacts that make the system testable and reviewable:

scan results (PDF + vision)
fusion decision + confidence
treatment plan + prescription proposal
DDI results + safety critic scores
alternatives + evidence + citations

This is what lets you build evaluation harnesses, regression tests, and human review workflows.

What I’d do next (if this were going to production)

containerize agents + MCP gateway; deploy to EKS with HPA
add offline evaluation harness + golden clinical test suites per stage
add drift and safety monitors (DDI distribution shifts, false positives)
add a human-in-the-loop review queue for low-confidence cases

If you’re building safety-critical AI (healthcare, finance, industrial), the pattern is consistent: hybrid systems, explicit safety gates, and operational traceability. That’s the point of this project.

Leo ooooo

A Practical Blueprint for Multimodal, Safety-First Clinical AI

Leo ooooo