A Practical Blueprint for Multimodal, Safety-First Clinical AI
Most “clinical AI” demos collapse the moment you ask three questions:
1) What happens when the data is split across a report, an image, and messy notes?*
2) How do you prevent unsafe recommendations (drug–drug interactions) without hallucinating?*
3) Can you audit every decision the system made—end to end?*
AI Medical Swarm is my attempt to answer those questions with a system that’s deliberately boring in the right ways: deterministic safety gates, traceable artifacts, and an architecture you can plausibly deploy on AWS.
Multimodal intake: treating each modality like an expert witness
This stage demonstrates multimodal extraction + explainability + structured outputs.
- PDF Agent (BioClinicalBERT): converts free-text reports into typed clinical fields (diagnosis, findings, confidence) suitable for validation and logging.
- X-ray Agent (TorchXRayVision + Grad-CAM): GPU-backed multi-label inference with heatmaps to make predictions reviewable.
- Notes Agent (SBERT/BioClinicalBERT embeddings): normalizes messy symptoms into retrieval-ready signals (risk factors, severity cues).
Fusion + triage: confidence is a first-class output
This stage demonstrates decision fusion under uncertainty.
- Confidence-weighted fusion merges modality outputs without “averaging the vibes.”
- Conflict detection + conservative routing produces an explicit decision: proceed, request more input, or flag for review.
- Output is a single working diagnosis + confidence + rationale, not just text.
Treatment planning: protocol-first beats freeform text
This stage demonstrates constraint-aware generation.
- Produces a structured plan (drug, dose, duration, monitoring, follow-ups) mapped to ICD-10.
- Keeps an internal representation that is testable (unit tests, regression suites) and easy to audit.
- The “nice narrative” is generated only at the end (report composer).
Drug safety (DDI): the non-negotiable gate
This stage demonstrates deterministic safety engineering.
- Neo4j DDI/KG queries return severity/mechanism deterministically (no hallucinated interactions).
- A safety critic combines DDI severity, diagnosis confidence, and risk flags into pass/warn/block decisions.
- Safety decisions emit machine-readable artifacts suitable for compliance review.
Alternatives: safe substitutes require more than a nearest-neighbor lookup
This stage demonstrates hybrid retrieval (semantic + graph) and re-validation.
- FAISS semantic similarity proposes indication-aligned alternatives.
- Graph similarity (Neo4j neighborhood / GraphSAGE-style) captures structural equivalence beyond embeddings.
- Every candidate is re-run through the same DDI gate before recommendation.
Evidence grounding: citations, not vibes
This stage demonstrates RAG with quality control.
- Retrieves PubMed evidence and produces citation-backed claims.
- Summarizes into clinician-readable snippets (BART-style) and filters with a relevance scorer (SBERT-style).
- Outputs include citations + relevance scores for auditability.
Orchestration: making the pipeline explicit
This stage demonstrates agentic orchestration + observability.
- State-machine orchestration (LangGraph-like) makes transitions explicit and debuggable.
- Each agent has a single responsibility; routing logic is visible (not buried in prompts).
- Produces a traceable run timeline (stage outputs + confidence + safety flags).

The production piece that’s usually missing: ACP + MCP
A notebook can chain function calls. A production swarm needs two explicit boundaries to make coordination and tool use reliable.
ACP (Agent Communication Protocol): tasks + state references
ACP is the messaging contract for orchestrator ↔ agents ↔ services coordination.
The key design choice: agents exchange references (S3 URIs), not bulky payloads.
AWS mapping (reference deployment):
- workflow + retries: AWS Step Functions
- task fan-out: Amazon EventBridge (or MSK if you need replay/streaming)
- agent workers: EKS/ECS services consuming tasks and emitting results
MCP Gateway: standardized tool calls (schema + policy + telemetry)
MCP is the boundary for agent → tool calls. Instead of every agent implementing its own integrations, all tool calls go through one gateway that enforces:
- schema validation (typed inputs/outputs)
- policy enforcement (PHI controls, tool allowlists, version pinning)
- telemetry (latency, errors, cost per tool) with trace propagation
In AWS terms, the MCP Gateway is a small EKS service (or ECS Fargate) that fronts tools like:
- SageMaker inference endpoints (vision/text)
- Neo4j DDI/KG queries
- vector retrieval
- PubMed retrieval + summarization
One-line boundary rule: ACP moves tasks/results between agents; MCP moves tool calls from agents to services.
The audit trail: turning “a pipeline” into “a system”
Every run produces structured artifacts that make the system testable and reviewable:
- scan results (PDF + vision)
- fusion decision + confidence
- treatment plan + prescription proposal
- DDI results + safety critic scores
- alternatives + evidence + citations
This is what lets you build evaluation harnesses, regression tests, and human review workflows.
What I’d do next (if this were going to production)
- containerize agents + MCP gateway; deploy to EKS with HPA
- add offline evaluation harness + golden clinical test suites per stage
- add drift and safety monitors (DDI distribution shifts, false positives)
- add a human-in-the-loop review queue for low-confidence cases
If you’re building safety-critical AI (healthcare, finance, industrial), the pattern is consistent: hybrid systems, explicit safety gates, and operational traceability. That’s the point of this project.
