top of page

Medical AI Agents Swarm: A Practical Blueprint for Multimodal, Safety-First Clinical AI

Most “clinical AI” demos collapse the moment you ask three questions:

  1. *What happens when the data is split across a report, an image, and messy notes?*

  2. *How do you prevent unsafe recommendations (drug–drug interactions) without hallucinating?*

  3. *Can you audit every decision the system made—end to end?*

Medical AI Agents Swarm is my attempt to answer those questions with a system that’s deliberately boring in the right ways: deterministic safety gates, traceable artifacts, and an architecture you can plausibly deploy on AWS.
 

-Disclaimer: Portfolio prototype. Not medical advice. Not for clinical use.

Multimodal intake: treating each modality like an expert witness

This stage demonstrates multimodal extraction + explainability + structured outputs.

  • PDF Agent (BioClinicalBERT): converts free-text reports into typed clinical fields (diagnosis, findings, confidence) suitable for validation and logging.
  • X-ray Agent (TorchXRayVision + Grad-CAM): GPU-backed multi-label inference with heatmaps to make predictions reviewable.
  • Notes Agent (SBERT/BioClinicalBERT embeddings): normalizes messy symptoms into retrieval-ready signals (risk factors, severity cues).

Fusion + triage: confidence is a first-class output

This stage demonstrates decision fusion under uncertainty.

  • Confidence-weighted fusion merges modality outputs without “averaging the vibes.”
  • Conflict detection + conservative routing produces an explicit decision: proceed, request more input, or flag for review.
  • Output is a single working diagnosis + confidence + rationale, not just text.

Treatment planning: protocol-first beats freeform text

This stage demonstrates constraint-aware generation.

  • Produces a structured plan (drug, dose, duration, monitoring, follow-ups) mapped to ICD-10.
  • Keeps an internal representation that is testable (unit tests, regression suites) and easy to audit.
  • The “nice narrative” is generated only at the end (report composer).

Drug safety (DDI): the non-negotiable gate

This stage demonstrates deterministic safety engineering.

  • Neo4j DDI/KG queries return severity/mechanism deterministically (no hallucinated interactions).
  • A safety critic combines DDI severity, diagnosis confidence, and risk flags into pass/warn/block decisions.
  • Safety decisions emit machine-readable artifacts suitable for compliance review.

Alternatives: safe substitutes require more than a nearest-neighbor lookup

This stage demonstrates hybrid retrieval (semantic + graph) and re-validation.

  • FAISS semantic similarity proposes indication-aligned alternatives.
  • Graph similarity (Neo4j neighborhood / GraphSAGE-style) captures structural equivalence beyond embeddings.
  • Every candidate is re-run through the same DDI gate before recommendation.

Evidence grounding: citations, not vibes

This stage demonstrates RAG with quality control.

  • Retrieves PubMed evidence and produces citation-backed claims.
  • Summarizes into clinician-readable snippets (BART-style) and filters with a relevance scorer (SBERT-style).
  • Outputs include citations + relevance scores for auditability.

Orchestration: making the pipeline explicit

This stage demonstrates agentic orchestration + observability.

  • State-machine orchestration (LangGraph-like) makes transitions explicit and debuggable.
  • Each agent has a single responsibility; routing logic is visible (not buried in prompts).
  • Produces a traceable run timeline (stage outputs + confidence + safety flags).

The production piece that’s usually missing: ACP + MCP

A notebook can chain function calls. A production swarm needs two explicit boundaries to make coordination and tool use reliable.

ACP (Agent Communication Protocol): tasks + state references

ACP is the messaging contract for orchestrator ↔ agents ↔ services coordination. The key design choice: agents exchange references (S3 URIs), not bulky payloads.

AWS mapping (reference deployment):

  • workflow + retries: AWS Step Functions
  • task fan-out: Amazon EventBridge (or MSK if you need replay/streaming)
  • agent workers: EKS/ECS services consuming tasks and emitting results

MCP Gateway: standardized tool calls (schema + policy + telemetry)

MCP is the boundary for agent → tool calls. All tool calls go through one gateway that enforces:

  • schema validation (typed inputs/outputs)
  • policy enforcement (PHI controls, tool allowlists, version pinning)
  • telemetry (latency, errors, cost per tool) with trace propagation

In AWS terms, the MCP Gateway is a small EKS service (or ECS Fargate) that fronts tools like:

  • SageMaker inference endpoints (vision/text)
  • Neo4j DDI/KG queries
  • vector retrieval
  • PubMed retrieval + summarization

One-line boundary rule: ACP moves tasks/results between agents; MCP moves tool calls from agents to services.


The audit trail: turning “a pipeline” into “a system”

Every run produces structured artifacts that make the system testable and reviewable:

  • scan results (PDF + vision)
  • fusion decision + confidence
  • treatment plan + prescription proposal
  • DDI results + safety critic scores
  • alternatives + evidence + citations

This is what lets you build evaluation harnesses, regression tests, and human review workflows.


What I’d do next (if this were going to production)

  • containerize agents + MCP gateway; deploy to EKS with HPA
  • add offline evaluation harness + golden clinical test suites per stage
  • add drift and safety monitors (DDI distribution shifts, false positives)
  • add a human-in-the-loop review queue for low-confidence cases

If you’re building safety-critical AI (healthcare, finance, industrial), the pattern is consistent: hybrid systems, explicit safety gates, and operational traceability. That’s the point of this project.

ai_medical_swarm.png
bottom of page