top of page

From Single Chatbot to a Stateful Multi‑Agent Travel Planner (Google ADK)

Most “travel chatbots” fail the same way: they forget constraints, return generic advice, and confidently invent details. Travel planning is a high‑variance task — it includes preferences, logistics, and time‑sensitive information — so a single prompt that tries to do everything becomes brittle.

This project demonstrates a better approach: build travel planning as a small workflow using Google’s Agent Development Kit (ADK). The system captures constraints, runs parallel research tasks, and synthesizes a plan from the collected artifacts — while persisting state across turns.

Why travel planning is not “one prompt”

A request like:

“I want a 5-day trip in April, budget $2500, leaving from NYC. Prefer food and museums.”

contains multiple sub-tasks:

  1. Constraint extraction (dates, duration, budget, origin, party size)
  2. Destination ideation (shortlist options that fit)
  3. Time-sensitive research (events, advisories, seasonal notes)
  4. Local grounding (specific places near the chosen area)
  5. Synthesis (turn findings into a structured plan)

If you force all of this into one model call, the output is hard to verify. More importantly, the model is incentivized to “fill in gaps” with plausible but wrong details. That’s why production agent systems usually introduce structure: intermediate artifacts, tool grounding, and explicit state.


The core architecture: sequential + parallel agents with state

This implementation uses a production-friendly orchestration pattern:

  • Sequential stages for deterministic steps with clear input/output contracts
  • Parallel fan-out for independent research tasks
  • State + memory to preserve preferences across multiple turns
  • Tool grounding to reduce hallucination
  • Callbacks for guardrails + observability

At a high level, the root agent is a sequential pipeline:

Trip Profiler → Parallel Research → Synthesizer

The parallel research stage fans out into three leaf agents:

  • Destination agent: proposes 2–4 candidate destinations with rationale
  • News agent: collects time-sensitive bullets via a search grounding wrapper
  • Places agent: retrieves nearby POIs using OpenStreetMap (Nominatim + Overpass)

Business value: why this design is safer and cheaper to evolve

From a business standpoint, this architecture addresses three recurring risks:

1) Personalization failures destroy trust

If an assistant forgets the user’s budget, dates, or preferences, users stop relying on it. Persisting preferences and trip constraints in state prevents “chatbot amnesia” and makes multi-turn planning viable.

2) Hallucinated travel details create brand risk

Travel is high-stakes. Fake addresses, invented events, or incorrect advisories lead to immediate credibility loss. Grounding time-sensitive claims in tool output (search) and sourcing POIs from an external database (OSM) reduces the surface area for confident nonsense.

3) Monolith prompts are expensive to maintain

In a single-prompt system, changing one behavior often destabilizes everything. In a multi-agent pipeline, each stage has a narrow responsibility. You can replace the places tool or adjust the destination prompt without rewriting the entire planner.


Stage 1: Trip Profiler (constraint capture)

The profiler’s job is simple and critical: extract structured constraints and merge them into persistent state.

It writes to stable keys:

  • user:travel_prefs — long-lived preferences (food, pace, interests)
  • current_trip — session-specific constraints (dates, origin, party size, budget)
  • tp:trip_profile — the profiler output summary and missing fields

This turns messy natural language into something downstream stages can rely on.

A practical design choice: the profiler also identifies “missing essentials.” If a user didn’t specify dates or budget, the system should not guess. Instead, it should ask 1–2 clarifying questions.


Stage 2: Parallel research (modular and fast)

Once constraints exist, the system runs three research tasks concurrently.

Destination agent

Produces a shortlist of destinations based on constraints and preferences. This agent is responsible for fit, not for inventing “facts.” If time-sensitive claims are required, they should be grounded through the research artifacts (not guessed).

News agent (grounded)

Time-sensitive information is where hallucinations become dangerous: events, closures, advisories, seasonal issues. In this implementation, “news” is retrieved through a wrapper tool, google_search_grounding, implemented as an AgentTool.

Two important points:

  1. The search wrapper makes it explicit what came from a tool versus what came from the model.
  2. The tool returns concise bullet summaries suitable for inclusion in the final plan.

This is not a guarantee that all output is perfect — but it gives the system a structured place to attach evidence and improves debuggability.

Places agent (OpenStreetMap)

For local POIs, the system uses OpenStreetMap APIs:

  • Nominatim for geocoding (turn “Paris” into lat/lon)
  • Overpass for querying POIs near that coordinate

This is a pragmatic choice for demos and portfolios because it’s reproducible without paid API keys. It also creates a clean separation: the planner can recommend places it actually retrieved, rather than inventing addresses.


Stage 3: Synthesizer (turn artifacts into a plan)

The synthesizer reads:

  • preferences (user:travel_prefs)
  • trip constraints (current_trip)
  • research artifacts (tp:destinations, tp:news, tp:places)

and emits a structured plan. A solid structure for readability and evaluation:

  1. 2–4 destination options with rationale
  2. Top pick + mini itinerary (3–5 bullets)
  3. Timely considerations grounded in tp:news
  4. Place suggestions sourced from tp:places (avoid invented addresses)

The synthesizer also uses the profiler’s “missing essentials” output: if critical fields are missing, it asks clarifying questions rather than producing a low-quality plan.


Guardrails and observability: callbacks are not optional

If you claim “production-grade,” you need control and visibility. This project wires agents into ADK’s lifecycle callbacks (before/after agent, model, and tool calls) to support:

  • Prompt injection filtering (block obvious override attempts)
  • Tool argument sanitization (clean user text before it becomes a query; clamp limits)
  • Per-turn logging (counts and timings stored in tp:* keys)

This is what makes the system maintainable. Without logs, you can’t debug failures. Without sanitization, you can’t safely expose tools.


What makes this portfolio-grade

Hiring teams care less about “which model” and more about system design:

  • Clear contracts between stages (stable state keys)
  • Multi-turn correctness (session + memory)
  • Grounding patterns (tool outputs as artifacts)
  • Modularity (replace parts without collapse)
  • Observability (callbacks, logs, counters)

This project is small enough to understand quickly and realistic enough to demonstrate modern agent engineering.


Practical next steps (high ROI upgrades)

  1. Add caching for geocoding + Overpass responses to reduce throttling risk.
  2. Add retries/backoff for Overpass failures so the system degrades gracefully.
  3. Add exports: Markdown + JSON + calendar .ics output.
  4. Add evaluation: a small test set + rubric (constraint adherence, grounding usage, structure compliance).
  5. Add cost estimation: rough ranges for lodging/food/transport tied to user budget.

Key takeaways

  • Travel planning is best treated as a workflow, not a single prompt.
  • Sequential + parallel multi-agent design improves reliability and makes debugging feasible.
  • State + memory are mandatory for multi-turn personalization.
  • Grounding via tools reduces hallucination risk, but does not magically guarantee correctness.
  • Callbacks + logs are the backbone of safety and maintainability.

How to evaluate it (simple but effective)

A lightweight evaluation loop makes this more than a demo:

  • Constraint adherence: does the final plan respect dates, budget, origin, and stated interests?
  • Grounding usage: are time-sensitive claims attributable to tp:news bullets rather than free-form guesses?
  • Structure compliance: does the output always follow the same headings and limits (options, itinerary bullets, POIs)?
  • Failure behavior: when key fields are missing, does it ask 1–2 questions instead of hallucinating?

Even 30–50 test prompts plus a small scoring rubric will catch most regressions.


Resources

bottom of page