02 — Multi-agent orchestration with Bedrock Agents + Step Functions¶
Long-running, multi-step workflows where multiple specialized agents coordinate through a durable state machine.
Problem statement¶
A single LLM call is not the right shape for a workflow that takes minutes to hours, calls multiple external systems, may need human approval in the middle, and must survive a Lambda restart. You need durable orchestration with specialized agents as the workers.
Concrete example: a "research report" workflow that (1) gathers sources via a search agent, (2) summarizes each source via a summary agent, (3) drafts a report via a writer agent, (4) gates on human approval, (5) publishes.
Components¶
- AWS Step Functions Standard. The control plane — durable, visible, supports human-approval tasks via
waitForTaskToken. - Amazon Bedrock Agents. Each step is an agent invocation. Each agent has its own action groups and prompt template.
- AWS Lambda. Adapters between Step Functions tasks and Bedrock agent invocations; also implements action groups.
- Amazon DynamoDB. Workflow state and intermediate artifacts (URLs, summaries, draft sections).
- Amazon S3. Final artifacts and any large intermediate payloads (Step Functions payload limit is 256 KB).
- Amazon EventBridge. Cross-workflow events (completion, failure, approval requested).
- Amazon SNS / SES. Human approval notifications.
Diagram¶
flowchart TB
Start([Trigger]) --> SF[Step Functions Standard]
SF --> A1[Task: Search Agent]
A1 --> A2[Task: Summary Agent fan-out]
A2 --> A3[Task: Writer Agent]
A3 --> Approval{Human approval?}
Approval -->|Wait for token| Notify[SNS/SES → reviewer]
Notify --> Approval
Approval -->|Approved| Publish[Task: Publish]
Approval -->|Rejected| Revise[Task: Revise Agent]
Revise --> A3
Publish --> Done([End])
SF -.workflow state.-> DDB[(DynamoDB)]
A1 & A2 & A3 & Revise -.invoke.-> Agents[Bedrock Agents]
Agents --> KB[(Knowledge Bases)]
Decisions¶
D1 — Step Functions Standard, not Express¶
Context. Workflow can run for minutes to hours. Express has a 5-minute hard cap.
Decision. Standard. Pay-per-state-transition is fine for low-volume workflows; Express savings only matter at high event rate.
Alternatives. Express + recursion via EventBridge — possible but adds complexity. Step Functions Standard wins for readability.
Consequences. $0.025/1k transitions can add up; budget early.
D2 — One agent per role, not one mega-agent¶
Context. Could have a single agent with all action groups. Or split per role.
Decision. Split per role: Search, Summary, Writer, Reviser. Each one has a tight system prompt, only the tools it needs, and its own evals.
Alternatives. One generalist agent — simpler config but mixes responsibilities and harder to evaluate.
Consequences. More IaC, but each agent is independently testable. Easier to swap or A/B test one role.
D3 — Human-in-the-loop via waitForTaskToken¶
Context. Editorial / compliance workflows need approval before publication.
Decision. Use Step Functions' waitForTaskToken pattern: emit SNS notification with token, reviewer hits a "approve" link that calls a small Lambda that calls SendTaskSuccess / SendTaskFailure.
Alternatives. Polling DynamoDB — wasteful. EventBridge ↔ Step Functions integration — viable but more moving parts.
Consequences. Tokens have a max wait of 1 year, which is plenty. Make sure the approval-Lambda authenticates the reviewer.
D4 — Intermediate artifacts in S3 with pointers in payload¶
Context. Step Functions payload limit is 256 KB. Sources, summaries and drafts blow past that easily.
Decision. Put artifacts in S3 (s3://bucket/workflow-id/...), pass S3 keys in the payload, dereference inside each task.
Alternatives. Just-in-time inline via DynamoDB — DynamoDB also has 400 KB item limit. Same problem, smaller bucket.
Consequences. Slight extra latency per task (S3 round trip). Worth it.
Cost analysis¶
| Sizing | Workflows / mo | Tasks / workflow | Approx. monthly USD |
|---|---|---|---|
| S — pilot | 100 | 8 | ~ $120 |
| M — team | 2 000 | 12 | ~ $1 050 |
| L — biz unit | 20 000 | 16 | ~ $8 200 |
Inputs (M sizing):
- Step Functions: 2k × 12 = 24k transitions → free tier + ~$0.50
- Bedrock Claude Sonnet across all agents: ~$600
- Lambda: ~$30
- DynamoDB on-demand: ~$20
- S3 storage + requests: ~$10
- Knowledge Base retrieval: ~$200
- EventBridge / SNS: ~$5
- Logs & misc: ~$185
Well-Architected review¶
Operational excellence. Step Functions execution history is gold for debugging — it's the equivalent of a free distributed trace. Tag every execution with workflow_id, tenant, version for filtering.
Security. Each agent's action-group Lambda has its own IAM role. The orchestrator role can states:StartExecution only on the specific workflow ARN. Approval Lambdas validate the reviewer identity via Cognito or signed URLs.
Reliability. Standard workflows survive any worker failure — Step Functions retries are first-class. Bedrock agent invocations are idempotent for the same sessionId.
Performance efficiency. Parallel summary stage uses Map state with MaxConcurrency set to a sane cap (e.g. 10) to stay under Bedrock TPS limits.
Cost optimization. For agents that don't need Claude Sonnet, drop to Haiku — savings stack across the workflow. Cache the search-agent results in DynamoDB with a TTL.
Sustainability. No idle compute — Step Functions and Lambda only run when invoked. Bedrock is multi-tenant on AWS's side.
Trade-offs¶
Use this when:
- Workflow exceeds the 15-min Lambda cap or has human-in-the-loop gates.
- You want each step's reasoning surfaced (Step Functions UI shows inputs/outputs per state).
- Volume is hundreds to low-thousands of workflows per day.
Do NOT use this when:
- Workflow is short (< 30 s) and stateless — call Bedrock directly from a single Lambda.
- You need true sub-second latency on every step — Step Functions adds ~50–100 ms per state.
- The pattern is fan-out only (no orchestration) — EventBridge + SQS is simpler. See arch 04.
Terraform skeleton¶
See terraform/ — creates the state machine, IAM, DynamoDB and S3. Agents are referenced by ID (you provision them separately or via the AWS console while iterating).