04 — Event-driven AI processing¶
Async classification, enrichment or moderation of incoming items at scale, with retries, DLQ and back-pressure.
Problem statement¶
A stream of items (uploaded documents, customer messages, transactions) needs to be processed by a model — classify, summarize, extract entities, moderate. Latency requirement is "minutes", not seconds. Volume is bursty: 10x spikes during business hours.
You want at-least-once processing, automatic retries, a dead-letter destination for poison pills, and a clean way to fan out to multiple downstream consumers of the result.
Components¶
- Amazon EventBridge. Schema-aware bus that accepts events from multiple producers (apps, SaaS, S3 notifications). Rule-based routing.
- Amazon SQS Standard. Decoupling queue between EventBridge and the worker — gives us back-pressure and a DLQ.
- AWS Lambda — worker. Pulls from SQS, calls Bedrock, writes results.
- Amazon Bedrock —
InvokeModel. The actual AI work. - Amazon DynamoDB. Idempotency table (dedup keys) and results store.
- Amazon SQS — DLQ. For messages that fail after max receive count.
- Amazon EventBridge — result bus. Emits a
ProcessingCompletedevent downstream consumers subscribe to (notifications, indexing, etc.). - Amazon CloudWatch. Queue-depth alarms, Bedrock throttle metric, worker error rate.
Diagram¶
flowchart LR
Producers[Producers
app / SaaS / S3] --> EB[EventBridge
input bus]
EB --> Queue[(SQS Standard)]
Queue --> Worker[Lambda worker]
Worker -->|InvokeModel| Bedrock[Amazon Bedrock]
Worker --> DDB[(DynamoDB
idempotency + results)]
Worker --> EBout[EventBridge
result bus]
EBout --> Notify[SNS / email]
EBout --> Index[Indexer / search]
Queue -.failed.-> DLQ[(SQS DLQ)]
DLQ --> CW[CloudWatch alarm]
Decisions¶
D1 — SQS between EventBridge and the worker, not direct EventBridge → Lambda¶
Context. EventBridge → Lambda is supported, but the failure semantics are weak: a failing Lambda gets the event once, then EventBridge sends it to the rule's target DLQ if configured.
Decision. EventBridge → SQS → Lambda. SQS provides visibility timeout, max receives, configurable retry, and a DLQ that we own.
Alternatives. EventBridge Pipes (newer) — also good, more sources. Worth revisiting once your producers grow.
Consequences. One extra hop (~10 ms). Tiny price for first-class retry semantics.
D2 — Idempotency keys in DynamoDB, not "best effort"¶
Context. SQS is at-least-once. Bedrock calls are expensive and have side effects (write to results store).
Decision. Compute a deterministic idempotency_key from the event payload, conditional-PutItem into DynamoDB before invoking Bedrock. If the put fails (key exists), skip.
Alternatives. Rely on the worker being deterministic — fragile, breaks the moment you add a side effect.
Consequences. One extra DynamoDB call per event. Saves money and prevents duplicate downstream events.
D3 — Result bus separate from input bus¶
Context. Downstream consumers want to react to completed processing — but should not see the raw inputs.
Decision. Emit ProcessingCompleted events on a second EventBridge bus. Schema includes only the result + a reference to the original event.
Alternatives. Reuse the input bus and tag events. Couples producers to consumers; harder to govern.
Consequences. Two buses to monitor. Worth it for governance.
D4 — Reserved concurrency on the worker¶
Context. Without a cap, a backlog spike triggers thousands of concurrent Bedrock calls — instant throttling.
Decision. Reserved concurrency on the worker matched to your Bedrock TPS quota (e.g. 20).
Alternatives. Provisioned concurrency — costs idle capacity. Reserved is free.
Consequences. Under a spike, the SQS queue grows but Bedrock stays healthy. Backlog drains gracefully.
Cost analysis¶
| Sizing | Events / mo | Avg tokens / event | Approx. monthly USD |
|---|---|---|---|
| S — pilot | 50 000 | 800 | ~ $80 |
| M — product | 1 000 000 | 1 200 | ~ $1 100 |
| L — high volume | 25 000 000 | 1 500 | ~ $28 000 |
Inputs (M sizing):
- EventBridge: 1M events × $1/M × 2 buses = ~$2
- SQS: 1M × 2 (send + receive) = 2M requests, > free tier → ~$0.40
- Lambda: 1M × 1.5 s × 512 MB = ~750k GB-s → ~$12
- Bedrock Claude Haiku: 1M × 1.2k tokens = 1.2B tokens (60/40 in/out) → ~$900
- DynamoDB on-demand: 1M reads + 2M writes + storage → ~$50
- CloudWatch: ~$50
- Misc: ~$85
Note the use of Haiku at M scale — switch to Sonnet for higher quality at ~3x cost.
Well-Architected review¶
Operational excellence. ApproximateAgeOfOldestMessage alarm on the queue (catches stuck consumers). DLQ alarm with a messages-in > 0 threshold and a 5-minute paging gap.
Security. Worker has minimal IAM — sqs:Receive* on one queue, bedrock:InvokeModel on a specific model ARN. Result bus events do not include the raw input — only a reference.
Reliability. SQS visibility timeout > p99 worker duration × 6 (handles retries). DLQ is the safety net. Idempotency table prevents double-side-effects.
Performance efficiency. Batch size on the SQS trigger: 5–10 messages per invocation amortizes Lambda init. Bedrock calls inside the worker can run in parallel via asyncio.
Cost optimization. Drop to Haiku unless quality demands Sonnet. Compress / dedup events at the producer side. Set short TTL on the idempotency table (24 h is plenty).
Sustainability. No idle compute; queue-driven scaling minimises waste.
Trade-offs¶
Use this when:
- Latency is minutes-tolerant.
- You need durable retries and a DLQ.
- Multiple downstream consumers want to react to completion.
Do NOT use this when:
- The user is waiting on the result interactively — go synchronous, see arch 03.
- Strict ordering is required — SQS Standard does not preserve order. Switch to SQS FIFO (lower throughput) or Kinesis.
- The processing is sub-100 ms — the queueing overhead dominates.
Terraform skeleton¶
See terraform/ — input/output buses, SQS + DLQ, worker Lambda with reserved concurrency, IAM roles, DynamoDB tables.