Skip to content

04 — Event-driven AI processing

Async classification, enrichment or moderation of incoming items at scale, with retries, DLQ and back-pressure.

Problem statement

A stream of items (uploaded documents, customer messages, transactions) needs to be processed by a model — classify, summarize, extract entities, moderate. Latency requirement is "minutes", not seconds. Volume is bursty: 10x spikes during business hours.

You want at-least-once processing, automatic retries, a dead-letter destination for poison pills, and a clean way to fan out to multiple downstream consumers of the result.

Components

  • Amazon EventBridge. Schema-aware bus that accepts events from multiple producers (apps, SaaS, S3 notifications). Rule-based routing.
  • Amazon SQS Standard. Decoupling queue between EventBridge and the worker — gives us back-pressure and a DLQ.
  • AWS Lambda — worker. Pulls from SQS, calls Bedrock, writes results.
  • Amazon Bedrock — InvokeModel. The actual AI work.
  • Amazon DynamoDB. Idempotency table (dedup keys) and results store.
  • Amazon SQS — DLQ. For messages that fail after max receive count.
  • Amazon EventBridge — result bus. Emits a ProcessingCompleted event downstream consumers subscribe to (notifications, indexing, etc.).
  • Amazon CloudWatch. Queue-depth alarms, Bedrock throttle metric, worker error rate.

Diagram

flowchart LR
    Producers[Producers
app / SaaS / S3] --> EB[EventBridge
input bus] EB --> Queue[(SQS Standard)] Queue --> Worker[Lambda worker] Worker -->|InvokeModel| Bedrock[Amazon Bedrock] Worker --> DDB[(DynamoDB
idempotency + results)] Worker --> EBout[EventBridge
result bus] EBout --> Notify[SNS / email] EBout --> Index[Indexer / search] Queue -.failed.-> DLQ[(SQS DLQ)] DLQ --> CW[CloudWatch alarm]

Decisions

D1 — SQS between EventBridge and the worker, not direct EventBridge → Lambda

Context. EventBridge → Lambda is supported, but the failure semantics are weak: a failing Lambda gets the event once, then EventBridge sends it to the rule's target DLQ if configured.

Decision. EventBridge → SQS → Lambda. SQS provides visibility timeout, max receives, configurable retry, and a DLQ that we own.

Alternatives. EventBridge Pipes (newer) — also good, more sources. Worth revisiting once your producers grow.

Consequences. One extra hop (~10 ms). Tiny price for first-class retry semantics.

D2 — Idempotency keys in DynamoDB, not "best effort"

Context. SQS is at-least-once. Bedrock calls are expensive and have side effects (write to results store).

Decision. Compute a deterministic idempotency_key from the event payload, conditional-PutItem into DynamoDB before invoking Bedrock. If the put fails (key exists), skip.

Alternatives. Rely on the worker being deterministic — fragile, breaks the moment you add a side effect.

Consequences. One extra DynamoDB call per event. Saves money and prevents duplicate downstream events.

D3 — Result bus separate from input bus

Context. Downstream consumers want to react to completed processing — but should not see the raw inputs.

Decision. Emit ProcessingCompleted events on a second EventBridge bus. Schema includes only the result + a reference to the original event.

Alternatives. Reuse the input bus and tag events. Couples producers to consumers; harder to govern.

Consequences. Two buses to monitor. Worth it for governance.

D4 — Reserved concurrency on the worker

Context. Without a cap, a backlog spike triggers thousands of concurrent Bedrock calls — instant throttling.

Decision. Reserved concurrency on the worker matched to your Bedrock TPS quota (e.g. 20).

Alternatives. Provisioned concurrency — costs idle capacity. Reserved is free.

Consequences. Under a spike, the SQS queue grows but Bedrock stays healthy. Backlog drains gracefully.

Cost analysis

Sizing Events / mo Avg tokens / event Approx. monthly USD
S — pilot 50 000 800 ~ $80
M — product 1 000 000 1 200 ~ $1 100
L — high volume 25 000 000 1 500 ~ $28 000

Inputs (M sizing):

  • EventBridge: 1M events × $1/M × 2 buses = ~$2
  • SQS: 1M × 2 (send + receive) = 2M requests, > free tier → ~$0.40
  • Lambda: 1M × 1.5 s × 512 MB = ~750k GB-s → ~$12
  • Bedrock Claude Haiku: 1M × 1.2k tokens = 1.2B tokens (60/40 in/out) → ~$900
  • DynamoDB on-demand: 1M reads + 2M writes + storage → ~$50
  • CloudWatch: ~$50
  • Misc: ~$85

Note the use of Haiku at M scale — switch to Sonnet for higher quality at ~3x cost.

Well-Architected review

Operational excellence. ApproximateAgeOfOldestMessage alarm on the queue (catches stuck consumers). DLQ alarm with a messages-in > 0 threshold and a 5-minute paging gap.

Security. Worker has minimal IAM — sqs:Receive* on one queue, bedrock:InvokeModel on a specific model ARN. Result bus events do not include the raw input — only a reference.

Reliability. SQS visibility timeout > p99 worker duration × 6 (handles retries). DLQ is the safety net. Idempotency table prevents double-side-effects.

Performance efficiency. Batch size on the SQS trigger: 5–10 messages per invocation amortizes Lambda init. Bedrock calls inside the worker can run in parallel via asyncio.

Cost optimization. Drop to Haiku unless quality demands Sonnet. Compress / dedup events at the producer side. Set short TTL on the idempotency table (24 h is plenty).

Sustainability. No idle compute; queue-driven scaling minimises waste.

Trade-offs

Use this when:

  • Latency is minutes-tolerant.
  • You need durable retries and a DLQ.
  • Multiple downstream consumers want to react to completion.

Do NOT use this when:

  • The user is waiting on the result interactively — go synchronous, see arch 03.
  • Strict ordering is required — SQS Standard does not preserve order. Switch to SQS FIFO (lower throughput) or Kinesis.
  • The processing is sub-100 ms — the queueing overhead dominates.

Terraform skeleton

See terraform/ — input/output buses, SQS + DLQ, worker Lambda with reserved concurrency, IAM roles, DynamoDB tables.