Reference Title

The Outbox Pattern

AUTHOR: M. ABDELNABY DATE: 2026-04-08 CATEGORY: Architecture MODE: READONLY
Synopsis

Dual-writes are a trap. Every time you write to a database and publish an event in the same codepath without a transaction boundary around both, you are one crash away from silent data inconsistency. The outbox pattern is how you fix that — without distributed transactions.

DESCRIPTION

The pattern is simple enough to explain in one sentence: write your event to the same database as your state change, in the same transaction, then have a separate process relay it to the message broker.

Simple to explain. Consistently misunderstood. Most engineers reach for the dual-write before they reach for the outbox, usually because the dual-write feels like the straightforward solution. It is not. It is a race condition with a marketing problem.

This article walks through why dual-writes fail, what the outbox pattern actually looks like in practice, and the failure modes you still have to handle even after you adopt it.

THE DUAL-WRITE TRAP

The naive implementation looks like this:

def place_order(user_id, items):
    order = db.orders.insert({
        "user_id": user_id,
        "items": items,
        "status": "placed"
    })

    event_bus.publish("order.placed", {
        "order_id": order.id,
        "user_id": user_id
    })

    return order

There are two failure windows here. The process crashes after the DB write but before the publish — the order exists, the event never fires, downstream services never know. Or the DB write succeeds, the publish succeeds, then a retry loop runs the whole function again — now you have a duplicate order and a duplicate event.

Neither of these is a theoretical edge case. They are what happens when your deployment restarts mid-request. They are what happens on a Tuesday.

THE CORRECT MODEL: TRANSACTIONAL OUTBOX

Write the event as a row in your own database, inside the same transaction as the state change. Atomicity is guaranteed at the DB level — either both records commit, or neither does. A separate relay process reads undelivered outbox rows and publishes them.

-- The outbox table lives in your application database
CREATE TABLE outbox_events (
    id          UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    topic       TEXT NOT NULL,
    payload     JSONB NOT NULL,
    created_at  TIMESTAMPTZ DEFAULT NOW(),
    published_at TIMESTAMPTZ
);
def place_order(user_id, items):
    with db.transaction():
        order = db.orders.insert({
            "user_id": user_id,
            "items": items,
            "status": "placed"
        })

        db.outbox_events.insert({
            "topic": "order.placed",
            "payload": {
                "order_id": str(order.id),
                "user_id": str(user_id)
            }
        })

    return order

The application codepath is now complete. No event bus touched. No network call made. The transaction either commits both rows or neither. The race condition is gone.

THE RELAY PROCESS

A background worker polls the outbox table, publishes unpublished events to the message broker, then marks them delivered.

def relay_outbox():
    while True:
        events = db.query("""
            SELECT id, topic, payload
            FROM outbox_events
            WHERE published_at IS NULL
            ORDER BY created_at
            LIMIT 100
            FOR UPDATE SKIP LOCKED
        """)

        for event in events:
            event_bus.publish(event.topic, event.payload)

            db.execute("""
                UPDATE outbox_events
                SET published_at = NOW()
                WHERE id = %s
            """, event.id)

        time.sleep(0.5)

FOR UPDATE SKIP LOCKED is load-bearing here. It lets you run multiple relay instances without them stepping on each other — each instance locks a batch, processes it, and moves on. Rows locked by one instance are invisible to others.

FAILURE MODES YOU STILL OWN

The outbox pattern eliminates the dual-write race. It does not eliminate all delivery concerns.

At-least-once delivery. If the relay publishes an event then crashes before marking it delivered, the event will be published again on the next poll cycle. Consumers must be idempotent. This is not optional.

# Consumer must handle duplicate events gracefully
def handle_order_placed(event):
    if db.processed_events.exists(event.id):
        return  # already handled, skip

    db.processed_events.insert(event.id)
    fulfill_order(event.payload["order_id"])

Relay lag. The outbox introduces a polling delay between the state change and the event delivery. For most workloads this is acceptable — sub-second. If you need true real-time delivery, replace the polling relay with CDC (Change Data Capture) against your database's replication log. Debezium is the standard implementation for Postgres and MySQL.

Outbox table growth. Without a cleanup job, the outbox table grows indefinitely. Archive or delete published events on a schedule. Keep enough history to debug delivery failures; not enough to bloat your primary database.

-- Run on a schedule: delete events published more than 7 days ago
DELETE FROM outbox_events
WHERE published_at IS NOT NULL
  AND published_at < NOW() - INTERVAL '7 days';

WHEN NOT TO USE IT

The outbox pattern is overhead. Do not reach for it by default.

If your operation does not need to notify downstream services — it does not need an outbox. If your system already uses a database that supports atomic multi-document transactions with external systems (rare, and you would know) — you may not need it. If your event publishing is fully synchronous and the only consumer is within the same service boundary — skip it.

The outbox earns its complexity when: a state change in your service must reliably trigger work in another service, and "reliably" means you cannot tolerate silent drops or duplicates without detection.

MODEL

Write your side effects into the same transaction as your state. Publish them asynchronously. Accept that delivery is at-least-once and design consumers accordingly. The outbox does not give you exactly-once — it gives you durable, inspectable, replayable delivery. That is the correct tradeoff for most systems.

SEE ALSO

idempotency-keys(3), change-data-capture(5), at-least-once-delivery(2), distributed-transactions(8)

← Exit to Logbook Collection