Tape vs.⌗
The honest comparisons, organised by feature. Tape is the agent-runtime ceiling
(scoped to ADK); Temporal is a (much more mature) durable-execution floor; the
others sit in the same agent layer Tape sits in, with different opinions about
where the runtime should live. They're not all rivals — the spec lists
"Temporal/Restate engine behind tape.proto" as v2 — but they're the live
choices for "make my agent durable" today.
- §1 Tape vs Temporal — feature-parity audit
- §2 Tape vs LangGraph durable execution — same layer, different shape
- §3 Tape vs Pydantic AI + DBOS — same conclusion, different framework
1. Tape vs Temporal⌗
What Tape covers, end to end⌗
| Temporal feature | Tape | Notes |
|---|---|---|
| Workflows (durable orchestration) | ✓ (your ADK agent) | Tape doesn't ask for workflow code — your agent stays as it is; TapePlugin journals decisions and effects through ADK's existing hooks |
| Activities (boundary-crossing units) | ✓ (every tool call) | @tape.effect(compensate=…, status_check=…, retry=…) declares them; TapePlugin journals input/output, status, errors |
| Activity retries with backoff | ✓ | @tape.effect(retry=tape.RetryPolicy(max_attempts=N, initial_interval_s=…, backoff_coefficient=…, max_interval_s=…, jitter=…, retry_on=(…,), non_retryable=(…,))) — same idempotency key passed to the counterparty on every attempt |
| Signals (named messages to a running run) | ✓ | AwaitSignal / SendSignal; tape.gate_tool("approval") is the LongRunningFunctionTool shape |
Timers (workflow.sleep) |
✓ | tape.set_timer(run_id, fire_at_ms, kind) / tape.cancel_timer; the timer reactor (tape.reactors.fire_due_timers_once) fires them. Built-in kinds: gate_timeout, redrive, reconcile, plus your own via a callback |
| Cancellation | ✓ (cooperative) | tape.cancel_run(run_id, reason=…) marks the run CANCELLED; TapePlugin(check_cancellation=True) checks at the next model/tool boundary and bails (or tape.is_cancelled(tool_context) on demand). Not preemptive — a tool body that's mid-syscall keeps running until it returns |
| Compensation / Sagas | ✓ | @tape.effect(compensate=…) registers the inverse; tape.compensate_run(run_id) walks obligations LIFO; failures land in stuck (never silently "compensated") |
Side effects (workflow.side_effect) |
✓ | tape.sample(tool_context, fn) calls fn once per run, journals the result, returns-from-history on re-drive; tape.now(), tape.uuid(), tape.random() are pre-wrapped |
| Heartbeats (extend an activity's deadline) | ✓ (run-level) | tape.heartbeat(tool_context) extends the run's lease — for long-running tool bodies, so the recovery reactor doesn't decide the run is stale and re-drive concurrently |
| Replay (re-derive workflow state from history) | ✓ | The whole point: the agent re-drives via ADK's invocation_id; TapePlugin short-circuits confirmed effects and replays recorded decisions; the resume point is the first seq the journal has no record of |
| Idempotency keys | ✓ (and named-by-decision) | The key is run/decision-N/<tool>/<call_idx> (or the ADK function_call_id derivative) — not a hash of inputs, which can be recomputed differently on replay |
The third outcome (unknown ack) |
✓ | EffectStatus.UNKNOWN + the reconciler reactor that calls the registered status_check. Temporal doesn't have this as a first-class status — you build it on top |
| Budget as run state | ✓ | tape.Budget(usd_cap=…, token_cap=…); AdmitBudget before, ChargeBudget after; spent counters survive crashes |
| Continue-as-new (truncate history; restart) | ✗ (planned) | Sketched in spec §6 / §13; not yet implemented. Workaround: end the run TERMINAL with a summary state, start a new session with that state as the seed |
| Schedules (cron / interval) | ⚙️ (use timers) | Pattern: set_timer(fire_at_ms=next_cron_tick, kind="periodic", payload=…); the handler does the work and re-arms with set_timer(fire_at_ms=next_after_that). A dedicated tape_schedules table + cron parsing is a v2 add |
| Child workflows | ✗ (planned) | Sketched — parent_run_id on tape_runs + cascading cancel + "wait for child" via signals. Workaround: spawn a fresh begin_run and signal back when done |
Versioning / patching (workflow.patched) |
✓ (manual) | tape.policy_is(tool_context, "cfo-2026.05") reads the recorded policy_version — TapePlugin records it on every decision; agents can branch on it. No automatic "use the new code path only for new runs" mechanism — you write the branch |
| Queries (read-only call on a running workflow) | ⚙️ (read the journal) | The journal + the session state are the queryable surface — tape.TapeClient.get_run / get_effect / get_session / list_obligations / subscribe_events. No "register a named query handler on the workflow" mechanism. The cross-run WAL tail (SubscribeEvents) covers observability |
| Updates (synchronous-mutating RPC, with validation) | ✗ | Use signals (SendSignal is the closest — async, no return value beyond the eventual run output) |
| Workers / task queues | ⚙️ (different model) | Tape doesn't have a worker pool; the agent process is the worker. Recovery is the reactor pulling from ListRunsToRecover. Multiple agent replicas serialise per-run via the lease |
| Pluggable persistence backends | ✓ | sqlite:… / postgres:… / alloydb:… / bigtable:… chosen by URL at deploy time (TAPE_STORE). Adding a backend = implementing the RunStore trait |
| Horizontal scaling of the server | ✓ | The Rust server is stateless; run N replicas behind a load balancer; the lease + idempotent RPCs make a double-drive harmless |
| Reactive shared state (coordinate through journaled state, not messages) | ✓ (Tape-original; treatise §IX ⑥) | tape.set_value(ns, key, v, if_version=…) writes monotonically-versioned values with optional CAS; tape.get_value(ns, key) reads; tape.watch_value(ns, key, from_version=0) streams ValueEvents for the snapshot + every change with (prev_version, prev_value_json) attached — so a subscriber observes the transition (X: 70 → 90) rather than just the latest. tape.delete_value tombstones (watchers see one final event with deleted=True). Different primitive from signals (point-to-point, single-consumer) and the WAL tail (cross-run, journal-of-everything) — this is shared state, fan-out, by-key. Temporal's nearest analogue is workflow updates + queries; the agent equivalent there requires rolling your own |
| Push-based event consumption | ✓ (WAL tail) | SubscribeEvents streams cross-run journal entries (ts, run_id, seq-ordered); wire it through tape.reactors.run_event_fanout(url, sink=…) for an in-process consumer or tape.reactors.run_outbox_relay(url, sink, cursor_path=…) for an exactly-once-effective publisher (durable cursor + at-least-once delivery + consumer dedup on (run_id, seq)). Built-in sinks: LogSink, WebhookSink, PubSubSink (Google Cloud Pub/Sub, lazy import). On Bigtable: "use change streams" for cross-run tail |
| Multi-language SDKs | ✓ (Python full · TS / Go / Java wired-client + tests) | The Python SDK is the reference (ADK plugin + session service + reactors + sinks). TS / Go / Java each ship: a working TapeClient covering all RPCs (run lifecycle, decisions, effects with the dedup short-circuit, obligations, budget admit/charge, gates, timers, reconciliation, the WAL tail, sessions), tape:// plaintext + tapes:// TLS (Java accepts a Bearer token; TS/Go auto-attach a Google ID token from ADC), and a smoke test that round-trips the full lifecycle against a real tape-server. The per-language ADK adapter (TapePlugin / TapeSessionService for each language's ADK port) is mechanical work on top — the protocol is the stable surface |
| Web UI / Cloud (Temporal Cloud) | ✗ | Tape has no UI (just SubscribeRun / SubscribeEvents as machine feeds), no managed offering. Temporal Cloud removes ops; Tape is yours to run |
| Search attributes (custom indexed metadata) | ✗ | Roadmap |
| Replay-testing tooling (re-execute history with new code) | ✗ | Roadmap (the journal + the session events have everything needed) |
| Determinism enforcement (sandbox detects diverging replay) | ✗ | Tape can't sandbox ADK Python code. P11 documents the contract ("your code must be deterministic, route non-determinism through tape.sample / tools"); Temporal enforces it. Real difference, with real footguns on the Tape side if ignored |
Choosing between Tape and Temporal⌗
Pick Tape when the job is "make my ADK agent durable, with minimal change to
the agent, and I want the agent-shaped primitives" — decision ledger,
decision-keyed idempotency, the unknown state + reconciler, gates as durable
suspends, budget as run state, model-written compensation, journaled
non-determinism — without a workflow rewrite. You self-host the Rust server +
(SQLite/Postgres/AlloyDB/Bigtable); the agent stays as ADK code.
Pick Temporal when you need a battle-tested, multi-language, generally-useful durable-execution platform — non-agent use cases included — with a managed option (Temporal Cloud), determinism enforced by the SDK, mature versioning, schedules, child workflows, search attributes, and a Web UI. The cost is expressing your agent as a workflow + activities, which for ADK is a rewrite.
Pick both — by putting Temporal under Tape (v2 in the spec: a Temporal-backed
RunStore). The agent keeps Tape's API, the durable execution is Temporal's;
you get the agent ceiling on top of the production-grade floor, which is the
combination the treatise's architecture diagrams point at.
What's deferred, and how to bridge it today⌗
- Continue-as-new — end the run TERMINAL with a summary state in the
session; start a new session seeded with it. Plan: a
ContinueAsNewRPC + atape.continue_as_new(tool_context)helper. - Child runs —
begin_runa sub-run, signal-back when done. Plan: aparent_run_idqualifier + cascading cancel. - Cron-style schedules — periodic timer that re-arms itself. Plan: a
tape_schedulestable + cron parsing, with a schedules reactor. - Named queries — read
get_run/get_session/subscribe_events. Plan: aRunQueryRPC that routes to an agent-registered handler (only useful when the agent process is alive — limited utility for the always-on case). - Updates — use signals + the next decision boundary for the response. No near-term plan.
- Sandbox-enforced determinism — Python can't be sandboxed safely; lint
tape.sampleusage; document P11 prominently.
Test coverage of this parity work⌗
tape/tests/test_features.py— retry policies (succeeds-after-retries, gives-up-on-non-retryable, exhausts-max-attempts), cancellation (cancel_run→CANCELLED; cancelled runs are not recoverable), policy-version branch.tape/tests/test_resume.py— the original kill-and-resume (3 cases).tape/tests/test_reactors.py— the timer reactor + the reconciler reactor.tape/tests/test_bigtable.py— the same kill-and-resume against the Bigtable backend (emulator-bootstrapped).tape/tests/test_values.py— reactive store: write/get/CAS/delete roundtrip, CAS version-conflict rejection, and the headline X-70-to-90 watcher seeing both the snapshot and the transition with the previous value attached.- Rust:
cargo test— the in-process store + the gRPC service.
2. Tape vs LangGraph durable execution⌗
Source for LangGraph claims: https://docs.langchain.com/oss/python/langgraph/durable-execution.
LangGraph and Tape sit in the same layer: they make a graph-shaped agent
durable without asking the developer to rewrite the agent as a workflow. The
shape of the answer is different — LangGraph cuts the journal at node /
entrypoint boundaries and asks the user to wrap non-determinism in @task;
Tape cuts the journal at the decision and effect boundaries the
treatise's §IX names, and the user wraps @tape.effect(...). Both then ride
on the counterparty's idempotency for the actual exactly-once-effective
guarantee at the wire.
| Question (the treatise's reactive-defence ⓵–⑦, applied to both) | LangGraph (durable execution) | Tape |
|---|---|---|
| Is state durable? | ✓ via checkpointer= on compile() (memory / SQLite / Postgres saver). Cut: at node boundaries (StateGraph) or entrypoint boundaries (Functional API). durability="sync"\|"async"\|"exit" chooses when the cut commits |
✓ via the Rust server + RunStore (SQLite/Postgres/AlloyDB/Bigtable). Cut: per decision (every model call) and per effect (every tool call's intent + outcome), each written in one txn with the ADK event |
| Does the trigger fire exactly-once? | Within a thread (thread_id), the node either ran-to-completion (its checkpoint is committed) or it didn't (it re-runs on resume). The dedup for an effect inside a node still rides on the counterparty |
The decision-keyed idempotency key (run/decision-N/<tool>) names the decision the model made, not its inputs. A confirmed effect short-circuits on re-drive; a pending effect re-issues with the same key; the counterparty dedupes |
| Is the handler itself durable? | Node body re-runs from the top on resume. Inside a Functional API entrypoint, @task-wrapped sub-units cache their results — a completed task returns from history, an incomplete one re-runs |
The agent re-drives via ADK's invocation_id; recorded decisions short-circuit at before_model_callback; confirmed effects short-circuit at before_tool_callback. The body never runs for already-confirmed work |
| Where is the timer? | Within a node: ordinary Python (not durable across crashes). HITL interrupt() does hold the run across deploys via the checkpointer |
tape.set_timer(run_id, fire_at_ms, kind); a timer reactor fires due timers across processes. Kinds: gate_timeout, redrive, reconcile, custom |
| Is the condition still true when the handler runs? | User discipline; no built-in atomic check-and-set | Optimistic versioning on the reactive store (tape.set_value(ns, key, v, if_version=…)) closes the TOCTOU race |
| Where is the journal? | Checkpoints are state-versioned snapshots of the graph state. Not a decision/effect/obligation ledger — that shape is user-built on top | Three explicit ledgers — decision, effect, obligation — interleaved by (run_id, seq). The journal is the audit |
| Is replay deterministic? | Documented contract: wrap non-deterministic operations in @task (Functional API) or in nodes; otherwise replay drift. Not sandbox-enforced |
Documented (P11): route non-determinism through tape.sample / tools. Not sandbox-enforced. Same footgun shape as LangGraph |
| Human in the loop | interrupt(payload) pauses the graph; invoke(Command(resume=…), config) resumes with the user's reply. Survives crashes via the checkpointer |
tape.gate_tool("approval") returns pending (a LongRunningFunctionTool); SendSignal resolves it; the recovery loop re-invokes the run; ADK injects the signal payload as the tool result |
| Graceful drain | RunControl.request_drain() stops after the current superstep and saves a resumable checkpoint; invoke(None, config) resumes |
tape.cancel_run(run_id, reason=…) marks the run CANCELLED; the plugin bails at the next model/tool boundary. Cooperative, not preemptive |
The third outcome (unknown ack) |
✗ — not first-class. You either rerun the node (re-issuing without a counterparty-side key is unsafe) or build your own reconciler | ✓ EffectStatus.UNKNOWN + a registered status_check reactor that asks the counterparty and flips pending/unknown → confirmed or re-issues with the same key |
| Compensation / sagas | User-built. The graph can have a compensating branch, but there's no obligation ledger that runs LIFO on failure | @tape.effect(compensate=…) registers the inverse at commit; tape.compensate_run(run_id) walks obligations LIFO; failures land in stuck (never silently "compensated") |
| Budget as run state | User-built (carry counters in graph state; the user enforces the cap in a node) | tape.Budget(usd_cap=…, token_cap=…); AdmitBudget before, ChargeBudget after; spent counters survive crashes |
| Multi-agent coordination through journaled state | Subgraphs + shared state in the parent graph; no monotonically-versioned, fan-out-watchable shared store as a primitive | tape.set_value / get_value / watch_value — monotonically-versioned, CAS-able, watchers see the transition (X: 70 → 90) with the previous value attached |
| Time travel / fork | The checkpointer keeps every state version on the thread; you can invoke(Command(...), config={"configurable": {"thread_id": t, "checkpoint_id": c}}) to resume from any past checkpoint (the linked durable-execution page does not document fork-from-checkpoint; the broader docs do). The durable-execution doc's focus is resume, not time travel |
Not a feature. Tape replays forward from the resume point; the journal supports replay-testing but Tape is not a versioned state store you fork |
| Language scope | Python and TypeScript SDKs of LangGraph itself | Python full (the ADK reference); TS / Go / Java wired-client + tests. Wire protocol is gRPC, so the runtime survives the agent's choice of language |
| Operational footprint | In-process library + a checkpointer backend (your DB). No separate server | A separate Rust server (one process per cluster, behind a load balancer) + a backend (SQLite/Postgres/AlloyDB/Bigtable) |
The honest summary. LangGraph's durable execution and Tape are answering
adjacent questions with overlapping vocabulary. LangGraph asks how do I make
this graph survive a crash? and gives you a checkpointer, a @task
decorator, interrupt() for HITL, three durability modes, and a drain
primitive — all in-process, all bound to the graph's notion of "what is a
step". Tape asks how do I make this ADK agent's decisions, effects,
and obligations survive a crash? and gives you a separate journaling
server that the agent talks to over gRPC, with the third outcome (unknown),
obligation-ledger compensation, decision-keyed idempotency, and a reconciler
as first-class primitives — the §IX list, by name.
Pick LangGraph's durable execution when your agent is already a
LangGraph graph, your durability needs end at "resume the graph from the last
node boundary", and the in-process checkpointer model fits your operational
shape. The Functional API + @task + interrupt() is a real, mature
implementation of the "checkpointed graph" pattern.
Pick Tape when your agent is an ADK agent, you need the §IX primitives
(unknown, model-written compensation, decision-keyed idempotency, gates as
durable suspends, budget as run state, coordination through journaled state),
and you can run the Rust server alongside your existing DB. The contract is
"the agent stays as ADK code; the journal lives somewhere built to survive."
The composition. Putting LangGraph on top of Tape is out of scope (Tape is wired to ADK's callbacks, not LangGraph's). Putting Tape on top of a LangGraph checkpointer is the wrong shape (LangGraph already commits at node boundaries; Tape would duplicate the cut). Where they meet honestly is the landscape claim Section XII makes: pick one runtime layer per agent, and prefer the one built for the boundaries you actually care about.
3. Tape vs Pydantic AI + DBOS (DBOSAgent)⌗
Source for Pydantic AI + DBOS claims: https://pydantic.dev/articles/pydantic-ai-dbos.
Pydantic AI's DBOSAgent is the closest spiritual cousin Tape has. Both
inherit the §XII conclusion — put a durable engine underneath — and apply
it to a single agent framework. The differences are framework (Pydantic AI vs
ADK), engine (DBOS in-process Postgres library vs Tape's stand-alone Rust
server), and which §IX primitives are first-class.
| Concern | Pydantic AI + DBOS (DBOSAgent) |
Tape (ADK) |
|---|---|---|
| Integration shape | DBOSAgent(agent) wraps Agent.run() / Agent.run_sync() as a @DBOS.workflow and model + MCP calls as DBOS steps. Two lines: from pydantic_ai.durable_exec.dbos import DBOSAgent; dbos_agent = DBOSAgent(agent) |
Runner(..., plugins=[TapePlugin()], session_service=TapeSessionService(...)). Two lines, no agent rewrite |
| Durable engine | DBOS — an in-process Python library backed by Postgres. No separate server; the DB is the control plane | Tape — a stand-alone Rust server (gRPC) + pluggable RunStore (SQLite/Postgres/AlloyDB/Bigtable). Separate process; the agent talks to it |
| Workflow identity | DBOS workflow UUID; identity flows from the framework | ADK's invocation_id (one runner.run() call) + session_id; Tape's run_id keyed to (app_name, user_id, session_id, invocation_id) |
| Step identity (the recovery model) | Step ID by call order inside the workflow — the model the treatise §VI calls out as the DBOS pattern | seq per run, monotonic by call order within (run_id, kind). Same model (Tape acknowledges this in tape.md §6.5: "DBOS's step-id-by-call-order") |
| Decision journal (the LLM call) | Model calls are auto-wrapped as DBOS steps; the response is checkpointed and replayed from the DB on re-drive | before_model_callback short-circuits with the recorded LlmResponse; after_model_callback writes to tape_decisions. Same outcome (the decision is replayed, not re-sampled) |
| Effect journal (the tool call) | Tool invocations wrap as DBOS steps; results are checkpointed | before_tool_callback → BeginEffect(pending) commits before the body runs; after_tool_callback → CompleteEffect(confirmed). The intent-before-act split is explicit |
| Idempotency at the wire | The user supplies the idempotency key inside the tool body (no decision-keyed key is mentioned in the article) | Tape derives the key — run/decision-N/<tool>/<call_idx> — and hands it back via the plugin; the body passes it to the counterparty |
The third outcome (unknown ack) |
Not documented as first-class. A failed step retries (DBOS retries failed steps); reconciliation against the counterparty is user-built | EffectStatus.UNKNOWN + status_check reactor — first-class |
| Compensation / sagas | DBOS has step-level retries; the article doesn't show a compensation primitive. Compensation is user-built as a separate workflow path | @tape.effect(compensate=…) registers the inverse at commit time; tape.compensate_run walks LIFO |
| Human in the loop | Not covered in the article. Pattern would be a DBOS workflow that awaits a signal/event | tape.gate_tool("approval") — LongRunningFunctionTool + signal; the run holds in waiting across deploys |
| Sub-agent / fan-out | "Sub-agent runs as child workflows" — DBOS.start_workflow_async(...) for fan-out/fan-in. End-to-end reliability across agents |
Tape's child-runs are sketched (planned in §13). Workaround: spawn a fresh begin_run and signal back |
| Durable queues | Yes — DBOS includes Postgres-backed queues with concurrency limits, rate limits, retries, prioritisation | No queue primitive — the run lease + the reactor pattern serve the recovery case; cross-run fan-out is via the WAL tail + sinks (PubSubSink, WebhookSink) |
| Budget as run state | Not documented in the article | tape.Budget — admit before / charge after, survives crashes |
| Coordination through journaled state | Not documented as a primitive | tape.set_value / watch_value — monotonically-versioned, CAS, watchers observe the transition |
| Observability | DBOS Conductor (web UI), Pydantic Logfire via OpenTelemetry, MCP servers for natural-language queries | SubscribeRun / SubscribeEvents as machine feeds; no UI. The journal is the queryable surface |
| Determinism | Article doesn't address it explicitly. DBOS, like Tape, relies on user discipline to keep workflows replay-safe | P11: documented, not enforced. Same shape |
| Operational footprint | DBOS as a library + Postgres. No new infra | Rust server + the chosen backend. New infra |
| Language scope | Python (Pydantic AI's home) | Python (reference SDK) + TS / Go / Java wired-client. The wire protocol means the agent's language is independent of the runtime's |
The honest summary. Pydantic AI + DBOS and Tape are the same conclusion applied to two different agent frameworks: the runtime layer is something else, the agent stays as the framework's code, and the framework cedes durability to a purpose-built engine. The two diverge on three real axes:
- Engine deployment. DBOS is an in-process library on Postgres. Tape is a separate Rust server with multiple backends. DBOS is operationally simpler if you already run Postgres; Tape decouples the runtime from the agent's process and language at the cost of a new service.
- Which §IX primitives are first-class. DBOS gives you durable
workflows, steps, child workflows, queues, retries, and observability —
the floor primitives. Tape adds the agent-shaped ceiling:
unknownacks, decision-keyed idempotency, gates as suspend-until-signal, model-written compensation walked LIFO, budget as run state, and coordination through versioned shared state — the §IX list, by name. - Framework scope. DBOSAgent is Pydantic AI only. Tape is ADK only. Picking one is mostly picking the agent framework.
Pick Pydantic AI + DBOS when you want Pydantic AI's typed-agent ergonomics
and DBOS's operationally-simple Postgres-backed runtime, and the §IX floor
(workflows, steps, queues, retries) is enough — you'll build the ceiling
primitives (compensation walked LIFO, unknown reconciliation, decision-keyed
idempotency, budget as state) yourself when you need them.
Pick Tape when you're on ADK, you want the §IX ceiling primitives as table stakes, and you can run the Rust server alongside your DB. The runtime is independent of your agent's language; the journal is the audit; the recovery model is the same step-by-call-order DBOS uses, expressed through ADK's callbacks.
The composition. A RunStore backed by DBOS is not in the spec (Tape's
backends are SQLite/Postgres/AlloyDB/Bigtable — storage, not durable
execution). A Temporal-backed RunStore is (v2). The composition that makes
sense across all three is the §XII picture: one runtime layer per agent —
DBOS-under-Pydantic-AI, Tape-over-ADK, Temporal-under-Tape — chosen for the
boundaries the workload cares about.