Replay & resume⌗

When the agent re-attaches to a run — because it crashed, because the pod got evicted, because the recovery reactor decided the lease was stale — what does "resume" actually mean?

The resume point⌗

The resume point is the first seq in the journal that has no record.

Concretely:

The agent (re-)constructs an ADK Runner via the runner factory.
The TapePlugin re-attaches the same invocation_id. ADK's resume semantics kick in.
The plugin replays journaled decisions for seq < resume_point — the agent doesn't re-ask the model for choices it already made.
It replays confirmed effects' results from history — the tool body doesn't run again.
At seq == resume_point, the agent does what it would have done if it had never crashed.

What gets replayed⌗

Record at `seq < resume_point`	What replay does
Decision	Hand the recorded choice back to the model loop
Effect, `CONFIRMED`	Hand the recorded result back to the tool caller
Effect, `PENDING`	The reconciler is on it; agent waits
Effect, `UNKNOWN`	Same — wait for the reactor to resolve
Budget admit/charge	Restore spent counters
Gate, signalled	Hand the signal to the agent
Gate, awaiting	Suspend until signalled or timed out

The replay is passive — it doesn't call any tool bodies, doesn't call the model. It rebuilds the agent's view from history. The first thing that runs is the work at seq == resume_point.

Determinism is your job⌗

For replay to be correct, your tool bodies and your agent's prompt construction must be deterministic given the journal.

Three rules:

No wall-clock reads outside tape.sample. Wrap them:

now = tape.now(tool_context)                # journalled once per run
uid = tape.uuid(tool_context)               # journalled once per run
r   = tape.sample(tool_context, lambda: random.random())

No reads from mutable external state outside an @tape.effect. If the agent's prompt reads from a DB row, that read goes through tape.sample or is itself an effect.
Same code path on re-drive. If you bump the agent's model version between attempts, replay will diverge. Tape can't sandbox this, but it can warn you via tape.policy_is(tool_context, "...") so you branch correctly.

What about model nondeterminism?⌗

The model's output is the decision. The decision is journalled. On replay we hand the recorded decision back to the model loop — we don't ask the model again. So model temperature doesn't matter for replay.

The model's input — the prompt — needs to be the same shape on every call. ADK gives you tools, message history, and tool results in a deterministic order; if you build prompts yourself, route through tape.sample for anything that isn't already in the history.

Replay vs. rewind vs. fork⌗

These are three different operations. Don't confuse them.

Operation	What it does	Use it when
Replay	Reconstruct an in-flight run from `seq=0` to the resume point.	Recovery after crash.
Rewind	Truncate the journal at `seq=N`, replay to that point, and continue.	"Run that again, but make this choice differently."
Fork	Copy a run's journal up to `seq=N` into a new `run_id`, continue independently.	"What would have happened if we'd done X?"

Replay is automatic. Rewind and fork are explicit operator actions and should be rare — they're the closest thing Tape has to a git rebase.

The redrive⌗

tape.redrive(run_id) is "wake this run up, even though nothing changed in the world." It's the operator's poke-with-a-stick. Useful when:

A reactor wedged and you've fixed it.
A connector was misconfigured at first dispatch; you've updated it and want the run to retry on the new config.
You added a new signal handler in code and want existing waiting runs to re-evaluate.

Internally, redrive is just a timer of kind redrive that fires immediately, which the recovery reactor picks up the normal way.

When replay can't help you⌗

The journal is corrupted. (It shouldn't be; the server runs on a transactional store.)
Your code diverges between attempts in a way Tape didn't sandbox. (Real footgun. Watch policy_version on every decision.)
An effect that was CONFIRMED actually didn't commit and the upstream has since changed its mind. (This is a contract violation by the upstream. STUCK + human.)

In all three, the answer is the same: stop, look, decide. Tape doesn't pretend it can recover from inconsistent state.

Next⌗

The journal — what gets recorded, structurally.
Reactors — the recovery reactor that triggers replay.
Reactive shared state — the other primitive for "things that change while the run is alive."