Observability⌗
Tape emits structured logs (JSON) and OpenTelemetry spans. On GCP, both go to Cloud Logging and Cloud Trace automatically when the standard service-account roles are granted.
Structured logs⌗
Every server + reactor log line carries these fields (when known):
tenant_id app_name run_id invocation_id session_id
seq effect_key decision_index reactor lease_owner
tape.obs.log_json("msg", **fields) writes one line in that order. Cloud
Logging indexes the JSON keys; your queries look like:
OpenTelemetry spans⌗
The SDK opens spans on the boundaries you care about:
tape.begin_run tape.resume_run
tape.record_decision
tape.begin_effect tape.complete_effect
tape.reconcile_effect tape.dispatch_effect
tape.compensate tape.redrive
tape.await_signal tape.send_signal
Bring your own exporter, or call tape.obs.configure_cloud_trace_exporter()
at process start. The exporter is lazy — no OTel SDK installed, no spans, no
crash.
Dashboard + log-based metrics⌗
The observability Terraform module provisions:
tape/runs/running,tape/runs/stucktape/effects/unknowntape/obligations/unresolvedtape/reactor/lag_ms
…and a Cloud Monitoring dashboard wiring all five. Customize the dashboard
JSON in
tape/deploy/gcp/terraform/modules/observability/dashboard.json.
What to alert on⌗
tape/runs/stuck > 0for > 5m — escalate.tape/effects/unknownnon-zero and rising for > 10m — reconciler is failing to resolve, the upstream is degraded, or a status check is missing.tape/obligations/unresolved > 0for > 15m — compensation is failing.tape/reactor/lag_ms > 60_000— a reactor is starved; check pod CPU.
See also⌗
- Observability (Python reference) — the
tape.obsAPI for emitting your own structured logs + spans. - Reactors — what each reactor logs and why
lag_msmatters. - Troubleshooting — what to do when one of these alerts fires.