Safety & observability¶

The production perimeter: what the agent outputs, what it costs, what it logs, and how you prove it still works a week from now.

        flowchart LR
    I[input] --> MW[Middleware<br/>M.retry, M.cost,<br/>M.trace, M.timeout]
    MW --> A[agent]
    A --> G[Guards<br/>G.pii, G.toxicity,<br/>G.length, G.schema]
    G --> O[output]
    A -.telemetry.-> OBS[(logs, traces,<br/>metrics)]
    A -.regressions.-> EV[Evaluation<br/>E.case, .eval_suite]

    classDef gate fill:#ffebee,stroke:#c62828,color:#b71c1c
    classDef obs fill:#e8f5e9,stroke:#2e7d32,color:#1b5e20
    class MW,G gate
    class OBS,EV obs

Chapters¶

Chapter	Use it for
Middleware	Cross-cutting concerns that wrap every agent call: retry, timeout, logging, cost, circuit-breaker.
Guards	Output validation that must run before the response leaves the agent — PII, toxicity, max-length, schema.
Evaluation	LLM-as-judge scoring, criterion scoring, eval suites for regression testing.
Testing	Deterministic substitutes: `.mock()` replaces the LLM; `.test()` asserts inline; both are CI-safe.

Pair this tier with Patterns & control flow — production deployments almost always combine the two.

Tip

Middleware vs guards: two different jobs Middleware (M.*) wraps the call: retries on network blips, cost accounting, latency metrics. Guards (G.*) validate the response: “does this output contain PII?”, “is it under 500 chars?”. Don’t put retry logic in a guard, and don’t put PII detection in middleware — they run in different lifecycle phases.